Page 1 of 1

content somehow being transcoded from utf-8 into iso-8859

Posted: Wed Oct 10, 2007 3:02 am
by Chris..S
Hi,

Problem site:  CMS 1.1.2, PHP 5.2.3, MySQL 4.1.22, Apache 1.3.37
Problem: page content has somehow become encoded with iso-8859.  Source content and page headers are all utf-8.

I set up the site on my own server (CMS 1.1.2, PHP 5.2.3, MySQL 5.0.30, Apache 2.0.59).  Everything here is set up to deliver the page as UTF-8 and the page data was entered in UTF-8.  Page data is delivered as utf-8 with utf-8 headers. Everything fine.

To transfer the site to the live server, I dumped the DB with mysqldump using default-character-set=utf8.  The resulting dump was loaded into the MySQL db using phpmyadmin on the live server.  CMS MS files were transferred with the same settings as the development site.

Low and behold, the pages were being delivered with iso-8859 encoded text.  The response header was still correctly saying utf-8 and the meta content field was still saying utf-8.  The browser then attempted to display the text as utf-8, resulting in a mass of "?" whenever it came across an iso-8859 byte sequence.

I checked the db by dumping the content_prop table back to my computer and viewing it in a text editor set to interpret the data as utf-8.  Everything looked fine - ie. all the non-ascii characters were displaying correctly.  I also checked "show create table content_prop" on the live server - it reported a default charset of utf-8.

This leads me to believe the source data is still in utf-8 and that a charset conversion is occuring somewhere with CMS MS.
Can anyone tell me where that conversion could be happening and what server setting it could be picking up on?

Additional notes:
- original setting in config.php for default_encoding was empty.
- I did try explicitly setting default_encoding to "utf-8".  There was no change - ie. iso-8859 text was sent under utf-8 headers.
- I have temporarily fixed the problem by setting the default_encoding and the meta content charset to iso-8859-1.  Content and headers now match allowing web browsers to display the text correctly, but this is not an ideal solution.
- iconv & mbstring modules are available on both servers and have the same settings (although there is slight difference in the iconv implementation, dev server uses glibc & lib version 2.5 where as live server use libiconv version 1.9)

Thanks in advance or any assistance.