Problems with utf8 again and again Topic is solved

General project discussion. NOT for help questions.
Post Reply
Sonya

Problems with utf8 again and again

Post by Sonya »

Hello,

we are having problems with installation of CMSMS in Russian board again and again. The problem is that database collation does not match the connection encoding. I know that this can be corrected by uncommenting set names line in include.php But the CMSMS newbies not.
1. They just give up with CMSMS immediately after installation because this issue. (They say they have/had no problems with other CMS)
2. If the browser displays the site correctly, they just start to add their content and corrupt the database sending latin content to utf-database. It is possible but very tricky to repair the database encoding afterwards.

Probably a solution for this problem. Uncomment the line for mysql database and default_encoding='utf-8' in the next release. E.g.

Code: Select all

	
if ($cmsdb->dbtype == 'mysql' && $config['default_encoding'] == 'utf-8') {
	$cmsdb->Execute('set names utf-8'); // database connection with utf-8
} else {
      # $cmsdb->Execute('set names utf-8'); // database connection with utf-8
}
Probably you cannot see a large issue in it. It is because you make your site in English. For other character sets this is really a big problem and it costs much time to figure out and solve the problem.

Thank you,
Sonya
vilkis

Re: Problems with utf8 again and again

Post by vilkis »

It would be nice.

Vilkis
Pierre M.

Re: Problems with utf8 again and again

Post by Pierre M. »

Hello,

Please file a bug in the forge. It is where the DevTeam tracks issues. They may miss them in the forums.
Have fun with CMSms

Pierre M.
vilkis

Re: Problems with utf8 again and again

Post by vilkis »

Further, it could be implemented an additional parameter of CMSMS: "forced ecncoding for mysql". During installation of CMSMS, if selected mysql db, some non latin characters can be written to db and read in two ways: with setting names to utf8 and without doing it. If users can see different answers then he can be invited to turn on a parameter "forced ecncoding for mysql".
It is general discussion here and if somebody say that this feature is necessary I will submit the request in forge.

Vilkis
Sonya

Re: Problems with utf8 again and again

Post by Sonya »

Pierre M. wrote: Please file a bug in the forge. It is where the DevTeam tracks issues. They may miss them in the forums.
vilkis wrote: It is general discussion here and if somebody say that this feature is necessary I will submit the request in forge.
@Vilkis, can you please submit my first suggestion? I do not really know how to deal with forge  :-[
vilkis

Re: Problems with utf8 again and again

Post by vilkis »

I submitted your post in "Feature Requests" of CMSMS core. In future, you can do it by yourself - sign up in http://dev.cmsmadesimple.org/signup .

Then you can login in forge (http://dev.cmsmadesimple.org/users). Find appropriate project in Project list (you can search a project entering key words in search field on the top right corner of page),click on project name and then select Bug Tracker or Feature Requests tab and click on Submit New Bug or Submit New FR, respectively. Fill form and submit. That's all.

Vilkis
alby

Re: Problems with utf8 again and again

Post by alby »

Sonya wrote: we are having problems with installation of CMSMS in Russian board again and again.
.....................

Code: Select all

	
if ($cmsdb->dbtype == 'mysql' && $config['default_encoding'] == 'utf-8') {
	$cmsdb->Execute('set names utf-8'); // database connection with utf-8
} else {
      # $cmsdb->Execute('set names utf-8'); // database connection with utf-8
}

Well I answer in forum because: have more opinions and it's my 2ç.

1. For include this then is important first to change the requirement for cmsms 1.X (for example mysql from 3.23 to 4.1.0). Before MySQL 4.1 you had SET CHARACTER SET cp1251_koi8 (the only allowable value and for est europe only), yes you can add new if modify the sql/convert.cc and compile(!).

2. Unfortunately mysql default to latin1-swedish (I suppose that you are a normal user and you cannot touch my.cnf file) and user must have privileges and know the way for changed charset.

3. There are problems for a few UTF8 chars (values very high) that MySQL will silently reject and PostgreSQL before 8.1.X warning hardly.


However I agree :)
Nowadays use of utf8 is a best practies in all languages (also in en_US: write Résumé or pound symbol £ can be a pain)
but for items I see better with a option deactivated by default (compatibility with all prev versions) [for info: in MLE I enable it for default]


Alby

PS: the correct statement, for me, is in lib/adodb.functions.php
if ($config['dbms'] == 'sqlite')
{
$dbinstance->Execute('PRAGMA short_column_names = 1;');
sqlite_create_function($config['use_adodb_lite'] ? $db->connectionId : $db->_connectionID, 'now', 'time', 0);
}
//Start MLE
else
if(!empty($config['default_encoding']) && $config['default_encoding'] == 'utf-8') $dbinstance->Execute("SET NAMES 'utf8'");
//End MLE
because it's valid for mysqli and postgreSQL also (but not for sqlite!) and immediately in connect function
Last edited by alby on Tue Mar 10, 2009 11:09 am, edited 1 time in total.
Sonya

Re: Problems with utf8 again and again

Post by Sonya »

Just adding to the post. With new releases 1.5.4 include.php is delivered. If you are not accurate you replace the include.php and set names is defaultet again. It can damage database.

Is it not just logical to add new $config variable where I can store my preference about connection, so that there is no need to modify include.php after upgrade?

Please! :)
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm

Re: Problems with utf8 again and again

Post by Ted »

Sonya,

Can you do a bit of testing for me, since I can't as easily recreate your environment.  I'm concerned that the set names command could screw up existing installations, so I'd like to figure out a way to solve it for all new installs instead.

Before you install CMSMS, can you create the database with:

Code: Select all

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;
Then comment out the SET NAMES bit and see if it still works properly.

It might just be an issue of modifying the installation instructions.  Though, this encoding stuff in relation to how PHP and MYSQL treat it differently is not my strongest subject...  so tell me if I'm totally off-base.  :)

Thanks
Sonya

Re: Problems with utf8 again and again

Post by Sonya »

Ted wrote: I'm concerned that the set names command could screw up existing installations, so I'd like to figure out a way to solve it for all new installs instead.
Definitely, it can happen. The "set names" can screw up the installations where the character set it not set properly from the very beginning. It will impact databases that are already "damaged" because of false set name command. Let's omit collation now and deal only with character set since collation is only needed for comparison:

Code: Select all

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;
This means that tables created in the database will use utf8  and utf8_general_ci by default for any character columns. But it does NOT mean that the data sent to it is in utf-8! Applications that use the database should configure their connection to the server each time they connect or the default it used. In our case PHP has to inform the server that the data coming is UTF-8 data. It can be done with the command "SET names utf8".

If it is done, we are happy and can be sure that everything will be OK. But what if not? Some servers have another character set defined for connection, eg. latin1 and if no other command is given the connection will be established with this value. Now, what happens in this case step by step:

   1. We declare UTF-8 for our page with meta tag encoding and type in text area some special characters: „üüü“.
   2. The MySQL client knows nothing about UTF-8 at that time and sends „üüü“ as „üüü“ (latin1 representaion of „üüü“) to the database with the comment that latin1 data comes.
   3. The MySQL server knows that the table uses UTF-8 as default character_set and converts the received data from latin1 to utf-8. But it saves literally the string „üüü“ as UTF-8 data and not "ü".
   4. Now we want to query the table. Since our connection is still latin1 the query will demand latin1 data.
   5. The server delivers the data, but before delivering it converts the data from UTF-8 to latin1, just because the client wanted it so.
   6. The client gets „üüü“ as latin1 and outputs it on the display as „üüü“.

In these 6 steps you can see how corrupted data can be saved and queried "without any problems" except that it is impossible to read the data in PHPMyAdmin, sort it and that the data is not saved properly.

The connection is established with utf-8 by default only if the MySQL server configured accordingly. See in my.cnf:

Code: Select all

default-character-set=utf8
character-set-server=utf8
If default-character-set is set to latin1 (and in my case it is so!) I need to advise the mysql client each time before sending and quering data to use utf-8 connection.
Ted wrote: so tell me if I'm totally off-base.  :)
I tried  ;D

To solve the problem for new installation, please see the post of vilkis:
it could be implemented an additional parameter of CMSMS: "forced ecncoding for mysql". During installation of CMSMS, if selected mysql db, some non latin characters can be written to db and read in two ways: with setting names to utf8 and without doing it. If users can see different answers then he can be invited to turn on a parameter "forced ecncoding for mysql".
For me, it it the best way to do it for new installation. However it will not suit existing "damaged" environments. Converting database is not so trivial. ZYV knows how, but he does not tell!  ;D

Hope, could help a little.
Last edited by Sonya on Sat Apr 11, 2009 9:13 pm, edited 1 time in total.
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm

Re: Problems with utf8 again and again

Post by Ted »

Ok, so I guess the way around this then is to have a config.php variable.  Have it default to false on upgrade and true on a new installation.  1.6 is going to be in beta for a bit anyway, so it's worth trying to do this in my opinion.
Sonya

Re: Problems with utf8 again and again

Post by Sonya »

Ted wrote: Ok, so I guess the way around this then is to have a config.php variable.  Have it default to false on upgrade and true on a new installation.  1.6 is going to be in beta for a bit anyway, so it's worth trying to do this in my opinion.
This would be genius solution! Can't wait until 1.6. If it is the possible to set variable with installation program on Step 6 where the database connection settings are made, it will make many-many-many non-English speaking people happy :)
Ted
Power Poster
Power Poster
Posts: 3329
Joined: Fri Jun 11, 2004 6:58 pm

Re: Problems with utf8 again and again

Post by Ted »

I've pushed this change into the 1.6 code.  Sonya, when beta time comes around, please give it a test -- both on upgrade and new install.

Thanks!
Sonya

Re: Problems with utf8 again and again

Post by Sonya »

Ted wrote: I've pushed this change into the 1.6 code.  Sonya, when beta time comes around, please give it a test -- both on upgrade and new install.

Thanks!
Will be done :)
Post Reply

Return to “General Discussion”