• twitter image
  • facebook image
  • youtube image
  • linkedin image
Language: CMS Made Simple Czech CMS Made Simple France CMS Made Simple Spain CMS Made Simple Hungary CMS Made Simple Russia CMS Made Simple Netherlands

All times are UTC




Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 
Author Message
 Post subject: Question about the transform sql
PostPosted: Sun Apr 20, 2008 11:27 am 
Offline
Forum Members
Forum Members

Joined: Wed Mar 26, 2008 1:49 am
Posts: 233
Location: Stuttgart / Germany
Hi Alby,

one additional question:
Why did you set the charset for the new columns explicit to 'utf8'?

At the moment I'm using the above example sql to transform cmsms to mle.

(Of cource, it would be nice/a good idea, to really change/convert the db to utf8)

Regards,
Carsten


Attachments:
transform_cmsms_to_mle.sql.txt [3.28 KiB]
Downloaded 125 times
transform_mle_to_extended.sql.txt [1.62 KiB]
Downloaded 105 times
Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 7:15 am 
Offline
Support Guru
Support Guru
User avatar

Joined: Mon Jul 04, 2005 5:12 pm
Posts: 4812
Location: Ferrara, Italy
Wiedmann wrote:
Why did you set the charset for the new columns explicit to 'utf8'?


Generally people who use multilingual must use UTF-8, in this way sets the db to use this instead latin-sweden (mysql default) but is not important if not appear strange characters  :)

Alby

_________________
CMSMS Support Team
Italian Admin and Moderator

Plugins: Geolocate hostip, Multiple random image, Image rotator (beta), Content Pagination
Modules: ForumMadeSimple (Howto), TranslationManager
Multilingual: MLE is not CMSMS


Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 8:53 am 
Offline
Forum Members
Forum Members

Joined: Wed Mar 26, 2008 1:49 am
Posts: 233
Location: Stuttgart / Germany
Quote:
Generally people who use multilingual must use UTF-8

Right, and CMSMS is allready using UTF-8.

But:
CMSMS itself have a really strange misbehaviour in using UTF-8 in db's. Because it's connecting with latin1 to the db. All utf-8  char's are stored as a sequence of 1,2 or 3 latin1 chars in the db. Regardsless if the the db is setup to latin1 or utf-8.

If you want verify this:
DB setup with latin1:
Just make a content page with e.g. the German word "König". The word looks fine in the browser. Now look at the db with phpMyAdmin. What can you see?

In a second test change the db to utf-8:
Now the same test as above. What can you see: Right... the same.

In a third test:
Change the content of the UTF-8 db with phpMyAdmin and put some valid utf-8 chars in it (e.g. change König to Müller). What can you see in your Browser if you acces your website?

At the moment it makes no sense to set only these (MLE)  columns to utf-8.


Of course, the way CMSMS is using charsets in the db should be corrected:
- you can't really change utf-8 content with a 3rd.party program like phpMyAdmin.
- if the default charset of the server is utf-8 before you install CMSMS, and you install CMSMS, not all indexes will be generated, but you have no error message about this situation.
- And of course, no natural language sorting of the db will work at the moment.

(I think, most of the cmsms devs are us-(ascii) people. Ans thus they don't have this problem ;-) )


Last edited by Wiedmann on Mon Apr 21, 2008 8:55 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 9:35 am 
Offline
Support Guru
Support Guru
User avatar

Joined: Mon Jul 04, 2005 5:12 pm
Posts: 4812
Location: Ferrara, Italy
Wiedmann wrote:
Of course, the way CMSMS is using charsets in the db should be corrected:
- you can't really change utf-8 content with a 3rd.party program like phpMyAdmin.
- if the default charset of the server is utf-8 before you install CMSMS, and you install CMSMS, not all indexes will be generated, but you have no error message about this situation.
- And of course, no natural language sorting of the db will work at the moment.

(I think, most of the cmsms devs are us-(ascii) people. Ans thus they don't have this problem ;-) )


Yes (most of the cmsms devs are us-(ascii)) and no (you can really change utf-8 content with a 3rd.party program)
but there are many problem related .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8

In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8') for insert in/view from DB
If you then use a editor that translate in html entity .....  :o

UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?

Alby

_________________
CMSMS Support Team
Italian Admin and Moderator

Plugins: Geolocate hostip, Multiple random image, Image rotator (beta), Content Pagination
Modules: ForumMadeSimple (Howto), TranslationManager
Multilingual: MLE is not CMSMS


Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 10:47 am 
Offline
Forum Members
Forum Members

Joined: Wed Mar 26, 2008 1:49 am
Posts: 233
Location: Stuttgart / Germany
Quote:
and no (you can really change utf-8 content with a 3rd.party program) .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8

PHP is not the problem. But CMSMS is not using "SET NAMES utf8" in the db connection, although they use utf8 for out/input.

Quote:
UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?

That's not a problem. With a correct "set names" the db make sure the client have the correct chars. Thus you can output iso-8859-15 to the client, but store the data in utf-8 tables. Or output utf-8, but store the text in columns which have different charsets for each language.

Well, the easiest thing is really using utf-8 in output and db. And this works with cmsms after a small test... Of course, just setting the charset in db to utf-8  is not enought, if you transform a working cmsms. In this case you must first transform the wrong latin1(utf-8) chars to real utf-8 chars. But with a little script this is a easy thing ;-)

Quote:
In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8')

Is this not also normal? IMHO that's a basic thing if you output text form a db.... (BTW: afaik there is a cmsms internal function for this) (Of course, no one should really store entities in the db...)



(But in the summary: CMSMS MLE works also, if the columns are latin1 or whatever, because cmsms don't use charsets in the db.)


Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 8:51 pm 
Offline
Support Guru
Support Guru
User avatar

Joined: Mon Jul 04, 2005 5:12 pm
Posts: 4812
Location: Ferrara, Italy
Wiedmann wrote:
Quote:
and no (you can really change utf-8 content with a 3rd.party program) .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8

PHP is not the problem. But CMSMS is not using "SET NAMES utf8" in the db connection, although they use utf8 for out/input.


I wanted to say that PHP is a problem for general installations. PHP 5 is ok but PHP 4 has problem (mysql_fetch bug in few version of 4.4)


Wiedmann wrote:
Quote:
UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?

That's not a problem. With a correct "set names" the db make sure the client have the correct chars. Thus you can output iso-8859-15 to the client, but store the data in utf-8 tables. Or output utf-8, but store the text in columns which have different charsets for each language.

Well, the easiest thing is really using utf-8 in output and db. And this works with cmsms after a small test... Of course, just setting the charset in db to utf-8  is not enought, if you transform a working cmsms. In this case you must first transform the wrong latin1(utf-8) chars to real utf-8 chars. But with a little script this is a easy thing ;-)


Again, for general installations "set names" (if you want uncomment query in include.php) implies that you must have mysql 4.1 to work well, but until a month ago I had two sites MLE with mysql 3.23.58!!


Wiedmann wrote:
Quote:
In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8')

Is this not also normal? IMHO that's a basic thing if you output text form a db.... (BTW: afaik there is a cmsms internal function for this) (Of course, no one should really store entities in the db...)

(But in the summary: CMSMS MLE works also, if the columns are latin1 or whatever, because cmsms don't use charsets in the db.)


My bad example but it was to say that you will still problems with the WYSIWYG editor.
My personal experience is:

- If I have access to mysql/apache resources:
- my.cnf:
  [mysqld]
  collation_server=utf8_unicode_ci
  character_set_server=utf8

  [client]
  default-character-set=utf8

- htaccess or httpd.conf:
  AddDefaultCharset UTF-8

- No access to mysql/apache resources: php query:
  SET NAMES utf8 or SET CHARACTER_SET utf8

- database/table/columns text: UTF8
- header('Content-Type: text/html; charset=utf-8' );
-

Alby

_________________
CMSMS Support Team
Italian Admin and Moderator

Plugins: Geolocate hostip, Multiple random image, Image rotator (beta), Content Pagination
Modules: ForumMadeSimple (Howto), TranslationManager
Multilingual: MLE is not CMSMS


Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Mon Apr 21, 2008 9:51 pm 
Offline
Forum Members
Forum Members

Joined: Wed Mar 26, 2008 1:49 am
Posts: 233
Location: Stuttgart / Germany
Quote:
(mysql_fetch bug in few version of 4.4)

Can you explain this more detailed?
--> I've written some of the charset code for the mysql(I) drivers for mdb2, and I don't think we have special PHP4 bugreports regarding charsets.

Of course, if you store data with an application like phpMyAdmin (which handle charsets correct) in the db, and then retrieve the data with a wrong client encoding in PHP, the result is not correct.
--> same happens with cmsms at the moment.

Quote:
Again, for general installations "set names" (if you want uncomment query in include.php) implies that you must have mysql 4.1 to work well, but until a month ago I had two sites MLE with mysql 3.23.58!!

That's also no problem, because mysql <4.1 (or sqlite2) doesn't now anything about charsets. Just store data in utf-8 and you can retrieve them as utf-8.

(BTW: The prerequist for cmsms is MySQL 4.1)

You have only a problem with db's which knows charsets. As you can see with cmsms, you have now wrong characters in the db. (Well cmsms can handle them, but have you ever test to backup a cmsms on a server with another charset then the server where you restore the backup?)

And about enabling "set names" in include.php. Just enabling this is also wrong. First you must correct all wrong chars in the db.

BTW: "include.php" is also the wrong place. You must use "SET NAMES" in "adodb.functions.php"!
(Sorry, but IMHO the db knowledge from the cmsms devs is really less...)

Quote:
Code:
 [client]
   default-character-set=utf8

You remember: PHP doesn't use this.

Quote:
- htaccess or httpd.conf:
Code:
   AddDefaultCharset UTF-8

Most time also not a good idea. Because not all things on your server, like some textfiles, are in utf-8. The best way is to only set the charset only with header() and/or in your (x)html output.

Quote:
- No access to mysql/apache resources: php query:
Quote:
  SET NAMES utf8 or SET CHARACTER_SET utf8

What does you mean with "no access"?

BTW:
"SET CHARACTER SET" is in most times not what you want. This sets the connection charset to the default server charset. "SET NAMES" (or better mysql(i)_set_charset) is the correct way.

Quote:
- database/table/columns text: UTF8
- header('Content-Type: text/html; charset=utf-8' );
-

That's what I also have. But as I've written above:
For the DB this makes only sense, if you use "SET NAMES 'utf8'" and also correct the wrong chars in the db and adjust one index.
If you don't use "SET NAMES", better use "latin1" in the db (and save space).

That works without problems for the base cmsms. If you install additional modules, you can have a problem if only your tables have a default charset utf8, but the db not.


Last edited by Wiedmann on Tue Apr 22, 2008 5:01 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Tue Apr 22, 2008 9:22 am 
Offline
Support Guru
Support Guru
User avatar

Joined: Mon Jul 04, 2005 5:12 pm
Posts: 4812
Location: Ferrara, Italy
Wiedmann wrote:
Quote:
(mysql_fetch bug in few version of 4.4)

Can you explain this more detailed?
--> I've written some of the charset code for the mysql(I) drivers for mdb2, and I don't think we have special PHP4 bugreports regarding charsets.


WOW :)
I remember a problem but I could not find it now, I only found this


Wiedmann wrote:
(BTW: The prerequist for cmsms is MySQL 4.1)


No, requirement for < 2.0 is MySQL 3.23


Wiedmann wrote:
And about enabling "set names" in include.php. Just enabling this is also wrong. First you must correct all wrong chars in the db.


I agree


Wiedmann wrote:
Quote:
Code:
 [client]
   default-character-set=utf8

You remember: PHP doesn't use this.


True, is because I use mysql programs for backup/check


Wiedmann wrote:
Quote:
- No access to mysql/apache resources: php query:
Quote:
  SET NAMES utf8 or SET CHARACTER_SET utf8

What does you mean with "no access"?


;D  No access to my.cnf and httpd.conf/Override None

Why you don't you take a look at the 2.0 code?

Alby

_________________
CMSMS Support Team
Italian Admin and Moderator

Plugins: Geolocate hostip, Multiple random image, Image rotator (beta), Content Pagination
Modules: ForumMadeSimple (Howto), TranslationManager
Multilingual: MLE is not CMSMS


Top
 Profile  
 
 Post subject: Re: Question about the transform sql
PostPosted: Tue Apr 22, 2008 10:14 am 
Offline
Forum Members
Forum Members

Joined: Wed Mar 26, 2008 1:49 am
Posts: 233
Location: Stuttgart / Germany
Quote:
but I could not find it now, I only found this

The old problem.. A MySQL utf-8 table with correct utf-8 chars (created with phpMyAdmin or MySQL QueryBrowser or...), but the PHP script connects with latin1. They should read, how MySQL handle charsets ;-)


Quote:
No, requirement for < 2.0 is MySQL 3.23

Ups, my fault. I thought I had read this somewhere.

Quote:
Quote:
First you must correct all wrong chars in the db.

I agree

If someone (or you) is interesting. I've attached a script to convert all tables correct to utf-8.
- just put this script in a subdir of your cmsms root and execute it (browser or shell)
- after that apply the patch to adodb.functions.php to enable utf-8 db connections.
(you must not change anything in "my.ini")

(Backup your db and only use this script with MySQL! ;-)  only testet with a default installation of cmsms with sample data and some chinese chars... )


Quote:
True, is because I use mysql programs for backup/check

And I have always latin1 as client encoding, because all my shells are setup to iso-8859-1 and I want work with the command line clients. For backup/restore I change this with the client options parameter.


Quote:
Why you don't you take a look at the 2.0 code?

Is this code (formationg/docblocks) as bad as that of version 1.2?


Attachments:
transform-charset.php.txt [5.57 KiB]
Downloaded 162 times
lib-adodb.functions.php.diff.txt [442 Bytes]
Downloaded 146 times
Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Arvixe - A CMSMS Partner