Question about the transform sql

This is a FORK of the CMS Made Simple project and is not oficially supported in any way by the CMS Made Simple development team.
Locked
Wiedmann
Forum Members
Forum Members
Posts: 233
Joined: Wed Mar 26, 2008 1:49 am
Location: Stuttgart / Germany

Question about the transform sql

Post by Wiedmann »

Hi Alby,

one additional question:
Why did you set the charset for the new columns explicit to 'utf8'?

At the moment I'm using the above example sql to transform cmsms to mle.

(Of cource, it would be nice/a good idea, to really change/convert the db to utf8)

Regards,
Carsten
Attachments

[The extension txt has been deactivated and can no longer be displayed.]

[The extension txt has been deactivated and can no longer be displayed.]

alby

Re: Question about the transform sql

Post by alby »

Wiedmann wrote: Why did you set the charset for the new columns explicit to 'utf8'?
Generally people who use multilingual must use UTF-8, in this way sets the db to use this instead latin-sweden (mysql default) but is not important if not appear strange characters  :)

Alby
Wiedmann
Forum Members
Forum Members
Posts: 233
Joined: Wed Mar 26, 2008 1:49 am
Location: Stuttgart / Germany

Re: Question about the transform sql

Post by Wiedmann »

Generally people who use multilingual must use UTF-8
Right, and CMSMS is allready using UTF-8.

But:
CMSMS itself have a really strange misbehaviour in using UTF-8 in db's. Because it's connecting with latin1 to the db. All utf-8  char's are stored as a sequence of 1,2 or 3 latin1 chars in the db. Regardsless if the the db is setup to latin1 or utf-8.

If you want verify this:
DB setup with latin1:
Just make a content page with e.g. the German word "König". The word looks fine in the browser. Now look at the db with phpMyAdmin. What can you see?

In a second test change the db to utf-8:
Now the same test as above. What can you see: Right... the same.

In a third test:
Change the content of the UTF-8 db with phpMyAdmin and put some valid utf-8 chars in it (e.g. change König to Müller). What can you see in your Browser if you acces your website?

At the moment it makes no sense to set only these (MLE)  columns to utf-8.


Of course, the way CMSMS is using charsets in the db should be corrected:
- you can't really change utf-8 content with a 3rd.party program like phpMyAdmin.
- if the default charset of the server is utf-8 before you install CMSMS, and you install CMSMS, not all indexes will be generated, but you have no error message about this situation.
- And of course, no natural language sorting of the db will work at the moment.

(I think, most of the cmsms devs are us-(ascii) people. Ans thus they don't have this problem ;-) )
Last edited by Wiedmann on Mon Apr 21, 2008 8:55 am, edited 1 time in total.
alby

Re: Question about the transform sql

Post by alby »

Wiedmann wrote: Of course, the way CMSMS is using charsets in the db should be corrected:
- you can't really change utf-8 content with a 3rd.party program like phpMyAdmin.
- if the default charset of the server is utf-8 before you install CMSMS, and you install CMSMS, not all indexes will be generated, but you have no error message about this situation.
- And of course, no natural language sorting of the db will work at the moment.

(I think, most of the cmsms devs are us-(ascii) people. Ans thus they don't have this problem ;-) )
Yes (most of the cmsms devs are us-(ascii)) and no (you can really change utf-8 content with a 3rd.party program)
but there are many problem related .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8

In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8') for insert in/view from DB
If you then use a editor that translate in html entity .....  :o

UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?

Alby
Wiedmann
Forum Members
Forum Members
Posts: 233
Joined: Wed Mar 26, 2008 1:49 am
Location: Stuttgart / Germany

Re: Question about the transform sql

Post by Wiedmann »

and no (you can really change utf-8 content with a 3rd.party program) .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8
PHP is not the problem. But CMSMS is not using "SET NAMES utf8" in the db connection, although they use utf8 for out/input.
UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?
That's not a problem. With a correct "set names" the db make sure the client have the correct chars. Thus you can output iso-8859-15 to the client, but store the data in utf-8 tables. Or output utf-8, but store the text in columns which have different charsets for each language.

Well, the easiest thing is really using utf-8 in output and db. And this works with cmsms after a small test... Of course, just setting the charset in db to utf-8  is not enought, if you transform a working cmsms. In this case you must first transform the wrong latin1(utf-8) chars to real utf-8 chars. But with a little script this is a easy thing ;-)
In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8')
Is this not also normal? IMHO that's a basic thing if you output text form a db.... (BTW: afaik there is a cmsms internal function for this) (Of course, no one should really store entities in the db...)



(But in the summary: CMSMS MLE works also, if the columns are latin1 or whatever, because cmsms don't use charsets in the db.)
alby

Re: Question about the transform sql

Post by alby »

Wiedmann wrote:
and no (you can really change utf-8 content with a 3rd.party program) .... the most important is intrinsic with php: default is ALL ISO-8859-1 and not UTF8
PHP is not the problem. But CMSMS is not using "SET NAMES utf8" in the db connection, although they use utf8 for out/input.
I wanted to say that PHP is a problem for general installations. PHP 5 is ok but PHP 4 has problem (mysql_fetch bug in few version of 4.4)

Wiedmann wrote:
UTF-8 is not the panacea for everything and should be evaluated case by case, if a person does not want utf8 because, for example, use languages ISO-8859-15?
That's not a problem. With a correct "set names" the db make sure the client have the correct chars. Thus you can output iso-8859-15 to the client, but store the data in utf-8 tables. Or output utf-8, but store the text in columns which have different charsets for each language.

Well, the easiest thing is really using utf-8 in output and db. And this works with cmsms after a small test... Of course, just setting the charset in db to utf-8  is not enought, if you transform a working cmsms. In this case you must first transform the wrong latin1(utf-8) chars to real utf-8 chars. But with a little script this is a easy thing ;-)
Again, for general installations "set names" (if you want uncomment query in include.php) implies that you must have mysql 4.1 to work well, but until a month ago I had two sites MLE with mysql 3.23.58!!

Wiedmann wrote:
In a my script I must use: htmlspecialchars($query, ENT_QUOTES, 'UTF-8')
Is this not also normal? IMHO that's a basic thing if you output text form a db.... (BTW: afaik there is a cmsms internal function for this) (Of course, no one should really store entities in the db...)

(But in the summary: CMSMS MLE works also, if the columns are latin1 or whatever, because cmsms don't use charsets in the db.)
My bad example but it was to say that you will still problems with the WYSIWYG editor.
My personal experience is:

- If I have access to mysql/apache resources:
- my.cnf:
  [mysqld]
  collation_server=utf8_unicode_ci
  character_set_server=utf8

  [client]
  default-character-set=utf8

- htaccess or httpd.conf:
  AddDefaultCharset UTF-8

- No access to mysql/apache resources: php query:
  SET NAMES utf8 or SET CHARACTER_SET utf8

- database/table/columns text: UTF8
- header('Content-Type: text/html; charset=utf-8' );
-

Alby
Wiedmann
Forum Members
Forum Members
Posts: 233
Joined: Wed Mar 26, 2008 1:49 am
Location: Stuttgart / Germany

Re: Question about the transform sql

Post by Wiedmann »

(mysql_fetch bug in few version of 4.4)
Can you explain this more detailed?
--> I've written some of the charset code for the mysql(I) drivers for mdb2, and I don't think we have special PHP4 bugreports regarding charsets.

Of course, if you store data with an application like phpMyAdmin (which handle charsets correct) in the db, and then retrieve the data with a wrong client encoding in PHP, the result is not correct.
--> same happens with cmsms at the moment.
Again, for general installations "set names" (if you want uncomment query in include.php) implies that you must have mysql 4.1 to work well, but until a month ago I had two sites MLE with mysql 3.23.58!!
That's also no problem, because mysql in your (x)html output.
- No access to mysql/apache resources: php query:
  SET NAMES utf8 or SET CHARACTER_SET utf8
What does you mean with "no access"?

BTW:
"SET CHARACTER SET" is in most times not what you want. This sets the connection charset to the default server charset. "SET NAMES" (or better mysql(i)_set_charset) is the correct way.
- database/table/columns text: UTF8
- header('Content-Type: text/html; charset=utf-8' );
-
That's what I also have. But as I've written above:
For the DB this makes only sense, if you use "SET NAMES 'utf8'" and also correct the wrong chars in the db and adjust one index.
If you don't use "SET NAMES", better use "latin1" in the db (and save space).

That works without problems for the base cmsms. If you install additional modules, you can have a problem if only your tables have a default charset utf8, but the db not.
Last edited by Wiedmann on Tue Apr 22, 2008 5:01 am, edited 1 time in total.
alby

Re: Question about the transform sql

Post by alby »

Wiedmann wrote:
(mysql_fetch bug in few version of 4.4)
Can you explain this more detailed?
--> I've written some of the charset code for the mysql(I) drivers for mdb2, and I don't think we have special PHP4 bugreports regarding charsets.
WOW :)
I remember a problem but I could not find it now, I only found this

Wiedmann wrote: (BTW: The prerequist for cmsms is MySQL 4.1)
No, requirement for < 2.0 is MySQL 3.23

Wiedmann wrote: And about enabling "set names" in include.php. Just enabling this is also wrong. First you must correct all wrong chars in the db.
I agree

Wiedmann wrote:

Code: Select all

 [client]
   default-character-set=utf8
You remember: PHP doesn't use this.
True, is because I use mysql programs for backup/check

Wiedmann wrote:
- No access to mysql/apache resources: php query:
  SET NAMES utf8 or SET CHARACTER_SET utf8
What does you mean with "no access"?
;D  No access to my.cnf and httpd.conf/Override None

Why you don't you take a look at the 2.0 code?

Alby
Wiedmann
Forum Members
Forum Members
Posts: 233
Joined: Wed Mar 26, 2008 1:49 am
Location: Stuttgart / Germany

Re: Question about the transform sql

Post by Wiedmann »

but I could not find it now, I only found this
The old problem.. A MySQL utf-8 table with correct utf-8 chars (created with phpMyAdmin or MySQL QueryBrowser or...), but the PHP script connects with latin1. They should read, how MySQL handle charsets ;-)

No, requirement for < 2.0 is MySQL 3.23
Ups, my fault. I thought I had read this somewhere.
First you must correct all wrong chars in the db.
I agree
If someone (or you) is interesting. I've attached a script to convert all tables correct to utf-8.
- just put this script in a subdir of your cmsms root and execute it (browser or shell)
- after that apply the patch to adodb.functions.php to enable utf-8 db connections.
(you must not change anything in "my.ini")

(Backup your db and only use this script with MySQL! ;-)  only testet with a default installation of cmsms with sample data and some chinese chars... )

True, is because I use mysql programs for backup/check
And I have always latin1 as client encoding, because all my shells are setup to iso-8859-1 and I want work with the command line clients. For backup/restore I change this with the client options parameter.

Why you don't you take a look at the 2.0 code?
Is this code (formationg/docblocks) as bad as that of version 1.2?
Attachments

[The extension txt has been deactivated and can no longer be displayed.]

[The extension txt has been deactivated and can no longer be displayed.]

Locked

Return to “[locked] CMSMS MLE fork”