Page 1 of 2

Search Module - Greek Search

Posted: Wed Oct 25, 2006 12:25 pm
by tsiger
The search function doesn't seem to work with international characters. I give specific terms in the search textfield (i know these terms exist in my content).

Any ideas? Any solutions?

Re: Search Module - Greek Search

Posted: Wed Oct 25, 2006 12:48 pm
by Ted
What version are you using?  I though I solved this problem in 1.0.2.

Re: Search Module - Greek Search

Posted: Wed Oct 25, 2006 1:11 pm
by tsiger
I am using 1.0.2 . Any related topics to read on? It just doesn't return any results...  ???

Re: Search Module - Greek Search

Posted: Wed Oct 25, 2006 3:41 pm
by kode_fi
I tested search module and int characters today. It seems to me that search works if characters are encoded to HTML entities (γε etc..) when stored to database. If those characters are stored to database as they are (not as html entities) searching doesn't find anything.
Do you use WYSYWYG-editor when adding content??

It seems to me that admin console stores Title and Menu Text fields to database without encoding international characters to HTML entities. I think that this prevents search module to find strings from Title and Menu Text containing int characters.

Re: Search Module - Greek Search

Posted: Wed Oct 25, 2006 4:23 pm
by tsiger
Exactly. This is the second day straight playing with this. I used fckeditor and characters are stored in database as "&Gamma&Epsilon" etc.. In that case search works just fine... If i take fckeditor out data is stored Like that in database "Σκατά". Both are searchable but the point is that how am i suppose to store the data correctly in the database, i mean with the correct characters, like "Ενα Δυο Τρια" and still be searchable?

Re: Search Module - Greek Search

Posted: Thu Oct 26, 2006 5:18 am
by kode_fi
Well, as you seem to have same problem as me, I'll file bug report to tracker. I'm not 100% sure is this CMSMS core problem or search module problem?

There are quite a few variables (in php.ini, my.cnf, config.php.....) which can mess up international characters. I've tried every combination I can imagine without success.
If i take fckeditor out data is stored Like that in database "Σκατά".
I made these changes to my.cnf:

Code: Select all

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
default-character-set = utf8
character-set-server = utf8
collation-server = utf8_general_ci
[mysql.server]
user=mysql
basedir=/var/lib

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]
default-character-set=utf8
With this I tried to make sure that connection from CMSMS to MySql database was done using UTF-8 encoding so that international characters would be stored to database as they are. I made changes to my.cnf because CMSMS is only software using Mysql on my server.
 

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 12:24 pm
by jozef
I am just evaluating CMSMS and i can confirm the issues with UTF-8.

My installation:
- CMSMS 1.0.2 on WAMP (Win2k, Apache 2, MySQL 4.1, PHP 4.4).
- CMSMS DB on MySQL 4.1 is set to collation utf8_general_ci

Issues:
Slovak characters stored with FCKEditor are encoded as htmlentities or completly damaged.
Search does not work.

Solution:
1. disable HTML entities in FCKEditor, in fckconfig.js set:
    FCKConfig.ProcessHTMLEntities = false ;
2. always connect to MySQL with utf8. Patch the include.php file; add this line:
    $db->Execute('set names utf8');
after line
    $db =& $gCms->GetDB();

I have success  ;)

Now i need to build a simple multilingual site. I have found several threads here in forum. Which solution is the best one?
Thanks
Jozef

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 12:52 pm
by tsiger
Nope. This didn't work for Greek characters. everything on my DB (mysql 5.0) is set to utf-8.

When i use fckeditor characters are stored as "Γ&Theta.. etc" and they are searchable.

I deactivate fckeditor, contenet goes as it should be in db but it's not searchable.  ???

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 1:18 pm
by jozef
tsiger: I wonder. Do you use AdoDb with MySQL 5.0+utf8 in other PHP software without need to use 'set names utf8' query?

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 1:37 pm
by tsiger
Josef i added the set names lines and i did everything u described :)
This is the first time i am using mysql 5.0 and that's because the site built with cmsmadesimple is going to be hosted on mysql 5.0 server.

Here's a step.

I edited fckconfig.js like that and now the characters are stored in db just fine using fckeditor.

Code: Select all

FCKConfig.ProcessHTMLEntities	= false ;
FCKConfig.IncludeLatinEntities	= true;
FCKConfig.IncludeGreekEntities	= false;
i noticed that e.g for the word "Αποτέλεσμα" which is stored in db properly, when i hit the search button the url contains
"cntnt01searchinput=%CE%91%CF%80%CE%BF%CF%84%CE%AD%CE%BB%CE%B5%CF%83%CE%BC%CE%B1".


Would that help?

I noticed that in Google as well... so not sure at all though.. but it's propably something between the script and the db... the characters in db are stored properly so... any thoughts?

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 2:20 pm
by tsiger
ok let's see coz it works but i am not sure if this is the right way...

after doing everything jozef said i did the following:

in the file action.dosearch.php i replaced this line :

Code: Select all

$ary[] = "word = " . $db->qstr(htmlentities($word, ENT_COMPAT, 'UTF-8'));
with this:

Code: Select all

$ary[] = "word = " . $db->qstr($word, ENT_COMPAT, 'UTF-8');
In other words i removed the htmlentities thing... and it works just fine... Data stored in db properly and everything is searchable.

By removing the htmlentities would that affect any other functionality?

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 2:22 pm
by jozef
Now i am confused. I thought that after my patcing there will be no HTML entities in DB. But they are there! and search works?!
tsiger: do you have same words in tables content_props and module_search_index?
core developers: why are you using the html entities?! It is a road to hell! If you use utf8 you should read about PHP and UTF8
http://www.phpwact.org/php/i18n/charsets, http://sourceforge.net/projects/phputf8

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 2:27 pm
by tsiger
jozef wrote: Now i am confused. I thought that after my patcing there will be no HTML entities in DB. But they are there!
There are no HTML entities in DB anymore jozef. These characters i mentioned before appear on the URL NOT the database.Try it by urself. Copy paste the word "Αποτέλεσμα" from here and check the url. U ll see what i mean. I mentioned that coz i thought it has something to do with it but apparently it's not. Forget about it. :)


Both tables have same words in them. Both stored properly. Everything searchable.

Everything works now :)

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 2:39 pm
by jozef
I am still confused. I have only my 2 patches, without  your one. I have inserted your greek word into one content page using FCKEditor.
In both previously mentioned tables the word is stored in HTML entities:
"Αποτέλεσμα" = αποτέλεσμα
And the search finds your word without your patch?!

Re: Search Module - Greek Search

Posted: Fri Oct 27, 2006 2:46 pm
by tsiger
aply those 2 changes in fckconfig.js

Code: Select all

FCKConfig.IncludeLatinEntities = true;
FCKConfig.IncludeGreekEntities = false;
re enter the greek word in one content page and search again. Without my patch it should not find anything