Page 1 of 1

Search module: Problems with HTML entities

Posted: Thu Jan 24, 2008 11:46 am
by faglork
Hello,

I am running CMSMS 1.2.3 in german language and noted that when I search for a word containing an HTML entity I get no results.

Example: When I have a page containing text with german umlauts e.g. - äöü - marked up correctly with html entities, I get no search results. When I switch off wysiwyg editor and enter text by hand, without masking the umlauts, the page can be found.

It seems that search does not mask non-ascii characters as entities.

Did I overlook something? Any solutions?

Cheers,
Alex

Re: Search module: Problems with HTML entities

Posted: Thu Feb 14, 2008 12:18 pm
by hibr
Hi Alex,

I use CMSms 1.2.3 with TinyMCE as its content-editor and the search module find words with German Umlauts. But you are partly right. The search of words with umlauts is case-sensitive!!

But I found out one other curiosities or bug (maybe features  :():
You have to type in the whole word to find it. If you like to search for only a part of a word the search run fails. E.g. You type in "Dokument" and you will find all pages with "Dokument" but CMSms-Search do not show pages which contain the plural form "Dokumente" with a trailing "e".

Both characteristics make the search module mess useful for surfers.

Regards,

Hani

Re: Search module: Problems with HTML entities

Posted: Thu Feb 14, 2008 1:26 pm
by faglork
Hi,

well, don't know how & why, but after  re-inxexing everything now works perfectly.

I do not find a case-sensitivty in Umlauts ...

As to the other problem: Is there something like an advanced search, wher you can use + and - for in-/exclusion and the like?

This would be a nice feature ...

CHeers,
Alex

Re: [solved] Search module: Problems with HTML entities

Posted: Sun Feb 17, 2008 7:06 pm
by hibr
Ok, I cleaned cache and re-indexed the site but I still have the case-sensitivity on umlauts.

But when I display the html-code of the page the umlauts are not marked-up with html-entities. I see the ÄäÖö etc. In my template I use the default utf:



Database uses UTF and in the config.php the encoding is:

$config['default_encoding'] = '';
$config['admin_encoding'] = 'utf-8';

as the default. The TinyMCE setting for entities was raw, so I switched to named and I let TinyMCE create one page's content. The result was correct html-entities in the content block of this page. But the results of Search were the same - case sensitive for umlauts even for this page.

What are your settings?

Regards, Hani

PS: Concerning Advanced Search: "+" works already. Enter two words separated with a blank and you only get the page(s) which contain both words. Very nice.

Re: [solved] Search module: Problems with HTML entities

Posted: Mon Feb 18, 2008 8:41 am
by faglork
hibr wrote: Ok, I cleaned cache and re-indexed the site but I still have the case-sensitivity on umlauts.

But when I display the html-code of the page the umlauts are not marked-up with html-entities. I see the ÄäÖö etc.
Däng!! That did it :-(((

I noticed that I entered the text of the page which I used as a test case in "source code mode" so the Umlauts of this page didn't get changed to entities. I corrected that. Now the coding is perfect, but
search does not work on catitalized umlauts, just as in your case - but with one difference: I have correct entities on the page.

Uh, oh - I just noticed: I it even worse, after re-indexing the search with umlauts does not work at all.
So I am right where I started.

I have to cross-check with several systems, so it will take a day or two to get a complete picture.
I will post as soon as possible.

Alex