Cannot use emojis in any content (character encoding issue)

For questions and problems with the CMS core. This board is NOT for any 3rd party modules, addons, PHP scripts or anything NOT distributed with the CMS made simple package itself.
Post Reply
emgaron
Forum Members
Forum Members
Posts: 17
Joined: Tue Jul 10, 2012 6:52 am

Cannot use emojis in any content (character encoding issue)

Post by emgaron »

I have recently re-build my personal website from scratch and everything went fine, even was able to create my own theme, set up multilingual support and a few more other nice things.

Then came the day when I wanted to add the first content containing emojis and ended up with an error message like this:

Code: Select all

DEBUG: SQL = INSERT INTO cms_module_news (news_id, news_category_id, news_title, news_data, summary, status, news_date, start_time, end_time, create_date, modified_date,author_id,news_extra,news_url,searchable) VALUES (19,'1','test2','

😀
','','published','2025-05-10 20:11:36',NULL,NULL,'2025-05-10 20:11:46','2025-05-10 20:11:46',1,'','',1)
Incorrect string value: '\xF0\x9F\x98\x80
(this is from a news posting, same happens when creating a new content page, only the table and parameters in the error message differ)

I realised that when creating the database, I did not pay attention to character encoding - which means that the database and all tables in MariaDB were created in 'latin1' (the server config has been in use for years). I assume that that is the core cause behind this issue.

I have since searched and experimented a lot - among other things, I found this post in this forum, describing a similar problem. I have by now switched the database to 'utf8mb4' and converted a few tables (the ones showing up in the error messages) - but unfortunately, that does not seem to make any difference so far. I know that this is very likely not a CMSMS-issue as such, but maybe somebody out there has a brainwave that helps me getting this solved without having to re-build the whole site from scratch again... Thank you kindly in advance!
emgaron
Forum Members
Forum Members
Posts: 17
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

I am one step further - when I copy the "INSERT" statement from the error message into the command line MariaDB client, I can insert emojis into the cms_module_news table, and I can display them in the command line client. The newsitem is also visible in CMSMS, but the emoji is displayed as '?'.

Inserting the emoji with the command line client only works if the client and the connection are also set to utf8mb4. Judging from that, it looks like CMSMS is using another connection charset (in the command line client I got the exact same error when the connection charset was set to utf8mb3).

That leads me to the following questions:
  1. How can I tell CMSMS to use the correct connection charset?
  2. Alternatively: What information does CMSMS use to determine which connection charset it should use?
emgaron
Forum Members
Forum Members
Posts: 17
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

And the next step: setting the default charset of the MariaDB-server to utf8mb4 (was latin1) causes the initial charset of the connection to be set to utf8mb4. I was able to confirm that by adding the following line to lib/classes/Database/mysqli/class.Connection.php in Connect():

Code: Select all

echo 'Initial character set: ' . $this->_mysql->character_set_name() . "<br/>\n";
Unfortunately, that still does not solve the problem... Emojis (and I would assume any other 4-byte unicode characters) still fail. The log of MariaDB shows that "something" sets the session character set to utf8mb3. At this point, the database is utf8mb4, the server is set to utf8mb4, and why anything would switch back to utf8mb3 is totally beyond me... ???
emgaron
Forum Members
Forum Members
Posts: 17
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

Next data point: I just created a completely new website installation. When creating the database in MariaDB, I made certain that it was utf8mb4, then ran the CMSMS-installer. Result: inserting four-byte unicode fails, just like with the original website. Upon inspection, all tables were generated as utf8mb3 rather than utf8mb4 (what server and database are set to). When looking at CreateTableSQL() in lib/classes/Database/mysqli/class.DataDictionary.php (I assume that that's where the tables are created), I see:

Code: Select all

$str = 'ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_general_ci';
I seem to remember having read that on older MariaDB (mine is still at 10.x - upgrade is planned), that is equivalent to utf8mb3 (can't find the actual source of that info right now). A fresh install with a MariaDB older than 11.something will therefore likely always end up with utf8mb3, I think?

Anyway, looks like I need to dig further...
User avatar
creopard
Forum Members
Forum Members
Posts: 89
Joined: Fri Nov 10, 2017 10:25 am
Location: .de
Contact:

Re: Cannot use emojis in any content (character encoding issue)

Post by creopard »

MariaDB 11.6's default is utf8mb4
https://mariadb.com/kb/en/changes-impro ... iadb-11-6/

MariaDb 10.5 to 10.6 changed the default from utf8 to utf8mb3:
https://mariadb.com/kb/en/upgrading-fro ... iadb-10-6/


However, MariaDB 10.6 also supports collations like "utf8mb4_bin"
emgaron
Forum Members
Forum Members
Posts: 17
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

creopard wrote: Wed May 14, 2025 12:44 pm [...]
MariaDb 10.5 to 10.6 changed the default from utf8 to utf8mb3:
https://mariadb.com/kb/en/upgrading-fro ... iadb-10-6/

However, MariaDB 10.6 also supports collations like "utf8mb4_bin"
I'm on 10.9, so that tracks, thanks!

Other than that, I'm slowly approaching something that seems to be working. Apparently, when I force the connection to utf8mb4 in the do_sql() function by adding

Code: Select all

$this->_mysql->query("SET NAMES utf8mb4 COLLATE utf8mb4_unicode_520_ci");
right before the original query-statement, things start to work and I can use 4-byte unicodes. Now I'm wondering how to proceed...
Post Reply

Return to “CMSMS Core”