Cannot use emojis in any content (character encoding issue) Topic is solved

For questions and problems with the CMS core. This board is NOT for any 3rd party modules, addons, PHP scripts or anything NOT distributed with the CMS made simple package itself.
Post Reply
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Cannot use emojis in any content (character encoding issue)

Post by emgaron »

I have recently re-build my personal website from scratch and everything went fine, even was able to create my own theme, set up multilingual support and a few more other nice things.

Then came the day when I wanted to add the first content containing emojis and ended up with an error message like this:

Code: Select all

DEBUG: SQL = INSERT INTO cms_module_news (news_id, news_category_id, news_title, news_data, summary, status, news_date, start_time, end_time, create_date, modified_date,author_id,news_extra,news_url,searchable) VALUES (19,'1','test2','

😀
','','published','2025-05-10 20:11:36',NULL,NULL,'2025-05-10 20:11:46','2025-05-10 20:11:46',1,'','',1)
Incorrect string value: '\xF0\x9F\x98\x80
(this is from a news posting, same happens when creating a new content page, only the table and parameters in the error message differ)

I realised that when creating the database, I did not pay attention to character encoding - which means that the database and all tables in MariaDB were created in 'latin1' (the server config has been in use for years). I assume that that is the core cause behind this issue.

I have since searched and experimented a lot - among other things, I found this post in this forum, describing a similar problem. I have by now switched the database to 'utf8mb4' and converted a few tables (the ones showing up in the error messages) - but unfortunately, that does not seem to make any difference so far. I know that this is very likely not a CMSMS-issue as such, but maybe somebody out there has a brainwave that helps me getting this solved without having to re-build the whole site from scratch again... Thank you kindly in advance!
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

I am one step further - when I copy the "INSERT" statement from the error message into the command line MariaDB client, I can insert emojis into the cms_module_news table, and I can display them in the command line client. The newsitem is also visible in CMSMS, but the emoji is displayed as '?'.

Inserting the emoji with the command line client only works if the client and the connection are also set to utf8mb4. Judging from that, it looks like CMSMS is using another connection charset (in the command line client I got the exact same error when the connection charset was set to utf8mb3).

That leads me to the following questions:
  1. How can I tell CMSMS to use the correct connection charset?
  2. Alternatively: What information does CMSMS use to determine which connection charset it should use?
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

And the next step: setting the default charset of the MariaDB-server to utf8mb4 (was latin1) causes the initial charset of the connection to be set to utf8mb4. I was able to confirm that by adding the following line to lib/classes/Database/mysqli/class.Connection.php in Connect():

Code: Select all

echo 'Initial character set: ' . $this->_mysql->character_set_name() . "<br/>\n";
Unfortunately, that still does not solve the problem... Emojis (and I would assume any other 4-byte unicode characters) still fail. The log of MariaDB shows that "something" sets the session character set to utf8mb3. At this point, the database is utf8mb4, the server is set to utf8mb4, and why anything would switch back to utf8mb3 is totally beyond me... ???
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

Next data point: I just created a completely new website installation. When creating the database in MariaDB, I made certain that it was utf8mb4, then ran the CMSMS-installer. Result: inserting four-byte unicode fails, just like with the original website. Upon inspection, all tables were generated as utf8mb3 rather than utf8mb4 (what server and database are set to). When looking at CreateTableSQL() in lib/classes/Database/mysqli/class.DataDictionary.php (I assume that that's where the tables are created), I see:

Code: Select all

$str = 'ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_general_ci';
I seem to remember having read that on older MariaDB (mine is still at 10.x - upgrade is planned), that is equivalent to utf8mb3 (can't find the actual source of that info right now). A fresh install with a MariaDB older than 11.something will therefore likely always end up with utf8mb3, I think?

Anyway, looks like I need to dig further...
User avatar
creopard
Forum Members
Forum Members
Posts: 92
Joined: Fri Nov 10, 2017 10:25 am
Location: .de
Contact:

Re: Cannot use emojis in any content (character encoding issue)

Post by creopard »

MariaDB 11.6's default is utf8mb4
https://mariadb.com/kb/en/changes-impro ... iadb-11-6/

MariaDb 10.5 to 10.6 changed the default from utf8 to utf8mb3:
https://mariadb.com/kb/en/upgrading-fro ... iadb-10-6/


However, MariaDB 10.6 also supports collations like "utf8mb4_bin"
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

creopard wrote: Wed May 14, 2025 12:44 pm [...]
MariaDb 10.5 to 10.6 changed the default from utf8 to utf8mb3:
https://mariadb.com/kb/en/upgrading-fro ... iadb-10-6/

However, MariaDB 10.6 also supports collations like "utf8mb4_bin"
I'm on 10.9, so that tracks, thanks!

Other than that, I'm slowly approaching something that seems to be working. Apparently, when I force the connection to utf8mb4 in the do_sql() function by adding

Code: Select all

$this->_mysql->query("SET NAMES utf8mb4 COLLATE utf8mb4_unicode_520_ci");
right before the original query-statement, things start to work and I can use 4-byte unicodes. Now I'm wondering how to proceed...
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

I just tried one more thing - I started another install from scratch, but before executing the installation fully, I patched the installer to use "utf8mb4" as charset and "utf8mb4_unicode_520_ci" as collation in every place where either was set in an SQL-statement.

I then proceeded to complete the install. Result: creation of two tables fails:

Code: Select all

Created table permissions: .... Done
Created table siteprefs: .... failed
Created table user_groups: .... Done
[...]
Created table module_smarty_plugins: .... Done
Created index idx_smp_module
Created table routes: .... failed
Created table layout_tpl_type: .... Done
[...]
As a result, the site is not functional. The "siteprefs"-table also gave an error in my other test-site when trying to convert it manually to utf8mb4 - some error about a key being too large, so my guess is that that is the underlying problem. Maybe it's as simple as increasing the field size, but I don't know if there are any side effects of doing so...
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

Ok, I've settled on the following workaround to get four byte utf8 (which includes emojis) working in my set-up. The following steps allow me to use utf8mb4 in news articles, pages, page names and in the gallery module (title/comment) - if you are using other text-related modules, you may need to do more steps.
WARNING: since you are changing the encoding of the database/several tables, there is a real risk of getting garbled content! That depends mainly on your initial encoding - in my case it was latin1, which maps reasonably well (though not 100% for non-ASCII characters) to utf8mb4. As I did not have that much text yet, I was willing to take the risk. You have been warned.
Step 0
Make a backup of the database!

Step 1
Change the character set of the database:

Code: Select all

ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
Step 2
Convert tables where that is possible:

Code: Select all

USE database_name;
ALTER TABLE cms_content_props CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
ALTER TABLE cms_module_search_index CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
ALTER TABLE cms_module_news CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
ALTER TABLE cms_module_gallery CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
Step 3
Convert certain columns of tables that cannot be converted as a whole due to key size issues, but changing only the relevant columns seems possible:

Code: Select all

USE database_name;
ALTER TABLE cms_content MODIFY content_name varchar(255) CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_520_ci';
ALTER TABLE cms_siteprefs MODIFY sitepref_value text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
Step 4
Make certain that a utf8mb4 connection is used. If you are running newer versions of MariaDB/MySQL, this may not be necessary - try without this step first, and if everything works (i.e. you can add four byte unicode characters in your site), you're done. In my case, it does not seem to work out-of-the-box, despite MariaDB being set to utf8mb4 default (see discussion above). Hence, to get this to work, I needed to change one file of the CMSMS-code:
In config.php, add the line:

Code: Select all

$config['set_names'] = true;
In lib/classes/Database/class.compatibility.php, in function init(\cms_config $config), change the line

Code: Select all

if( $config['set_names'] ) $tmp[] = "NAMES 'utf8'";
to

Code: Select all

if( $config['set_names'] ) $tmp[] = "NAMES 'utf8mb4'";
emgaron
Forum Members
Forum Members
Posts: 20
Joined: Tue Jul 10, 2012 6:52 am

Re: Cannot use emojis in any content (character encoding issue)

Post by emgaron »

emgaron wrote: Thu May 22, 2025 9:03 pm [...]
Step 1
Change the character set of the database:

Code: Select all

ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
[...]
I probably should give some additional information here - I'm using utf8mb4_unicode_520_ci as collation (which is not the standard collation for utf8mb4) as I found several postings on the web recommending this collation over the standard one for older versions of MariaDB/MySQL. In newer versions, there are apparently even better collations, so if you are using a newer version (e.g. recent MariaDB 11.x), the standard collation may well be good enough.
Post Reply

Return to “CMSMS Core”