Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Encoding (latin1/ISO-8859-1 vs. UTF-8) conundrum: wiki page
Max_B
New
I have saved a first draft of this page, about halfway at the moment.
I'm not a native english speaker so feel free to read and correct orthography or syntax errors.
If something I unclearly worded, add a small comment here or there, I'll reformulate.
Overall comments are untimely until I finish it.
I'm not a native english speaker so feel free to read and correct orthography or syntax errors.
If something I unclearly worded, add a small comment here or there, I'll reformulate.
Overall comments are untimely until I finish it.
0
This discussion has been closed.
Comments
This page is intended to help resolve issues related to character encoding in Vanilla and MySQL.
Pleas folks, correct the second part.
I hope I didn't forget something important.
I did not check the installer/upgrader but the installer may set the connection setting to utf-8 and the ugrader leave it untouched.
Until 1.1, all new install on MySQL 4.1+ where created "weird". From 1.1.1, with database setup to utf-8
while leaving DATABASE_CHARACTER_ENCODING blank, new installation are"worst"OK by default.Much thrilling to clean up. If you switch, just add two lines in the wiki to tell user what is the default now on.And, last, this setting is misnamed. It is due to reflect the client encoding, not the base. I know that historical reason lead to name it this way, nevertheless…
That's exactly how I've set it up.
Sorry for having presumed it wasn't. Had no time to check it out.
Hope we have got rid of utf-8 recurrent questions.
Thanx Max_B for the excellent wiki tutorial!
I'm not sure mine is a 'utf-8 recurrent question', but it does have smth to do with encoding. The thing is, if you use Janine extension to post from Wordpress to Vanilla in non-latin charset, what you get in place of the blog post title is something in unidentifiable encoding. In my case, everything is in UTF-8: DB, Vanilla and Wordpress, and still...
(Interestingly, and in case that helps, this kind of 'encoding' pops-up within WP as well in certain areas, also related to crossposting. Thus, for example, if you use the Crossposter plugin for posting to LiveJournal, it gives you the option of putting in a custom notification-message. Try to put it in Cyrillic, for instance, and then, already in WP Dashboard, once you saved the option, the original Cyrillic text is displyed precisely the way it is when it gets from WP to Vanilla. That is, unreadable.)
From the info you gave I'd bet that the problem is on wordpress side and of the same weird/worst syndrome. That is, part of wordpress is utf-8 aware/capable but some addons or secondary area are probably missing the famous "SET NAMES utf8" in mysql opening resulting in weird or worst encoding, depending of the wordpress database default encoding. Try looking at the relevant code.
Also try asking squirrel, the Janine author if he knows about this stuff.
I'd look at it if you get stuck.
I've checked the WP db and even transcoded it, just in case, to utf8 the way you descibe the procedure on your wiki page. This did not change anything, as far as crossposting through Janine was concerned. But then I enabled the preformating option in ecto blogging mac-client and posted everything to WP through it (it encodes HTML-entities) while also enabling the Janine feature that posts not just the link but the whole of the blog-post content to Vanilla. This is when things started to get really weird, to use your label.
1. If the blog-post TITLE was in Cyrrilic, then this post never got to Vanilla from WP.
2. If at least one word in latin was included (I guess, even a character would be enough) into the title, the post appeared in Vanilla with the latin part displayed correctly; the rest - as HTML substitutes for Cyrillic characters (and not just completely unreadable, as before).
3. The content of the post is then displayed in Vanilla correctly in Cyrrilic. It is also displayed correctly on my blog-page, which is NOT the WP original page. In WP Dashboard, however, it is then displayed as HTML symbols.
4. If the preformatting in ecto is disabled, then both Cyrillic titles and content get through, but completely garbled. As is the case, when smth is posted not through ecto but from the WP Dashboard directly ...
Hope, this makes sense to you. It surely doesn't to me
That's easy :
- launch phpMyAdmin (if you use another front end, operation should be similar).
- when you create the new base, there is a select (just beside or below the new base name field) allowing to set the default character set and collation (that means how characters are sorted) for the base. Scroll down the (huge) list to select utf8_general_ci
That is it. If you start with a fresh install, Vanilla is already set-up for utf-8.
Edit: As you wrote you have already created the new base, you can either delete and re-create it, or else change encoding from the "operations" tab in phpMyAdmin.
I bump it because I added today a new chapter to the wiki page.
I recently had to sort out some weird bugs on filenames and though it could help some folks to have this information in short form somewhere.
Eventually I found this configuration setting - $Configuration['CHARSET']. I then added this and the $Configuration['DATABASE_CHARACTER_ENCODING' in my settings.php file and my problem was solved.
Here's how they look like:
$Configuration['DATABASE_CHARACTER_ENCODING'] = 'utf8';
$Configuration['CHARSET'] = 'ISO-8859-1';
Max_B: Thank you for the wikipedia page who was helpful in my quest for the solution.
Should I change the DATABASE_CHARACTER_ENCODING to 'ISO-8859-1' as well?