Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Encoding (latin1/ISO-8859-1 vs. UTF-8) conundrum: wiki page

Max_BMax_B New
edited November 2008 in Vanilla 1.0 Help
I have saved a first draft of this page, about halfway at the moment.
I'm not a native english speaker so feel free to read and correct orthography or syntax errors.
If something I unclearly worded, add a small comment here or there, I'll reformulate.
Overall comments are untimely until I finish it.
«1

Comments

  • MarkMark Vanilla Staff
    That looks awesome so far - thanks for this!
  • I just made 2 tiny changes with what I believe looks more correct/readable. The only other thing that strikes me is that the second sentence "This page is intended to clear it up and give indication to sort any unclean situation." seems a bit...strange. What are you aiming for with this? An alternative I'd suggest could be "This page is intended to clear up any issues and provide information to sort any ?unclean? situation."
  • MarkMark Vanilla Staff
    How about:

    This page is intended to help resolve issues related to character encoding in Vanilla and MySQL.
  • Done!
    Pleas folks, correct the second part.

    I hope I didn't forget something important.
  • Max_BMax_B New
    edited March 2007
    @Mark: Now that we have a central place for encoding related issues, I suggest that the next minor release of Vanilla be full utf-8 compliant, rather than letting the admin to do it.
    I did not check the installer/upgrader but the installer may set the connection setting to utf-8 and the ugrader leave it untouched.
    Until 1.1, all new install on MySQL 4.1+ where created "weird". From 1.1.1, with database setup to utf-8 while leaving DATABASE_CHARACTER_ENCODING blank, new installation are "worst"OK by default. Much thrilling to clean up. If you switch, just add two lines in the wiki to tell user what is the default now on.
    And, last, this setting is misnamed. It is due to reflect the client encoding, not the base. I know that historical reason lead to name it this way, nevertheless…
  • MarkMark Vanilla Staff
    I did not check the installer/upgrader but the installer may set the connection setting to utf-8 and the ugrader leave it untouched.

    That's exactly how I've set it up.
  • Right!
    Sorry for having presumed it wasn't. Had no time to check it out.
    Hope we have got rid of utf-8 recurrent questions.
  • MarkMark Vanilla Staff
    No doubt! Thanks for all your work on that page, Max :)
  • Succesfully upgraded my forum from latin1 to utf-8.
    Thanx Max_B for the excellent wiki tutorial!
  • @Max-B
    I'm not sure mine is a 'utf-8 recurrent question', but it does have smth to do with encoding. The thing is, if you use Janine extension to post from Wordpress to Vanilla in non-latin charset, what you get in place of the blog post title is something in unidentifiable encoding. In my case, everything is in UTF-8: DB, Vanilla and Wordpress, and still...
    (Interestingly, and in case that helps, this kind of 'encoding' pops-up within WP as well in certain areas, also related to crossposting. Thus, for example, if you use the Crossposter plugin for posting to LiveJournal, it gives you the option of putting in a custom notification-message. Try to put it in Cyrillic, for instance, and then, already in WP Dashboard, once you saved the option, the original Cyrillic text is displyed precisely the way it is when it gets from WP to Vanilla. That is, unreadable.)
  • Its hazardous to diagnose at this stage: I didn't look at Janine, and my own wordpress blog is sleepy from long time at 1.0 release.
    From the info you gave I'd bet that the problem is on wordpress side and of the same weird/worst syndrome. That is, part of wordpress is utf-8 aware/capable but some addons or secondary area are probably missing the famous "SET NAMES utf8" in mysql opening resulting in weird or worst encoding, depending of the wordpress database default encoding. Try looking at the relevant code.
    Also try asking squirrel, the Janine author if he knows about this stuff.
    I'd look at it if you get stuck.
  • hmmm... it's not that I insist (or persist) but now it is getting really confusing (and interesting). so, just to share some experiences...

    I've checked the WP db and even transcoded it, just in case, to utf8 the way you descibe the procedure on your wiki page. This did not change anything, as far as crossposting through Janine was concerned. But then I enabled the preformating option in ecto blogging mac-client and posted everything to WP through it (it encodes HTML-entities) while also enabling the Janine feature that posts not just the link but the whole of the blog-post content to Vanilla. This is when things started to get really weird, to use your label.
    1. If the blog-post TITLE was in Cyrrilic, then this post never got to Vanilla from WP.
    2. If at least one word in latin was included (I guess, even a character would be enough) into the title, the post appeared in Vanilla with the latin part displayed correctly; the rest - as HTML substitutes for Cyrillic characters (and not just completely unreadable, as before).
    3. The content of the post is then displayed in Vanilla correctly in Cyrrilic. It is also displayed correctly on my blog-page, which is NOT the WP original page. In WP Dashboard, however, it is then displayed as HTML symbols.
    4. If the preformatting in ecto is disabled, then both Cyrillic titles and content get through, but completely garbled. As is the case, when smth is posted not through ecto but from the WP Dashboard directly ...
    Hope, this makes sense to you. It surely doesn't to me :)
  • edited April 2007
    Thanks Max_B I added the collations and characters from phpmyadmin to the wiki
  • Hey guys. I have read the wiki page until I am blue in the face- and short a couple handfulls of hair. I am trying to get my forum in Japanese and I am not sure how to change my database to utf-8. I want to start fresh with a new vanilla and a new database. I sucessfully created a new database and I have to change it to utf-8 before I connect it to vanilla right? How do I do the mysql side if I am doing it from scratch? I tried mysql> set names 'utf8' and it just changed back to latin by itself- I don't really understand what I am doing. What do I need to do before I attach vanilla if I am just starting with a new database? Please help, I am at a total loss. Thanks.
  • Max_BMax_B New
    edited November 2007
    @legolas
    That's easy :
    - launch phpMyAdmin (if you use another front end, operation should be similar).
    - when you create the new base, there is a select (just beside or below the new base name field) allowing to set the default character set and collation (that means how characters are sorted) for the base. Scroll down the (huge) list to select utf8_general_ci

    That is it. If you start with a fresh install, Vanilla is already set-up for utf-8.

    Edit: As you wrote you have already created the new base, you can either delete and re-create it, or else change encoding from the "operations" tab in phpMyAdmin.
  • Thanky sir!
  • Max_BMax_B New
    edited October 2008
    This thread eventually sunk, hopefully because everybody has now successfully switched to utf-8.
    I bump it because I added today a new chapter to the wiki page.
    I recently had to sort out some weird bugs on filenames and though it could help some folks to have this information in short form somewhere.
  • I upgraded from 0.9.2.6 yesterday, and struggled a bit because all my special characters (norwegian letters) was showing up like - "�rsm�te p�"

    Eventually I found this configuration setting - $Configuration['CHARSET']. I then added this and the $Configuration['DATABASE_CHARACTER_ENCODING' in my settings.php file and my problem was solved.

    Here's how they look like:

    $Configuration['DATABASE_CHARACTER_ENCODING'] = 'utf8';
    $Configuration['CHARSET'] = 'ISO-8859-1';

    Max_B: Thank you for the wikipedia page who was helpful in my quest for the solution.
  • I think $Configuration['DATABASE_CHARACTER_ENCODING'] define the encoding of the text sent to and from the database. If your forum works with ISO-8859-1, surely the connection should also be ISO-8859-1 (mysql call it "latin1"?).
  • Yes, as far as I can tell (little MySQL knowledge) my database have latin1.

    Should I change the DATABASE_CHARACTER_ENCODING to 'ISO-8859-1' as well?
This discussion has been closed.