Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Migrating latin1 to utf8 in Vanilla 1.0.3

edited April 2007 in Vanilla 1.0 Help
Help! Vanilla was installed in an older server, with an older MySQL, and an older phpMyAdmin (which did not support UTF-8). Even though my forum uses Esperanto characters a lot (ĉ, ŝ, ĝ, ĥ, ĵ, ŭ), I had everything set in its default of 'latin1', and even though if I looked at my database via phpMyAdmin it all looked like garbled text, it still worked and displayed fine in the forum. I kept the forums up-to-date, and was running version 1.0.3 of Vanilla with several extensions, without any problems. A couple of days ago I upgraded our account to a new server, which has a newer MySQL, and a newer phpMyAdmin, supporting UTF-8 in full. I decided to change everything over to UTF-8, and after having read several of the discussions in here, thought that I'd do the migration the 'clever way': 1) In the old server, with the old phpMyAdmin using latin1, I downloaded a full sql export of the entire database. 2) Using a programmer's text editor (TextMate), I opened the sql export file, and got the program to automatically re-read all characters as UTF-8. It worked! All international characters displayed perfectly! I saved the file as UTF-8 text. 3) Went into the NEW phpMyAdmin, setup the new database, made sure its collation is UTF-8, and that the phpMyAdmin interface is also UTF-8. Imported the converted backup into the database, and voilá! In phpMyAdmin, the data is all there, and all the characters display perfectly! So, I opened the forum, now running from the new database, to test it out. But... the Esperanto characters are NOT showing - all I'm getting is the "?" instead. I do not understand, as from what I've read in previous postings, my entire system should be native UTF-8. Obviously, something isn't. I don't understand. Please, help!

Comments

  • Can anyone help?... I've looked at the explanations in other discussions (such as the technical one from Max_B about Vanilla 1.0.1) but it is a bit beyond me - I don't even know if it applies to me or not. Please, please, help...
  • I have just installed a *brand new* Vanilla, on a server with MySQL, with utf-8 enabled. Before I do the install, I go to phpMyAdmin and make sure that the collation is *definitely* utf-8. Installation progresses without any problems, and I start up with a brand new forum. After the installation, I again open phpMyAdmin and check that in the table definitions, in the appropriate columns I see the collation description as being "utf8_general_ci" (for instance, in the "Name" column of the "LUM_Discussion" table. So, in the MySQL end, everything is certainly defined as utf-8. I go back into the brand new forum, and start a new discussion. I give it a title with international characters, such as: "Ĉu ĉi tio funkcios?" I start the discussion, and it displays fine in the forum, with the characters showing as they should. Great. Now, I go back to phpMyAdmin, to see how this was saved in the database - remembering, that the database is most certainly utf-8. It saved is as "Ĉu ĉi tio funkcios?". This seems to show, that Vanilla is NOT sending information as utf-8 to the database - although, once it gets the response from the database, it is displaying it as utf-8! There is a problem in the db<->Vanilla queries, which is what is affecting my capacity to bring in my old forum data in a readable manner. I do not know how to fix this. Can anyone help? Please...
  • The solution to this problem seems to be in this rather cryptic discussion. This seems to be a well-known, documented problem, and hopefully Mark will include this patch in future versions of Vanilla, to avoid this problem resurfacing again.

    For those of us who have 1.0.3 installed, there is only one modification necessary, in order to make Vanilla fully utf-8 savvy. You must edit the file "Vanilla/library/Framework/Framework.Class.MySQL.php" in 2 places, as describe in the previously mentioned discussion. The poster actually includes 2 more edits, to installation files, which do not seem to be necessary, as Mark has already included *those* ones in the latest (1.0.3) version of Vanilla.

    So, for future reference, using version 1.0.3 of Vanilla, the easiest way to migrate your entire system from latin1 (or something else) to utf-8, is:

    1) using phpMyAdmin, export your entire database to an .sql file
    2) using a programmer's text editor (TextMate/TextWrangler/Smultron), translate the file (until you see the characters displayed correctly), then save the file using utf-8 encoding.
    3) make a new database in phpMyAdmin, and make sure it is utf-8 encoded (ie., it is using "utf8_general_ci" collation).
    4) apply the patch described above to a single file in your 1.0.3 installation
    5) browse the site

    It should work, and all your international characters should display correctly, both in Vanilla, as well as in phpMyAdmin.
  • I had the same problem, but I just changed all 'utf-8' in the Vanilla files to 'latin-1' and then installed Vanilla. Works great.
  • @icouto:
    Sorry I didn't read this discussion in time to help you, but you sorted it our by yourself, the right way.

    @Ares:
    Will work as long as nobody will try to post text in non latin language, say Russian or Arab or anything else.
  • Bump: The wonders of the interweb....links dead, I need the "instructions" in the link icouto referes to doesn't work http://lussumo.com/vanilladev/discussion/126/mysql-utf8-and-cyrillic-characters-i-sh-displays-like-/#Item_1 Looks like vanilladev died this week So, anybody know what needs changed in Vanilla/library/Framework/Framework.Class.MySQL.php ?
  • http://www.google.com/search?q=cache:gPnRvxNnOgEJ:lussumo.com/vanilladev/discussion/126/mysql-utf8-and-cyrillic-characters-i-sh-displays-like-/&hl=en&strip=1
  • Ah yes....

    image
  • could someone post the changes please, the cache has expired for me....
  • MarkMark Vanilla Staff
    I was hoping this wasn't going to come up before Vanilla 1.1. I made the call to remove those changes from the core because they were causing a lot of headaches for people already running Vanilla in the way it currently is set up. It's the whole "six of one and half dozen of another" conundrum. But I'm thinking I'll need to add this stuff back into the core and write a migration script for databases that have problems with the new encoding.

    dammit.
  • I though of writing and offering such a script to the community but I was not confident enough in my knowledge of Vanilla installed base and feared to face unsatisfied users. Also I found no spare time…
    Can I help in some way?
  • Max_BMax_B New
    edited March 2007
    @monkie and others in same position:
    If you want to patch your install before Mark release 1.1 (is it worthwhile?) here is a patch. It's different from the posted original because it is in a central place, rather than in several init_* file.

    The file to patch is Library/Framework/Framework.Class.MySQL.php. function GetConnection() { if (!$this->Connection) { if (!function_exists('mysql_connect')) $this->Context->ErrorManager->AddError($this->Context, $this->Name, 'GetConnection', 'You do not appear to have MySQL enabled for PHP.'); $this->Connection = @mysql_connect($this->Context->Configuration['DATABASE_HOST'], $this->Context->Configuration['DATABASE_USER'], $this->Context->Configuration['DATABASE_PASSWORD']); @mysql_query("SET NAMES 'utf8'"); //this line is UTF-8 patch
    and function GetFarmConnection() { if ($this->FarmConnection) { return $this->FarmConnection; } elseif ($this->Context->Configuration['FARM_DATABASE_HOST'] != '') { if (!function_exists('mysql_connect')) $this->Context->ErrorManager->AddError($this->Context, $this->Name, 'GetConnection', 'You do not appear to have MySQL enabled for PHP.'); $this->FarmConnection = @mysql_connect($this->Context->Configuration['FARM_DATABASE_HOST'], $this->Context->Configuration['FARM_DATABASE_USER'], $this->Context->Configuration['FARM_DATABASE_PASSWORD']); @mysql_query("SET NAMES 'utf8'"); //this line is UTF-8 patch

    In each code snipet the last line is the only mod.
  • MarkMark Vanilla Staff
    I was thinking about this last night. I think that I'm going to create a configuration setting called something like MYSQL_CHARACTER_ENCODING and set the default value to utf8. That way if you are running an existing vanilla and the new setting screws up your forum, you can just remove the utf8 and go back to the way it was before. Otherwise it should work forwards without any headaches.
  • Yes, if you are looking for quick and simple solution to release 1.1 this one will do.

    But admins of boards running in the "weird" or "worst" case (referring to my own term in this post) are on their own to clean things up.
    The weird case is rather simple to sort out, following a path outlined by Icouto in this thread, at the cost of a base re-install.
    The "worst" case is more cryptic. Incorrect characters bust be read as utf-8 (looking bad), converted to latin1 and saved (still looking bad) then read again as utf-8 (should looks ok).
    Users running MySQL < 4.1 are either all latin1 (OK) or weird.
    Mysql >= 4.1 are weird if database is set up as latin1 or worst if utf-8.
    Not to mention that Mysql charset cant be setup at different level, global, table, column or even literal data, and then there may be mixed case.
  • MarkMark Vanilla Staff
    Okay, I've done as I said and made a new configuration setting called DATABASE_CHARACTER_ENCODING.

    I've also upgraded the installer and upgrader scripts so that they make the db and it's tables utf8 if it is above mysql 4.1 - they will automatically save this variable in the conf/database.php file as well.

    The only thing I did differently was to make the default value for the DATABASE_CHARACTER_ENCODING setting blank. I did this so that existing working copies of Vanilla wouldn't break when they upgrade to Vanilla 1.1.

    Max, I was hoping you could write a page in the wiki about what to do if you encounter problems with character encoding in Vanilla. You just know so much more about it than me. If you are interested, I'd really appreciate it.
  • Mark wrote: Max, I was hoping you could write a page in the wiki about what to do if you encounter problems with character encoding in Vanilla. You just know so much more about it than me. If you are interested, I'd really appreciate it.
    OK, I'll do this, hopefully this week, I have to register there. I'll add it to the "Administrators" chapter.
  • First I upgraded my forum on a local machine (with MySQL 4.1.13a) and it went wel (although my database DEFAULT_CHARACTER_SET_NAME is latin1 and my DEFAULT_COLLATION_NAME is latin1_swedish_ci). All the latin characters on my forum (é ë etc) were fine.
    But when I upgraded my forum on my online server (with MySQL 5.0.18-Debian_3.dotdeb.1) the characters were all garbled, while my database DEFAULT_CHARACTER_SET_NAME and DEFAULT_COLLATION_NAME are the same as in on my local machine.
    How come? Because the newer MySQL version?

    I fixed this by replacing all 'utf-8' to 'ISO-8859-1' in the Vanilla 1.1 upgrade files.
  • Not sure if this is related. My site (in spanish) renders the accents ok, however when i syndicate the feed via widget box on my main site I get the corrupted characters where there should be accents etc. (see the widget box at the end of the sidebar here http://mamidelux.com/)
  • @johoney: have you read this memo?
    Your site does not seem to be using Vanilla but the problem is of the same nature as soon as PHP application use MySQL databases.
This discussion has been closed.