Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Support for storage and retrieval of all Unicode characters in Vanilla?*



  • mattmatt ✭✭

    Fantastic work, @chuck911

  • AnonymooseAnonymoose ✭✭
    edited August 2012

    Testing emoji.

    Edit: Didn't work. All characters gone when post is saved. I suspect this is either a php or database issue, where these characters are discarded. It could also be the javascript processing the post.

  • mattmatt ✭✭

    Welcome to the subject of the discussion @Anonymoose

  • Vanilla translation into Emoji posted:

  • dfdfdsfdsfdsfsd

  • AnonymooseAnonymoose ✭✭
    edited October 2012
  • Nice work!

    Lets hope it gets more attention than my mention of the change to utf8mb4 on August 2nd.

  • @Todd said:
    Completely agree with x00 here. This is not a change we are going to make lightly nor anytime soon. The fact of the matter is that encoding support in mysql and php is just not that robust. And switching encodings often leads to mangled results.

    We could most likely make this change on the forums we host, but the skill level of most of the open source community just isn't there to handle a change like this.

    In this case, it doesn't lead to mangled results, because utf8mb4 is a superset of utf8, and only adds the ability to use more character while doing nothing to the existing content in the database.

    Meanwhile, CJK characters continue to be added to the Unicode plane that requires utf8mb4, so not supporting it is not an option if Vanilla is to be truly internationalized.

  • AnonymooseAnonymoose ✭✭
    edited October 2012

    For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

    For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

  • That's great, thanks for digging up that information.

  • AnonymooseAnonymoose ✭✭
    edited October 2012

    I released a new plugin for Emoji, based on the old one, except that it doesn't make use of css+image files to display emoji, but displays real unicode emoji.

    Unlike the original Emoji plugin, it saves Emoji in the database as htmlentities:


    which it later converts to utf8.

    It also converts emoticons to Emoji. 😃

  • AnonymooseAnonymoose ✭✭
    edited October 2012

    Vanilla really should upgrade to utf8mb4 (if supported by the installed version of MySQL) though.

Sign In or Register to comment.