Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Searching with UTF-8 Character Set



  • Options
    Do you guys feel this change (removing the line) should be changed in the release source too? Is it likely to have any ill effects?
  • Options
    Thank you guys a lot!

    @minisweeper: I am not an expert for MySQL. But in my installations (on the live server and on my development servers) I have not seen any negative side effects. MySQL seems to behave case insensitive with the generated search strings. Even if this was not the case, does it make any sense to change the case of the search string, thus changing the user's "search intention"? So all in all: Yes, as far as I can see, it would make sense to change the release source, too. (I would be proud to have contributed at least a very small piece to Vanilla...)

    I have received final information from the hosting company, and I did some final research and thinking:

    On my (Debian) live server there is a bunch of locales installed:

    de_DE ISO-8859-1
    de_DE.UTF-8 UTF-8
    de_DE@euro ISO-8859-15
    en_US ISO-8859-1
    pl_PL ISO-8859-2
    nl_NL@euro ISO-8859-15
    fr_FR@euro ISO-8859-15
    cs_CZ ISO-8859-2
    tr_TR ISO-8859-9
    sv_SE ISO-8859-1
    ru_RU KOI8-R
    pt_PT ISO-8859-1
    es_ES ISO-8859-1

    So in my case de_DE.UTF-8 would be the right choice. If I include an additional line saying

    setlocale (LC_CTYPE, 'de_DE.UTF-8');

    immediately before the line in question, searching will work even if I leave everything else intact. But setlocale should be used with caution (see PHP documentation), and it makes things even more complicated (as you would need to include the right locale for your language...)

    Another possibility would be to configure the server's locale settings (which can be done via SSH, if you have SSH access) - but on my live server these changes have no effect on PHP. I have not found out if the latter is typical for Debian servers, or if it was simply my fault, doing wrong and/or insufficient... Anyway this would make Vanilla's installation much more complicated!

    So as long as I do not know about any negative effects, I will stick to my solution, as proposed above.

    Thanks again!
  • Options
    edited January 2008
    I have no problem with the line erased. Actually my search results are more logical now. but I have another problem, (not about the line, I tried even if its there or not) when I made a search with comments selected, I have got all unicodes displayed instead of the letters itself. I realised that some of my users data written to database differently; for the letter ğ , its written as Ä� but for some of my users its written as & # 287; (I have put spaces so it will be shown by you) search results are finding the letters which are written like Ä� and they displayed normally.. never find any letter written as & # 287; interesting point is thata when I check the table discussions, I have seen all the characters are like Ä� , not unicodes. so there is no problem if I make my search within titles. so. I have two problems... and two questions. :=) -is it normal for some user's writings are differently encoded to my database, -is it actually normal all the letters are converted to unicode or something else on my database? or I should see all the characters normally ? and my problem is how to solve all the mess... ehr. I think if my comments also encoded as the way which happens on titles, it solves the problem. thanks in advance.
  • Options
    @selflearner: it is likely that you database is not utf-8 clean. It is even possible that you have part of it clean and other "weird", if you changed vanilla setting without mySQL side cleaning. See for complete information.
  • Options
    hey, i wasn't really satisfied with the feed publishers that were online here. if the user was online they were presenting headlines from categorys he was not supposed to see, they only showed the first comments of a thread, never the latest, etcetcetc,... this is my good oldschool standalone version, presenting only the latest 20 newest entries (and only one) per discussion. maybe someone likes to make it vanilla-like,... :) [sorry now i posted this two times in 2 different threads]
  • Options
    @Max_B - one more time, thank you verymuch.. I dump all the sqls, erase my old database, recreate ad utf8_general_ci .. but the worst part was correcting all the data, I couldn't find anyway to do that, so I do find and replace to all characters, and import them into my new database, now everything is ok.. ooops, did I say everything??? no, of course not, now there is a problem (which probably I should investigate before I ask) I use FCKeditor and if I choose Enable Visual editing toolbar then, all the characters are written as unicode (like & # 287;) into my database. as a result, my search results are still not finding those characters. if I disable Enable Visual editing toolbar, everything is ok, all characters written as they are, and all my search results are correct. PS. I will search about this issue, but if you know anyway to keep my characters normal with using FCKeditor please tell me.. may be someone suggest me another editor, (I really need to use, bold, italic, list, and importantly embeding Youtube videos) thank you again and again.
  • Options
    I'm looking for client side xhtml editor myself and the contenders are:
    TinyMCE almost as big as FCKeditor
    NicEdit small and minimal
    Apple FancyToolbar CSS3/webkit requirement so reserved for intranet/private pages.
    jTagEditor light (if you have jQuery already) and sufficient but not wysiwyg

    I have not made any testing yet.
  • Options
    Bumping this thread because some folks running into problems when searching for utf-8 characters might be interested to read the new chapter I just wrote on the wiki page, which is just about such concern.
This discussion has been closed.