Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Searching with UTF-8 Character Set

edited October 2008 in Vanilla 1.0 Help
I have a strange problem in a live installation of Vanilla (which is not public). I hope that someone else had this issue before so he can point me in the right direction.

The forum in question is running perfectly with UTF-8 (as explained by Max_B - thank you for that!). In need of investigating a small but strange problem I have three installations in the meantime – the live forum (running on my client's webspace) an two test installations (locally, on my OS X machine), the latter being copies of the files and the database:

(1) live forum running on:
* Apache 2.0
* PHP 5.2.3
* MySQL 5.0.32
(2) test forum #1 running on:
* Apache 1.3
* PHP 5.2.2
* MySQL 5.0.24a
(3) test forum #2 running on:
* Apache 2.0
* PHP 5.2.3
* MySQL 5.0.45

In the test forums everything works perfectly, but in the live forum the search function does not work if I use special German characters in the search string. So in short:

* any UTF-8 characters are stored and delivered fine, but:
* searching for phrases using special (German) characters gives no results

Here's an example: Let's assume you have an entry with the German word "präsentieren" in the title. In this case

* searching for "prasentieren" (please note there is "a" instead of "ä") will find the article
* searchibg for "präsentieren" (the original phrase) will find nothing

The weird thing about it all is that everything's fine in the test installations. So there must be some incompatibility and/or misconfiguration on the live server, I guess. I did a lot of tests, but could not find the problem.

The problem with the search function (on the live web server) even exists with a completely fresh copy of Vanilla, without any extensions (which is installation #4, of course).

I hope that somebody has an idea where to go from here...
«1

Comments

  • I tested searching präsentieren on this (Lussumo) forum with success.

    Besides server software version, did you check default php options (magic_quote and the like) and .htaccess also ?
  • There is no .htaccess file at all. Regarding PHP options I just did some quick check, because I do not really know where to search. Do you know some server or PHP parameters which could have an influence on handling GET data? I am aksing this because I've seen that searching the database works fine with PHPMyAdmin! So maybe the problem is in the handling of GET data on the server?
  • I compared PHP core configurations now. There are no significant differences, beside: 1. "allow_url_include" is ON on the live server 2. "enable_dl" is OFF on the live server 3. "register_argc_argv" is ON on the live server 4. "short_open_tag" is ON on the live server 5. the live server uses the suhosin patch (which AFAIK is meant to increase safety)
  • Max_BMax_B New
    edited January 2008
    Well, this suhosin patch looks for a candidate, you may want to try the suhosin.simulation flag.
    Nevertheless, if phpMyAdmin is ok, you may want to double check you live forum utf-8 configuration.
  • edited January 2008
    Max_B, thank you very much for your help. I tried and switched the local value of suhosin.simulation to ON, but nothing changed. The Hosting Company told me to take a deeper look in the Vanilla software. Haha - I did tell them it works perfectly elsewhere (like in my local installations or in this forum here)! I double checked the Vanilla side, of course. (One of the installations even is a brand new one.) I do not know what to do now. I would not like to switch to a different hosting company, because I have many clients' websites there... If you have any new ideas, please post them here. I appreciate any help on the topic.
  • I'm not sure if you switched it on the right side:
    You told that suhosin patch is on the live server. You must test the simulation flag on this one, not on the local one. If you can't because it's hosted, you may try to install this patch locally and see if it is the culprit.

    I see no other clue, right now, beside MySQL version. MySQL doc say something about searching with LIKE operator (Vanilla uses LIKE for search):
    Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
    mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
    +-----------------------------------------+
    | 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
    +-----------------------------------------+
    | 0 |
    +-----------------------------------------+
    mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
    +--------------------------------------+
    | 'ä' = 'ae' COLLATE latin1_german2_ci |
    +--------------------------------------+
    | 1 |
    +--------------------------------------+
    Be sure to test phpMyAdmin search with a LIKE operator.
  • Yes, I switched on the live server. There's a local and a global value for all the parameters, and by adding
    php_value suhosin.simulation ON
    
    to the .htaccess file I switched the local value to "ON". I can not disable Suhosin completely (because it's on a hosted webspace), but shouldn't it be enough to change the local value for suhosin.simulation?

    OK, this did not work.

    To be extra sure, I thought about installing the Suhosin patch on one of my development (OS X) servers. But this is rather tricky stuff, too tricky for me... I'm rather a coder than a programmer.

    I tried searching for my user family name (which includes the German letter "ö") with PHPMyAdmin using the LIKE operator:
    SELECT * 
    FROM  `db1083101-test`.`LUM_User` 
    WHERE (
     `LastName` LIKE  '%ö%'
    )
    LIMIT 0 , 30
    
    and also
    SELECT * 
    FROM  `db1083101-test`.`LUM_User` 
    WHERE (
    `LastName` LIKE  '%ö%' COLLATE utf8_general_ci
    )
    LIMIT 0 , 30
    
    Both works fine.
  • Also, to be extra sure, check for register_argc_argv as it implies copying the request arguments to system environment , which might not be utf-8 capable, it can possibly mess-it.

    The fact that searching is ok for phpMyAdmin and not Vanilla is the only element we have. Check if both use GET or POST (vanilla may be POST while PMA GET) and set up some testing around this.
  • additionally there is some character stuff within the apache 2 conf- files, sorry I'm on my way to bed so if anyone believes this could be the issue then i need to research that again, but i once had a problem with the same perl-generated pages looking differently on two identical (so i thought) servers, one was delivered as UTF and one was delivered being ISO... ...
  • I'm suspecting something in MySQL is causing this, since it matches on a but not ä.

    Curious if it would be possible to point a local Vanilla installation to the remote DB server to verify this. (It might be inaccessible from the outside...)

    Another tool that might help is to upload this file to your installation, and while logged in as administrator visit it and turn on the debug mode (only the administrator roles who toggle this can view it). Then you will be able to see and test the various SQL queries that Vanilla generates at the bottom of the page.
  • edited January 2008
    Thank you all! Meanwhile I did some more testing on the subject. Here are the results:
    • changing the PHP parameter register_argc_argv does nothing to the phenomenon
    • there is no difference using GET or POST parameters. (I verified this using a manually edited search page.)
    • the problem is not in the database. As WallPhone suggested, I opened the live database for external connections and connected my development server to this database. Everything worked perfectly. So the problem must be in the Apache/PHP configuration on the live server WORKING TOGETHER with Vanilla. (Everything is OK with PHPMyAdmin, but this Software has been pre-installed by the hosting company somewhere out of the webspace; I have no file access to PHPMyAdmin.)
    • I verified that the problem is in Apache/PHP using Vanilla's debug mode. After hitting "search" the German character is still OK in the search field, but obviously it has been garbled on it's way to the database query!
    I have written another E-Mail to the hosting company. Unfortunately I have no access to Apache 2 configuration files, I only can switch some PHP configuration options via htaccess. I already tried a lot of PHP options without any success. Maybe the problem is in the Apache config (meaning keith_ is on the right track)?
  • http://httpd.apache.org/docs/2.2/mod/core.html#adddefaultcharset

    This should not be necessary because Vanilla does output an utf-8 charset header but I have read that some installation override charset someway.
  • I already checked the HTTP headers. The character set is included correctly in the header section: Content-Type: text/html; charset=utf-8 (So it does not change anything to use the AddDefaultCharset utf-8 directive.)
  • Having checked the Apache configuration options once again, I am rather sure that the problem must be either in a faulty compilation or in PHP's configuration. I hope that the hosting company will have s.th. interesting to say (once they checked Vanilla's debug information). Nevertheless I will try and evaluate any ideas in the meantime!
  • edited January 2008
    I have almost exactly the same problem. The difference is that it's not working even in my testing server. EVERYTHING, both in the testing and the production server, is UTF-8. I can search the "body" column of table "comment" using the "search" function for that column in phpmyadmin, for example for 'á' (spanish), and it will find results both for 'a' and 'á'. However, if I make the same search inside Vanilla, if I search for 'á', it will return both results for 'á' and 'a', but if I search for 'a', it will return results with 'a' but omit 'á'. If I search for "ñ", it will find nothing, but if I search for "n", it will return both "n" and "ñ". Again, MySQL is OK. As I've never been able to make this work, I was thinking it was a Vanilla issue and I should "filter" every "special" character before sending them to be searched in the database, but after seeing you have made it work and after trying to search directly inside MySQL, definititely there must be something misconfigurated somewhere...
  • I think you(both) did, but have you checked the PHP default_charset value? It should be empty.
    Also, just to keep searching, what are PHP mbstring settings.
  • I have found and solved the problem!!!

    In line 35 of /library/Framework/Framework.Class.SqlSearch.php the user query is converted with PHP's 'strtolower' function.
    $this->UserQuery = strtolower(trim($this->UserQuery));
    This function relies on the current locale setting. This means that in i.e. the default “C” locale, multibyte characters such as "ä" will not be converted properly! Thus the database query will include garbled characters.

    On the server in question (Apache/2.0.54 (Debian GNU/Linux) PHP/5.2.3 with Suhosin-Patch DAV/2) the 'locales' seem not to work for the 'strtolower' function! Even adding
    setlocale (LC_ALL, 'de_DE@euro', 'de_DE', 'de', 'ge');
    (immediately before the function call) had no effect. I assume that Debian is not set up correctly. I found out that it needs a package named "locales" in order to properly support locales. Maybe they did not compile this package? I do not know this for sure. Anyway.

    I solved the problem by simply removing the mentioned line in the source code. (In my eyes there is no need for it, because MySQL itself will behave case insensitive.)

    I wrote the above to the hosting company as well. I will be back here if they have anything interesting to say!
  • Megazilions of thanks. You're the man, Michael-e!! I was struggling a lot with this. Smoke was going out of my head. I thought it had something to do with character encoding, misconfiguration on the server and the like. The solution provided works like a charm. Thank a lot!!!!! :D
  • @ Pere: Thank you. I was struggling a lot with this, too... Many thanks to Max_B and WallPhone!
  • Michael-e you are a survivor.. thanks a lot, it was my big headache.. :=) also thanks for everyone who tried to solve that problem.
This discussion has been closed.