Characters encoding in Vanilla Wordpress plugin

hispanico87 · June 2013

When I post in my Wordpress blog, automatically Vanilla create a discussion in the forum; all is ok but in Vanilla forum I see some strange characters in post title and summary: characters like é è à etc... are "transformed" in something like Â Ã. (In italian language, these characters are heavily used)
Character encoding seems to be ok in Wordpress, in Vanilla and in MySQL database. Where is the problem? How I can resolve this issue?

P.S. sorry for my bad english!

hgtonight · June 2013

Welcome to the community!

What version of Vanilla are you running?

hispanico87 · June 2013

I'm using Vanilla 2.0.18.8!

hispanico87 · June 2013

You can see an example here: goo.gl/CnNCe

hgtonight · June 2013

This title looks fine in the Vanilla discussion db table?

hispanico87 · June 2013

Yes! In database all seems to be ok!!

UnderDog · June 2013

What is the character encoding for WordPress (theme), Vanilla (theme) and your database? How about for your database table?

hispanico87 · June 2013

utf8_unicode_ci for WordPress and Vanilla! Database server have UTF-8 Unicode (utf8) character encoding. Could this be the problem? I think not.

x00 · June 2013

if you make a comment directly you don't have this issue?

if you are using that pluign it doesn't use api, it is purely scraping the content.

x00 · June 2013

if in that database it is ok. Then you need to ensure the connection has

notice also the slug in the url, so this is not a client side issue.

x00 · June 2013

yes I think the problem is during scraping. It is multi-ibyte interpreted as single byte, this can also becuase by single byte functions. it is not to do with you database connection or after.

Please double check it is ok in the database , I suspect not, I suspect it is entered as Ã¹ as in two bytes. This is useful information.

hispanico87 · June 2013

With direct commenting there isn't any issue!

hispanico87 · June 2013

@x00 said:
Please double check it is ok in the database , I suspect not, I suspect it is entered as Ã¹ as in two bytes. This is useful information.

How I can check this? I don't understand your suspect!

x00 · June 2013

look up DiscussionID 34 in the database look in the name and body field.

hispanico87 · June 2013

Oh...nice! In body and name field the text are "strange"!

x00 · June 2013

yep as I suspected.

hispanico87 · June 2013

Is there a way to fix this problem?

x00 · June 2013

this is why

http://www.glenscott.co.uk/blog/2012/08/07/html5-character-encodings-and-domdocument-loadhtml-and-loadhtmlfile/

x00 · June 2013

there is a create a file conf/bootstrap.before.php (if ti doesn't already exist)

<?php if (!defined('APPLICATION')) exit();
   function FetchPageInfo($Url, $Timeout = 0) {
      $PageInfo = array(
         'Url' => $Url,
         'Title' => '',
         'Description' => '',
         'Images' => array(),
         'Exception' => FALSE
      );
      try {
         $PageHtml = ProxyRequest($Url, $Timeout, TRUE);
         $Dom = new DOMDocument();
         @$Dom->loadHTML('<?xml encoding="UTF-8">'.$PageHtml);
         // Page Title
         $TitleNodes = $Dom->getElementsByTagName('title');
         $PageInfo['Title'] = $TitleNodes->length > 0 ? $TitleNodes->item(0)->nodeValue : '';
         // Page Description
         $MetaNodes = $Dom->getElementsByTagName('meta');
         foreach($MetaNodes as $MetaNode) {
            if (strtolower($MetaNode->getAttribute('name')) == 'description')
               $PageInfo['Description'] = $MetaNode->getAttribute('content');
         }
         // Keep looking for page description?
         if ($PageInfo['Description'] == '') {
            $PNodes = $Dom->getElementsByTagName('p');
            foreach($PNodes as $PNode) {
               $PVal = $PNode->nodeValue;
               if (strlen($PVal) > 90) {
                  $PageInfo['Description'] = $PVal;
                  break;
               }
            }
         }
         if (strlen($PageInfo['Description']) > 400)
            $PageInfo['Description'] = SliceString($PageInfo['Description'], 400);

         // Page Images (retrieve first 3 if bigger than 100w x 300h)
         $Images = array();
         $ImageNodes = $Dom->getElementsByTagName('img');
         $i = 0;
         foreach ($ImageNodes as $ImageNode) {
            $Images[] = AbsoluteSource($ImageNode->getAttribute('src'), $Url);
         }

         // Sort by size, biggest one first
         $ImageSort = array();
         // Only look at first 10 images (speed!)
         $i = 0;
         foreach ($Images as $Image) {
            $i++;
            if ($i > 10)
               break;

            list($Width, $Height, $Type, $Attributes) = getimagesize($Image);
            $Diag = (int)floor(sqrt(($Width*$Width) + ($Height*$Height)));
            if (!array_key_exists($Diag, $ImageSort))
               $ImageSort[$Diag] = $Image;
         }
         krsort($ImageSort);
         $PageInfo['Images'] = array_values($ImageSort);
      } catch (Exception $ex) {
         $PageInfo['Exception'] = $ex;
      }
      return $PageInfo;
   }

This exactly the same as the core version. bar one difference.

note the '<?xml encoding="UTF-8"> is key it forces UTF-8

This will apply to new articles. You will have to edit the old articles manually.

hispanico87 · June 2013

Fantastic! It works!!! tizenitalia.net/forum/discussion/39/test-e-a-o-i-e-tizen-italia

Thank you!!

hispanico87 · June 2013

I don't know if this could be directly related, but with a custom bootstrap.before.php there is a chance to fix this issue also? https://vanillaforums.org/discussion/22090/auto-truncate-titles-of-discussions-created-by-embedded-comments-in-wordpress#latest

Characters encoding in Vanilla Wordpress plugin

Comments