Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

HTML encoding issue

meshugymeshugy Musician/Hacker ✭✭
edited November 2013 in Vanilla 2.0 - 2.8

Hello,

First, I'd like to say hi to the Vanilla forum community. I just imported a PHPBB3 forum into Vanilla 2.0.18.8 and found a lot of useful info here, thanks! Also, the software is fantastic! I'm looking forward to going live with our new Vanilla forum.

One issue I haven't been able to resolve is how Vanilla displays raw HTML in posts. My PHPBB forum was setup to allow the admin to post HTML, so I have hundreds of posts with raw html. When, I look in the database, the HTML was stored using Unicode special characters.

So, for example, if a post has HTML in the database using special characters, it will just display to users showing the encoded HTML tags.

So, how do I configure Vanilla to display the Unicode HTML correctly? PHPBB never had this issue, so I'm not sure what's different about how Vanilla displays HTML. I suppose I could use the brute force method of just replacing all the Unicode special characters with their UTF8 equivalents. But, I was hoping there was a more elegant way to do this.

I checked, and the HTMLAW plugin is active. I also tried turning it off, along with HTML purifier, but neither made a difference. I turned all other plugins off, including quotes and other display related plugins, but still had the same issue.

Does anyone have a solution?

Thanks in advance for your help,

Michael

«1

Comments

  • meshugymeshugy Musician/Hacker ✭✭
    edited November 2013

    BTW, I tried to post examples of specific code above using the code tag, but the code was parsed by the forum so I removed it.

  • If you want the admin to post in raw html you want to use the Raw formatter.

    You can use this plugin

    https://github.com/vanillaforums/Addons/tree/master/plugins/AllowRawFormat

    I would not advise giving the capability to too many people.

    I don't know what you are on about with Unicode special characters. Just about everything vanilla is stored in utf8, but I suspect you mean html special characters like & which is nothing to do with Unicode.

    it is not really clear what you want. Are wanting to display html as text, becuase this not the same as raw html. raw html is unsanitized html which is rendered as is.

    it is important to not the that the default formatter is html, as it is displayed by the rules html rendering. It sanitised but does not convert characters for you as that is presumptive.

    grep is your friend.

  • meshugymeshugy Musician/Hacker ✭✭
    edited November 2013

    Yes, the special characters are the issue. Sorry for the reference to Unicode I'm a noob!

    How do I install the Raw HTML plugin. I ran into that earlier but wasn't sure how to install it.

    Thanks for the quick response!

  • hgtonighthgtonight ∞ · New Moderator

    Welcome to the community!

    You install plugins by placing the plugin folder in /plugins and enabling it via the dashboard. In this case, you would end up with /plugins/AllowRawFormat/class.allowrawformat.plugin.php.

    Search first

    Check out the Documentation! We are always looking for new content and pull requests.

    Click on insightful, awesome, and funny reactions to thank community volunteers for their valuable posts.

  • meshugymeshugy Musician/Hacker ✭✭

    Thanks, I installed the AllowRaw plugin, enabled it, and assigned privileges to the admin. However, all the old posts still have the same encoding issue. All the html with special characters is displayed as regular HTML tags when viewed in the forum. Any other ideas?

    Thanks so much for the quick response!

    M

  • meshugymeshugy Musician/Hacker ✭✭
    edited November 2013

    Comments with HTML in special characters display like this:

     <a href="http://link.com">The link</a> 

    What I'd like is for them to display like this:

     The link
  • I don't think he wants the raw html. I think he want to be able to display html as text

    if you want to display a code snippet you can format it like so

    <code>&lt;tag&gt;</code> which will render as <tag>

    if you want a code block

    <pre>&lt;tag&gt;</pre> which will render as

    <tag>

    by default it will not covert to special characters, but you could do so in a text editor.

    With the markdown formatter you can do

    ```~~~

    ~~~


    <tag>
    ```

    It will convert to special characters.

    grep is your friend.

  • Old post will be rendered under the formatter it was imputed with. In the database you can see the Format Column

    grep is your friend.

  • x00x00 MVP
    edited November 2013

    I'm having difficulty getting that the problem is. Is it that the old post are rendered as text and you want them rendered as html, or you want to be able to display code snippets.

    This is not what we call an encoding issue by any stretch. Well at least not character encoding. It sounds more like a formatting issue.

    What format were the post originally, BBCode?

    grep is your friend.

  • x00x00 MVP
    edited November 2013

    I think I'm getting the problem.

    phpBB saved the html as special characters when it should have just saved it as raw html. Then it was brought in as is.

    phpBB does to weird pre processing, presumably for backwards computability. I have seen stuff like that before. This could be an issue if you didn't know the exact nature of how the parse it, becuase some you may want to convert some but not all.

    The way vanilla works it is saves everything raw, and then formats it. There is no saved pre processing normally. it just saves with the format type in the Format column, then it renders based on that formatter.

    grep is your friend.

  • meshugymeshugy Musician/Hacker ✭✭

    The original posts were created in PHPBB with a mod that allowed the admin to use HTML. I'm not trying to post code in my comments, but rather trying to get the HTML to display as regular content.

    So, this post in my PHPBB forum was created with HTML:

    http://www.djangobooks.com/forum/viewtopic.php?f=9&t=12451

    But in Vanilla it's displaying like this:

    http://www.djangobooks.com/vanilla-forum/discussion/12451/2012-hahl-gitano-d-hole#Item_1

  • hgtonighthgtonight ∞ · New Moderator

    You will need to update the old posts format as 'Raw'. Then you will have to convert the html entities to their character equivalents. It would be best to back up the db, work a on db dump in your favorite editor and then import your changed bodies.

    I did something similar when I moved from phpBB last year to convert the smileys and links into something for compatible.

    Search first

    Check out the Documentation! We are always looking for new content and pull requests.

    Click on insightful, awesome, and funny reactions to thank community volunteers for their valuable posts.

  • meshugymeshugy Musician/Hacker ✭✭

    Ok, I'll go through and convert all the HTML entities.

    Just curious, why is PHPBB able to correctly parse the HTML entities while Vanilla is not?

  • hgtonighthgtonight ∞ · New Moderator

    It is a different way of processing it. As @x00 explained earlier, Vanilla stores raw input in the db and then formats it for display. PHPBB stores it in a preformatted manner and then displays is.

    So PHPBB expects it to be preformatted while Vanilla expects it raw. You could modify the raw formatter to decode html entities before render, but it wouldn't really be a raw formatter at that point.

    Search first

    Check out the Documentation! We are always looking for new content and pull requests.

    Click on insightful, awesome, and funny reactions to thank community volunteers for their valuable posts.

  • To be honest it is more usual to store it this way, becuase special characters are expected to be rendered as literal, not interpreted as html tag, so it is back to front. you would use specially characters if you wanted to be interpreted as a character an NOT parsed as a html tag.

    So what vanilla is doing is correct.

    I can only explain why they would do this becuase of historic reason. This is why you don't parse things before you save, it cannot easily be undone.

    grep is your friend.

  • @hgtonight said:
    It is a different way of processing it. As x00 explained earlier, Vanilla stores raw input in the db and then formats it for display. PHPBB stores it in a preformatted manner and then displays is.

    it is more complicated that that, it does bit of both, it is hell of a mess.

    grep is your friend.

  • meshugymeshugy Musician/Hacker ✭✭

    thanks again for all the help...

    Is there a method you'd recommend for converting the HTML entities? I figured I'd just search and replace everything my database. If there's a better way, let me know.

  • x00x00 MVP
    edited November 2013

    tbh it is impossible to know what exactly is the problem based on your description, without example data.

    I did build version for the porter with phpBB search and replaces to fix all sort of problem like this. However If remember correctly it was few thing not all html becuase there is very little html typically.

    The basic phpBB is typically BBCode so you are not entering html. there was just some pre parsed html from plugin like smilies. My version fo the porter actually converted the bbcode to html.

    I don't know if you used a pluign to enter raw html. However is highly usually to store html that intended to be rendered as html and not a text representation of html to copy, using special characters like < unless you want those character to be rendered as literal text. However having handles phpBB data in the past nothing surprises me.

    you mention [code] was that intended to be a form of BBCode that you used or where you just trying to annotate in this discussion.

    BBCode BTW isn't a standard, there is no unified specification for it.

    grep is your friend.

  • meshugymeshugy Musician/Hacker ✭✭

    Yes, I did use your special porter and it did fix a lot of the BBCode issues but the HTML was still an issue.

    Thanks for building the porter, very handy!

  • hgtonighthgtonight ∞ · New Moderator

    You can convert html entities to their characters quite easily using PHP's html_entity_decode() method. Many text editors also provide this functionality.

    Search first

    Check out the Documentation! We are always looking for new content and pull requests.

    Click on insightful, awesome, and funny reactions to thank community volunteers for their valuable posts.

Sign In or Register to comment.