Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Transmogrifier's bug with HTML entities

edited January 2007 in Vanilla 1.0 Help
Due to Transmogrifier parses entity-escaped texts, there is a problem with html-entities. For example, we have a rule in config.txt as follow:

;) == <img alt=";)" src="./images/smiles/wink.gif" />

And we have text like this:
("A quoted text in brackets")
When Transmogrifier's parser starts, he gets entity-escaped text, it looks as follow:
(&quot;A quoted text in brackets&quot;)
Then Transmogrifier's parser starts replace of it's tokens and replaces out ";)". Finally we've got:
(&quot;A quoted text in brackets&quot<img alt=";)" src="./images/smiles/wink.gif" />
As you can see, this example works unexpected and breaks valid XHTML markup.

I suggest following fix in Transmogrifier/default.php:

$value = preg_split("/\s*==\s*/", $line, 2); $this->token[$counter] = '/(?<!&[A-Za-z]{2}|&[A-Za-z]{3}|&[A-Za-z]{4}|&[A-Za-z]{5}|&[A-Za-z]{6}|&#\d{2}|&#\d{4}|&#x[0-9A-Fa-f]{2}|&#x[0-9A-Fa-f]{4})'.preg_quote($value[0]).'/'; $this->tokenValue[$counter] = $value[1];

I know, this looks dirty and unpretty, but it works and I can't find better solution for this moment.

Comments

  • that wouldn't work in all cases; say you wanted to use the &thetasym; entity? or had a character code of three digits in length? you can also simplify it a good deal:'/(?<!&(?>[A-Za-z_\-]+|#(?>[xX][A-Fa-f\d]+|\d+)))'.preg_quote($value[0]).'/';
  • edited January 2007
    No, it's not so easy... In PHP lookahead assertions must be fixed length. Your variant generates an error:

    Warning: preg_replace(): Compilation failed: lookbehind assertion is not fixed length at offset 46 in ...../forum/extensions/Transmogrifier/default.php on line 56

    To handle longer html entities you may add several "&[A-Za-z]{X}" for each entity length in my regexp.
  • ah yeah, oops. I tested it without the lookahead then forgot about the fixed length thing...

    the problem, though, is that entities aren't really limited in length. you could have &#33; or &#0000033;, and they'd both come out as an exclamation mark.
  • Yes. Better way is to deal with non-entity-escaped string. But I didn't found the way to do this (i'm new in vanilla and my reverse engineering possibilities are limited). Message is escaped by text formatter (BBCodeParser in my case), so we need to insert our preg_replaces in processing chain before.
  • StashStash
    edited January 2007
    I upkeep the BBCodeParser and am in contact with a guy who makes mods to the PEAR extension, so if you have a suggestion on how to improve the parser, please let me know and I'll see what I can do about adding it :) I'm always looking for ways to improve it.
  • Actually, this shouldn't be an improvement of BBCodeParser, this should be a way to start Transmogrifier before BBCodeParse. If this way exists -- I'd like to know it, if not -- there should be an improvement in Vanilla forum to allow organize chain of formatters more flexible.
  • I'd LOVE to be able to specify the order in which formatters are applied to comments as it would seriously help me out with fixing fringe cases. Any way of doing this?
  • Maybe hack? Extension-hack! We can make such text formatter extension which will be a wrapper for all others. It will starts first, unattach other formatters and start it from itself in proper order. At the moment, I don't really know is it possible, just idea...
  • I'd like to be able to order them like you can Roles & Permissions.
  • Not understood you now, sorry. :(
This discussion has been closed.