htmlawed and allowing classes
In my forum I have Markdown which outputs classes for the syntax highlighter to indicate what language it should highlight the code in (GFM style). I've just been testing the 2.2 update (I'm a bit slow off the mark here!) and have found that the Htmlawed plug-in included in 2.2 now has more advanced security and will strip classes unless they are listed - makes sense!
However, adding the classes In need to allow to the AllowedClasses
array doesn't appear to make any difference to the output in my forum - htmlawed is still stripping the classes. If I bypass Htmlawed then the output works as expected, but that obviously isn't ideal.
I've also tried removing the class
option from the deny_attribute
list and that makes no difference.
Does any one have any suggestions for how classes can be allowed in the output?
Thanks,
Allan
Best Answer
-
Eclipse New
@allanj I looked at the source a bit and it's a regex nightmare haha.
I see that $spec has:
str_replace(array("\t", "\r", "\n", ' '), '', preg_replace_callback('/"(?>(`.|[^"])*)"/sm', create_function('$m', 'return substr(str_replace(array(";", "|", "~", " ", ",", "/", "(", ")", \'`"\'), array("\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07", "\x08", "\""), $m[0]), 1, -1);'), trim($t)));
Which looks as if the spaces could potentially be escaped with double quotes judging by the nested str_replace
However, right before the element is passed to the oneof switch case (via the hl_attrval function) here:
if(isset($rl[$k]) && is_array($rl[$k]) && ($v = hl_attrval($v, $rl[$k])) === 0){continue;}
It may get passed through:
str_replace("", ' ', (strpos($v, '&') !== false ? str_replace(array('­', '­', '­'), ' ', $v) : $v));
I'd say try some cursory experiments with escaping the spaces - maybe try putting \x04 between the classes instead of spaces - and hopefully you will find better help on their forums
5
Answers
Little bit more experimentation and it looks like the issue comes from there being multiple classes on the element. If I use a single class that is in the
AllowedClasses
array then that is allowed. If I use two or more (space separated in the HTML attribute) they get stripped.The Htmlawed documentation seems to suggest that multiple should be allowed:
But it seems to be only one.
Anyone got any suggestions?
Allan
Full disclosure, I am not too familiar with htmLawed, but I have read your link to $spec and have a suggestion.
Say you have:
If you have been trying:
Try instead:
Ok I'm confused i thin you are going about this the wrong way
what you want is syntax like
```php
correct?
grep is your friend.
@Eclipse - Unfortunately one of the first things that Htmlawed appears to do is run the spec through a
str_replace
which strips out white space characters so it would end up searchingabc|de
.@x00 - exactly. The resulting HTML is
<code class="multiline language-php">...</code>
. The classes are used to tell the Javascript syntax highlighter library information about what it should do.This is probably evolving into an Htmlawed question - I've opened a thread on the board there.
Allan
@allanj I looked at the source a bit and it's a regex nightmare haha.
I see that $spec has:
Which looks as if the spaces could potentially be escaped with double quotes judging by the nested str_replace
However, right before the element is passed to the oneof switch case (via the hl_attrval function) here:
It may get passed through:
I'd say try some cursory experiments with escaping the spaces - maybe try putting \x04 between the classes instead of spaces - and hopefully you will find better help on their forums
There are four way of inserting block code in vanilla
This si the result of the preview
grep is your friend.
You could make htmLawed play ball. Or you could force it into post parsing.
grep is your friend.
I used this to test what you want
http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawedTest.php
don't use
oneof
usematch
e.g.
p=class(match=%a \(b|c\)%)
Note the escaping of the parenthesis. Btw test it out, becuase I notice that incorrect syntax can open the whole attribute up.
grep is your friend.
@Eclipse -
\x04
- brilliant! That works really nicely. Good thinking :-)@x00 - The problem I had with
match
was the space aspect, and as you note it defaults to valid.Using
\x04
seems to work in place of a space at the moment, and the classes I want to allow are always in a known order, so I'm okay with that.Thanks to both for your input on this - massively appreciated!
Allan
it is not the only one the defaults to valid though.
If you are good at PCRE it is very versatile.
grep is your friend.