Yes, HTML is stripped by the extension, for security purposes. I believe the ideal setup would be to chain this formatter with the Kses formatter so that the post passes through them both.
You'd have to replace line 18 ($String = $this->ProtectString($String);) with some other function that strips out potentially harmful HTML.
I'd recommend copying kses.php inside the markdown folder, then add include('kses.php'); just after the dictionary definition, and change line 18 to be $string = kses($String, array());
This would also give you some kind of tag policy, where you could put a list of HTML tags and attributes you want to allow in the array if Kses blocks them.
I notice that the Markdown StringFormatter doesn't call StringFormatter's ParseChildren() to give any child formatters a chance to work their magic.
Since you are supposed to be able to enter HTML tags in Markdown, why not parse the document through the HTML StringFormatter, if that plugin is present?
The following patch will achieve both these aims:
--- default.php.orig 2009-03-13 15:57:59.000000000 +0000
+++ default.php 2009-03-19 12:52:39.000000000 +0000
@@ -17,13 +17,21 @@
class MarkdownFormatter extends StringFormatter {
function Parse($String, $Object, $FormatPurpose) {
if ($FormatPurpose == FORMAT_STRING_FOR_DISPLAY) {
- $String = $this->ProtectString($String);
- return Markdown($String);
+ if( isset( $Object->Context->StringManipulator->Formatters[ "Html" ] ) ) {
+ $String = Markdown( $String );
+ $String = $Object->Context->StringManipulator->Formatters[ "Html" ]->Execute( $String, true );
+ }
+ else {
+ $String = $this->EscapeHtml( $String );
+ $String = Markdown( $String );
+ }
+ $String = $this->ParseChildren($String, $Object, $FormatPurpose);
+ return $String;
} else {
return $String;
}
}
- function ProtectString ($String) {
+ function EscapeHtml ($String) {
//$String = str_replace("<", "<", $String);
// $String = str_replace(">", ">", $String);
$String = explode("\n", $String); There is a problem with this though. The HTML formatter, by default, is set to convert newlines to <br/> tags. This makes the output of the HTML formatter look screwed up (spread out over far too many lines). There are two ways to fix this.
First, you could turn off the HTML formatter's HTML_CONVERT_NEWLINES option at the top of HtmlFomatter/default.php plugin. I don't like this solution though, cause I want the HTML formatter to convert newlines. The second solution is to apply this patch to the HTML formatter, so that we can specify that the HTML formatter should only parse and not do formatting:
Warning: it is simple to insert malicious code with this plugin as is. The inbuilt sanitiser only sanitises tags that aren't in code blocks. Unfortunately, it doesn't always get it right as to what is and isn't a code block, thus letting through html tags which markdown doesn't then escape (which it would if they were in a code block). So some additional sanitiser is required.
Comments
I'd recommend copying kses.php inside the markdown folder, then add
include('kses.php');
just after the dictionary definition, and change line 18 to be$string = kses($String, array());
This would also give you some kind of tag policy, where you could put a list of HTML tags and attributes you want to allow in the array if Kses blocks them.
- I notice that the Markdown StringFormatter doesn't call StringFormatter's ParseChildren() to give any child formatters a chance to work their magic.
- Since you are supposed to be able to enter HTML tags in Markdown, why not parse the document through the HTML StringFormatter, if that plugin is present?
The following patch will achieve both these aims:--- default.php.orig 2009-03-13 15:57:59.000000000 +0000 +++ default.php 2009-03-19 12:52:39.000000000 +0000 @@ -17,13 +17,21 @@ class MarkdownFormatter extends StringFormatter { function Parse($String, $Object, $FormatPurpose) { if ($FormatPurpose == FORMAT_STRING_FOR_DISPLAY) { - $String = $this->ProtectString($String); - return Markdown($String); + if( isset( $Object->Context->StringManipulator->Formatters[ "Html" ] ) ) { + $String = Markdown( $String ); + $String = $Object->Context->StringManipulator->Formatters[ "Html" ]->Execute( $String, true ); + } + else { + $String = $this->EscapeHtml( $String ); + $String = Markdown( $String ); + } + $String = $this->ParseChildren($String, $Object, $FormatPurpose); + return $String; } else { return $String; } } - function ProtectString ($String) { + function EscapeHtml ($String) { //$String = str_replace("<", "<", $String); // $String = str_replace(">", ">", $String); $String = explode("\n", $String);
There is a problem with this though. The HTML formatter, by default, is set to convert newlines to <br/> tags. This makes the output of the HTML formatter look screwed up (spread out over far too many lines). There are two ways to fix this.
First, you could turn off the HTML formatter's
HTML_CONVERT_NEWLINES
option at the top ofHtmlFomatter/default.php plugin
. I don't like this solution though, cause I want the HTML formatter to convert newlines. The second solution is to apply this patch to the HTML formatter, so that we can specify that the HTML formatter should only parse and not do formatting:--- default.php.orig 2009-03-19 13:02:20.000000000 +0000 +++ default.php 2009-03-19 13:04:46.000000000 +0000 @@ -111,7 +111,7 @@ $this->TagArray = &$GLOBALS['Html_TagArray']; } - function Execute($String) + function Execute($String, $ParseOnly) { $this->TagArray = array('normal' => array(), 'extraclosing' => array()); $String = str_replace(chr(0), ' ', $String); @@ -172,7 +172,7 @@ $sReturn ); - if(HTML_CONVERT_NEWLINES) + if(HTML_CONVERT_NEWLINES && !$ParseOnly) $sReturn = str_replace( array("\r\n", "\r", "\n"), '<br />', @@ -389,7 +389,7 @@ function Parse($String, $Object, $FormatPurpose) { - if($FormatPurpose == FORMAT_STRING_FOR_DISPLAY) $sReturn = $this->Execute($String); + if($FormatPurpose == FORMAT_STRING_FOR_DISPLAY) $sReturn = $this->Execute($String, false); else $sReturn = $String; return $this->ParseChildren($sReturn, $Object, $FormatPurpose);