BBCode Sanitizer

Rowan Lewis · September 2006

Just so everyone knows, this is about the engine which will drive my BBCode extension (Sorry if the formatting gets messed up too...), any feedback would be useful.

1. Features:

Completely customizable output.
Nesting correction, never have a validation error due to user input.
What other features can a BBCode parser have!?

2. Explanation:
To me this all makes sense, probably because this is the second time I've written something using the same methods, anyhow, to customize the input you have to be familiar with Matches and Elements:

2.A. Matches:
To put it in simple terms, a match is a regular expression that lets you check against data, in this case, either the BBCode elements attribute, or its text.

You add Matches like so:

BBCodeSanitizer::AddMatches(Array(
	"uri"		=> '~^(http|http|ftp|mailto)(://|:)\S+$~i'
));

2.B. Elements:
Elements describe two things, what elements are allowed, weather line breaks can be used, and how to format the element. The format consists of the output you want, Conditionals (which rely on Matches) and Values.

Conditionals look like:

!target{matches}{output}

Values look like:

#target

Target: Because BBCode is simple, there are only two, 'attribute' and 'text'.
Matches: Any number of matches prefixed with 'is' or 'not' for positive and negative matching joined together with either '&' for 'and' and '|' for 'or'.
Whatever you want, but it cannot be another Conditional.

You can add Elements like so:

BBCodeSanitizer::AddElements(Array(
	"url"		=> Array(
		"Format"		=> '!attribute{isuri&notgoatse}{#text}!attribute{noturi|isgoatse}{#text}',
		"LineBreaks"	=> False
	)
));

3. Code
You can view it here, or look at the source here.

NickE · September 2006

Just so you know, if you wrap your code/examples in <code> tags, tags won't be parsed.

Rowan Lewis · September 2006

Hmm, I've wrapped them in PRE elements, but I guess the HTML parser is a little... demented :P

NickE · September 2006

lol, no it isn't. It just makes as few changes as possible, so it will only convert instances of '<', '>' and '&' if you put them in code tags. I could've made it do it for pre, but I went ahead and chose code mainly because people were already using the tag for code examples and whatnot.

Alternately you could put them in xmp tags, and let the browser ignore the tags :-P

Rowan Lewis · September 2006

CODE elements are supposed to be inline, only PRE elements should be used for examples. Anyhow... any help in testing this thing would be appreciated, or perhaps I should offer it as an add-on untested?

Rowan Lewis · September 2006

Ok, I've packaged the BBCodeSanitizer extension: http://lussumo.com/addons/index.php?PostBackAction=AddOn&AddOnID=157 It adds an option for "BBCode Sanitizer", because I figure people would probably want to keep the normal BBCode parser, as this IS in Alpha. Here is a line that might help cut down on the extra line breaks in the HTML parser, however it won't remove them from OL or UL elements: $Text = Preg_Replace("~(<br />)(\s<br />)((\s<br />)+)~", "\\2", $Text);

BBCode Sanitizer

Comments