Hi,
My question is around HTML entities and XML output (although I use php, I felt this was more an XSLT question).
Given the below, am I safe to not escape HTML entities on input anymore?
I used to retrieve HTML fragments from a MySQL backend to echo out to the page (using php). I performed the HTML Entity escaping at the point of storage to reduce processing when the page was loaded.
Common entity conversions (using php's htmlentities function) were: single quote, double quote, ampersands, and angular brackets.
I have since changed from direct echo to an XML/XSLT methodology. Now, I retrieve the data from MySQL and convert it into XML (using DOMDocument), serving that up to the browser with XSLT.
Since XML automatically escapes both less-than angular brackets and ampersands, does this provide the same level of protection against HTML injection? (e.g. script tags)
I always just called the htmlentities function in php, but the data I store is now not compatible with the XML/XSLT method, unless I use the disable-output-escaping attribute to ensure the HTML is read as normal, which I don't want to do for every output tag.
Otherwise, you see the ampersand, at the start of the stored HTML entity, being escaped again, and it looks like HTML markup is turning up everywhere.
Essentially, is it safe for me to have normal text characters like quotes within HTML, or is there a reason why php converts all of those as well?
Regards,
Rob.