Previewing user input HTML 
September 30th, 2008, 08:45 AM
| | |
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
--
Steve Swift http://www.swiftys.org.uk/swifty.html http://www.ringers.org.uk | 
September 30th, 2008, 09:05 AM
| | | | re: Previewing user input HTML
On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote: Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
>
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
>
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
>
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
>
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
| I think if you user innerHTML, your own HTML will probably be OK.
The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.
To be absolutely sure, you could parse their input before attaching it
to your DOM tree.
Something like:
var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;
Then use appendChild to attach the div into your DOM tree.
But I don't think that will be necessary. | 
September 30th, 2008, 12:35 PM
| | | | re: Previewing user input HTML
Steve Swift <Steve.J.Swift@gmail.comwrites: Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also
like to avoid having to parse their HTML to ensure that it is valid.
| <snip> Quote:
>... Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
| Can you side-step the problem by keeping the user HTML separate and
displaying it using an <objectelement?
--
Ben. | 
September 30th, 2008, 02:15 PM
| | | | re: Previewing user input HTML
In article <slrnge3n19.3s0.spamspam@bowser.marioworld>,
Ben C <spamspam@spam.eggswrote: Quote:
On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote: Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
| >
I think if you user innerHTML, your own HTML will probably be OK.
>
The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.
>
To be absolutely sure, you could parse their input before attaching it
to your DOM tree.
>
Something like:
>
var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;
>
Then use appendChild to attach the div into your DOM tree.
>
But I don't think that will be necessary.
| I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.
Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)? | 
October 1st, 2008, 08:25 AM
| | | | re: Previewing user input HTML
On 2008-09-30, David Stone <no.email@domain.invalidwrote: Quote:
In article <slrnge3n19.3s0.spamspam@bowser.marioworld>,
Ben C <spamspam@spam.eggswrote:
> Quote:
>On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote: Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
>
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
>
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
>
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
>
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
| >>
>I think if you user innerHTML, your own HTML will probably be OK.
>>
>The browser will parse their garbage to create a subtree for the element
>whose innerHTML you're setting, and then attach that subtree to your DOM
>tree. It won't paste their garbage into your HTML and parse the whole
>lot again.
>>
>To be absolutely sure, you could parse their input before attaching it
>to your DOM tree.
>>
>Something like:
>>
> var div = document.createElement("div"); // unattached node
> div.innerHTML = userGarbage;
>>
>Then use appendChild to attach the div into your DOM tree.
>>
>But I don't think that will be necessary.
| >
I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.
>
Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)?
| I think we're thinking about different things. Perhaps because he used
the word "preview" I got it into my head that this user HTML was not
going back to the server (wiki style) but being added to the page there
and then with JS on the client.
The idea of innerHTML is that you're using the browser's own normal
broken HTML handling to deal with things, and it's basically all you've
got on the client.
But it's much more likely that the data is going back to the server, in
which case yes you could run it through tidy and other checkers like
that easily. |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 225,662 network members.
|