Connecting Tech Pros Worldwide Forums | Help | Site Map

Previewing user input HTML

Steve Swift
Guest
 
Posts: n/a
#1: Sep 30 '08
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk

Ben C
Guest
 
Posts: n/a
#2: Sep 30 '08

re: Previewing user input HTML


On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote:
Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
>
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
>
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
>
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
>
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
I think if you user innerHTML, your own HTML will probably be OK.

The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.

To be absolutely sure, you could parse their input before attaching it
to your DOM tree.

Something like:

var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;

Then use appendChild to attach the div into your DOM tree.

But I don't think that will be necessary.
Ben Bacarisse
Guest
 
Posts: n/a
#3: Sep 30 '08

re: Previewing user input HTML


Steve Swift <Steve.J.Swift@gmail.comwrites:
Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also
like to avoid having to parse their HTML to ensure that it is valid.
<snip>
Quote:
>... Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
Can you side-step the problem by keeping the user HTML separate and
displaying it using an <objectelement?

--
Ben.
David Stone
Guest
 
Posts: n/a
#4: Sep 30 '08

re: Previewing user input HTML


In article <slrnge3n19.3s0.spamspam@bowser.marioworld>,
Ben C <spamspam@spam.eggswrote:
Quote:
On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote:
Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
>
I think if you user innerHTML, your own HTML will probably be OK.
>
The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.
>
To be absolutely sure, you could parse their input before attaching it
to your DOM tree.
>
Something like:
>
var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;
>
Then use appendChild to attach the div into your DOM tree.
>
But I don't think that will be necessary.
I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.

Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)?
Ben C
Guest
 
Posts: n/a
#5: Oct 1 '08

re: Previewing user input HTML


On 2008-09-30, David Stone <no.email@domain.invalidwrote:
Quote:
In article <slrnge3n19.3s0.spamspam@bowser.marioworld>,
Ben C <spamspam@spam.eggswrote:
>
Quote:
>On 2008-09-30, Steve Swift <Steve.J.Swift@gmail.comwrote:
Quote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.
>
The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).
>
The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.
>
I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?
>
If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
>>
>I think if you user innerHTML, your own HTML will probably be OK.
>>
>The browser will parse their garbage to create a subtree for the element
>whose innerHTML you're setting, and then attach that subtree to your DOM
>tree. It won't paste their garbage into your HTML and parse the whole
>lot again.
>>
>To be absolutely sure, you could parse their input before attaching it
>to your DOM tree.
>>
>Something like:
>>
> var div = document.createElement("div"); // unattached node
> div.innerHTML = userGarbage;
>>
>Then use appendChild to attach the div into your DOM tree.
>>
>But I don't think that will be necessary.
>
I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.
>
Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)?
I think we're thinking about different things. Perhaps because he used
the word "preview" I got it into my head that this user HTML was not
going back to the server (wiki style) but being added to the page there
and then with JS on the client.

The idea of innerHTML is that you're using the browser's own normal
broken HTML handling to deal with things, and it's basically all you've
got on the client.

But it's much more likely that the data is going back to the server, in
which case yes you could run it through tidy and other checkers like
that easily.
Closed Thread