By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,287 Members | 1,644 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,287 IT Pros & Developers. It's quick & easy.

Previewing user input HTML

P: n/a
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
Sep 30 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On 2008-09-30, Steve Swift <St***********@gmail.comwrote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
I think if you user innerHTML, your own HTML will probably be OK.

The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.

To be absolutely sure, you could parse their input before attaching it
to your DOM tree.

Something like:

var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;

Then use appendChild to attach the div into your DOM tree.

But I don't think that will be necessary.
Sep 30 '08 #2

P: n/a
Steve Swift <St***********@gmail.comwrites:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also
like to avoid having to parse their HTML to ensure that it is valid.
<snip>
>... Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.
Can you side-step the problem by keeping the user HTML separate and
displaying it using an <objectelement?

--
Ben.
Sep 30 '08 #3

P: n/a
In article <sl*********************@bowser.marioworld>,
Ben C <sp******@spam.eggswrote:
On 2008-09-30, Steve Swift <St***********@gmail.comwrote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.

I think if you user innerHTML, your own HTML will probably be OK.

The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.

To be absolutely sure, you could parse their input before attaching it
to your DOM tree.

Something like:

var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;

Then use appendChild to attach the div into your DOM tree.

But I don't think that will be necessary.
I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.

Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)?
Sep 30 '08 #4

P: n/a
On 2008-09-30, David Stone <no******@domain.invalidwrote:
In article <sl*********************@bowser.marioworld>,
Ben C <sp******@spam.eggswrote:
>On 2008-09-30, Steve Swift <St***********@gmail.comwrote:
I have a page that accepts user input, including HTML. I would like to
offer a preview of what the users HTML will look like, but I'd also like
to avoid having to parse their HTML to ensure that it is valid.

The sorts of things that cause problems are unmatched quotes inside the
HTML and mismatched <>'s around the HTML. There are probably others
(thus demonstrating why I need to avoid parsing it).

The mismatched <>'s are not too difficult - I can add a ">" of my own,
but then it will be visible.

I realise we are into the land of handling invalid HTML, so all bets are
off, but is there any good approach to such a problem?

If I do end up parsing the users HTML, do I need to worry about more
than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
actually care what it looks like, as long as it doesn't upset my own
HTML which follows the preview.

I think if you user innerHTML, your own HTML will probably be OK.

The browser will parse their garbage to create a subtree for the element
whose innerHTML you're setting, and then attach that subtree to your DOM
tree. It won't paste their garbage into your HTML and parse the whole
lot again.

To be absolutely sure, you could parse their input before attaching it
to your DOM tree.

Something like:

var div = document.createElement("div"); // unattached node
div.innerHTML = userGarbage;

Then use appendChild to attach the div into your DOM tree.

But I don't think that will be necessary.

I don't know about that, but it seems to me that you will need to run the
user-provided html through something first, just to ensure that no
malicious code has been inserted that could pose a security risk. I
believe the perl CGI module has a function or functions you can use to
do this, and I would be willing to bet you can find equivalent JS tools.

Which leads to the thought that, since you're going to have to
pre-process the user html anyway, maybe you could also pipe it through
something like htmlTidy (I think that's it's name)?
I think we're thinking about different things. Perhaps because he used
the word "preview" I got it into my head that this user HTML was not
going back to the server (wiki style) but being added to the page there
and then with JS on the client.

The idea of innerHTML is that you're using the browser's own normal
broken HTML handling to deal with things, and it's basically all you've
got on the client.

But it's much more likely that the data is going back to the server, in
which case yes you could run it through tidy and other checkers like
that easily.
Oct 1 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.