By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,686 Members | 1,914 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,686 IT Pros & Developers. It's quick & easy.

getting correct xhtml from DIV

P: n/a
Hi guys, I'm facing the bug about the failure of innterHTML while
reading xhtml content inside a DIV, in fact it has passed as html
removing the closing of some nodes. Is there a way to read the content
of the div perfectly how it is in the page, kind of:

var obj = document.getElementById("myDIV");
var realContent = obj.toString();

Thanks a lot, chr

Dec 23 '05 #1
Share this Question
Share on Google+
22 Replies


P: n/a
An option could be to make a regular expression to check if the nodes
like (img,br) are closed correctly and if not to close them. In this
case, does anyone have a clue on which regex is needed?

thanks, chr

Dec 23 '05 #2

P: n/a
gabon wrote:
Hi guys, I'm facing the bug about the failure of innterHTML while
reading xhtml content inside a DIV, in fact it has passed as html
removing the closing of some nodes.
1. What do you expect of inner_HTML_? That it contains HTML only is
not a bug at all.

2. Nodes cannot be closed, you are talking of elements.
Is there a way to read the content of the div perfectly how it is
in the page, kind of:

var obj = document.getElementById("myDIV");
var realContent = obj.toString();


Use an XML Serializer object or build your own serializer. Since the only
UAs that fully support XHTML to date are Gecko-based ones (Mozilla/5.0) --
IE does _not_ support it at all! --, that would be in the Gecko DOM

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>XMLSerializer Example (application/xhtml+xml)</title>
<meta http-equiv="Content-Script-Type" content="text/javascript"/>
<script type="text/javascript">
<![CDATA[
function isMethodType(s)
{
return (s == "function" || s == "object");
}

function getSerialized(s)
{
var obj;
if (isMethodType(typeof document.getElementById)
&& (obj = document.getElementById(s))
&& typeof XMLSerializer == "object")
{
var o = new XMLSerializer();
if (o && typeof o.serializeToString == "function")
{
var realContent = o.serializeToString(obj);
alert(realContent);
}
}
}
]]>
</script>
</head>
<body>
<div id="foo"><img src="bar.png" alt="bar"/></div>
<div>
<input type="button" value="Get Serialized Content"
onclick="getSerialized('foo');"/>
</div>
</body>
</html>

Note that this will add the `xmlns' attribute value of the document's root
element to the serialization string of the root element in order to make
the markup valid.

WFM in

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051217
Debian/1.5.dfsg-2 Firefox/1.5 Mnenhy/0.7.3.0

What is nice is that it works with Valid HTML 4.01 as well; the only
difference is that element types are uppercased, meaning that they must
be lowercased before being valid XHTML again as that is case-sensitive.

A peculiarity (or a bug?) of the result of the serialization of XHTML
served as application/xhtml+xml in

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007
Debian/1.7.12-1

is that each element type is prefixed, and the added `xmlns' attribute
name is suffixed with "a0:".
HTH

PointedEars
Dec 23 '05 #3

P: n/a
Thomas 'PointedEars' Lahn wrote:
[...]
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
This is Valid, however it should be

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

to be perfect :)
[...]
A peculiarity (or a bug?) of the result of the serialization of XHTML
served as application/xhtml+xml in

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007
Debian/1.7.12-1

is that each element type is prefixed, and the added `xmlns' attribute
name is suffixed with "a0:".


Prefixed with "a0:", suffixed with ":a0".
PointedEars
Dec 23 '05 #4

P: n/a

Thomas 'PointedEars' Lahn wrote:
A peculiarity (or a bug?) of the result of the serialization of XHTML
served as application/xhtml+xml in

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007
Debian/1.7.12-1

is that each element type is prefixed, and the added `xmlns' attribute
name is suffixed with "a0:".


In terms of XML with namespaces e.g.
<p xmlns="http://www.w3.org/1999/xhtml"></p>
and
<a0:p xmlns:a0="http://www.w3.org/1999/xhtml"></a0:p>
are semantically the same and a pure XML serialization can choose any
prefix it likes. Of course in terms of the XHTML DTD the first is the
desired serialization. I think the XMLSerializer in Mozilla has been
enhanced after Gecko 1.7 not to choose a random prefix if elements in
the DOM do have a certain prefix or do not have any prefix at all.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Dec 23 '05 #5

P: n/a
Thanks thomas, it works fine on firefox like you say, but how can I
make it work on IE? Is not a regular expression to fix the broken nodes
got from the innerHTML a more compatible solution?

Cheers, chr

Jan 3 '06 #6

P: n/a
gabon wrote:
Thanks thomas, it works fine on firefox like you say,
Where did I say anything?

<URL:http://jibbering.com/faq/faq_notes/pots1.html#ps1Post>
<URL:http://www.safalra.com/special/googlegroupsreply/>
but how can I make it work on IE? Is not a regular expression to fix the
broken nodes got from the innerHTML a more compatible solution?


IE does not support XHTML, and innerHTML returns HTML, not XHTML, so no.

Whereas I am not even considering the -- let's say -- challenge, introduced
by the attempt to parse a Context-free Language (ℒ₂) like HTML with a
Regular Expression that describes a Regular Language (ℒ₃), of course.

<URL:http://en.wikipedia.org/wiki/Chomsky_hierarchy>
PointedEars
Jan 3 '06 #7

P: n/a
Well, here a test where I did the dirty job of converting html to xhtml
with regexp. Althougth it seems to work fine the regexp can be easily
updated to make it more consistent (any suggestion will be more than
welcome!).

http://nuthinking.com/goodies/accessible_bar_chart/

Of course in an optimal project the flash chart in presence of the
player and absence of a screenreader would replace the html one.
Cheers, chr

Jan 9 '06 #8

P: n/a
gabon wrote:
Well, here a test where I did the dirty job of converting html to xhtml
with regexp. Althougth it seems to work fine the regexp can be easily
updated to make it more consistent (any suggestion will be more than
welcome!).

http://nuthinking.com/goodies/accessible_bar_chart/


I expected that.

1. You are serving XHTML as text/html.
<URL:http://hixie.ch/advocacy/xhtml>

2. You continue to use the even-then obsolete and error-prone HTML technique
of trying to hide script and style content with SGML comment strings in
XHTML, which is likely to end up in an empty `script' and `style' element
if parsed by an XML parser. If those comments are not removed by the
parser and the delimiters ignored instead, the content of the `script'
element triggers a parsing error, hence is very likely to prevent the
display of the document, as its PCDATA content has not been declared
CDATA and contains markup characters like `<'.

<URL:http://groups.google.com/groups?as_q=CDATA&as_ugroup=comp.lang.javascript&s coring=d&filter=0>

3. The markup is not Valid XHTML at all.

<URL:http://validator.w3.org/check?uri=http://nuthinking.com/goodies/accessible_bar_chart/&ss=1;verbose=1>

4. You recognize the `br' and `img' elements which are but two of some to
have an EMPTY content model, but you do not recognize omitted end tags
or elements and attributes that are in HTML 4 but not in XHTML 1.0
Strict.

<URL:http://www.w3.org/TR/html4/>
<URL:http://www.w3.org/TR/xhtml1/>

5. You do not use an XML processing instruction to specify the document's
stylesheet.

<URL:http://www.w3.org/TR/xhtml1/#diffs>

6. You use ridiculous positioning like

.auraltext
{
position: absolute;
font-size: 0;
left: -1000px;
}

And probably I forgot something.
PointedEars
Jan 9 '06 #9

P: n/a
well, to be honest, I started from this theorically accessible chart:
http://www.standards-schmandards.com/exhibits/barchart/

fortunately it seems that most of the errors are related to that but in
anycase I'll be happy to analize each one you suggested.

thanks, chr

Jan 9 '06 #10

P: n/a
Thomas 'PointedEars' Lahn wrote:
gabon wrote:
http://nuthinking.com/goodies/accessible_bar_chart/

[...]
4. You recognize the `br' and `img' elements which are but two of some to
have an EMPTY content model, but you do not recognize omitted end tags


and omitted start tags, of course.
PointedEars
Jan 9 '06 #11

P: n/a
I tried to fix as much as I could. About the 4th, do you mean that I
just check br and img instead of many others? As I said my regexp is
not very consistent :(

if you have any further suggestion, it will be very helpful.

Thanks a lot Thomas, chr

Jan 9 '06 #12

P: n/a
well, the html comes from the browser, so I presume it will have a
start tag, otherwise it won't be present, right? I there is a wrong
html it's kind of impossible fix it automatically, isn't it? This is
not a general wrong html to xhtml parser but a limited and correct html
to xhtml parser.

thanks, chr

Jan 9 '06 #13

P: n/a
gabon wrote:
well, the html comes from the browser, so I presume it will
have a start tag, otherwise it won't be present, right?


Wrong. <URL:http://www.w3.org/TR/html4/index/elements.html>
PointedEars
Jan 9 '06 #14

P: n/a
gabon wrote:
I tried to fix as much as I could. About the 4th,
Please quote what you are replying to.

<URL:http://jibbering.com/faq/faq_notes/pots1.html#ps1Post>
<URL:http://www.safalra.com/special/googlegroupsreply/>

| 4. You recognize the `br' and `img' elements which are but two of some to
| have an EMPTY content model, but you do not recognize omitted end tags
| or elements and attributes that are in HTML 4 but not in XHTML 1.0
| Strict.
|
| <URL:http://www.w3.org/TR/html4/>
| <URL:http://www.w3.org/TR/xhtml1/>
do you mean that I just check br and img instead of many others?
Yes, I do.
As I said my regexp is not very consistent :(


It is not possible to transform HTML into XHTML with a one application of a
few regular expressions. Regular expressions describe regular languages,
and SGML-based markup languages as HTML and XHTML are more like
context-free languages. The possibility of optional tags for elements
makes this even harder to accomplish.
PointedEars
Jan 9 '06 #15

P: n/a
> It is not possible to transform HTML into XHTML with a one application of a
few regular expressions. Regular expressions describe regular
languages,
and SGML-based markup languages as HTML and XHTML are more like
context-free languages. The possibility of optional tags for elements
makes this even harder to accomplish.

well, it means so that we are talking about different aims that
strangely were clear to both...
Thanks for the advices, chr

Jan 9 '06 #16

P: n/a
unfortunately google groups seems very bad for quoting and to avoid
further missunderstanding...

you wrote:

It is not possible to transform HTML into XHTML with a one application
of a
few regular expressions. Regular expressions describe regular
languages,
and SGML-based markup languages as HTML and XHTML are more like
context-free languages. The possibility of optional tags for elements
makes this even harder to accomplish.
I wrote:

well, it means so that we are talking about different aims that
strangely were clear to both...

Thanks for the advices, chr

Jan 9 '06 #17

P: n/a
gabon wrote:
It is not possible to transform HTML into XHTML with a one application of
a few regular expressions. Regular expressions describe regular
languages,
and SGML-based markup languages as HTML and XHTML are more like
context-free languages. The possibility of optional tags for elements
makes this even harder to accomplish.


If you followed the advice given on quoting, it probably would not
have looked that bad.
well, it means so that we are talking about different aims that
strangely were clear to both...


You want to transform HTML into XHTML, and I was talking about that.
Where is the difference?
PointedEars
Jan 9 '06 #18

P: n/a
> You want to transform HTML into XHTML, and I was talking about that.
Where is the difference?


Well, my transformation was related to predefined html so it didn't
mean to be a universal converter, but one that could be used, for
instance, to generate a simple chart where in any case you need a
predefined swf to interpret the data passed, that can't be so
everything ;)
I made a mistake on posting that example in this thread that was
correctly more generic :)

cheers, chr

Jan 9 '06 #19

P: n/a
> 6. You use ridiculous positioning like

.auraltext
{
position: absolute;
font-size: 0;
left: -1000px;
}


Do you know other ways to make a text screen readable without showing
it?

chr

Jan 9 '06 #20

P: n/a
gabon wrote:
6. You use ridiculous positioning like

.auraltext
{
position: absolute;
font-size: 0;
left: -1000px;
}

Please provide attribution of quoted material.
Do you know other ways to make a text screen readable without showing
it?


The above is not a way of hiding text at all. The below is
Valid CSS2 that applies to what you want to accomplish:

.auraltext {
display:none;
}

@media speech {
.auraltext
{
display:inherit; /* or any other no-none value that applies */
}
}

BTW: Unscripted CSS is off-topic here and on-topic in ciwas.
PointedEars
Jan 9 '06 #21

P: n/a
gabon wrote:
6. You use ridiculous positioning like

.auraltext
{
position: absolute;
font-size: 0;
left: -1000px;
}

Please provide attribution of quoted material.
Do you know other ways to make a text screen readable without showing
it?


The above is not a way of hiding text at all. The below is
Valid CSS2 that applies to what you want to accomplish:

.auraltext {
display:none;
}

@media speech, aural { /* `aural' for backwards compatibility */
.auraltext
{
display:inherit; /* or any other no-none value that applies */
}
}

BTW: Unscripted CSS is off-topic here and on-topic in ciwas.
PointedEars
Jan 10 '06 #22

P: n/a
> .auraltext { ...

interesting, thanks, chr

Jan 10 '06 #23

This discussion thread is closed

Replies have been disabled for this discussion.