By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,952 Members | 1,384 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,952 IT Pros & Developers. It's quick & easy.

Reading RSS XML with IE

P: n/a
When loading an rss feed into Windows IE, doc.childNodes.length always
equals 0. If I manually delete the <!DOCTYPE tag doc.childNodes.length
is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?

(Using document.implementation.createDocument with FF reads the XML
correctly with or without a DOCTYPE.)

Andrew Poulos
Jan 14 '06 #1
Share this Question
Share on Google+
28 Replies


P: n/a


Andrew Poulos wrote:
When loading an rss feed into Windows IE, doc.childNodes.length always
equals 0. If I manually delete the <!DOCTYPE tag doc.childNodes.length
is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?


Hard to tell, we need to see more code, whether you load synchronously
or asynchronously.
Some settings to play with are
doc.resoveExternals = false
doc.validateOnParse = false
And how exactly does that !DOCTYPE declaration look like? Is the XML
valid in regard to that declared document type?
Have you checked
doc.parseError.errorCode
doc.parseError.reason

While Mozilla uses with Expat a non validating parser that ignores
externals resources IE uses MSXML and MSXML can validate against a DTD.
If you want to validate against the DTD then you need to check whether
there is a parseError.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Jan 14 '06 #2

P: n/a
Martin Honnen wrote:


Andrew Poulos wrote:
When loading an rss feed into Windows IE, doc.childNodes.length always
equals 0. If I manually delete the <!DOCTYPE tag doc.childNodes.length
is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?


Hard to tell, we need to see more code, whether you load synchronously
or asynchronously.
Some settings to play with are
doc.resoveExternals = false
doc.validateOnParse = false
And how exactly does that !DOCTYPE declaration look like? Is the XML
valid in regard to that declared document type?
Have you checked
doc.parseError.errorCode
doc.parseError.reason


Checking the error reason did it. I copied the RSS feed to my hard drive
and I must've inadvertently edited a tag name.

Andrew Poulos
Jan 14 '06 #3

P: n/a
Andrew Poulos wrote:
Martin Honnen wrote:


Andrew Poulos wrote:
When loading an rss feed into Windows IE, doc.childNodes.length
always equals 0. If I manually delete the <!DOCTYPE tag
doc.childNodes.length is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?


Hard to tell, we need to see more code, whether you load synchronously
or asynchronously.
Some settings to play with are
doc.resoveExternals = false
doc.validateOnParse = false
And how exactly does that !DOCTYPE declaration look like? Is the XML
valid in regard to that declared document type?
Have you checked
doc.parseError.errorCode
doc.parseError.reason


Checking the error reason did it. I copied the RSS feed to my hard drive
and I must've inadvertently edited a tag name.


I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

This seems odd to me. Does this mean that the XML itself is invalid or
that there's some resource that I don't access to that is causing the
problem.

Andrew Poulos
Jan 15 '06 #4

P: n/a
Andrew Poulos wrote:
I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

This seems odd to me. Does this mean that the XML itself is invalid or
that there's some resource that I don't access to that is causing the
problem.


Yes, it means the XML is invalid.

The XML contains a DTD embedded inline in the document, but the DTD only
defines some entities & not any elements, so the document will fail to
validate. If you can turn off validation, or tell the parser to ignore the
inline DTD you may be handle it.

The simplest thing is probably to set the 'validateOnParse' attribute to
false although this could hide more serious errors.
Jan 15 '06 #5

P: n/a
VK

Duncan Booth wrote:
Andrew Poulos wrote:
I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

This seems odd to me. Does this mean that the XML itself is invalid or
that there's some resource that I don't access to that is causing the
problem.


Yes, it means the XML is invalid.

The XML contains a DTD embedded inline in the document, but the DTD only
defines some entities & not any elements, so the document will fail to
validate. If you can turn off validation, or tell the parser to ignore the
inline DTD you may be handle it.


Did you try (IE-only):

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>

That works just fine (means no parsing errors).

Jan 15 '06 #6

P: n/a
VK wrote:
Duncan Booth wrote:
Andrew Poulos wrote:
I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

This seems odd to me. Does this mean that the XML itself is invalid or
that there's some resource that I don't access to that is causing the
problem.

Yes, it means the XML is invalid.

The XML contains a DTD embedded inline in the document, but the DTD only
defines some entities & not any elements, so the document will fail to
validate. If you can turn off validation, or tell the parser to ignore the
inline DTD you may be handle it.


Did you try (IE-only):

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>

That works just fine (means no parsing errors).

I'm not sure what you're doing. How would I access nodes etc?

I'm using the activeX object to load the XML file so that I can later
walk it's DOM. I agree with Duncan Booth that the DTD fails to define
any ELEMENTS and so the XML is invalid.

Andrew Poulos
Jan 15 '06 #7

P: n/a


Andrew Poulos wrote:

I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."


As already suggested you can set
xmlDocument.validateOnParse = false;
before calling the load method and that way you can ensure that the DOM
is built if the XML is well-formed without being valid.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jan 15 '06 #8

P: n/a
Martin Honnen wrote:

Andrew Poulos wrote:
I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in
the DTD/Schema."


As already suggested you can set
xmlDocument.validateOnParse = false;
before calling the load method and that way you can ensure that the DOM
is built if the XML is well-formed without being valid.


I used parseError.reason to show me what was wrong with the sample RSS
XML I was testing and I didn't understand what this "new" error meant on
"real" RSS. I think I'll try to read the XML twice. If it fails trying
to validate the XML I'll try it without validating.

thanks
Andrew Poulos
Jan 15 '06 #9

P: n/a
VK

Andrew Poulos wrote:

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>
I'm not sure what you're doing. How would I access nodes etc?
It is called "Dynamic Data Island" and it implants dynamic XML data
source into document. You're accessing data later (after script.onload)
using document.scripts[i].XMLDocument + standard XML DOM methods.

Nevertheless it fails on the data source in question
"image_of_the_day.rss" for the same reason why responseXML is not set
simetimes: because RSS feed is *not* XML though it uses XML format. Its
MIME (if served properly) should be say "application/rss+xml" or other
(depending on what rss format is used: RSS, Atom). In any case nothing
similar to the needed "text/xml" as you can see. You need to have a
registered MIME in your browser (comes with installed RSS readers). Or
you have to read it as plain vanilla text and parse it manually - or
feed it manually to browser's XML parser.
DTD fails to define
any ELEMENTS and so the XML is invalid.


This sentence has no meaning for my humble mind. XML by definition can
consist of any proprietary nodes, as long as they paired properly. I
can use:
<foobar>
<foo>Foo</foo>
<bar>Bar</bar>
</foobar>

w/o any DTD "permissions" to use <foobar>, <foo>, <bar>

It fails because Content-Type doesn't match to the expected: "wants
text/xml, got application/rss+xml"

Jan 15 '06 #10

P: n/a
VK wrote:
DTD fails to define
any ELEMENTS and so the XML is invalid.


This sentence has no meaning for my humble mind. XML by definition can
consist of any proprietary nodes, as long as they paired properly. I
can use:
<foobar>
<foo>Foo</foo>
<bar>Bar</bar>
</foobar>

w/o any DTD "permissions" to use <foobar>, <foo>, <bar>

Yes, that's fine: you can use whatever tags you want without a DTD and the
only problem is that the parser won't be able to validate it.

The problem with the XML in question was that a DTD *was* specified, and
when a DTD or schema is supplied a validating parser will reject XML which
does not validate correctly.
Jan 15 '06 #11

P: n/a
VK

Duncan Booth wrote:
Yes, that's fine: you can use whatever tags you want without a DTD and the
only problem is that the parser won't be able to validate it.

The problem with the XML in question was that a DTD *was* specified, and
when a DTD or schema is supplied a validating parser will reject XML which
does not validate correctly.


Uhm... Are we still talking about the same file?
<http://www.nasa.gov/rss/image_of_the_day.rss>

I don't see and tag DTD, just a set of named entities definitions like
"nbsp" or "pound" (what are they doing any way in a UTF-8 doc?)

IE 6.0 displays it as no problem (no error of any kind) if you trick it
to believe that this is indeed XML - I just did it. btw it's RSS 2.0
format

<rss version="2.0">...</rss> has no value to IE by itself, it's not a
DTD, just another proprietary XML tag. One could use instead <foo
version="0.1 beta">...</foo> with the same results.

I guess the real problem is that both Firefox (?) and Opera (for sure)
are coming with RSS reader build in, so they know in advance what
parser to use for the given Content-Type.
IE doesn't have RSS reader build in, so you have either install one, or
get the text and handle it manually. Just a suggestion.

Jan 15 '06 #12

P: n/a
VK wrote:
Uhm... Are we still talking about the same file?
<http://www.nasa.gov/rss/image_of_the_day.rss>

I don't see and tag DTD, just a set of named entities definitions like
"nbsp" or "pound" (what are they doing any way in a UTF-8 doc?)
Yes, that's right. The DTD is specified using an internal subset rather
than an external subset. The internal subset only contains entity
declarations, but its presence is sufficient to give something for the
parser to try to validate the document. If they are going to put some
entity declarations there then they *must* either make it a complete DTD
including element declarations as well or specify an external subset.

From Extensible Markup Language (XML) 1.0, section 2.8 Prolog and Document
Type Declaration:
The document type declaration can point to an external subset (a special
kind of external entity) containing markup declarations, or can contain
the markup declarations directly in an internal subset, or can do both.
The DTD for a document consists of both subsets taken together.


So there is a DTD, but it is incomplete.
Jan 16 '06 #13

P: n/a
VK

Duncan Booth wrote:
The DTD is specified using an internal subset rather
than an external subset. The internal subset only contains entity
declarations, but its presence is sufficient to give something for the
parser to try to validate the document. If they are going to put some
entity declarations there then they *must* either make it a complete DTD
including element declarations as well or specify an external subset.

From Extensible Markup Language (XML) 1.0, section 2.8 Prolog and Document
Type Declaration:
The document type declaration can point to an external subset (a special
kind of external entity) containing markup declarations, or can contain
the markup declarations directly in an internal subset, or can do both.
The DTD for a document consists of both subsets taken together.


So there is a DTD, but it is incomplete.


Thanks for inside. It still brings us at the round one: rss file is not
XML, it's a data package using XML format. If client machine has a rss
reader, then it knows what where and how to read. Otherwise maximum you
can do is to spoof XML parser by feeding into it rss data as a real
valid XML. The parser may eat it or spit out, and for the latter it has
all rights too.

Truthfully I consider both RSS and Atom feeds as unfit to eat :-) but I
had to learn them while working on JSONet news feeds.

To OP:
Try to add the relevant namespace before reading the feed (no guarantee
though):
....
document.namespaces.add('dc','http://purl.org/dc/elements/1.1/');
....

There are more RSS specs than bugs on the tree, so you may experiment
with the namespace reference.

Jan 16 '06 #14

P: n/a
VK wrote:
Duncan Booth wrote:
The DTD is specified using an internal subset rather
than an external subset. The internal subset only contains entity
declarations, but its presence is sufficient to give something for the
parser to try to validate the document. If they are going to put some
entity declarations there then they *must* either make it a complete DTD
including element declarations as well or specify an external subset.

From Extensible Markup Language (XML) 1.0, section 2.8 Prolog and
Document Type Declaration:
> The document type declaration can point to an external subset (a
> special kind of external entity) containing markup declarations, or can
> contain the markup declarations directly in an internal subset, or can
> do both. The DTD for a document consists of both subsets taken
> together.
So there is a DTD, but it is incomplete.


Thanks for inside. It still brings us at the round one: rss file is not
XML,


Of course, since XML is but a metalanguage that defines its applications.
it's a data package using XML format.


First, RSS is an XML application. And if it is one that is to be used, for
adhering to the XML well-formedness standard, its DTD has to be complete or
the undeclared elements and attributes MUST NOT be used. If the DTD is
incomplete and the DTD author is not inclined to change that, authors will
have to augment the DTD through subset declarations in order to produce a
Valid and well-formed XML document, one that is possible to be parsed
without fatal error by an XML parser.
PointedEars
Jan 16 '06 #15

P: n/a
"VK" <sc**********@yahoo.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...

Duncan Booth wrote:
The DTD is specified using an internal subset rather
than an external subset. The internal subset only contains entity
declarations, but its presence is sufficient to give something for the
parser to try to validate the document. If they are going to put some
entity declarations there then they *must* either make it a complete DTD
including element declarations as well or specify an external subset.

From Extensible Markup Language (XML) 1.0, section 2.8 Prolog and
Document
Type Declaration:
> The document type declaration can point to an external subset (a
> special
> kind of external entity) containing markup declarations, or can contain
> the markup declarations directly in an internal subset, or can do both.
> The DTD for a document consists of both subsets taken together.


So there is a DTD, but it is incomplete.


Thanks for inside. It still brings us at the round one: rss file is not
XML, it's a data package using XML format. If client machine has a rss
reader, then it knows what where and how to read. Otherwise maximum you
can do is to spoof XML parser by feeding into it rss data as a real
valid XML. The parser may eat it or spit out, and for the latter it has
all rights too.


Oh! Interesting...

Explain to me why rss isn't XML...?

(http://www-128.ibm.com/developerwork...?dwzone=web#h0)

--
Dag.
Jan 16 '06 #16

P: n/a
VK

Dag Sunde wrote:
Explain to me why rss isn't XML...?


Because it's content type is not neither "text/xml" nor
"application/xml".

Or, to make it more visual, a .cpp file is not a C++ program, but it
can become a program if you pass it though the right parser. Or it may
remain what it is: a chaotic (from the machine point of view)
agglomeration of characters in a text file.

XML is eXtensible Markup Language, and RSS feed is a data source (with
its own MIME) where data chunks are *marked up* using XML syntaxs.

Jan 16 '06 #17

P: n/a
VK wrote:
Dag Sunde wrote:
Explain to me why rss isn't XML...?


Because it's content type is not neither "text/xml" nor
"application/xml".


For $GOD's sake, could you please read _once_ the
reference material you have been referred to?

Here's another one:

<URL:http://blogs.law.harvard.edu/tech/rss>

Now go reading and shut up for a while, TIA.
PointedEars
Jan 16 '06 #18

P: n/a
VK

Thomas 'PointedEars' Lahn wrote:
<URL:http://blogs.law.harvard.edu/tech/rss>


A perfect peace of unfit to read junk, has nothing to do with the
reality (unless it is OK to skip on 80%-95% of your visitors). Any
better links? (at least >50% coverafe)

Jan 16 '06 #19

P: n/a
VK wrote:
Thomas 'PointedEars' Lahn wrote:
<URL:http://blogs.law.harvard.edu/tech/rss>


A perfect peace of unfit to read junk, has nothing to do with the
reality (unless it is OK to skip on 80%-95% of your visitors). Any
better links? (at least >50% coverafe)


"The document type specification itself does not have anything to do
with the reality."

YMMD.
PointedEars
Jan 16 '06 #20

P: n/a
VK

Thomas 'PointedEars' Lahn wrote:
"The document type specification itself does not have anything to do
with the reality."


Totally right - if you're dealing with the Web. In this case there is
only His Majesty Content-Type.

Like you can name your file reallyRealXML.xml.xml.xml with a bunch of
declarations inside but if it's served w/o "text/xml" Content-Type then
it will be treated equally with some looseText.txt file. It may seem
unfair but it is as it is.

Jan 16 '06 #21

P: n/a
"VK" <sc**********@yahoo.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...

Dag Sunde wrote:
Explain to me why rss isn't XML...?
Because it's content type is not neither "text/xml" nor
"application/xml".


???

What the hell does Content-type got to do with XML or RSS for that
sake?

Content-type belongs to http. it doesn't have *anything* to do with
rss or xml!

Incidentally, if you happen to send RSS XML over HTTP, and doesn't send
the correct HTTP headers to the AU, you're *still* sending RSS/XML.
You're just running the risk that the UA don't understand what you're
sending.

The majority of xml don't even care about web-browsers, http, or the
internet at all. In that context, content-type and MIME don't make sense.

You are confusing/mixing a lot of terms here.
XML is eXtensible Markup Language, and RSS feed is a data source (with
its own MIME) where data chunks are *marked up* using XML syntaxs.


RSS is no such thing!
RSS is a very specific application of XML.
There is nothing "chunky" about it! To be proper RSS, it must be
completely well-formed, and validating XML.

--
Dag.
Jan 16 '06 #22

P: n/a
VK

Dag Sunde wrote:
What the hell does Content-type got to do with XML or RSS for that
sake?


Syllabically:

1) If it is *not* "text/xml" or "application/xml" (for newer IE)
Content-Type then XML Parser is not turner on and all input goes as
plain vanilla text into responseText reservoir while responseXML
reservoir remains empty.

2) If Content-Type is "application/xml+rss" then what to do and how to
deal with such content depends on MIME association on the current UA.

If there is not any association for the given Content-Type then no
parser is used and all content goes unparsed to the responseText
reservoir..

Exeptions: some Content-Types are recognized but their content access
is limited for security reasons. For instance binary files like .exe or
images are recognized but the responseText will contain only file
header string.

Jan 16 '06 #23

P: n/a
VK wrote:
Thomas 'PointedEars' Lahn wrote:
"The document type specification itself does not have anything to do
with the reality."


Totally right - if you're dealing with the Web. In this case there is
only His Majesty Content-Type.
[...]


Please do not drink and post.
PointedEars
Jan 16 '06 #24

P: n/a
VK wrote:
Dag Sunde wrote:
What the hell does Content-type got to do with XML or RSS for that
sake?


Syllabically:
[...]


You are posting nonsense again, probably by purpose. The media type
of a resource has exactly NO meaning regarding the markup language
status of a document type. RSS is an XML application, it has been
developed as such and its INVENTORS itself at Harvard call it <q
cite="http://blogs.law.harvard.edu/tech/rss">a dialect of XML</q>.

Please, dance faster!
PointedEars, buying popcorn shares
Jan 16 '06 #25

P: n/a
VK

Thomas 'PointedEars' Lahn wrote:
Please do not drink and post.


Please do not participate in threads if you have no clue about the
question. I see no RegExp issues here so I presume you can take a rest.

Averall it is just amazing: I gave a clear explanation why it doesn't
work and what should be changed to make it work (for IE). In response
some people are wasting their time and efforts trying to convince me
why it should work as it is. You can convince VK (OK, think you did
it). But I doubt very much what your arguments will produce any effect
on MSXML module. You can try though: sit down in front of it and scream
into monitor that "no matter what Content-Type it is - you have to
parse it as I want". I donno... miracles happen... but very rarely.

Also a big problem of IXMLHTTPRequest / MSXML is that everyone knows
about it, but the majority managed to learn it without visiting "that
terrible non-standard Micro$oft M$DN". People offen prefer to get the
data from the most fantastic sources, even from mozilla.org - just do
not visit Micro$oft. As the result some "knowledge" would amaze a lot
IE developers.

Some basic reading with samples:
<http://msdn.microsoft.com/library/en-us/xmlsdk/html/10bd8230-6092-4e69-b7b3-273315b57161.asp>
<http://msdn.microsoft.com/library/en-us/xmlsdk/html/e0e0ec4b-1431-45ec-a72c-8114a092a3c7.asp>

Jan 16 '06 #26

P: n/a
VK wrote:
Please do not participate in threads if you have no clue about the
question. I see no RegExp issues here so I presume you can take a rest.


VK, thank you for trying to help me with my original post.

I've read all the posts and experimented with the various suggestions
offered and what Thomas et al have said, as far as I can tell, is correct.
Andrew Poulos
Jan 17 '06 #27

P: n/a
"Andrew Poulos" <ap*****@hotmail.com> wrote in message
news:43***********************@per-qv1-newsreader-01.iinet.net.au...
VK wrote:
Please do not participate in threads if you have no clue about the
question. I see no RegExp issues here so I presume you can take a rest.


VK, thank you for trying to help me with my original post.

I've read all the posts and experimented with the various suggestions
offered and what Thomas et al have said, as far as I can tell, is correct.


And there... we rest our case...

VK, get some sleep!

;-)

--
Dag.

Jan 17 '06 #28

P: n/a
VK wrote:
Dag Sunde wrote:
What the hell does Content-type got to do with XML or RSS for that
sake?


Syllabically:

1) If it is *not* "text/xml" or "application/xml" (for newer IE)
Content-Type then XML Parser is not turner on and all input goes as
plain vanilla text into responseText reservoir while responseXML
reservoir remains empty.

I'm afraid you are utterly wrong. XML is a self-contained format whereas
the MIME type is a browser-related transport type and does *not* define
what is transported. Its intent is to *explain* to the browser what is
transported. Try it for yourself - set up a video on your website, give
it a nice standard format like (ugh) WMV or MPG, but don't give it a
MIME type. Your browser will still play it.

The content does not change even if the *description* of it (the MIME
type) says it does.
Jan 17 '06 #29

This discussion thread is closed

Replies have been disabled for this discussion.