By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,197 Members | 1,620 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,197 IT Pros & Developers. It's quick & easy.

XMLReader skip current element

P: n/a
For example, i have some part of XML file.

<AppSettings>
<Object ClassVersion="1.0.0.0" Type="AppSettings">
<Fields>
<Field Name="App_ID" Type="System.Int32">
<Value>
<int>-1</int>
</Value>
</Field>
<Field Name="AppDate Type="System.DateTime">
<Value>
<dateTime>2007-05-25T00:00:00</dateTime>
</Value>
</Field>
<Field Name="AppFileName" Type="System.String">
<Value>
<string>TEST 03222007.daf</string>
</Value>
</Field>
<Field Name="AppVersion" Type="System.String">
<Value>
<string>1.0.3.3</string>
</Value>
</Field>
<Field Name="_ClassVersion" Type="System.String">
<Value>
<string>1.0.0.0</string>
</Value>
</Field>
</Fields>
</Object>
</AppSettings>

As you can see, its corrupted, because AppDate doesn't gave second ".
I am getting exception when reader.MoveToContent (after i read App_ID)
this all are in try..catch section...
and after that i am receiving smth like string fieldname == "AppDate
Type=";
I can't understand, how i can jump to AppFileName and skip corrupted
AppDate ?
so, how in catch section i can jump to next element ? (during
application's work, i dont know what is the name of next element)

Thanks

Jun 5 '07 #1
Share this Question
Share on Google+
13 Replies


P: n/a
On Jun 5, 4:29 pm, Alex <a...@douweb.orgwrote:

<snip>
As you can see, its corrupted, because AppDate doesn't gave second ".
Right. It's an invalid XML file. I would strongly recommend that you
completely reject such files - trying to cope with broken files like
this is a real pain, and I don't know whether XmlReader (or any of the
other .NET XML types) support it.

Jon

Jun 5 '07 #2

P: n/a
<snip>
>
As you can see, its corrupted, because AppDate doesn't gave second ".

Right. It's an invalid XML file. I would strongly recommend that you
completely reject such files - trying to cope with broken files like
this is a real pain, and I don't know whether XmlReader (or any of the
other .NET XML types) support it.

Jon
Sure, i made file to be invalid manually, because i want to add some
improvements to my code, to avoid or solve this problem.

This is just fragment, now file size is 100KB and will be bigger
later.
Also, this file is like XmlSerialization of some classes i want to be
serialized.
So, the data which stored are big, and i really don't want user to
fill out all again.

So, if there is some solution about this, i will be glad to here.
Jun 5 '07 #3

P: n/a
Alex wrote:
For example, i have some part of XML file.

<AppSettings>
<Object ClassVersion="1.0.0.0" Type="AppSettings">
<Fields>
<Field Name="App_ID" Type="System.Int32">
<Value>
<int>-1</int>
</Value>
</Field>
<Field Name="AppDate Type="System.DateTime">
As you can see, its corrupted, because AppDate doesn't gave second ".
I am getting exception when reader.MoveToContent (after i read App_ID)
this all are in try..catch section...
and after that i am receiving smth like string fieldname == "AppDate
Type=";
I can't understand, how i can jump to AppFileName and skip corrupted
AppDate ?
so, how in catch section i can jump to next element ? (during
application's work, i dont know what is the name of next element)
XML has strict rules, the sample markup is not well-formed and therefore
the XML parser will not parse it but throw an exception. There is no way
to simply skip markup that is not well-formed. So you will not be able
to parse that markup successfully with XmlReader. You have to fix
whatever application generates the markup to produce well-formed XML.
With .NET using XmlWriter can help.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jun 5 '07 #4

P: n/a
ok :(

is it possible to read in some another way, but a bit automatically,
and skip problem like that as i need ?
i mean not to use XmlReader, because it can't jump, but use smth else.
But for sure i dont want to write to xmlfile all-all fields manually
(this is just serialization of classes' fields i need).

but, if exception appears - skip field

?

Jun 5 '07 #5

P: n/a
On Tue, 05 Jun 2007 09:06:51 -0700, Alex <al**@mail.ruwrote:
is it possible to read in some another way, but a bit automatically,
and skip problem like that as i need ?
i mean not to use XmlReader, because it can't jump, but use smth else.
But for sure i dont want to write to xmlfile all-all fields manually
(this is just serialization of classes' fields i need).

but, if exception appears - skip field
No. The general-purpose XML classes have no practical way to make
intelligent decisions about where to start looking again for valid data.
The only way to do what you want, even in some limited way, is to do
everything yourself.

You as a person can look at the file visually and tell where valid data
again starts, but that's because you have a LOT of "meta-information"
about the XML and can recognize things that would never appear inside
quoted text, but which are definitely part of the XML structure. If you
want your code to handle that, you will need to write it yourself, taking
advantage of this knowledge. If you do this, you will likely want to
implement your entire XML reading code from scratch, so that when you run
across something that doesn't make sense you can recover immediately based
on where you've already read.

Personally, I would not bother. As has been pointed out, the XML is
simply invalid. It's not going to be invalid unless some user hand-edits
the file and starts mucking it up, and once you assume users may do that,
it is impossible to ensure that you can in any sensible way recover from
their doings. You should definitely make sure that bad data doesn't bring
your application crashing down, but it's not reasonable for a user to
expect you to come up with some graceful way to reconstruct the invalid
data in the general case, and so you should probably not waste a lot of
time implementing code that does so.

Pete
Jun 5 '07 #6

P: n/a
Alex <al**@mail.ruwrote:
Right. It's an invalid XML file. I would strongly recommend that you
completely reject such files - trying to cope with broken files like
this is a real pain, and I don't know whether XmlReader (or any of the
other .NET XML types) support it.

Sure, i made file to be invalid manually, because i want to add some
improvements to my code, to avoid or solve this problem.
Is there any real reason why you need to handle an invalid XML file?
Most XML-based applications don't, as far as I'm aware. (Obviously XML
editors have to, but other than that...)
This is just fragment, now file size is 100KB and will be bigger
later.
Also, this file is like XmlSerialization of some classes i want to be
serialized.
So, the data which stored are big, and i really don't want user to
fill out all again.

So, if there is some solution about this, i will be glad to here.
Why would the user have to fill anything out again? Why are you
expecting invalid XML?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jun 5 '07 #7

P: n/a
On Tue, 05 Jun 2007 10:47:27 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
Is there any real reason why you need to handle an invalid XML file?
Most XML-based applications don't, as far as I'm aware. (Obviously XML
editors have to, but other than that...)
Well, and in fact I'm not sure that XML editors have to either. As an
imprecise but similar example, consider Visual Studio's code editor. If
you miss some sort of closing quote, comment closure, closing bracket,
etc. the editor makes no attempt to recover from that. It just shows you
that there's a problem, treating the file as "valid" all the way up to the
point where it knows for sure it's not valid (which is often the end of
the file).

I can imagine someone writing an XML editor that goes to a lot of effort
to try to detect and correct invalid XML, just as the OP wants to do in
his program. But it would surprise me if this is the norm, even when
looking only at XML editors.

Pete
Jun 5 '07 #8

P: n/a
Peter Duniho <Np*********@nnowslpianmk.comwrote:
Is there any real reason why you need to handle an invalid XML file?
Most XML-based applications don't, as far as I'm aware. (Obviously XML
editors have to, but other than that...)

Well, and in fact I'm not sure that XML editors have to either. As an
imprecise but similar example, consider Visual Studio's code editor. If
you miss some sort of closing quote, comment closure, closing bracket,
etc. the editor makes no attempt to recover from that. It just shows you
that there's a problem, treating the file as "valid" all the way up to the
point where it knows for sure it's not valid (which is often the end of
the file).
It depends on quite how broken you make it.

If you miss off a semi-colon or have a random extra character like "+"
between statements, it's still syntactically invalid, but it recovers
quickly. An extra closing brace certainly confuses it though, yes.
I can imagine someone writing an XML editor that goes to a lot of effort
to try to detect and correct invalid XML, just as the OP wants to do in
his program. But it would surprise me if this is the norm, even when
looking only at XML editors.
Maybe it's just the ones I've used - and that's only from memory,
admittedly...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jun 5 '07 #9

P: n/a
On Tue, 05 Jun 2007 12:47:18 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
It depends on quite how broken you make it.

If you miss off a semi-colon or have a random extra character like "+"
between statements, it's still syntactically invalid, but it recovers
quickly. An extra closing brace certainly confuses it though, yes.
I suppose "recovers" is in the eye of the beholder. What I see when one
leaves off a semi-colon is that the end of the statement where the
semi-colon was expected is flagged. However, the only reason it can do
that is that it is apparent upon seeing the first thing that doesn't make
sense in that statement (ie, the next statement) where the error is.

But I don't really see that the editor has "recovered". It is simply
pointing out the first place it has detected a problem. Just as the
compiler won't compile a file even though it could usually correctly infer
the correct location of the semicolon, it's not really like the VS editor
has judged the remainder of the file correct and accurate. In fact, it
gives up on a variety of automatic stuff once it's stumbled (for example,
I've lost count of the number of times that I don't get Intellisense
feedback because of a localized compiler-type error in my source code).

Compilers, code editors, and XML editors alike can all make inferences
about what the input data *should* look like, and try to produce correct
behavior based on those inferences. But my experience (granted, limited
in the case of XML editors, but not so limited in other areas) is that if
the input data does not comply exactly with what's expected, the user is
simply told "this data is bad...I'm not going any further until you fix
it".

Pete
Jun 5 '07 #10

P: n/a
Peter Duniho <Np*********@nnowslpianmk.comwrote:
On Tue, 05 Jun 2007 12:47:18 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
It depends on quite how broken you make it.

If you miss off a semi-colon or have a random extra character like "+"
between statements, it's still syntactically invalid, but it recovers
quickly. An extra closing brace certainly confuses it though, yes.

I suppose "recovers" is in the eye of the beholder. What I see when one
leaves off a semi-colon is that the end of the statement where the
semi-colon was expected is flagged. However, the only reason it can do
that is that it is apparent upon seeing the first thing that doesn't make
sense in that statement (ie, the next statement) where the error is.

But I don't really see that the editor has "recovered". It is simply
pointing out the first place it has detected a problem.
It recovers to the extent that it's able to find errors later on, and
you can still use Intellisense etc.

For example, take this code:

using System;

public class Test
{
static void Main()
{
int x = 5
int y = 10;

Console.WriteLine("Hello");
}
}

If you type another "Console." underneath the current call to
Console.WriteLine, VS (2005 at least) offers Intellisense.

It's hard for me to judge exactly how well VS does as opposed to
resharper, but if you change Console.WriteLine to Console.Foo, I
certainly get some feedback that Foo isn't a valid member of Console.
Just as the
compiler won't compile a file even though it could usually correctly infer
the correct location of the semicolon, it's not really like the VS editor
has judged the remainder of the file correct and accurate. In fact, it
gives up on a variety of automatic stuff once it's stumbled (for example,
I've lost count of the number of times that I don't get Intellisense
feedback because of a localized compiler-type error in my source code).
You should try Eclipse some time - it will compile (in some cases, at
least) syntactically invalid code, generating code which throws an
exception when it's got to somewhere that the compilation broke. Not
terribly handy, but quite cute.
Compilers, code editors, and XML editors alike can all make inferences
about what the input data *should* look like, and try to produce correct
behavior based on those inferences. But my experience (granted, limited
in the case of XML editors, but not so limited in other areas) is that if
the input data does not comply exactly with what's expected, the user is
simply told "this data is bad...I'm not going any further until you fix
it".
Certainly things are more limited after an error, but there's often
still *some* functionality available. If I find the time I might see
what a few XML editors do past an error - whether they still
automatically close tags, find further errors etc. Certainly the VS
2005 XML editor was able to automatically close the "blech" tag in the
below XML, despite the previous error:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
<bar>
<baz text="Hello otherText="There"/>

<blech></blech>
</bar>
</foo>

Also if you change </blechto </blech2it notices that as a second
error.
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jun 5 '07 #11

P: n/a
On Tue, 05 Jun 2007 13:40:01 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
[...]
You should try Eclipse some time - it will compile (in some cases, at
least) syntactically invalid code, generating code which throws an
exception when it's got to somewhere that the compilation broke. Not
terribly handy, but quite cute.
Well, sure. I can appreciate "cute". :) But as you say, not terribly
handy. Likewise, just how handy would it be to just skip over an invalid
section of XML, when you have no idea what the overall effect of doing so
would be? Just because the remaining XML can be parsed, that doesn't mean
that it can be *used* without the part that was erroneous.
[...] Certainly the VS
2005 XML editor was able to automatically close the "blech" tag in the
below XML, despite the previous error:
I certainly agree that it *can* be done. I just am not convinced it makes
sense to bother writing the code to do so. It does seem to me that in an
editor, where the user is actively modifying the data, it makes more sense
to put the effort in, but even there I wouldn't necessarily insist on it
(even in VS there are limits to what it can recover from, and frankly it
only handles the simplest situations). I expect it's something you see in
editors that are intended to be feature-laden, considered "heavy-duty"
(that's certainly how I'd describe VS).

In a situation where the data is static though, I don't see the use in
recovering. You never know when the data that was in error was critical
to the use of the larger XML document. Just because you can successfully
parse the rest of the document doesn't mean you should, just as just
because a compiler could make an assumption about where to insert a
missing semi-colon doesn't mean it should.

Pete
Jun 5 '07 #12

P: n/a
Peter Duniho <Np*********@nnowslpianmk.comwrote:
[...]
You should try Eclipse some time - it will compile (in some cases, at
least) syntactically invalid code, generating code which throws an
exception when it's got to somewhere that the compilation broke. Not
terribly handy, but quite cute.

Well, sure. I can appreciate "cute". :) But as you say, not terribly
handy. Likewise, just how handy would it be to just skip over an invalid
section of XML, when you have no idea what the overall effect of doing so
would be? Just because the remaining XML can be parsed, that doesn't mean
that it can be *used* without the part that was erroneous.
On the other hand, if I open an invalid XML file it's nice to know
whether there's just one error or whether the whole thing is pooched.
[...] Certainly the VS
2005 XML editor was able to automatically close the "blech" tag in the
below XML, despite the previous error:

I certainly agree that it *can* be done. I just am not convinced it makes
sense to bother writing the code to do so. It does seem to me that in an
editor, where the user is actively modifying the data, it makes more sense
to put the effort in, but even there I wouldn't necessarily insist on it
(even in VS there are limits to what it can recover from, and frankly it
only handles the simplest situations). I expect it's something you see in
editors that are intended to be feature-laden, considered "heavy-duty"
(that's certainly how I'd describe VS).
Agreed in the last bit - and I'm *certainly* not suggesting that the OP
should try to recover.
In a situation where the data is static though, I don't see the use in
recovering. You never know when the data that was in error was critical
to the use of the larger XML document. Just because you can successfully
parse the rest of the document doesn't mean you should, just as just
because a compiler could make an assumption about where to insert a
missing semi-colon doesn't mean it should.
Oh absolutely. I was only talking about editors, where it can be handy
to be able to show more than the first error.

Even with static document reading, it *may* be useful to bomb out with
an error which has a good stab at working out where all the error parts
are, rather than just the first one. That's not the same as really
trying to recover though.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jun 5 '07 #13

P: n/a
On Tue, 05 Jun 2007 15:00:04 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
On the other hand, if I open an invalid XML file it's nice to know
whether there's just one error or whether the whole thing is pooched.
Sure, I agree. If you're using an editor, that would be a nice feature to
have. But that still doesn't mean it would be a ubiquitous feature in all
XML editors (though I can see how it might appear in advanced editors).
[...]
Even with static document reading, it *may* be useful to bomb out with
an error which has a good stab at working out where all the error parts
are, rather than just the first one. That's not the same as really
trying to recover though.
Nope. :)

If I wanted to provide feedback as to a place to look for the error, I
would inform the user where the last place in the file I had valid data.
That's not really the same as trying to do anything fancy with figuring
out the erroneous part though. All it requires is keep track of how far
into the file you got before you failed to generate new valid data.

It's the parsing bad data that I think is normally going to be outside the
scope of typical software. Sorry if I seem to have taken this thread off
on a tangent. I just got set off by the statement that an XML editor
*has* to handle errors. An XML editor *could* in fact just display the
text beyond the error and tell the user "I'm not going to help you with
this until you fix it". :)

Pete
Jun 6 '07 #14

This discussion thread is closed

Replies have been disabled for this discussion.