Sign In | Register Now About Bytes | Help | Site Map
Connecting Tech Pros Worldwide

Stop and Resume parsing of large XML file

Question posted by: Brian Cryer (Guest) on June 27th, 2008 07:20 PM
Currently I am using XmlReader (but I am open to other options) to parse an
XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there any
way to get the current location in the file that the XmlReader has reached
so as to be able to restore that and start from that point later?

TIA.

Martin Honnen's Avatar
Martin Honnen
Guest
n/a Posts
June 27th, 2008
07:20 PM
#2

Re: Stop and Resume parsing of large XML file
Brian Cryer wrote:
Quote:
Currently I am using XmlReader (but I am open to other options) to parse
an XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point later?


I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the
state of the XmlReader.


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Brian Cryer's Avatar
Brian Cryer
Guest
n/a Posts
June 27th, 2008
07:20 PM
#3

Re: Stop and Resume parsing of large XML file
"Martin Honnen" <mahotrash@yahoo.dewrote in message
news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
Quote:
Brian Cryer wrote:
Quote:
>Currently I am using XmlReader (but I am open to other options) to parse
>an XML file, and I would like to be able to stop/break the current parse
>(simple enough) and then resume it later (say after a reboot). Is there
>any way to get the current location in the file that the XmlReader has
>reached so as to be able to restore that and start from that point later?

>
I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the state
of the XmlReader.


I'm not too worried about the "state" of the XmlReader (I might be when I
get there but for now I'm assuming if there are any issues that I'll be able
to work round them).

I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about the
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
but one which I ought to be able to test) then as a way of "restoring" I may
be able to get away with simply putting my read point 8096 bytes before the
last known position in the underlying stream and then deal with any errors
that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
yucky, but this might work for me (if XmlReader will play ball). At least it
gives me an avenue to explore.

TA.


=?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?='s Avatar
=?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?=
Guest
n/a Posts
June 27th, 2008
07:20 PM
#4

Re: Stop and Resume parsing of large XML file


"Brian Cryer" wrote:
Quote:
"Martin Honnen" <mahotrash@yahoo.dewrote in message
news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
Quote:
Brian Cryer wrote:
Quote:
Currently I am using XmlReader (but I am open to other options) to parse
an XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point later?


I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the state
of the XmlReader.

>
I'm not too worried about the "state" of the XmlReader (I might be when I
get there but for now I'm assuming if there are any issues that I'll be able
to work round them).
>
I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about the
8KB mark when I get to the first tag in the XmlReader.
>
Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
but one which I ought to be able to test) then as a way of "restoring" I may
be able to get away with simply putting my read point 8096 bytes before the
last known position in the underlying stream and then deal with any errors
that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
yucky, but this might work for me (if XmlReader will play ball). At least it
gives me an avenue to explore.
>
TA.
>
>


It seems like a lot of work to go through, and likely prone to errors due to
machine dependencies. How are you persisting the part that was read before
the reboot? Are you no longer interested in that portion of the XML after it
has been processed?


Brian Cryer's Avatar
Brian Cryer
Guest
n/a Posts
June 27th, 2008
07:20 PM
#5

Re: Stop and Resume parsing of large XML file
"Family Tree Mike" <FamilyTreeMike@discussions.microsoft.comwrote in
message news:60AE9B88-1244-4699-90B2-AF1211FE2941@microsoft.com...
Quote:
>
"Brian Cryer" wrote:
>
Quote:
>"Martin Honnen" <mahotrash@yahoo.dewrote in message
>news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
Quote:
Brian Cryer wrote:
>Currently I am using XmlReader (but I am open to other options) to
>parse
>an XML file, and I would like to be able to stop/break the current
>parse
>(simple enough) and then resume it later (say after a reboot). Is
>there
>any way to get the current location in the file that the XmlReader has
>reached so as to be able to restore that and start from that point
>later?
>
I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the
state
of the XmlReader.

>>
>I'm not too worried about the "state" of the XmlReader (I might be when I
>get there but for now I'm assuming if there are any issues that I'll be
>able
>to work round them).
>>
>I've looked at storing the stream position, but its evident that the
>XmlReader reads in a buffer load because my stream position is at about
>the
>8KB mark when I get to the first tag in the XmlReader.
>>
>Ahh ... Martin, you've been a great "sounding board". Knowing that the
>XmlReader doesn't provide any way of doing this is useful. But thinking
>about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
>but one which I ought to be able to test) then as a way of "restoring" I
>may
>be able to get away with simply putting my read point 8096 bytes before
>the
>last known position in the underlying stream and then deal with any
>errors
>that get thrown up when XmlReader hits what it thinks is malformed XML.
>Bit
>yucky, but this might work for me (if XmlReader will play ball). At least
>it
>gives me an avenue to explore.
>>
>TA.
>>

>
It seems like a lot of work to go through, and likely prone to errors due
to
machine dependencies. How are you persisting the part that was read
before
the reboot? Are you no longer interested in that portion of the XML after
it
has been processed?


Fortunatly in this case the XML file whilst rather long is quite shallow. So
I can forget about what went on before, and if I come across a duplicate
section (which I will) then I can handle that (because each has a unique
ID). So, in short, I don't need to worry too much about what went on before
or the context. So, this isn't a generic solution by any means. (If I were
processing something like an HTML file then it would get too messy to be
viable.)

However, all this is still theory at the moment, as other work has pulled me
away from this. I am hoping to be able to prove whether or not thie approach
works for me either today or tomorrow.



Brian Cryer's Avatar
Brian Cryer
Guest
n/a Posts
June 27th, 2008
07:20 PM
#6

Re: Stop and Resume parsing of large XML file
"Brian Cryer" <www.cryer.co.ukwrote in message
news:u5SuoDXwIHA.1236@TK2MSFTNGP02.phx.gbl...
Quote:
"Family Tree Mike" <FamilyTreeMike@discussions.microsoft.comwrote in
message news:60AE9B88-1244-4699-90B2-AF1211FE2941@microsoft.com...
Quote:
>>
>"Brian Cryer" wrote:
>>
Quote:
>>"Martin Honnen" <mahotrash@yahoo.dewrote in message
>>news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
>Brian Cryer wrote:
>>Currently I am using XmlReader (but I am open to other options) to
>>parse
>>an XML file, and I would like to be able to stop/break the current
>>parse
>>(simple enough) and then resume it later (say after a reboot). Is
>>there
>>any way to get the current location in the file that the XmlReader
>>has
>>reached so as to be able to restore that and start from that point
>>later?
>>
>I don't think so. If you have an underlying stream you could store the
>stream position but I don't know of any way to store and restore the
>state
>of the XmlReader.
>>>
>>I'm not too worried about the "state" of the XmlReader (I might be when
>>I
>>get there but for now I'm assuming if there are any issues that I'll be
>>able
>>to work round them).
>>>
>>I've looked at storing the stream position, but its evident that the
>>XmlReader reads in a buffer load because my stream position is at about
>>the
>>8KB mark when I get to the first tag in the XmlReader.
>>>
>>Ahh ... Martin, you've been a great "sounding board". Knowing that the
>>XmlReader doesn't provide any way of doing this is useful. But thinking
>>about it, if the XmlReader reads in 8KB chunks (an assumption on my
>>part,
>>but one which I ought to be able to test) then as a way of "restoring" I
>>may
>>be able to get away with simply putting my read point 8096 bytes before
>>the
>>last known position in the underlying stream and then deal with any
>>errors
>>that get thrown up when XmlReader hits what it thinks is malformed XML.
>>Bit
>>yucky, but this might work for me (if XmlReader will play ball). At
>>least it
>>gives me an avenue to explore.
>>>
>>TA.
>>>

>>
>It seems like a lot of work to go through, and likely prone to errors due
>to
>machine dependencies. How are you persisting the part that was read
>before
>the reboot? Are you no longer interested in that portion of the XML
>after it
>has been processed?

>
Fortunatly in this case the XML file whilst rather long is quite shallow.
So I can forget about what went on before, and if I come across a
duplicate section (which I will) then I can handle that (because each has
a unique ID). So, in short, I don't need to worry too much about what went
on before or the context. So, this isn't a generic solution by any means.
(If I were processing something like an HTML file then it would get too
messy to be viable.)
>
However, all this is still theory at the moment, as other work has pulled
me away from this. I am hoping to be able to prove whether or not thie
approach works for me either today or tomorrow.


Incase anyone is monitoring this or wants to do something similar one day
.... I've decided to abandon this approach. It just started to get too messy.
Since the XML is well structured I'm going to implement reader from scratch
which does exactly what I need.


 
Not the answer you were looking for? Post your question . . .
189,846 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

Latest Articles: Read & Comment
  • Didn't find the answer you were looking for?
    Post Your Question
  • Top Community Contributors