473,804 Members | 3,742 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Stop and Resume parsing of large XML file

Currently I am using XmlReader (but I am open to other options) to parse an
XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there any
way to get the current location in the file that the XmlReader has reached
so as to be able to restore that and start from that point later?

TIA.

Jun 27 '08 #1
5 2421
Brian Cryer wrote:
Currently I am using XmlReader (but I am open to other options) to parse
an XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point later?
I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the
state of the XmlReader.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jun 27 '08 #2
"Martin Honnen" <ma*******@yaho o.dewrote in message
news:er******** ******@TK2MSFTN GP03.phx.gbl...
Brian Cryer wrote:
>Currently I am using XmlReader (but I am open to other options) to parse
an XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point later?

I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the state
of the XmlReader.
I'm not too worried about the "state" of the XmlReader (I might be when I
get there but for now I'm assuming if there are any issues that I'll be able
to work round them).

I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about the
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
but one which I ought to be able to test) then as a way of "restoring" I may
be able to get away with simply putting my read point 8096 bytes before the
last known position in the underlying stream and then deal with any errors
that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
yucky, but this might work for me (if XmlReader will play ball). At least it
gives me an avenue to explore.

TA.

Jun 27 '08 #3


"Brian Cryer" wrote:
"Martin Honnen" <ma*******@yaho o.dewrote in message
news:er******** ******@TK2MSFTN GP03.phx.gbl...
Brian Cryer wrote:
Currently I am using XmlReader (but I am open to other options) to parse
an XML file, and I would like to be able to stop/break the current parse
(simple enough) and then resume it later (say after a reboot). Is there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point later?
I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the state
of the XmlReader.

I'm not too worried about the "state" of the XmlReader (I might be when I
get there but for now I'm assuming if there are any issues that I'll be able
to work round them).

I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about the
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
but one which I ought to be able to test) then as a way of "restoring" I may
be able to get away with simply putting my read point 8096 bytes before the
last known position in the underlying stream and then deal with any errors
that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
yucky, but this might work for me (if XmlReader will play ball). At least it
gives me an avenue to explore.

TA.

It seems like a lot of work to go through, and likely prone to errors due to
machine dependencies. How are you persisting the part that was read before
the reboot? Are you no longer interested in that portion of the XML after it
has been processed?

Jun 27 '08 #4
"Family Tree Mike" <Fa************ @discussions.mi crosoft.comwrot e in
message news:60******** *************** ***********@mic rosoft.com...
>
"Brian Cryer" wrote:
>"Martin Honnen" <ma*******@yaho o.dewrote in message
news:er******* *******@TK2MSFT NGP03.phx.gbl.. .
Brian Cryer wrote:
Currently I am using XmlReader (but I am open to other options) to
parse
an XML file, and I would like to be able to stop/break the current
parse
(simple enough) and then resume it later (say after a reboot). Is
there
any way to get the current location in the file that the XmlReader has
reached so as to be able to restore that and start from that point
later?

I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the
state
of the XmlReader.

I'm not too worried about the "state" of the XmlReader (I might be when I
get there but for now I'm assuming if there are any issues that I'll be
able
to work round them).

I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about
the
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
but one which I ought to be able to test) then as a way of "restoring" I
may
be able to get away with simply putting my read point 8096 bytes before
the
last known position in the underlying stream and then deal with any
errors
that get thrown up when XmlReader hits what it thinks is malformed XML.
Bit
yucky, but this might work for me (if XmlReader will play ball). At least
it
gives me an avenue to explore.

TA.

It seems like a lot of work to go through, and likely prone to errors due
to
machine dependencies. How are you persisting the part that was read
before
the reboot? Are you no longer interested in that portion of the XML after
it
has been processed?
Fortunatly in this case the XML file whilst rather long is quite shallow. So
I can forget about what went on before, and if I come across a duplicate
section (which I will) then I can handle that (because each has a unique
ID). So, in short, I don't need to worry too much about what went on before
or the context. So, this isn't a generic solution by any means. (If I were
processing something like an HTML file then it would get too messy to be
viable.)

However, all this is still theory at the moment, as other work has pulled me
away from this. I am hoping to be able to prove whether or not thie approach
works for me either today or tomorrow.
Jun 27 '08 #5
"Brian Cryer" <www.cryer.co.u kwrote in message
news:u5******** ******@TK2MSFTN GP02.phx.gbl...
"Family Tree Mike" <Fa************ @discussions.mi crosoft.comwrot e in
message news:60******** *************** ***********@mic rosoft.com...
>>
"Brian Cryer" wrote:
>>"Martin Honnen" <ma*******@yaho o.dewrote in message
news:er****** ********@TK2MSF TNGP03.phx.gbl. ..
Brian Cryer wrote:
Currently I am using XmlReader (but I am open to other options) to
parse
an XML file, and I would like to be able to stop/break the current
parse
(simple enough) and then resume it later (say after a reboot). Is
there
any way to get the current location in the file that the XmlReader
has
reached so as to be able to restore that and start from that point
later?

I don't think so. If you have an underlying stream you could store the
stream position but I don't know of any way to store and restore the
state
of the XmlReader.

I'm not too worried about the "state" of the XmlReader (I might be when
I
get there but for now I'm assuming if there are any issues that I'll be
able
to work round them).

I've looked at storing the stream position, but its evident that the
XmlReader reads in a buffer load because my stream position is at about
the
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the
XmlReader doesn't provide any way of doing this is useful. But thinking
about it, if the XmlReader reads in 8KB chunks (an assumption on my
part,
but one which I ought to be able to test) then as a way of "restoring" I
may
be able to get away with simply putting my read point 8096 bytes before
the
last known position in the underlying stream and then deal with any
errors
that get thrown up when XmlReader hits what it thinks is malformed XML.
Bit
yucky, but this might work for me (if XmlReader will play ball). At
least it
gives me an avenue to explore.

TA.

It seems like a lot of work to go through, and likely prone to errors due
to
machine dependencies. How are you persisting the part that was read
before
the reboot? Are you no longer interested in that portion of the XML
after it
has been processed?

Fortunatly in this case the XML file whilst rather long is quite shallow.
So I can forget about what went on before, and if I come across a
duplicate section (which I will) then I can handle that (because each has
a unique ID). So, in short, I don't need to worry too much about what went
on before or the context. So, this isn't a generic solution by any means.
(If I were processing something like an HTML file then it would get too
messy to be viable.)

However, all this is still theory at the moment, as other work has pulled
me away from this. I am hoping to be able to prove whether or not thie
approach works for me either today or tomorrow.
Incase anyone is monitoring this or wants to do something similar one day
.... I've decided to abandon this approach. It just started to get too messy.
Since the XML is well structured I'm going to implement reader from scratch
which does exactly what I need.

Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
4943
by: Dave | last post by:
Hi, I have a C# program that is parsing an XML file and loading a database table. Some of the elements may be missing but I want to continue with loading the data anyway because they may not be applicable. So, if '190' doesn't exist using the following code: doc.SelectSingleNode("descendant::companyinfo
5
6055
by: Juan Romero | last post by:
Hey guys, Has any of you played with Resume Parsing? I have a project that involves automatically parsing resumes into a database. If you have any ideas, suggestions, resources, links, etc, please let me know.
10
3006
by: Kurniawan | last post by:
Xml is very amazing technology! generalism is his power. I have create called XML resume. that it may be good for you who want to apply for job. you can design a web with your resume without putting your data one by one. just use template of XML. you can save your collection of resume and display it one by one. you can check to my site. http://kkurni.fateback.com/resume/resume.xml
3
4388
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS
4
2518
by: ravindarjobs | last post by:
hi...... i am using ms access 2003,vb6 i have a form. in that i have 2 buttons 1. start search 2 stop search when i click the "start search" button the fucntion SearchSystem() is called, it will search for a particular file in the computer(searches entire drives).
4
3409
by: shelley_2000 | last post by:
What is the best approach to collect and load Employee Resume Data from External Employees who may not have Microsoft access? If is likely they will have Microsoft Word, but not Microsoft Access. Is there any type of form/template that I could send to employees to fill out and send back for me to import into the Access Database? Or should I consider a Word Forms with some type of data extractor for loading into Access database? and if...
5
3131
by: empiresolutions | last post by:
I have some scripts that when initiated by a link click will FTP a 1G+ file from a secure server to another server using ftp_get(). Once ftp_get() is complete i then user Content Headers to open a "save as" dialog box. (working good to this point). Since the files are so large there will be times when a user loses connection or something else that interrupts the download happens. Now what i need is to be able to "finish" the download from the...
8
8425
by: sbettadpur | last post by:
hello, Is there any PHP API's for parsing resume. If it is available please let me know the details. thanks
0
3287
by: =?Utf-8?B?am1hZ2FyYW0=?= | last post by:
My program needs to do X when someone 'starts using' their Windows user account, and it should do Y when they 'stop using' their Windows user account. By 'starts using' I mean they log on, unlock the desktop, resume from hibernate/sleep, or resume a session that was paused via Switch User. By 'stop using' I mean they lock the desktop, initiate a hibernate/sleep , or choose Switch User while logged on. For context, the program is a parental...
0
9704
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9569
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10318
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10069
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6844
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5503
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5636
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4277
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3802
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.