473,320 Members | 2,202 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Chunking out data from a huge xml file (Ajax)

Hi
I am faced with quite a challenge. I need to open a 70-100 meg file and be
able to chunk it out using AJAX back to the client but that isn't my problem
really. What I need to do is open the file and get pieces of it out without
loading the entire thing into memory. The pieces themselves are random
although of a fixed size. If I tried to read an entire file into a stirng
and parse pieces out I use too much memory and if I use the xmlTextReader and
the skip method my memory problems are solved but it creates a huge
performance issue. If I don't have to I don't want to write my own parser
and try and read and track the tags byte by byte. Basically I would love to
be able to go to a point in my file and read from it saving only the position
of my cursor in the file (but not the actual cursor because of the sharing of
session issues)

Any help would be greatly appreciated and I apologize if this double posted.
I asked this question this morning but I haven't seen if come up in the past
few hours so I am trying again.

thanks
Jan 3 '06 #1
5 2280
Maybe XML is just not the right storage. If you have a way to preprocess
your data and store the pieces into a data store that can index it
efficiently (a database, smaller xml files indexes by a directory structure,
etc.), you will be much better off.

Writing your parser won't help. The .NET classes benefit from man years of
work, so you'll have to work really hard to get better perf (and you can
only expect a small percentage, not a major factor).

My 2 cents.

Bruno

"adhag" <ad***@discussions.microsoft.com> a écrit dans le message de news:
5D**********************************@microsoft.com...
Hi
I am faced with quite a challenge. I need to open a 70-100 meg file and
be
able to chunk it out using AJAX back to the client but that isn't my
problem
really. What I need to do is open the file and get pieces of it out
without
loading the entire thing into memory. The pieces themselves are random
although of a fixed size. If I tried to read an entire file into a stirng
and parse pieces out I use too much memory and if I use the xmlTextReader
and
the skip method my memory problems are solved but it creates a huge
performance issue. If I don't have to I don't want to write my own parser
and try and read and track the tags byte by byte. Basically I would love
to
be able to go to a point in my file and read from it saving only the
position
of my cursor in the file (but not the actual cursor because of the sharing
of
session issues)

Any help would be greatly appreciated and I apologize if this double
posted.
I asked this question this morning but I haven't seen if come up in the
past
few hours so I am trying again.

thanks

Jan 4 '06 #2
The thing is that the files reside in xml format on disc at a given location.
The size is variable but can be very large. This I cannot change. What I
need to do though is pull back parts of the file without having to load the
whole file into memory which you can do with the xmlTextReader but I cannot
take the performance hit of skipping nodes until I pass the ones already sent
to the client because the file could get very very large. If I was to parse
something myself it would be a painful process but I do know the opening
tag's name so I would have to look for it via a pattern matching of bytes.
There has to be a better way than this though.

Thanks.

Jan 4 '06 #3

"adhag" <ad***@discussions.microsoft.com> a écrit dans le message de news:
1A**********************************@microsoft.com...
The thing is that the files reside in xml format on disc at a given
location.
The size is variable but can be very large. This I cannot change. What I
need to do though is pull back parts of the file without having to load
the
whole file into memory which you can do with the xmlTextReader but I
cannot
take the performance hit of skipping nodes until I pass the ones already
sent
to the client because the file could get very very large. If I was to
parse
something myself it would be a painful process but I do know the opening
tag's name so I would have to look for it via a pattern matching of bytes.
There has to be a better way than this though.
There is no free lunch. If you want fast indexing, you have to organize your
storage so that it can be indexed efficiently, and XML is just not designed
for fast indexing from file (without loading the data in memory)

As I said before, you have to use some other indexing mechanism (a database
or directories, or a hashtable where you index the end offsets of the XML
fragments, or something else) but if you only have the big XML file on disk
and don't want to do any kind of preprocessing to build an index on it,
there is not much hope!

Bruno

Thanks.

Jan 5 '06 #4
There is a new XML processing model/API that may work for the case
you described, it is called vtd-xml (http://vtd-xml.sf.net) it consumes
less
memory (5x less than DOM ) and retains random access and is 10x
faster than DOM. It is perfect when you want to grab a chunk of xml
out of the file (is that what you mean by chunking) a demo is at
http://vtd-xml.sf.net/demo.html

Jan 9 '06 #5
This is interesting info but according to the link, "Its memory usage is
typically between 1.3x~1.5x the size of the XML document, with 1 being the
XML itself". So, the 70-100 MB file will use at least 100 MB of memory. Not
sure this is the answer.

Bruno.

<jz****@ximpleware.com> a écrit dans le message de news:
11*********************@o13g2000cwo.googlegroups.c om...
There is a new XML processing model/API that may work for the case
you described, it is called vtd-xml (http://vtd-xml.sf.net) it consumes
less
memory (5x less than DOM ) and retains random access and is 10x
faster than DOM. It is perfect when you want to grab a chunk of xml
out of the file (is that what you mean by chunking) a demo is at
http://vtd-xml.sf.net/demo.html

Jan 9 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: LearninGuru | last post by:
Hi, I have a situation where I need to return bulky PDF files from a web service method. The easiest way to do this is return base64 encoded strings. But as the PDF files are bulky this...
5
by: ggk517 | last post by:
We are trying to develop an Engineering application using PHP, Javascript with Informix as the back-end. Is it possible to retrieve data using Javascript but by accessing the Database. Say...
4
by: vunet.us | last post by:
Hi all, I am converting my app to AJAX-based. I have a form that submits some data including images. When I use AJAX XmlHttpRequest I am unable to submit the form with...
1
by: starfoxsb | last post by:
Hi all. I have a huge amount of data burnt on a CD-Rom (coming from a DB), written on XML files. I would like to show them to the user, by a web page. In my first version, I build the Html...
0
by: ranganadh | last post by:
Dear Group members, I am new to LINQ, pls help on the deeling with huge amount of data with the C# stand Alone application. I have two file, which contains more then 2 lacs lines in every...
3
by: wendallsan | last post by:
Hi All, I've stumped myself writing an app that uses Prototype and a bit of PHP. Here is what I have: I have a custom class named Default_county_init_data that, upon initialization makes...
2
by: malcster2 | last post by:
hello, i am a beginner to ajax. i have created a mysql database, which i would like to access from a web page. i have created 3 files, a html to display the data, a php file to extract the data,...
0
by: =?Utf-8?B?Q3JhaWdo?= | last post by:
Background: I am currently using WCF for remoting (using CSLA hosted in IIS) with a binding config entry like this: <binding name ="default" transferMode="Buffered" messageEncoding="Text"...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.