473,386 Members | 1,736 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?

Hi,

I am using VB6, SAX (implementing IVBSAXContentHandler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
.........
...................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy

Sep 9 '05 #1
2 3033
ji***********@yahoo.co.uk wrote:
: Hi,

: I am using VB6, SAX (implementing IVBSAXContentHandler).

: I need to extract binary encoded data (images) from large XML files and
: decode this data and generate the appropriate images onto disk. My XML
: files have the following structure:

: <?xml version="1.0" encoding="utf-8" ?>
: <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
: <attachment>
: <primary_id>28899</primary_id>
: <filename>userguide3.pdf</filename>
: <file
: dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
: IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
: ........
: ..................
: </file>
: </attachment>
: <attachment>
: ......
: ......
: </attachment>
: </imagepla>

: The encoded data (in the <file> element) neds to be extracted and then
: decoded. I am trying to use SAX but I cannot read the whole of the
: <file> element data at once (i.e. using DOM I would use
: DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
: document into memory therefore the nodeTypedValue can be used.

: I am using the following extract of code:

: Dim strTmp as String
: Dim byArr() as Byte

: Private Sub IVBSAXContentHandler_characters(text As String)
: ...
: strTmp = strTmp & text
: ...
: btArr = strTmp
: Open strAttFile For Binary As #1
: Put #1, 1, btArr
: Close #1
: ...
: End Sub

: The problem is that only 1 line at a time of the <file> node data is
: passed to this sub. Therefore I need to reconstruct the whole of the
: binary data for the image in a temp variable (strTmp), before I
: determine the end of the file and then write it to disk.

: This takes a vast amount of time (i.e. 20 minutes to try and decode a
: 4MB image). The XML file will contain 100s of images, so really the
: current way of processing is no good at all.
: Is there a way to read the whole of the data from the <file> node in
: one go?

In SAX in general you cannot ever be sure to read the whole of the
character data at once, though there is a slim chance that the sax module
you have available in VB has an option to do that (I have no idea, I
wouldn't count on it).

But why do you need to read the whole thing into memory? Base64 can be
decoded on the fly. Each sequence of four characters gives you three
bytes of data. Read a chunk, decode multiples of four characters at one
go and write them out. You may have to worry about the last few bytes
that have to hold over from one read to the next to get a multiple of
four.

And where is the slow down? I suspect that the string concatenation is to
blame. VB may be allocating a longer string each time and then copying
all the existing data plus the appended data into it. If you keep doing
that for an eventually large string it could get very slow. Can you
preallocate a much larger string and use substr to push the data into that
single large string. (VB substr, is that right?
substr(the_line,offset,len) = data_to_insert, something like that.)
: Also, I will be extracting the binary data and then use DOM to rewrite
: the XML file without the binary data (so the user has a copy of the
: original XML file - but a much smaller one since no binary in it).
: Should I use DOM or SAXReader/SAXWriter?

If you are not changing anything else in the xml except removing the
file data (and possibly replacing that one tag) then I would think it
easiest use a sax approach. As you read the data you also spool it back
out, except that one tag. I suppose a SAXWriter would help do that.
$0.10

--

This programmer available for rent.
Sep 9 '05 #2

ji***********@yahoo.co.uk wrote:
Hi,

I am using VB6, SAX (implementing IVBSAXContentHandler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
........
..................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy


Try NOT to open/close the file on each "characters" event.

Sep 13 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

36
by: Andrea Griffini | last post by:
I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...
3
by: Dinesh_GR | last post by:
Hi all, I have many parent tags and many child tags under the respective parent.. in an XML file. On the click of a button the application should pick up the one parent and the corresponding...
43
by: Rob R. Ainscough | last post by:
I realize I'm learning web development and there is a STEEP learning curve, but so far I've had to learn: HTML XML JavaScript ASP.NET using VB.NET ..NET Framework ADO.NET SSL
3
by: Eroc | last post by:
I'm new to XML files so I'm kinda lost here. I found some example code on reading an XML file. My objective is simple. Read the whole XML file into memory. Here is part of my code: Private...
28
by: H J van Rooyen | last post by:
Hi, I want to write a small system that is transaction based. I want to split the GUI front end data entry away from the file handling and record keeping. Now it seems almost trivially easy...
111
by: Tonio Cartonio | last post by:
I have to read characters from stdin and save them in a string. The problem is that I don't know how much characters will be read. Francesco -- ------------------------------------- ...
6
by: Richard Maher | last post by:
Hi, Now that I am aware that JS on a page in Frame A can directly call a function on a page in Frame B, I no longer have to continue with my frameB.location.reload() fudge in order to get some...
21
by: Stephen.Schoenberger | last post by:
Hello, My C is a bit rusty (.NET programmer normally but need to do this in C) and I need to read in a text file that is setup as a table. The general form of the file is 00000000 USNIST00Z...
7
by: Adam David Moss | last post by:
All, Long time since I've done some proper coding in ASP and I've hit a wee snag that has got me baffled. Well two actually but the other is to do with running pages under Sun ASP so we'll not...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.