By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,619 Members | 1,534 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,619 IT Pros & Developers. It's quick & easy.

SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?

P: n/a
Hi,

I am using VB6, SAX (implementing IVBSAXContentHandler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
.........
...................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy

Sep 9 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
ji***********@yahoo.co.uk wrote:
: Hi,

: I am using VB6, SAX (implementing IVBSAXContentHandler).

: I need to extract binary encoded data (images) from large XML files and
: decode this data and generate the appropriate images onto disk. My XML
: files have the following structure:

: <?xml version="1.0" encoding="utf-8" ?>
: <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
: <attachment>
: <primary_id>28899</primary_id>
: <filename>userguide3.pdf</filename>
: <file
: dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
: IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
: ........
: ..................
: </file>
: </attachment>
: <attachment>
: ......
: ......
: </attachment>
: </imagepla>

: The encoded data (in the <file> element) neds to be extracted and then
: decoded. I am trying to use SAX but I cannot read the whole of the
: <file> element data at once (i.e. using DOM I would use
: DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
: document into memory therefore the nodeTypedValue can be used.

: I am using the following extract of code:

: Dim strTmp as String
: Dim byArr() as Byte

: Private Sub IVBSAXContentHandler_characters(text As String)
: ...
: strTmp = strTmp & text
: ...
: btArr = strTmp
: Open strAttFile For Binary As #1
: Put #1, 1, btArr
: Close #1
: ...
: End Sub

: The problem is that only 1 line at a time of the <file> node data is
: passed to this sub. Therefore I need to reconstruct the whole of the
: binary data for the image in a temp variable (strTmp), before I
: determine the end of the file and then write it to disk.

: This takes a vast amount of time (i.e. 20 minutes to try and decode a
: 4MB image). The XML file will contain 100s of images, so really the
: current way of processing is no good at all.
: Is there a way to read the whole of the data from the <file> node in
: one go?

In SAX in general you cannot ever be sure to read the whole of the
character data at once, though there is a slim chance that the sax module
you have available in VB has an option to do that (I have no idea, I
wouldn't count on it).

But why do you need to read the whole thing into memory? Base64 can be
decoded on the fly. Each sequence of four characters gives you three
bytes of data. Read a chunk, decode multiples of four characters at one
go and write them out. You may have to worry about the last few bytes
that have to hold over from one read to the next to get a multiple of
four.

And where is the slow down? I suspect that the string concatenation is to
blame. VB may be allocating a longer string each time and then copying
all the existing data plus the appended data into it. If you keep doing
that for an eventually large string it could get very slow. Can you
preallocate a much larger string and use substr to push the data into that
single large string. (VB substr, is that right?
substr(the_line,offset,len) = data_to_insert, something like that.)
: Also, I will be extracting the binary data and then use DOM to rewrite
: the XML file without the binary data (so the user has a copy of the
: original XML file - but a much smaller one since no binary in it).
: Should I use DOM or SAXReader/SAXWriter?

If you are not changing anything else in the xml except removing the
file data (and possibly replacing that one tag) then I would think it
easiest use a sax approach. As you read the data you also spool it back
out, except that one tag. I suppose a SAXWriter would help do that.
$0.10

--

This programmer available for rent.
Sep 9 '05 #2

P: n/a

ji***********@yahoo.co.uk wrote:
Hi,

I am using VB6, SAX (implementing IVBSAXContentHandler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCB vYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NC ANL04gMTUzIA0vVCA0OTMyMTc4
........
..................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy


Try NOT to open/close the file on each "characters" event.

Sep 13 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.