473,851 Members | 2,032 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?

Hi,

I am using VB6, SAX (implementing IVBSAXContentHa ndler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:s chemas-microsoft-com:datatypes">
<attachment>
<primary_id>288 99</primary_id>
<filename>userg uide3.pdf</filename>
<file
dt:dt="bin.base 64">JVBERi0xLjM NJeLjz9MNCjU5NT AgMCBvYmoNPDwgD S9MaW5lYXJpemVk IDEgDS9PIDU5NTM gDS9I
IFsgMTM4OSAzODY 0IF0gDS9MIDUwNT EyOTggDS9FIDEwM TQ3NCANL04gMTUz IA0vVCA0OTMyMTc 4
.........
............... ....
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeType dValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHa ndler_character s(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy

Sep 9 '05 #1
2 3077
ji***********@y ahoo.co.uk wrote:
: Hi,

: I am using VB6, SAX (implementing IVBSAXContentHa ndler).

: I need to extract binary encoded data (images) from large XML files and
: decode this data and generate the appropriate images onto disk. My XML
: files have the following structure:

: <?xml version="1.0" encoding="utf-8" ?>
: <imagepla xmlns:dt="urn:s chemas-microsoft-com:datatypes">
: <attachment>
: <primary_id>288 99</primary_id>
: <filename>userg uide3.pdf</filename>
: <file
: dt:dt="bin.base 64">JVBERi0xLjM NJeLjz9MNCjU5NT AgMCBvYmoNPDwgD S9MaW5lYXJpemVk IDEgDS9PIDU5NTM gDS9I
: IFsgMTM4OSAzODY 0IF0gDS9MIDUwNT EyOTggDS9FIDEwM TQ3NCANL04gMTUz IA0vVCA0OTMyMTc 4
: ........
: ............... ...
: </file>
: </attachment>
: <attachment>
: ......
: ......
: </attachment>
: </imagepla>

: The encoded data (in the <file> element) neds to be extracted and then
: decoded. I am trying to use SAX but I cannot read the whole of the
: <file> element data at once (i.e. using DOM I would use
: DOMDoc.nodeType dValue). I understand that the DOM loads the whole
: document into memory therefore the nodeTypedValue can be used.

: I am using the following extract of code:

: Dim strTmp as String
: Dim byArr() as Byte

: Private Sub IVBSAXContentHa ndler_character s(text As String)
: ...
: strTmp = strTmp & text
: ...
: btArr = strTmp
: Open strAttFile For Binary As #1
: Put #1, 1, btArr
: Close #1
: ...
: End Sub

: The problem is that only 1 line at a time of the <file> node data is
: passed to this sub. Therefore I need to reconstruct the whole of the
: binary data for the image in a temp variable (strTmp), before I
: determine the end of the file and then write it to disk.

: This takes a vast amount of time (i.e. 20 minutes to try and decode a
: 4MB image). The XML file will contain 100s of images, so really the
: current way of processing is no good at all.
: Is there a way to read the whole of the data from the <file> node in
: one go?

In SAX in general you cannot ever be sure to read the whole of the
character data at once, though there is a slim chance that the sax module
you have available in VB has an option to do that (I have no idea, I
wouldn't count on it).

But why do you need to read the whole thing into memory? Base64 can be
decoded on the fly. Each sequence of four characters gives you three
bytes of data. Read a chunk, decode multiples of four characters at one
go and write them out. You may have to worry about the last few bytes
that have to hold over from one read to the next to get a multiple of
four.

And where is the slow down? I suspect that the string concatenation is to
blame. VB may be allocating a longer string each time and then copying
all the existing data plus the appended data into it. If you keep doing
that for an eventually large string it could get very slow. Can you
preallocate a much larger string and use substr to push the data into that
single large string. (VB substr, is that right?
substr(the_line ,offset,len) = data_to_insert, something like that.)
: Also, I will be extracting the binary data and then use DOM to rewrite
: the XML file without the binary data (so the user has a copy of the
: original XML file - but a much smaller one since no binary in it).
: Should I use DOM or SAXReader/SAXWriter?

If you are not changing anything else in the xml except removing the
file data (and possibly replacing that one tag) then I would think it
easiest use a sax approach. As you read the data you also spool it back
out, except that one tag. I suppose a SAXWriter would help do that.
$0.10

--

This programmer available for rent.
Sep 9 '05 #2

ji***********@y ahoo.co.uk wrote:
Hi,

I am using VB6, SAX (implementing IVBSAXContentHa ndler).

I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:s chemas-microsoft-com:datatypes">
<attachment>
<primary_id>288 99</primary_id>
<filename>userg uide3.pdf</filename>
<file
dt:dt="bin.base 64">JVBERi0xLjM NJeLjz9MNCjU5NT AgMCBvYmoNPDwgD S9MaW5lYXJpemVk IDEgDS9PIDU5NTM gDS9I
IFsgMTM4OSAzODY 0IF0gDS9MIDUwNT EyOTggDS9FIDEwM TQ3NCANL04gMTUz IA0vVCA0OTMyMTc 4
........
............... ...
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>

The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeType dValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.

I am using the following extract of code:

Dim strTmp as String
Dim byArr() as Byte

Private Sub IVBSAXContentHa ndler_character s(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub

The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.

This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?

Greatly appreciated. Thanks.

Jimmy


Try NOT to open/close the file on each "characters " event.

Sep 13 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

36
6425
by: Andrea Griffini | last post by:
I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but something I'll need in this case is some experience-based set of rules about how to use python in this context. For example... is defining readonly attributes in classes worth the hassle ? Does duck-typing scale well in complex
3
4840
by: Dinesh_GR | last post by:
Hi all, I have many parent tags and many child tags under the respective parent.. in an XML file. On the click of a button the application should pick up the one parent and the corresponding child and bind it to a grid. Like that it goes on for the next parent and the related child for the next button click.
43
3440
by: Rob R. Ainscough | last post by:
I realize I'm learning web development and there is a STEEP learning curve, but so far I've had to learn: HTML XML JavaScript ASP.NET using VB.NET ..NET Framework ADO.NET SSL
3
1625
by: Eroc | last post by:
I'm new to XML files so I'm kinda lost here. I found some example code on reading an XML file. My objective is simple. Read the whole XML file into memory. Here is part of my code: Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim query As String
28
2654
by: H J van Rooyen | last post by:
Hi, I want to write a small system that is transaction based. I want to split the GUI front end data entry away from the file handling and record keeping. Now it seems almost trivially easy using the sockets module to communicate between machines on the same LAN, so that I want to do the record keeping on one machine.
111
20101
by: Tonio Cartonio | last post by:
I have to read characters from stdin and save them in a string. The problem is that I don't know how much characters will be read. Francesco -- ------------------------------------- http://www.riscossione.info/
6
1955
by: Richard Maher | last post by:
Hi, Now that I am aware that JS on a page in Frame A can directly call a function on a page in Frame B, I no longer have to continue with my frameB.location.reload() fudge in order to get some code to run in B's context. This is fantastic news, and I'm having a ball! But before running off with all my convert's zeal and converting all my <script src="common.js"copy-books to direct parent.frame.function() calls I'd like to pause for a...
21
3080
by: Stephen.Schoenberger | last post by:
Hello, My C is a bit rusty (.NET programmer normally but need to do this in C) and I need to read in a text file that is setup as a table. The general form of the file is 00000000 USNIST00Z 00000000_00 0 000 000 000 0000 000 I need to read the file line by line and eventually parse out each piece of the file and store in arrays that correspond to the specific
7
6294
by: Adam David Moss | last post by:
All, Long time since I've done some proper coding in ASP and I've hit a wee snag that has got me baffled. Well two actually but the other is to do with running pages under Sun ASP so we'll not go there! Anyway, I digress... I've cobbled together some code as shown below (removed error checking etc for simplicity). This uses ServerXMLHTTP to grab an XML file off a remote server. When querying the result, however, the nodeValue...
0
9897
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
11019
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10356
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7906
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5736
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5933
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4549
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3179
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.