473,385 Members | 1,944 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

XmlReaders and fragments

When reading fragments, it seems like XmlReaders try to read too much. I'm
working on a file parser for a new file format, and I've run into a problem.
The format has an XML fragment for a header, then a (frequently) large amount
of binary data beneath. In certain situations, there may be XML fragments
further down in the file. An example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...

The header isn't a fixed size. In writing a little test to try and parse
out the header, I ran into what seemed like a really weird decision on the
part of the XmlReader. When I try to stream the data through using ReadOuterXml
(to get the header for processing later), it would throw an exception regarding
invalid characters MUCH further down in the file. In other words, the reader
had gotten past the ending element of the fragment, then kept going. ReadOuterXml
is just supposed to get the tags and children of the current node, which
in this case would have been the node labelled "header".

Here's a short program to demonstrate what I mean.

using System;
using System.IO;
using System.Xml;

namespace XmlParseTest
{
class Program
{
static void Main(string[] args)
{
FileStream file;
XmlReader baseReader;
XmlTextReader reader, sReader;
XmlReaderSettings readerSettings;
string xml;

file = new FileStream(@"c:\test.xml", FileMode.Open, FileAccess.Read);
file.Seek(0, SeekOrigin.Begin);

reader = new XmlTextReader(file, XmlNodeType.Element, null);
reader.Normalization = false;

readerSettings = new XmlReaderSettings();
readerSettings.ConformanceLevel = ConformanceLevel.Fragment;
readerSettings.IgnoreWhitespace = false;
readerSettings.IgnoreComments = true;
readerSettings.CheckCharacters = false;

baseReader = XmlReader.Create(reader, readerSettings);
baseReader.MoveToContent();
xml = baseReader.ReadOuterXml();

baseReader.Close();
file.Close();
}
}
}

Am I missing something?

Lee Crabtree
Jan 8 '08 #1
2 2847
Lee Crabtree wrote:
When reading fragments, it seems like XmlReaders try to read too much.
I'm working on a file parser for a new file format, and I've run into a
problem. The format has an XML fragment for a header, then a
(frequently) large amount of binary data beneath. In certain
situations, there may be XML fragments further down in the file. An
example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...

The header isn't a fixed size. In writing a little test to try and
parse out the header, I ran into what seemed like a really weird
decision on the part of the XmlReader. When I try to stream the data
through using ReadOuterXml (to get the header for processing later), it
would throw an exception regarding invalid characters MUCH further down
in the file. In other words, the reader had gotten past the ending
element of the fragment, then kept going. ReadOuterXml is just supposed
to get the tags and children of the current node, which in this case
would have been the node labelled "header".
ReadOuterXml() when positioned on an element reads everything including
the end tag and positions the reader on the next node. If there is
binary data after the end tag then you get an error. I am not sure what
you expect, ConformanceLevel.Fragment does only mean there is no
requirement to have exactly one root element, it does not mean binary
data is allowed.
If you want to consume an element but avoid that the reader is
positioned after the end tag then you might want to try whether using
ReadSubtree() does what you want, it gives you a second XmlReader you
can work with to consume only the element, once you close it the first
main reader is positioned on the end tag, not after it.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 9 '08 #2
Lee Crabtree wrote:
>
>When reading fragments, it seems like XmlReaders try to read too
much. I'm working on a file parser for a new file format, and I've
run into a problem. The format has an XML fragment for a header, then
a (frequently) large amount of binary data beneath. In certain
situations, there may be XML fragments further down in the file. An
example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...
The header isn't a fixed size. In writing a little test to try and
parse out the header, I ran into what seemed like a really weird
decision on the part of the XmlReader. When I try to stream the data
through using ReadOuterXml (to get the header for processing later),
it would throw an exception regarding invalid characters MUCH further
down in the file. In other words, the reader had gotten past the
ending element of the fragment, then kept going. ReadOuterXml is
just supposed to get the tags and children of the current node, which
in this case would have been the node labelled "header".
ReadOuterXml() when positioned on an element reads everything
including
the end tag and positions the reader on the next node. If there is
binary data after the end tag then you get an error. I am not sure
what
you expect, ConformanceLevel.Fragment does only mean there is no
requirement to have exactly one root element, it does not mean binary
data is allowed.
If you want to consume an element but avoid that the reader is
positioned after the end tag then you might want to try whether using
ReadSubtree() does what you want, it gives you a second XmlReader you
can work with to consume only the element, once you close it the first
main reader is positioned on the end tag, not after it.
Fantastic. That worked. Thanks.

Lee Crabtree
Jan 9 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Hugh Sparks | last post by:
If I configure and use two different fragment extractors on the same XML document, how can I write xslt template match patterns that distinguish which elements these fragments replaced? Details:...
4
by: Oliver Sturm | last post by:
I have an XML file that contains fragments, meaning there's no root element. Node names are in my own "test" namespace. Looks like this: <test:info date="...">Content</test:info> <test:info...
4
by: Top Gun | last post by:
If I have a string that is in a constant format of, say 0154321-001, how can I parse this into two fragments: int contractid = 0154321; int contractseq = 001;
1
by: Cybertof | last post by:
Hello, What would be the best way to synchronise 2 big files ? The files have only small modifications (not more than 10Kb changed / added insided). It would a bad choice to transfer 5Mb...
9
by: Phil_Harvey | last post by:
I am redoing my website and trying to get it to do something more exciting using Javascript. I did normal Java at university and code at work in VB.NET. I have got reasonably far into what I want...
2
by: Gustaf | last post by:
My program is constructing a document from several fragments, which are created on the fly using XSL transformations. I've managed to store the output from each transformation (each fragment) in an...
1
by: Lee Crabtree | last post by:
Is it possible to serialize and deserialize objects into XML fragments. I have a compiled type that can be configured differently based on entries in an XML file. Something like this: public...
0
by: Kitto | last post by:
I need to serialize several objects and send them over a NetworkStream. The stream won't be closed so I decided to use Xml Fragments. The problem is that a XmlSerializer object seems to call...
4
by: =?Utf-8?B?TWFyaw==?= | last post by:
Hi... We've got a lot of places in our code where we read relatively small xml user preference blocks. Currently that's creating a new XmlDocument in every spot. I was thinking we might see...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.