XmlReaders and fragments

Lee Crabtree

When reading fragments, it seems like XmlReaders try to read too much. I'm
working on a file parser for a new file format, and I've run into a problem.
The format has an XML fragment for a header, then a (frequently) large amount
of binary data beneath. In certain situations, there may be XML fragments
further down in the file. An example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...

The header isn't a fixed size. In writing a little test to try and parse
out the header, I ran into what seemed like a really weird decision on the
part of the XmlReader. When I try to stream the data through using ReadOuterXml
(to get the header for processing later), it would throw an exception regarding
invalid characters MUCH further down in the file. In other words, the reader
had gotten past the ending element of the fragment, then kept going. ReadOuterXml
is just supposed to get the tags and children of the current node, which
in this case would have been the node labelled "header".

Here's a short program to demonstrate what I mean.

using System;
using System.IO;
using System.Xml;

namespace XmlParseTest
{
class Program
{
static void Main(string[] args)
{
FileStream file;
XmlReader baseReader;
XmlTextReader reader, sReader;
XmlReaderSettings readerSettings;
string xml;

file = new FileStream(@"c:\test.xml", FileMode.Open, FileAccess.Read);
file.Seek(0, SeekOrigin.Begin);

reader = new XmlTextReader(file, XmlNodeType.Element, null);
reader.Normalization = false;

readerSettings = new XmlReaderSettings();
readerSettings.ConformanceLevel = ConformanceLevel.Fragment;
readerSettings.IgnoreWhitespace = false;
readerSettings.IgnoreComments = true;
readerSettings.CheckCharacters = false;

baseReader = XmlReader.Create(reader, readerSettings);
baseReader.MoveToContent();
xml = baseReader.ReadOuterXml();

baseReader.Close();
file.Close();
}
}
}

Am I missing something?

Lee Crabtree

Jan 8 '08 #1

Subscribe Post Reply

2847

Martin Honnen

Lee Crabtree wrote:

When reading fragments, it seems like XmlReaders try to read too much.
I'm working on a file parser for a new file format, and I've run into a
problem. The format has an XML fragment for a header, then a
(frequently) large amount of binary data beneath. In certain
situations, there may be XML fragments further down in the file. An
example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...

The header isn't a fixed size. In writing a little test to try and
parse out the header, I ran into what seemed like a really weird
decision on the part of the XmlReader. When I try to stream the data
through using ReadOuterXml (to get the header for processing later), it
would throw an exception regarding invalid characters MUCH further down
in the file. In other words, the reader had gotten past the ending
element of the fragment, then kept going. ReadOuterXml is just supposed
to get the tags and children of the current node, which in this case
would have been the node labelled "header".

ReadOuterXml() when positioned on an element reads everything including
the end tag and positions the reader on the next node. If there is
binary data after the end tag then you get an error. I am not sure what
you expect, ConformanceLevel.Fragment does only mean there is no
requirement to have exactly one root element, it does not mean binary
data is allowed.
If you want to consume an element but avoid that the reader is
positioned after the end tag then you might want to try whether using
ReadSubtree() does what you want, it gives you a second XmlReader you
can work with to consume only the element, once you close it the first
main reader is positioned on the end tag, not after it.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Jan 9 '08 #2

Lee Crabtree

Lee Crabtree wrote:

>
>When reading fragments, it seems like XmlReaders try to read too
much. I'm working on a file parser for a new file format, and I've
run into a problem. The format has an XML fragment for a header, then
a (frequently) large amount of binary data beneath. In certain
situations, there may be XML fragments further down in the file. An
example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...
The header isn't a fixed size. In writing a little test to try and
parse out the header, I ran into what seemed like a really weird
decision on the part of the XmlReader. When I try to stream the data
through using ReadOuterXml (to get the header for processing later),
it would throw an exception regarding invalid characters MUCH further
down in the file. In other words, the reader had gotten past the
ending element of the fragment, then kept going. ReadOuterXml is
just supposed to get the tags and children of the current node, which
in this case would have been the node labelled "header".

ReadOuterXml() when positioned on an element reads everything
including
the end tag and positions the reader on the next node. If there is
binary data after the end tag then you get an error. I am not sure
what
you expect, ConformanceLevel.Fragment does only mean there is no
requirement to have exactly one root element, it does not mean binary
data is allowed.
If you want to consume an element but avoid that the reader is
positioned after the end tag then you might want to try whether using
ReadSubtree() does what you want, it gives you a second XmlReader you
can work with to consume only the element, once you close it the first
main reader is positioned on the end tag, not after it.

Fantastic. That worked. Thanks.

Lee Crabtree

Jan 9 '08 #3

by: Hugh Sparks | last post by:

If I configure and use two different fragment extractors on the same XML document, how can I write xslt template match patterns that distinguish which elements these fragments replaced? Details:...

.NET Framework

Reading fragments / .net 2.0

by: Oliver Sturm | last post by:

I have an XML file that contains fragments, meaning there's no root element. Node names are in my own "test" namespace. Looks like this: <test:info date="...">Content</test:info> <test:info...

.NET Framework

How do I parse this string into int fragments?

by: Top Gun | last post by:

If I have a string that is in a constant format of, say 0154321-001, how can I parse this into two fragments: int contractid = 0154321; int contractseq = 001;

C# / C Sharp

File Synchro (Checksum, Fragments, ...)

by: Cybertof | last post by:

Hello, What would be the best way to synchronise 2 big files ? The files have only small modifications (not more than 10Kb changed / added insided). It would a bad choice to transfer 5Mb...

C# / C Sharp

Javascript problems from a lightweight - xmlhttprequest and html fragments

by: Phil_Harvey | last post by:

I am redoing my website and trying to get it to do something more exciting using Javascript. I did normal Java at university and code at work in VB.NET. I have got reasonably far into what I want...

Javascript

Including XmlDocument fragments in an XmlTextWriter

by: Gustaf | last post by:

My program is constructing a document from several fragments, which are created on the fly using XSL transformations. I've managed to store the output from each transformation (each fragment) in an...

.NET Framework

serializing and deserializing XML fragments

by: Lee Crabtree | last post by:

Is it possible to serialize and deserialize objects into XML fragments. I have a compiled type that can be configured differently based on entries in an XML file. Something like this: public...

.NET Framework

Serialization and Fragments

by: Kitto | last post by:

I need to serialize several objects and send them over a NetworkStream. The stream won't be closed so I decided to use Xml Fragments. The problem is that a XmlSerializer object seems to call...

.NET Framework

re-using XmlDocument to load fragments

by: =?Utf-8?B?TWFyaw==?= | last post by:

Hi... We've got a lot of places in our code where we read relatively small xml user preference blocks. Currently that's creating a new XmlDocument in every spot. I was thinking we might see...

.NET Framework

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Similar topics