472,137 Members | 1,549 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,137 software developers and data experts.

Strange behaviour reading XML into DataSet

Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("test.xml");
PrintDataSet(ds);
}

public static void PrintDataSet(DataSet ds)
{
Console.WriteLine("DataSet name: " + ds.DataSetName);
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLine("\nTable: " + dt.ToString() + " (" + rowCount + " rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLine("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLine("Row " + i + ": " + dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes"?>
<test>
<product>Product 1</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes"?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill
Nov 12 '05 #1
3 4447
XML is hierarchical. DataSet is not. So there are some "hieristics" that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.WriteSchema. Then be sure to read this file back in using
XmlReadMode.ReadSchema.

You are using XmlReadMode.InferSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production" application.
"Bill C." <bi***********@yahoo.com> wrote in message
news:1e**************************@posting.google.c om...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("test.xml");
PrintDataSet(ds);
}

public static void PrintDataSet(DataSet ds)
{
Console.WriteLine("DataSet name: " + ds.DataSetName);
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLine("\nTable: " + dt.ToString() + " (" + rowCount + " rows)"); foreach (DataColumn dc in dt.Columns)
{
Console.WriteLine("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLine("Row " + i + ": " + dt.Rows[i][dc].ToString()); }
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes"?>
<test>
<product>Product 1</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes"?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #2
Thanks,

I've done some experimenting with schemas but I'll try some more.

Still, though, it seems to me like there is something more
fundamental going on here (possibly a bug?).

I just don't understand why just changing that one element (i.e.
<product>) should make such a difference in the way the DataSet
gets constructed? From everything I've read the
DataSet.DataSetName property represents the root element of the
XML hierarchy. My root element in both examples is "test". So
why is it that in one example, DataSetName is "test" and in the
other it's "NewDataSet"? Why is there a "test" table in one and
no "test" table in the other?

Also, another odd thing I've noticed is that when I call the
DataSet.WriteXML method, in both cases, the root node gets
correctly created as "test". Why would this be for the 1st
example where the DataSetName property is "NewDataSet"?

BTW, to further confuse the issue when I use VS.NET to create an
XML schema for each XML example in my original post, I get some
strange results.

For the first XML file, I get:

<xs:schema id="NewDataSet"...

For the 2nd XML file, I get:

<xs:schema id="test"...

I want the latter. There are times where I'll want to start
with a completely empty DataSet and add things to it. I was
planning on loading the schema to start out with in order to
get the tables and relations correct. But I found that when
I add something then call WriteXML, the root node in this
case is "NewDataSet" and the first sub-node is "test".
Obviously not what I want.

Very confusing.

thanks,

Bill

"Chris Lovett" <chris@!nospam!.net> wrote in message news:<vr************@corp.supernews.com>...
XML is hierarchical. DataSet is not. So there are some "hieristics" that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.WriteSchema. Then be sure to read this file back in using
XmlReadMode.ReadSchema.

You are using XmlReadMode.InferSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production" application.
"Bill C." <bi***********@yahoo.com> wrote in message
news:1e**************************@posting.google.c om...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("test.xml");
PrintDataSet(ds);
}

public static void PrintDataSet(DataSet ds)
{
Console.WriteLine("DataSet name: " + ds.DataSetName);
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLine("\nTable: " + dt.ToString() + " (" + rowCount +

" rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLine("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLine("Row " + i + ": " +

dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes"?>
<test>
<product>Product 1</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes"?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #3
Ok, FYI, I think we figured it out.

If your root element has attributes, e.g.:

<rootElem attrib1="xyz">

or direct sub-elements of the form:

<subElem>whatever</subElem>

then a DataTable will get created within the DataSet with your root
element as its name. The DataSetName property will remain
"NewDataSet" (or whatever you passed to the DataSet constructor).

Why? Because the DataSet class represents such attributes and sub-
elements as COLUMNs in a DataTable. Therefore the root element
must be created as a DataTable containing these columns.

However, if the root element has no attributes or such sub-elements
then there are no columns to create and therefore no need for the
DataSet class to create a DataTable for the root element. In these
cases, the DataSetName property is just set to the root element name
and any direct sub-elements are created as DataTables contained in
the DataSet.

So, e.g., if your XML looks like this:

<rootElem>
<subElem>
<value>whatever</value>
</subElem>
</rootElem>

The DataSetName will be "rootElem" and there will just be one DataTable
called "subElem" contained within the DataSet.

The fact that it works this way causes some interesting problems:

1) If the root element gets created as a DataTable, then all child
DataTables of the root element DataTable will have a new column added
called "rootElem_Id". This contains the root element DataTable row index
for the row with which the child DataTable entry is associated.

However, this column is created transparently so you don't even know it's
there. So when a new row gets added to the child table, this value has to
be initialized to 0. If you don't do this you'll get the following
exception:

Unhandled Exception: System.InvalidOperationException: Token StartElement
in state Epilog would result in an invalid XML document.

which is not very useful in telling you what the real problem is and
it thus was very hard to debug.

2) When you create an XML schema in VS.NET from an XML file whose
root element has attributes or the previously described sub elements,
the schema gets created with a schema id of "NewDataSet" and there
is a sub-element with the "correct" root element name. At least in
our case, this is not what we wanted. We wanted the schema id to be
our root element name. The only way we could do this was to change
our XML so there were no attributes on the root element and no
sub-elements of the form <subElem>whatever</subElem>

Thanks for the help.

Bill

"Chris Lovett" <chris@!nospam!.net> wrote in message news:<vr************@corp.supernews.com>...
XML is hierarchical. DataSet is not. So there are some "hieristics" that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.WriteSchema. Then be sure to read this file back in using
XmlReadMode.ReadSchema.

You are using XmlReadMode.InferSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production" application.
"Bill C." <bi***********@yahoo.com> wrote in message
news:1e**************************@posting.google.c om...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("test.xml");
PrintDataSet(ds);
}

public static void PrintDataSet(DataSet ds)
{
Console.WriteLine("DataSet name: " + ds.DataSetName);
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLine("\nTable: " + dt.ToString() + " (" + rowCount +

" rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLine("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLine("Row " + i + ": " +

dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes"?>
<test>
<product>Product 1</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes"?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue's Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Grzegorz Kaczor | last post: by
reply views Thread by Grzegorz Kaczor | last post: by
4 posts views Thread by ignw82 | last post: by
reply views Thread by theintrepidfox | last post: by
31 posts views Thread by gamehack | last post: by
45 posts views Thread by simnav | last post: by
8 posts views Thread by Dox33 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.