473,581 Members | 2,649 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Strange behaviour reading XML into DataSet

Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("tes t.xml");
PrintDataSet(ds );
}

public static void PrintDataSet(Da taSet ds)
{
Console.WriteLi ne("DataSet name: " + ds.DataSetName) ;
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLi ne("\nTable: " + dt.ToString() + " (" + rowCount + " rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLi ne("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLi ne("Row " + i + ": " + dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes "?>
<test>
<product>Produc t 1</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes "?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet ". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill
Nov 12 '05 #1
3 4579
XML is hierarchical. DataSet is not. So there are some "hieristics " that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.Wr iteSchema. Then be sure to read this file back in using
XmlReadMode.Rea dSchema.

You are using XmlReadMode.Inf erSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production " application.
"Bill C." <bi***********@ yahoo.com> wrote in message
news:1e******** *************** ***@posting.goo gle.com...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("tes t.xml");
PrintDataSet(ds );
}

public static void PrintDataSet(Da taSet ds)
{
Console.WriteLi ne("DataSet name: " + ds.DataSetName) ;
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLi ne("\nTable: " + dt.ToString() + " (" + rowCount + " rows)"); foreach (DataColumn dc in dt.Columns)
{
Console.WriteLi ne("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLi ne("Row " + i + ": " + dt.Rows[i][dc].ToString()); }
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes "?>
<test>
<product>Produc t 1</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes "?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet ". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #2
Thanks,

I've done some experimenting with schemas but I'll try some more.

Still, though, it seems to me like there is something more
fundamental going on here (possibly a bug?).

I just don't understand why just changing that one element (i.e.
<product>) should make such a difference in the way the DataSet
gets constructed? From everything I've read the
DataSet.DataSet Name property represents the root element of the
XML hierarchy. My root element in both examples is "test". So
why is it that in one example, DataSetName is "test" and in the
other it's "NewDataSet "? Why is there a "test" table in one and
no "test" table in the other?

Also, another odd thing I've noticed is that when I call the
DataSet.WriteXM L method, in both cases, the root node gets
correctly created as "test". Why would this be for the 1st
example where the DataSetName property is "NewDataSet "?

BTW, to further confuse the issue when I use VS.NET to create an
XML schema for each XML example in my original post, I get some
strange results.

For the first XML file, I get:

<xs:schema id="NewDataSet" ...

For the 2nd XML file, I get:

<xs:schema id="test"...

I want the latter. There are times where I'll want to start
with a completely empty DataSet and add things to it. I was
planning on loading the schema to start out with in order to
get the tables and relations correct. But I found that when
I add something then call WriteXML, the root node in this
case is "NewDataSet " and the first sub-node is "test".
Obviously not what I want.

Very confusing.

thanks,

Bill

"Chris Lovett" <chris@!nospam! .net> wrote in message news:<vr******* *****@corp.supe rnews.com>...
XML is hierarchical. DataSet is not. So there are some "hieristics " that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.Wr iteSchema. Then be sure to read this file back in using
XmlReadMode.Rea dSchema.

You are using XmlReadMode.Inf erSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production " application.
"Bill C." <bi***********@ yahoo.com> wrote in message
news:1e******** *************** ***@posting.goo gle.com...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("tes t.xml");
PrintDataSet(ds );
}

public static void PrintDataSet(Da taSet ds)
{
Console.WriteLi ne("DataSet name: " + ds.DataSetName) ;
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLi ne("\nTable: " + dt.ToString() + " (" + rowCount +

" rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLi ne("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLi ne("Row " + i + ": " +

dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes "?>
<test>
<product>Produc t 1</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes "?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet ". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #3
Ok, FYI, I think we figured it out.

If your root element has attributes, e.g.:

<rootElem attrib1="xyz">

or direct sub-elements of the form:

<subElem>whatev er</subElem>

then a DataTable will get created within the DataSet with your root
element as its name. The DataSetName property will remain
"NewDataSet " (or whatever you passed to the DataSet constructor).

Why? Because the DataSet class represents such attributes and sub-
elements as COLUMNs in a DataTable. Therefore the root element
must be created as a DataTable containing these columns.

However, if the root element has no attributes or such sub-elements
then there are no columns to create and therefore no need for the
DataSet class to create a DataTable for the root element. In these
cases, the DataSetName property is just set to the root element name
and any direct sub-elements are created as DataTables contained in
the DataSet.

So, e.g., if your XML looks like this:

<rootElem>
<subElem>
<value>whatever </value>
</subElem>
</rootElem>

The DataSetName will be "rootElem" and there will just be one DataTable
called "subElem" contained within the DataSet.

The fact that it works this way causes some interesting problems:

1) If the root element gets created as a DataTable, then all child
DataTables of the root element DataTable will have a new column added
called "rootElem_I d". This contains the root element DataTable row index
for the row with which the child DataTable entry is associated.

However, this column is created transparently so you don't even know it's
there. So when a new row gets added to the child table, this value has to
be initialized to 0. If you don't do this you'll get the following
exception:

Unhandled Exception: System.InvalidO perationExcepti on: Token StartElement
in state Epilog would result in an invalid XML document.

which is not very useful in telling you what the real problem is and
it thus was very hard to debug.

2) When you create an XML schema in VS.NET from an XML file whose
root element has attributes or the previously described sub elements,
the schema gets created with a schema id of "NewDataSet " and there
is a sub-element with the "correct" root element name. At least in
our case, this is not what we wanted. We wanted the schema id to be
our root element name. The only way we could do this was to change
our XML so there were no attributes on the root element and no
sub-elements of the form <subElem>whatev er</subElem>

Thanks for the help.

Bill

"Chris Lovett" <chris@!nospam! .net> wrote in message news:<vr******* *****@corp.supe rnews.com>...
XML is hierarchical. DataSet is not. So there are some "hieristics " that
are used to "normalize" the XML into a relational shape. One of those
hieristics is that if an element contains children then it is mapped to a
table, just in case those children are repeated somewhere.

You can make the DataSet behave much more predictably by providing it with
an XML Schema. To see how this is done call WriteXml with
XmlWriteMode.Wr iteSchema. Then be sure to read this file back in using
XmlReadMode.Rea dSchema.

You are using XmlReadMode.Inf erSchema which is one of those "let's help
people get started" kinds of features. Not really something you should use
in a "production " application.
"Bill C." <bi***********@ yahoo.com> wrote in message
news:1e******** *************** ***@posting.goo gle.com...
Hi,

I've got a simple console app that just reads an XML file into a DataSet
then prints out a description of each table in the DataSet, including
column names and row values for each column. I'm getting some strange
results depending the input XML file I use. I was wondering if somebody
could help me understand what is going on or point me to a good reference.

The code for my program looks like this:

using System;
using System.Data;

static void Main(string[] args)
{
DataSet ds = new DataSet();
ds.ReadXml("tes t.xml");
PrintDataSet(ds );
}

public static void PrintDataSet(Da taSet ds)
{
Console.WriteLi ne("DataSet name: " + ds.DataSetName) ;
foreach (DataTable dt in ds.Tables)
{
int rowCount = dt.Rows.Count;
Console.WriteLi ne("\nTable: " + dt.ToString() + " (" + rowCount +

" rows)");
foreach (DataColumn dc in dt.Columns)
{
Console.WriteLi ne("Column: " + dc.ColumnName);
for (int i = 0; i < rowCount; i++)
{
Console.WriteLi ne("Row " + i + ": " +

dt.Rows[i][dc].ToString());
}
}
}
}

When "test.xml" looks like this:

<?xml version="1.0" standalone="yes "?>
<test>
<product>Produc t 1</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: NewDataSet

Table: test (1 rows)
Column: product
Row 0: Product 1
Column: test_Id
Row 0: 0

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.
Column: test_Id
Row 0: 0
Row 1: 0

However when "test.xml" looks like this (NOTE: the only difference
is in the <product> element):

<?xml version="1.0" standalone="yes "?>
<test>
<product>
<name>Product 1</name>
</product>
<customer>
<name>Bill</name>
<company>Bill 's Co.</company>
</customer>
<customer>
<name>Sue</name>
<company>Sue' s Co.</company>
</customer>
</test>

The output looks like this:

DataSet name: test

Table: product (1 rows)
Column: name
Row 0: Product 1

Table: customer (2 rows)
Column: name
Row 0: Bill
Row 1: Sue
Column: company
Row 0: Bill's Co.
Row 1: Sue's Co.

Questions:

1) I think I see why the test_Id column gets created "on the fly" for
the "customer" table in the first XML example. A "test" table got
created which might have multiple rows. So each row of the "customer"
table has to have some way of relating back to its parent table row,
which is what "test_Id" contains. But why does the "test" table
itself need a "test_Id" column?

2) In the first example a "test" table got created and the name of the
DataSet stayed at the default "NewDataSet ". In the second example there
was no "test" table created. Instead the name of the DataSet is "test"
and there is a "product" table and "customer" table. Since there is no
"test" table there is no need for the "test_Id" column in any of the
tables that do exist. Why does it work this way? Why would just changing
that one element (ie. <product>) make such a big difference in the way the
way the DataSet is constructed.

Any help appreciated. And again if you know of some good reference books
or links where these issues are discussed that would be fantastic.

Thanks in advance.

Bill

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1524
by: Grzegorz Kaczor | last post by:
Hello all, I've got a VERY strange network problem with Win2k Server and .NET. I've got one central server (hub) getting raw binary data (files) from many locations. Both server and clients are written in C# The server is quite simple: two threads, one accepts new connections and decides whether the client is authenticated to send data or...
0
1240
by: Grzegorz Kaczor | last post by:
Hello, I've got a VERY strange network problem with Win2k Server and .NET. I've got one central server (hub) getting raw binary data (files) from many locations. Both server and clients are written in C# The server is quite simple: two threads, one accepts new connections and decides whether the client is authenticated to send data or...
4
1452
by: ignw82 | last post by:
Hi all, I have a strange behaviour in dataview, maybe you can help me. the behaviour is like this : First I made a datatable (odt) in data set, and then I created a dataview using this datatable. I added a row to data table by using (let say the dataview created is odv) odv.table.rows.add(odr). When i saw in first datatable (odt), there...
10
4634
by: Oscar Thornell | last post by:
Hi, I generate and temporary saves a text file to disk. Later I upload this file to Microsoft MapPoint (not so important). The file needs to be in UTF-8 encoding and I explicitly use the "Encoding.UTF8" in the constructor like this: StreamWriter writer = new StreamWriter(file, Encoding.UTF8); When I do this the StreamWriter inserts an...
0
1413
by: theintrepidfox | last post by:
Dear Group I came accross a very annoying behaviour of Visual Studio, giving me six hours of headache till I found the solution. This post is mainly for fellow developers for reference as it took me ages reading through tons of posts till I found an answer. However, I'm also interested why Visual Studio behaves that way. If anyone has a...
31
2852
by: gamehack | last post by:
Hi all, I've been testing out a small function and surprisingly it does not work okay. Here's the full code listing: #include "stdlib.h" #include "stdio.h" char* escaped_byte_cstr_ref(char byte); int main (int argc, const char * argv)
45
3327
by: simnav | last post by:
In the following code something strange happens ! If I keep pressed any of ALT+Arrow, keys, they are extracted two times from buffer then getch seems to stop; if I release and press again ALT+arrow nothing changes: the only way to exit from this condition is press another key a single time. What seems to happen is that kbhit say some keys are...
8
5289
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples. (Sorry, long email) The first two examples are behaving normal, the thirth is strange....... I wrote the following flabbergasting code:...
160
5806
by: DiAvOl | last post by:
Hello everyone, Please take a look at the following code: #include <stdio.h> typedef struct person { char name; int age; } Person;
0
7869
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7797
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8151
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7900
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5677
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3806
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3830
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2302
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
1139
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.