473,320 Members | 1,817 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

XML Cleaner

I was hoping that someone could point me in the right direction. I'm
looking to develop a tool that will run an XML file against an XSD
schema and if a node doesn't conform to the schema, remove that node
from the xml (or output a new xml without that node) and continue
through the whole document until it is "Clean" (valid).
The code to validate against the schema is strightforward, but how do I
use the exceptions thrown by the XmlValidatingReader to clean the XML file?
thanks!

Nov 12 '05 #1
19 2330
First, you shouldn't let the validating reader throw exceptions, but call a
delegate you provide for the ValidationEventHandler.
Even so, as I explained in
http://weblogs.asp.net/cazzu/archive.../24/95588.aspx, you can't just
call sender.Skip() because due to a bug in v1.x, it's not set to the reader
raising the event.
Therefore, you will need to keep a reference to the reader at the class
level in a field, and in the validation handler method, skip the current
node:

private void OnValidationError(object sender, ValidationEventArgs e)
{
if (e.Severity = XmlSeverityType.Error)
{
// Accumulate error, set flag.
_thereader.Skip();
}
}

That should do the job.
--
Daniel Cazzulino [MVP XML]
Clarius Consulting SA
http://weblogs.asp.net/cazzu
http://aspnet2.com
"Matthew Wieder" <De*********@SatoriGroupInc.com> wrote in message
news:Oh**************@TK2MSFTNGP11.phx.gbl...
I was hoping that someone could point me in the right direction. I'm
looking to develop a tool that will run an XML file against an XSD
schema and if a node doesn't conform to the schema, remove that node
from the xml (or output a new xml without that node) and continue
through the whole document until it is "Clean" (valid).
The code to validate against the schema is strightforward, but how do I
use the exceptions thrown by the XmlValidatingReader to clean the XML file? thanks!

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.655 / Virus Database: 420 - Release Date: 08/04/2004
Nov 12 '05 #2
I don't think I was clear. I am calling a delegate to handle the event.
The question is, once I have accumulated all of the errors, how do I
then go and programatically edit the xml document based on where those
errors are? The errors give a line number, but how do I map that to a
node in the xml document that I can then go and remove?
thanks!

Daniel Cazzulino [MVP XML] wrote:
First, you shouldn't let the validating reader throw exceptions, but call a
delegate you provide for the ValidationEventHandler.
Even so, as I explained in
http://weblogs.asp.net/cazzu/archive.../24/95588.aspx, you can't just
call sender.Skip() because due to a bug in v1.x, it's not set to the reader
raising the event.
Therefore, you will need to keep a reference to the reader at the class
level in a field, and in the validation handler method, skip the current
node:

private void OnValidationError(object sender, ValidationEventArgs e)
{
if (e.Severity = XmlSeverityType.Error)
{
// Accumulate error, set flag.
_thereader.Skip();
}
}

That should do the job.


Nov 12 '05 #3
Thanks for Daniel's quick response.

Hi Matthew,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to remove the invalid nodes
from an XmlDocument against an XSD. If there is any misunderstanding,
please feel free to let me know.

Based on my experience, it's very hard to achieve this. As the validater
only returns the line number and line position of the invalidate node, we
have to write our own code to map the file position to Xml node. However,
the errors reported from the validater are only referring to some key
nodes, it might have something to do with other nodes. So removing a single
node might not make the document validate.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #4
You have the issue correct. As you say:
"As the validater only returns the line number and line position of the
invalidate node, we have to write our own code to map the file position
to Xml node."
This is the code I qould like to write. Can you advise how I would do this?
thanks!

Kevin Yu [MSFT] wrote:
Thanks for Daniel's quick response.

Hi Matthew,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to remove the invalid nodes
from an XmlDocument against an XSD. If there is any misunderstanding,
please feel free to let me know.

Based on my experience, it's very hard to achieve this. As the validater
only returns the line number and line position of the invalidate node, we
have to write our own code to map the file position to Xml node. However,
the errors reported from the validater are only referring to some key
nodes, it might have something to do with other nodes. So removing a single
node might not make the document validate.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #5
Hi Matthew,

Generally, I think we have to write code that can find the invalid node in
the XmlDocument according to the line and position first. Then remove this
node. After removing all the nodes in the list, validate the XmlDocument
again. We can do this again and again, until no errors was found. This is
just my suggestion. Let's see if there is any other community member who
has better advices.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #6
"write code that can find the invalid node in the XmlDocument according
to the line and position first" - can you help with this part?
thanks,
-Matthew

Kevin Yu [MSFT] wrote:
Hi Matthew,

Generally, I think we have to write code that can find the invalid node in
the XmlDocument according to the line and position first. Then remove this
node. After removing all the nodes in the list, validate the XmlDocument
again. We can do this again and again, until no errors was found. This is
just my suggestion. Let's see if there is any other community member who
has better advices.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #7
Have you tried my approach? Having the reader variable at the class level,
and skipping invalid nodes using Skip() method?

--
Daniel Cazzulino [MVP XML]
Clarius Consulting SA
http://weblogs.asp.net/cazzu
http://aspnet2.com
"Matthew Wieder" <De*********@SatoriGroupInc.com> wrote in message
news:OB*************@TK2MSFTNGP11.phx.gbl...
You have the issue correct. As you say:
"As the validater only returns the line number and line position of the
invalidate node, we have to write our own code to map the file position
to Xml node."
This is the code I qould like to write. Can you advise how I would do this? thanks!

Kevin Yu [MSFT] wrote:
Thanks for Daniel's quick response.

Hi Matthew,

First of all, I would like to confirm my understanding of your issue. From your description, I understand that you need to remove the invalid nodes
from an XmlDocument against an XSD. If there is any misunderstanding,
please feel free to let me know.

Based on my experience, it's very hard to achieve this. As the validater
only returns the line number and line position of the invalidate node, we have to write our own code to map the file position to Xml node. However, the errors reported from the validater are only referring to some key
nodes, it might have something to do with other nodes. So removing a single node might not make the document validate.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.665 / Virus Database: 428 - Release Date: 21/04/2004
Nov 12 '05 #8
Unless I misunderstood your post, all that will allow me to do is to
accumulate a list of the errors. The problem is that the
XmlValidatingReader exception just gives a line number and I don't see
how to translate that into a node. If I misunderstood or you have a
solution, please let me know.
thanks!

Daniel Cazzulino [MVP XML] wrote:
Have you tried my approach? Having the reader variable at the class level,
and skipping invalid nodes using Skip() method?


Nov 12 '05 #9
Here's my "solution":
public class MyLoader
{
XmlValidatingReader _reader;

public XPathDocument LoadFilteredDocument(Stream theDoc)
{
_reader = new XmlValidatingReader(new XmlTextReader(theDoc));
// Add your schemas
_reader.ValidationErrorHandler += new
ValidationErrorHandler(OnValidate);
return new XPathDocument(_reader);
}

private void OnValidate(object sender, ValidationEventArgs e)
{
// Just skip the failing node.
_reader.Skip();
}
}

HTH,

--
Daniel Cazzulino [MVP XML]
Clarius Consulting SA
http://weblogs.asp.net/cazzu
http://aspnet2.com
"Matthew Wieder" <De*********@SatoriGroupInc.com> wrote in message
news:eB*************@tk2msftngp13.phx.gbl...
Unless I misunderstood your post, all that will allow me to do is to
accumulate a list of the errors. The problem is that the
XmlValidatingReader exception just gives a line number and I don't see
how to translate that into a node. If I misunderstood or you have a
solution, please let me know.
thanks!

Daniel Cazzulino [MVP XML] wrote:
Have you tried my approach? Having the reader variable at the class level, and skipping invalid nodes using Skip() method?

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.665 / Virus Database: 428 - Release Date: 22/04/2004
Nov 12 '05 #10
Hi Matthew,

As far as I can think, is that we go through each line in the Xml file
before the position of the validation error occurs. During this, we check
how many tags we have passed and finally find the node that causes the
error. This is quite complicated and I can just provide a general idea.

It seems that Daniel has provided us with an example. I think his way is
better than mine. HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #11
Matthew Wieder wrote:
"write code that can find the invalid node in the XmlDocument according
to the line and position first" - can you help with this part?


Take a look at "Extending the DOM"
http://msdn.microsoft.com/library/en...asp?frame=true,

it shows how to extend XmlDocument to support IXmlLineInfo interface.

--
Oleg Tkachenko [XML MVP, XmlInsider]
http://blog.tkachenko.com
Nov 12 '05 #12
Thanks - I understand now. The drawback to this route is that it
doesn't allow me to display a list of the validation erros to the user
and have the user tell me which ones to fix - I must remove them as I
find them. I think that the user may want us to take care of certain
validation errors, but they may want to manually fix other errors
themselves or they may not want to fix certain errors at all. Is there
some way to maintain a list of the validation errors and then iterate
through the list, fixing as we go?
thanks,
-Matthew
Daniel Cazzulino [MVP XML] wrote:
Here's my "solution":
public class MyLoader
{
XmlValidatingReader _reader;

public XPathDocument LoadFilteredDocument(Stream theDoc)
{
_reader = new XmlValidatingReader(new XmlTextReader(theDoc));
// Add your schemas
_reader.ValidationErrorHandler += new
ValidationErrorHandler(OnValidate);
return new XPathDocument(_reader);
}

private void OnValidate(object sender, ValidationEventArgs e)
{
// Just skip the failing node.
_reader.Skip();
}
}

HTH,


Nov 12 '05 #13
Thanks, that was very helpful and similar to what I need. I believe the
best way to proceed is to run the valuator, and maintain a list of the
bad element names, with their line numbers. Then,, I would get the
elements matching that name from the document and iterate through until
I find the one with the matching line number. Once I find it, I can do
my repair work.
thanks!

Oleg Tkachenko [MVP] wrote:
Matthew Wieder wrote:
"write code that can find the invalid node in the XmlDocument
according to the line and position first" - can you help with this part?

Take a look at "Extending the DOM"
http://msdn.microsoft.com/library/en...asp?frame=true,

it shows how to extend XmlDocument to support IXmlLineInfo interface.


Nov 12 '05 #14
Hi Matthew,

I'd like to know if this issue has been resolved yet. Is there anything
that I can help. I'm still monitoring on it. If you have any questions,
please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #15
Using the implementation of the LineInfo XMLDocument class (thanks
Oleg!), I have a process which captures the erros in an array, then goes
back using the LineInfoDocument and compares the line and position info
of each node to the ones in my array. For some reason, the line and
position information is not aligned properly, so that an error in an
element which the XMLValidatingReader gives as Line Numebr100 and Line
Position 100, matches to a node in the LineInfoDocument as Line Number
99 and Line Position 99. The LineInfo implementation is available here:
http://msdn.microsoft.com/library/en...asp?frame=true

thanks!
Kevin Yu [MSFT] wrote:
Hi Matthew,

I'd like to know if this issue has been resolved yet. Is there anything
that I can help. I'm still monitoring on it. If you have any questions,
please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #16
Hi Mattew,

If you need to maintaina list of validation errors, I think we can go
throught the whole document twice. The first time, we get the list of
errors and their positions. Then the second time we let the user choose
which one to fix. We can maintain the errors by order so that each error
will be find at correct positions.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #17
Not bad, but what about the following scenario:
The XML document has in it an element of type X which, according to the
schema, must contain a type Y element, but does not. Let's say, instead
of removing the X element and thereby losing all the information
cotnained in it, all we need to do is add an empty element of type Y.
I understand this is a little different from the initial problem I
proposed, but I'm trying to cover all the scenarios as they come up.
How would we handle that?
thanks!
Kevin Yu [MSFT] wrote:
Hi Mattew,

If you need to maintaina list of validation errors, I think we can go
throught the whole document twice. The first time, we get the list of
errors and their positions. Then the second time we let the user choose
which one to fix. We can maintain the errors by order so that each error
will be find at correct positions.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #18
I have the following situation:
I loop through using the XmlValidationReader to get a list of all the
validation errors (line and position number). I then loop through the
XML document with the XmlTextReader class, and keep callign reader.Read
until the reader.LineNumber equals the line number of the first error in
the list. I then execute:
XmlNode node = xmlLIDoc.ReadNode(reader);
node.ParentNode.RemoveChild(node);
(where xmlLIDoc is in instance of the XmlDocument class implemented with
the LineINfoINterface)
but the value of node.ParentNode is undefined. It appears that the node
got "removed" from the XML document and hence is orphaned. How can I
get ahold of the node in the XmlDocument so I can delete it?
thanks!

Matthew Wieder wrote:
Using the implementation of the LineInfo XMLDocument class (thanks
Oleg!), I have a process which captures the erros in an array, then goes
back using the LineInfoDocument and compares the line and position info
of each node to the ones in my array. For some reason, the line and
position information is not aligned properly, so that an error in an
element which the XMLValidatingReader gives as Line Numebr100 and Line
Position 100, matches to a node in the LineInfoDocument as Line Number
99 and Line Position 99. The LineInfo implementation is available here:
http://msdn.microsoft.com/library/en...asp?frame=true
thanks!
Kevin Yu [MSFT] wrote:
Hi Matthew,

I'd like to know if this issue has been resolved yet. Is there
anything that I can help. I'm still monitoring on it. If you have any
questions, please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #19
Hi Matthew,

Generally, if a node's ParentNode property is null, it means that the node
hasn't been added to the DOM tree or the node is the root node which
doesn't have a parent. So please try to check if this node has been removed
yet from the tree.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Alex Martelli | last post by:
All my mailboxes have been filling up with files of about 130k to 150k, no doubt copies of some immensely popular virus. So, I've no doubt lost lots of real mail because of "mailbox full"...
5
by: F. GEIGER | last post by:
Hi all! As I saw Alex Martelli's post about a mbox cleaner based on POP3, I thought, it could be possible to do that based on IMAP too. That way I could ask the server for mails having attached...
16
by: Michael Ellis | last post by:
Hi, I have some data files with lines in space-delimited <name> <value> format. There are multiple name-value pairs per line. Is there a cleaner idiom than the following for reading each line...
13
by: Steven Scaife | last post by:
I have decided to re-write the intranet site i created over a year ago. The coding is pretty awful and hard to read cos I made the mistake of not putting comments in or putting crappy comments in...
20
by: Al Moritz | last post by:
Hi all, I was always told that the conversion of Word files to HTML as done by Word itself sucks - you get a lot of unnecessary code that can influence the design on web browsers other than...
0
by: Geoff | last post by:
I am looking for a Win 2003 version of the Microsoft Storage Device Registry Cleaner Scrubber Tool. I have the one for Windows 2000 but need one that works for 2003. This is to clean the registry...
1
by: heren | last post by:
http://www.htmlcodecleaner.com-http.com/
1
by: fif3336 | last post by:
Hi everyone, I'm forwarding here a 100 euro bounty hunt I've started at drupal.org. It's about making a "word-html cleaner" module for the open source cms Drupal. All informations, including my...
2
Chrisjc
by: Chrisjc | last post by:
Hey guys I am in need of a registry cleaner.. my sister got a nasty Virus... and I cant seem to get it out of the ADD/REMOVE PRO prompt... let alone I am not 100% sure I got it... I put a scanner...
0
by: myprotein | last post by:
if "all applications disconnection" is anathor tigger of page cleaner? hi all I discovered that when all application disconnect from the database, db2 will "truncate" current active log file...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.