By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,320 Members | 2,239 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,320 IT Pros & Developers. It's quick & easy.

Is RegEx a good choice for reading malformed xml?

P: n/a
I download xml logs from several servers every day and read the data out of
them using the XmlTextReader. But about 10% of them each day throw
exceptions because they are not well formed. I don't want to lose the data
in the files that won't load into an XmlDocument. So I was thinking maybe
using a RegEx function, sending a Node Name to the function and having it
return the InnerText.

Is this a good use for RegEx, or is there a better way to do what I want?
I'm not versed in RegEx either, so what would a RegEx expression look like
for this?

Thanks.
Jun 2 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Terry,
| Is this a good use for RegEx, or is there a better way to do what I want?
IMHO The "better" way, i.e. the *correct* way, would be to correct the
program that allegedly is writing Xml to *actually write* Xml, (have it use
a "parser" & write well formed Xml) then your program would not (should not)
have an issue reading valid Xml!

For details see "Item 29 - Always Use a Parser" in Elliotte Rusty Harold's
excellent book " Effective XML - 50 Specific Ways to Improve Your XML" from
Addison Wesley.

Although RegEx could possibly parse the mal formed Xml, what's to say the
source program is able to write enough bad Xml so that you regex could read
it.

Before using RegEx to parse out enough info to throw an exception, I would
consider using alternate Xml Parsers/readers, such as the SgmlReader from
Got Dot Net:

http://www.gotdotnet.com/Community/U...4-c3bd760564bc
Some RegEx resources:

Expresso:
http://www.ultrapico.com/Expresso.htm

RegEx Workbench:
http://www.gotdotnet.com/Community/U...1-4ee2729d7322

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/de...geElements.asp

Expresso & RegEx Workbench are helpful tools for learning regular
expressions & testing them.

I use the regular-expressions.info as a general regex reference, then fall
back to MSDN for the specifics. The above link is .NET 1.x; I don't have the
..NET 2.0 link handy; not sure if any thing changes in 2.0.

--
Hope this helps
Jay B. Harlow [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net
"Terry Olsen" <to******@hotmail.com> wrote in message
news:up**************@TK2MSFTNGP05.phx.gbl...
|I download xml logs from several servers every day and read the data out of
| them using the XmlTextReader. But about 10% of them each day throw
| exceptions because they are not well formed. I don't want to lose the data
| in the files that won't load into an XmlDocument. So I was thinking maybe
| using a RegEx function, sending a Node Name to the function and having it
| return the InnerText.
|
| Is this a good use for RegEx, or is there a better way to do what I want?
| I'm not versed in RegEx either, so what would a RegEx expression look like
| for this?
|
| Thanks.
|
|
Jun 9 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.