473,806 Members | 2,787 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Is RegEx a good choice for reading malformed xml?

I download xml logs from several servers every day and read the data out of
them using the XmlTextReader. But about 10% of them each day throw
exceptions because they are not well formed. I don't want to lose the data
in the files that won't load into an XmlDocument. So I was thinking maybe
using a RegEx function, sending a Node Name to the function and having it
return the InnerText.

Is this a good use for RegEx, or is there a better way to do what I want?
I'm not versed in RegEx either, so what would a RegEx expression look like
for this?

Thanks.
Jun 2 '06 #1
1 1555
Terry,
| Is this a good use for RegEx, or is there a better way to do what I want?
IMHO The "better" way, i.e. the *correct* way, would be to correct the
program that allegedly is writing Xml to *actually write* Xml, (have it use
a "parser" & write well formed Xml) then your program would not (should not)
have an issue reading valid Xml!

For details see "Item 29 - Always Use a Parser" in Elliotte Rusty Harold's
excellent book " Effective XML - 50 Specific Ways to Improve Your XML" from
Addison Wesley.

Although RegEx could possibly parse the mal formed Xml, what's to say the
source program is able to write enough bad Xml so that you regex could read
it.

Before using RegEx to parse out enough info to throw an exception, I would
consider using alternate Xml Parsers/readers, such as the SgmlReader from
Got Dot Net:

http://www.gotdotnet.com/Community/U...4-c3bd760564bc
Some RegEx resources:

Expresso:
http://www.ultrapico.com/Expresso.htm

RegEx Workbench:
http://www.gotdotnet.com/Community/U...1-4ee2729d7322

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/de...geElements.asp

Expresso & RegEx Workbench are helpful tools for learning regular
expressions & testing them.

I use the regular-expressions.inf o as a general regex reference, then fall
back to MSDN for the specifics. The above link is .NET 1.x; I don't have the
..NET 2.0 link handy; not sure if any thing changes in 2.0.

--
Hope this helps
Jay B. Harlow [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net
"Terry Olsen" <to******@hotma il.com> wrote in message
news:up******** ******@TK2MSFTN GP05.phx.gbl...
|I download xml logs from several servers every day and read the data out of
| them using the XmlTextReader. But about 10% of them each day throw
| exceptions because they are not well formed. I don't want to lose the data
| in the files that won't load into an XmlDocument. So I was thinking maybe
| using a RegEx function, sending a Node Name to the function and having it
| return the InnerText.
|
| Is this a good use for RegEx, or is there a better way to do what I want?
| I'm not versed in RegEx either, so what would a RegEx expression look like
| for this?
|
| Thanks.
|
|
Jun 9 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
794
by: chris | last post by:
i can see the power of regular expressions but am having a bit of a battle getting my head around them. can anyone recommend some BASIC - tutorials for using regex something like th idots guide :) or even total idiots guide :) somewhere that has some simple examples would be good toooo
14
2535
by: catorcio | last post by:
I'm trying to have some text in my page changed by clicking a button. Googleing around I've discovered that innerText doesn't work with every browser, so I've switched to innerHTML. It works fine on IE and Opera, but nothing happens on Firefox (just updated to version 1.0.4). Any suggestions? Thanks in advance! C.
33
5655
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if p.pattern.groups > 100: raise AssertionError( "sorry, but this version only supports 100 named groups"
16
2168
by: Andrew Baker | last post by:
I am trying to write a function which provides my users with a file filter. The filter used to work just using the VB "Like" comparision, but I can't find the equivilant in C#. I looked at RegEx.IsMatch but it behaves quite differently. Is there a way I can mimic the DOS filtering of filenames (eg. "*.*" or "*" returns all files, "*.xls" returns all excel files, "workbook*" returns all files begining with "workbook" etc)? thanks in...
4
1980
by: EfraimT | last post by:
Hello all! Thank you for spending your time reading my problem. I recive a daily mail with a file attached to it, the file is an HTML file containing a table of currencey values per one USA Dollar, I need to write a small Window(DescTop)- application, which will read a value from that table and will update my DB With that value. the page is already in my inbox not on the web, So no download is needed.
17
3984
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher http://forta.com/books/0672325667/
0
3216
by: Daniel Sélen Secches | last post by:
I found a good class to do a simple FTP. Very good.... I'm posting it with the message, i hope it helps someone ============================================================== Imports System.Net
16
2256
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string) pattern search functions. Where performance is an issue you can alway write your own specialized routine of course. However, for the occasional pattern search where performance isn't an issue, would most seasoned .NET developers rely on "Regex" and...
2
2820
by: GS | last post by:
How can one avoid capturing leading empty or blank lines? the data I deal with look like this "will be paid on the dates you specified. xyz supplier amount: $100.52 when: September 07, 2007 reference #: 0415 from: operating account
0
9719
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9599
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10624
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10374
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10111
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7650
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5546
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5684
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3853
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.