473,408 Members | 1,908 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

Function or approach to test if XML is well-formed? Design advice?


I periodically receive a 5+ MB XML document that I hand-load into SQL Server
using SQLXML running under a DTS process.

Unfortunately, the document is human-created, and (very unfortunately) often
has invalid elements, which breaks the bulk load. I've been managing this
problem by loading the document into Visual Studio and using it to identify
the offending line numbers, and fixing it by hand. Given my truly minimal
coding skills when I designed the app way back when, this was the best
approach.

I'd like to automate this crummy work now.

Using ASP, is there a way to generically test whether a given snippet of XML
is well-formed?

I've mapped out a very crude approach that involves walking through the
document with ASP's regex matching abilities, assigning the contents between
<item> and </item> to a variable, testing that section for well-formedness,
parsing out the pieces of information I want, inserting these pieces to the
database, and looping as necessary to finish the job.

If there are better approaches that can accommodate the occassionally-broken
incoming XML, I'd love to hear suggestions. I do not have a formal CS
background and sometimes what should be obvious is not.

Thanks,
-KF
Jul 19 '05 #1
3 1678
FYI, I was using the colloquial meaning of "invalid" when I wrote
"...Unfortunately, the document is human-created, and (very unfortunately)
often
has invalid elements..."

What I meant was that sometimes a given section of XML is not well-formed.
Testing validation against an XML schema isn't the important thing here.

-KF

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:ce**********@gnus01.u.washington.edu...

I periodically receive a 5+ MB XML document that I hand-load into SQL Server using SQLXML running under a DTS process.

Unfortunately, the document is human-created, and (very unfortunately) often has invalid elements, which breaks the bulk load. I've been managing this
problem by loading the document into Visual Studio and using it to identify the offending line numbers, and fixing it by hand. Given my truly minimal
coding skills when I designed the app way back when, this was the best
approach.

I'd like to automate this crummy work now.

Using ASP, is there a way to generically test whether a given snippet of XML is well-formed?

I've mapped out a very crude approach that involves walking through the
document with ASP's regex matching abilities, assigning the contents between <item> and </item> to a variable, testing that section for well-formedness, parsing out the pieces of information I want, inserting these pieces to the database, and looping as necessary to finish the job.

If there are better approaches that can accommodate the occassionally-broken incoming XML, I'd love to hear suggestions. I do not have a formal CS
background and sometimes what should be obvious is not.

Thanks,
-KF

Jul 19 '05 #2
I believe I may have found the answer to my own question. You can employ the
MSXML/XMLDOM parser to test, as follows:

<%@LANGUAGE="VBSCRIPT" CODEPAGE="1252"%>
<% Dim mydoc,strXML
Set mydoc=Server.CreateObject("Microsoft.XMLDOM")

strXML="<book><author>author1</author><title>title1</title></book>"
mydoc.loadXML(strXML)

if mydoc.parseError.errorcode<>0 then
response.write "failure"
'error handling code, jump out of script
else
response.write "success"
' proceed with DB insert, etc...
end if

%>

4guys has a good article about all this:
http://www.4guysfromrolla.com/webtech/101200-1.shtml

Still curious if there's a better design approach for this. I'm wondering if
splitting the valid XML nodes into an array would be more efficient than
looping, but I don't know if it's possible to do the tests for
well-formedness if I try to generate an array.

-KF

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:ce**********@gnus01.u.washington.edu...
FYI, I was using the colloquial meaning of "invalid" when I wrote
"...Unfortunately, the document is human-created, and (very unfortunately)
often
has invalid elements..."

What I meant was that sometimes a given section of XML is not well-formed.
Testing validation against an XML schema isn't the important thing here.

-KF

"Ken Fine" <ke*****@u.washington.edu> wrote in message
news:ce**********@gnus01.u.washington.edu...

I periodically receive a 5+ MB XML document that I hand-load into SQL

Server
using SQLXML running under a DTS process.

Unfortunately, the document is human-created, and (very unfortunately)

often
has invalid elements, which breaks the bulk load. I've been managing this problem by loading the document into Visual Studio and using it to

identify
the offending line numbers, and fixing it by hand. Given my truly minimal coding skills when I designed the app way back when, this was the best
approach.

I'd like to automate this crummy work now.

Using ASP, is there a way to generically test whether a given snippet of

XML
is well-formed?

I've mapped out a very crude approach that involves walking through the
document with ASP's regex matching abilities, assigning the contents

between
<item> and </item> to a variable, testing that section for

well-formedness,
parsing out the pieces of information I want, inserting these pieces to

the
database, and looping as necessary to finish the job.

If there are better approaches that can accommodate the

occassionally-broken
incoming XML, I'd love to hear suggestions. I do not have a formal CS
background and sometimes what should be obvious is not.

Thanks,
-KF


Jul 19 '05 #3
Ken,

The loadXML method is actually a function which will return true or false to
indicate whether the it did load or not. Failure to load is usually an
indication of it being mal-formed. The parseError object will then tell more
about the failure...

If objXMLDoc.parseError.errorCode <> 0 Then
MsgBox("Parse Error line " & objXMLDoc.parseError.line & ", character " & _
objXMLDoc.parseError.linePos & vbCrLf & objXMLDoc.parseError.srcText)
End If

Brian

Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Edvard Majakari | last post by:
Hi all ya unit-testing experts there :) Code I'm working on has to parse large and complex files and detect equally complex and large amount of errors before the contents of the file is fed to...
2
by: Ivan | last post by:
How (at compile time) can one determine whether some class implements a particular member function? Name only is sufficient, full signature match nice but not required. The application is a...
8
by: Bill Ehrreich | last post by:
I'm faced with a situation where I will need to calculate a column for a resultset by calling a component written as a VB6 DLL, passing parameters from the resultset to the component and setting...
5
by: J Lake | last post by:
I am working on a simple orderform script to keep a running total, however I am encountering some errors. function CalculateTotal() { var order_total = 0 // Run through all the form fields...
15
by: Matt Kruse | last post by:
I am far from a PHP expert, and I've been struggling to create a function which will take a javascript .js file and "compact" it as much as possible. Meaning, remove all comments and unnecessary...
2
by: Tony Liu | last post by:
Hi, I want to get the name of the calling function of an executing function, I use the StackTrace class to do this and it seems working. However, does anyone think that there any side effect...
22
by: Steve - DND | last post by:
We're currently doing some tests to determine the performance of static vs non-static functions, and we're coming up with some odd(in our opinion) results. We used a very simple setup. One class...
13
by: hazz | last post by:
I need to get a set of comma delimited client ids from a config file to test against while running through my main processing loop. If a given clientid matches any of the ids in the list, I don't...
18
by: Greg Scharlemann | last post by:
I'm having an issue passing a boolean to the constructor of an object. For some reason, if I pass false into the constructor, it doesn't register. If I pass an integer 0, it does. PHP 5 Code...
21
by: H9XLrv5oXVNvHiUI | last post by:
Hi, I have a question about injecting friend functions within template classes. My question is specific to gcc (version 3.4.5) used in combination with mingw because this code (or at least code...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.