473,396 Members | 2,010 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Tidy; how to make it XML-conform? <BR> needs to be closed

Hi

I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed <BR>-tags. Which command do I
need (in my tidy config-file) to close it <BR/and make valid XML out
of it?
regards
Rag.

Oct 23 '06 #1
9 2326
In article <11*********************@e3g2000cwe.googlegroups.c om>,
Ragnar <r@gnar.dewrote:
>I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed <BR>-tags. Which command do I
need (in my tidy config-file) to close it <BR/and make valid XML out
of it?
Use the -asxml or -asxhtml flag.

-- Richard
Oct 23 '06 #2
* Ragnar wrote in comp.text.xml:
>I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed <BR>-tags. Which command do I
need (in my tidy config-file) to close it <BR/and make valid XML out
of it?
HTML Tidy is not designed to clean up arbitrary XML documents, so if by
"XML-file" you really mean some arbitrary XML document, then it might be
difficult to address your problem. If you mean "HTML" or "XHTML" instead
then use the output-* family of options, or the -asxml command line
option and ensure that you have not set the input-xml flag.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Oct 24 '06 #3
Thank your for your help. It is very important to get support because
I have to finish it today

my command line looks like: tidy -asxml -config config.txt old.xml

I get the same error like without using "-asxml"

Error: unexpected </referencein <BR>

That means it finds an unclosed <BR>-tag at node "reference".

To get rid of it I could use "no-xml" as input-format but then tidy
would transform my XML into a HTML-structure what is not wanted
Ragnar

Oct 24 '06 #4
Another question regarding Tidy:

I want to use the COM-Wrapper of Tidy. Now I have found this example:
I dont know why "Stat As Long" is used. I tried to work without "Stat"
but I cannot call objTidyDoc.MethodName directly
Dim objTidyDoc As TidyDocument
Set objTidyDoc = New TidyDocument
Stat = 0
Stat = objTidyDoc.LoadConfig(strTidyConfig)
Stat = objTidyDoc.ParseFile(strFilePath & strXmlFileName)
Stat = objTidyDoc.CleanAndRepair()
Stat = objTidyDoc.RunDiagnostics()
Stat = objTidyDoc.SaveFile(strFilePath & strXmlFileName)

Oct 24 '06 #5
Now I know how to use the COM-Wrapper but my main question is still
open

How can I transform this source-xml into valid xml without using the
workaround of getting an HTML-output? I dont want to have the HTML-tags
like <HEADand <BODYaround it

http://www.ticope.de/tmp/source.xml/download

help VERY appreciated, this task keeps me busy too long
Rag.

Oct 26 '06 #6
If your input isn't HTML, Tidy may not be able to help you, and nothing
else out there is likely to be able to read your mind and guess that you
intended <BRtags to autoterminate.

Since you know that *was* your intent, how about just doing a text-level
global replace of <BRwith <BR/>?
Oct 26 '06 #7

Joseph Kesselman schrieb:
Since you know that *was* your intent, how about just doing a text-level
global replace of <BRwith <BR/>?
Joseph,
that is a very nice idea

It could look like this (assuming <BRappears in node "reference"):
Set objDOMnode = objDom.selectSingleNode("//reference")
If Not objDOMnode Is Nothing Then
strReference = objDOMnode.Text
End If
strReference = Replace(strReference , "<BR>", "<BR/>", 1, -1,
vbTextCompare)

But I dont get a value in strReference which means that XML has to be
valid before working with XMLDOM. Am I right? I checked it by closing
<BR/manually, then I get a value for strReference

Oct 26 '06 #8
Ragnar wrote:
But I dont get a value in strReference which means that XML has to be
valid before working with XMLDOM.
XML has to be well-formed before using any XML tools. An unterminated
element, such as your <BR>, is not well-formed XML. Fix it first.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Oct 26 '06 #9

Ragnar wrote:
How can I transform this source-xml into valid xml without using the
workaround of getting an HTML-output?
Find some non-Tidy Tidy-like XML tool ? Maybe write one for your
specific task?

Tidy uses an approximation of an SGML parser and a tag-soup strainer to
take "approximate HTML", turn it into the best-guess internal
(DOM-like) model of the intended page, then serialise it accurately.
This relies on three things that you don't have available:

* SGML parsing (omitted tags can often be inferred cleanly)
* A known HTML DTD
* Fix-up code outside the SGML parser that has assumed HTML-soup
behaviours coded explicitly into it.

If your problem is "bad XML" that isn't even approximating HTML, then I
sympathise, but Tidy has three of its hands tied.

Why is your bad XML bad? What's the problem? Can you build some specifc
tool that fixes some specific problem? Even if it has to work with
simple text-file processing and can't support more than one encoding,
it might be enough.

I've done a lot of work with RSS which is only approximate XML at best
and often significantly invalid. Typically it includes HTML entity
references (eg &eacute; )that aren't part of XML. It's not too hard to
scan the whole document with a crude entity reference expander that can
map these (from a known list) onto the numeric form. I usually try to
XML parse them, then if this fails I check for the presence of such
entities, convert them and then attempt to re-parse.

Oct 27 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Spam sucks | last post by:
hello, i create a logging xml file with dom that could have an unknown count of results now it is 0 to 7 but it could be i have 14 or 50 results how can you read this out with xsl, with php you...
4
by: Grams F. | last post by:
It seems a <br> code in my XML-file is not accepted. This XML file is created using Movablestyle/Thingamablog Can anyone please analyse this HTML/XML-source, and tell me what is wrong? ...
32
by: Werner Partner | last post by:
I put this question already, but erhaps it "came under the wheels" because it was hidden in another thread. Nevertheless it's important for me to understand the problem and solve it. Old html...
2
by: Dnna | last post by:
I have a table which is bound to an Internet Explorer XML data island. I'm using ASP.NET's client-side validators for an input field in the table. The problem is that if the input fields are in...
1
by: Mark | last post by:
I'm sending an xml stream through an http connection to my webserver. Since some of the xml data will have the same characters as the 'xml characters'(i.e <,>, etc...), I need the xml to be encoded...
6
by: c676228 | last post by:
Hi everyone, I searched on the internet and didn't get exactly what I want. Do you have any? -- Betty
6
by: venmore | last post by:
Hi Can someone please point in the right direction. I have an XML file that gets updated every 4 hours on a web server. I can check the XML modification time in ASP and compare to the databse....
2
by: rockermommie | last post by:
so I have this code for my blogger layout and it says that something is not properly closed, I was wondering if someone could find where to fix this. Heres the code: <html> <head> ...
0
by: wingnut144 | last post by:
I have the following XML file: <?xml version="1.0" encoding="utf-8"?> <class> <sdate>9/10/07</sdate> <special>1,2</special> <inst>Max Callao</inst> <sdate>10/22/07</sdate>...
10
by: =?Utf-8?B?YzY3NjIyOA==?= | last post by:
Hi all, I had a program and it always works fine and suddenly it gives me the following message when a pass a xml file to our server program: error code: -1072896680 reason: XML document must...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.