Hi,
2 issues left with my tidy-work:
1) Tidy transforms a "&" in the source-xml into a "&" in the tidied
version. My XML-Importer cannot handle it
2) in a long <title>-string a wrap is produced like:
<title>my very long title blab la blab la
Blabla bla </title>
Importer also has got problems with it
My tidy.bat
tidy.exe --output-xhtml yes --show-body-only yes --new-blocklevel-tags
component,bblocation,title2,short_intro,long_intro ,date,reference,category,image_small,image_medium, image_large,body2,external_link_text1,external_lin k_url1
--indent auto --write-back yes %1
regards
Ragnar 13 2757
Ragnar wrote:
1) Tidy transforms a "&" in the source-xml into a "&" in the tidied
version.
Hold it a moment -- if your source is XML, why are you going through Tidy?
Having said that, this shouldn't happen in XHTML output mode. Contact
Tidy's authors, and/or show us a failing example so we can crosscheck
this and make sure
2) in a long <title>-string a wrap is produced like:
<title>my very long title blab la blab la
Blabla bla </title>
Importer also has got problems with it
Turn off auto-indent.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
On Sat, 04 Nov 2006 10:17:58 -0500, Joe Kesselman
<ke************@comcast.netwrote:
>Hold it a moment -- if your source is XML, why are you going through Tidy?
Is there a better way to check the well-formedness of a xml-file than
tidy -xml ?
-Timo
Timo Harmo wrote:
Is there a better way to check the well-formedness of a xml-file than
tidy -xml ?
Tidy is not primarily an XML tool. It's a tool for repairing
sloppily-written HTML and XHTML.
To check well-formedness of XML, feed it to any proper XML parser. If
the parser doesn't accept it, the XML is not well-formed.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
You never answered my question: If this is already XML, why are you
putting it through Tidy in the first place?
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Ragnar wrote:
Here is my file: http://www.ticope.de/tmp/source.xml
Not well formed, so it isn't XML, despite the file name. First obvious
error is that someone failed to put quotes around the value of the lang
attribute. I'd recommend you fix this where it originates, rather than
trying to patch it later by running it through Tidy, especially since
you say Tidy's doing things you don't expect.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Tried running the most recent copy of Tidy against your input file,
using your batchfile. It is *NOT* damaging the &. Either you're
confusing yourself badly (for example, looking at the text in an XML
tool, which of course will see & as the & character since that's
what & represents), or you're running a damaged copy of Tidy and
need to upgrade.
I'll bet on the former.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Oh, forgot to say: The only thing I did differently was that I named the
input file test.html.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
I may also have accidentally dropped the "--write-back yes".
Still, this does suggest that Tidy isn't your problem.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Joe Kesselman schrieb:
Tried running the most recent copy of Tidy against your input file,
using your batchfile. It is *NOT* damaging the &. Either you're
confusing yourself badly (for example, looking at the text in an XML
tool, which of course will see & as the & character since that's
what & represents), or you're running a damaged copy of Tidy and
need to upgrade.
Hi Joe
thank you so for your work and help
Yes, you might be right. I was confused by the tool which has presented
& as &.
So you say I dont have wellformed xml and therefore I cannot use tidy.
The content was exported automatically from an older version of a CMS
and the rich-text-fields were not XHTML-compliant. But you are right- I
should focus more on exporting and trying to optimize the exporter
instead of the importer. Maybe it is just enough to run tidy there or
do a lot of string-manipulations (Replace) in the phase where the
content is exported using SOAP.
Ragnar
Ragnar wrote:
So you say I dont have wellformed xml and therefore I cannot use tidy.
Tidy's job is to (take an informed guess at how to) fix ill-formed HTML,
not ill-formed XML. And even there, it should be considered a stopgap,
used only because so few people (or tools!) produce officially correct HTML.
If you're working in XML, you should start by producing real XML. That
really shouldn't be hard to do.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Joe Kesselman wrote:
To check well-formedness of XML, feed it to any proper XML parser. If
the parser doesn't accept it, the XML is not well-formed.
What would you suggest if it _isn't_ well-formed XML? (dodgy use of
HTML entities being an obvious "fixable" problem that springs to mind)
It's not an uncommon problem to have to deal with cruddy XML like this.
I'd be interested to hear what other peoples' favourite tools for
helping with it are.
Andy Dingley wrote:
What would you suggest if it _isn't_ well-formed XML? (dodgy use of
HTML entities being an obvious "fixable" problem that springs to mind)
There really is no good way to repair a damaged document without deep
knowledge of exactly what the intended document structure was -- which
is why Tidy is such a complicated application; it needs to understand
HTML well enough to make intelligent guesses about what the author's
intent was.
The *best* you can hope to do is to sweep the problem under the carpet
and guess right most of the time.
So I would, very strongly, suggest fixing the problem at the source. If
it isn't well-formed XML, fix the tool that generated it.
--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Mike Gifford |
last post by:
Hello Folks,
I'm trying to get tidy set up on a pretty standard fedora core 1 server.
Looks pretty simple from here:
http://ca3.php.net/manual/en/ref.tidy.php
Should just have to go:
pear -v...
|
by: Börni |
last post by:
Hello,
I am using tidy to clean up an xml file, but my problem is that it
replaces newlines with normal whitespaces.
Still worse is that if i retrieve text from the xml file it contains all...
|
by: Stefan Weiss |
last post by:
Hi.
(this is somewhat similar to yesterday's thread about empty links)
I noticed that Tidy issues warnings whenever it encounters empty
tags, and strips those tags if cleanup was requested....
|
by: Christoph Schneegans |
last post by:
Hi!
I'd like to present an easy, yet powerful approach to use the HTML
Tidy program from <http://tidy.sourceforge.net/> in ASP.NET. It is
similiar to...
|
by: VK |
last post by:
Hi,
After the response on my request from W3C I'm still unclear about Tidy
vs. Validator discrepansies. That started with <IFRAME> issue, but
there is more as I know. Anyway, this very basic...
|
by: BG Mahesh |
last post by:
hi
I have installed Tidy on Fedore Core 4.0 using RPM. I have a very
simple script that uses tidy,
------------tidy.php------------------
<html>a html document</html>
<?
$html =...
|
by: Martin Odhelius |
last post by:
Hello,
Does anybody here have any example code for Tidy.Net
(http://sourceforge.net/projects/tidynet/) ?
I can't find one single example. I try to convert a html-string to well
formed xhtml,...
|
by: Simon Brooke |
last post by:
Consider this Java fragment, part of an application which takes crufty HTML
documents in MS Word and OO.o's excuses for HTML and produces a
standardised clean presentation in both HTML and PDF:
...
|
by: kempshall |
last post by:
Can somebody please tell me how to install the Tidy module for PHP 5
on a Mac? I tried what the php.net website said, which is running the
command "pecl install tidy" but the installation failed...
|
by: MeoLessi9 |
last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: marcoviolo |
last post by:
Dear all,
I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
| |