473,396 Members | 1,770 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

lxml question

Hi,

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

Greetings, Uwe
Sep 26 '08 #1
3 1319
On Sep 26, 11:19*am, Uwe Schmitt <rocksportroc...@googlemail.com>
wrote:
I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?
By "pretends to be XML" you mean XML-like but not really XML?
My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.
That's actually not a bad solution, if you know that the document is
otherwise well-formed. Another thing you can do is use libxml2's
"recover" mode which accommodates non-well-formed XML.

parser = etree.XMLParser(recover=True)
tree = etree.XML(your_xml_string, parser)

You'll still need to use your wrapper root element, because recover
mode will ignore everything after the first root closes (and it won't
throw an error).

-- Mark.
Sep 26 '08 #2
On Sep 27, 1:19*am, Uwe Schmitt <rocksportroc...@googlemail.com>
wrote:
I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
Another option is BeautifulSoup, which handles badly formed XML really
well:

http://www.crummy.com/software/BeautifulSoup/
Sep 27 '08 #3
Uwe Schmitt wrote:
I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.
Yes, you can do that. To avoid creating an intermediate string, you can use
the feed parser and do something like this:

parser = etree.XMLParser()
parser.feed("<root>")
parser.feed(your_xml_tag_sequence_data)
parser.feed("</root>")
root = parser.close()

Stefan
Oct 3 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Srijit Kumar Bhadra | last post by:
I am new to lxml. I am interested to know the equivalent code using lxml (http://cheeseshop.python.org/pypi/lxml/1.1alpha). The code is taken from http://effbot.org/zone/xml-writer.htm from...
5
by: Jan Dries | last post by:
I'm trying to find Windows binaries for lxml. The cheeseshop is supposed to have such binaries, but I can't find them. Does anyone know where I might find such binaries? Thanks, Jan
30
by: Chas Emerick | last post by:
I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API,...
1
by: Maxim Sloyko | last post by:
Hi All! I have a little problem with XML namespaces. In my application I have two XML processors, that process the same document, one after the other. The first one looks for nodes in 'ns1'...
0
by: Stefan Behnel | last post by:
Hi everyone, I'm very happy to announce the official release of lxml 2.0! http://codespeak.net/lxml/ http://pypi.python.org/pypi/lxml/2.0 ** What is lxml? """
7
by: Silfheed | last post by:
Heyas So first off I know that CDATA is generally hated and just shouldn't be done, but I'm simply required to parse it and spit it back out. Parsing is pretty easy with lxml, but it's the...
1
by: =?iso-8859-1?q?KLEIN_St=E9phane?= | last post by:
Hi, I'm on Ubuntu 8.04.1 I've installed lxml with easy_install lxml command. Now, when I load etree I've this error : $ python Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
0
by: =?iso-8859-1?q?KLEIN_St=E9phane?= | last post by:
Le Mon, 25 Aug 2008 13:50:50 +0000, KLEIN Stéphane a écrit : I've this bug only with lxml2, lxml 1.3.3 work very well. Regards, Stephane
1
by: Owen Zhang | last post by:
I am trying to build lxml package in SunOS 5.10. I got the following errors. Does anybody know why? $ python setup.py build Building lxml version 2.1. NOTE: Trying to build without Cython,...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.