I'm dealing with XML files in which there are lots of tags of the
following form: <a><b>x</b><c>y</c></a> (all of these letters are being
used as 'metalinguistic variables') Not all of the tags in the file are
of that form, but that's the only type of tag I'm interested in. (For
the insatiably curious, I'm talking about a conversation log from MSN
Messenger.) What I need to do is to pull out all the x's and y's in a
form I can use. In other words, from...
..
..
<a><b>x1</b><c>y1</c></a>
..
..
<a><b>x2</b><c>y2</c></a>
..
..
<a><b>x3</b><c>y3</c></a>
..
..
....I would like to produce, for example,...
[ (x1,y1), (x2,y2), (x3,y3) ]
Now, I'm aware that there are extensive libraries for dealing with
marked-up text, but here's the thing: I think I have a reasonable
understanding of python, but I use it in a lisplike way, and in
particular I only know the rudiments of how classes work. So here's
what I'm asking for:
Can anybody give me a rough idea how to come to grips with the problem
described above? Or even (dare to dream) example code? Any help will be
very much appreciated.
Peace,
STM 3 1246
Sean McIlroy wrote: I'm dealing with XML files in which there are lots of tags of the following form: <a><b>x</b><c>y</c></a> (all of these letters are being used as 'metalinguistic variables') Not all of the tags in the file are of that form, but that's the only type of tag I'm interested in. (For the insatiably curious, I'm talking about a conversation log from MSN Messenger.) What I need to do is to pull out all the x's and y's in a form I can use. In other words, from... . <a><b>x1</b><c>y1</c></a> . <a><b>x2</b><c>y2</c></a> . <a><b>x3</b><c>y3</c></a> . ...I would like to produce, for example,...
[ (x1,y1), (x2,y2), (x3,y3) ]
how about :
from elementtree import ElementTree
TEXT = """\
<doc>
<a><b>x1</b><c>y1</c></a>
<a><b>x2</b><c>y2</c></a>
<a><b>x3</b><c>y3</c></a>
</doc>
"""
tree = ElementTree.XML (TEXT)
data = []
for elem in tree.findall(".//a"):
data.append((el em.findtext("b" ), elem.findtext(" c")))
print data
=> [('x1', 'y1'), ('x2', 'y2'), ('x3', 'y3')]
more here: http://effbot.org/zone/element-index.htm
</F>
Exactly what I was looking for. Thanks.
On Sat, 2005-03-19 at 00:14 -0800, Sean McIlroy wrote: I'm dealing with XML files in which there are lots of tags of the following form: <a><b>x</b><c>y</c></a> (all of these letters are being used as 'metalinguistic variables') Not all of the tags in the file are of that form, but that's the only type of tag I'm interested in. (For the insatiably curious, I'm talking about a conversation log from MSN Messenger.) What I need to do is to pull out all the x's and y's in a form I can use. In other words, from...
. . <a><b>x1</b><c>y1</c></a> . . <a><b>x2</b><c>y2</c></a> . . <a><b>x3</b><c>y3</c></a> . .
...I would like to produce, for example,...
[ (x1,y1), (x2,y2), (x3,y3) ]
Now, I'm aware that there are extensive libraries for dealing with marked-up text, but here's the thing: I think I have a reasonable understanding of python, but I use it in a lisplike way, and in particular I only know the rudiments of how classes work. So here's what I'm asking for:
Can anybody give me a rough idea how to come to grips with the problem described above? Or even (dare to dream) example code? Any help will be very much appreciated.
There are many tools you can use to get this done in Python. Here's a
recipe using Amara ( http://www.xml.com/pub/a/2005/01/19/amara.html )
DOC = """\
<matrix>
<a><b>x1</b><c>y1</c></a>
<a><b>x2</b><c>y2</c></a>
<a><b>x3</b><c>y3</c></a>
</matrix>
"""
from amara import binderytools
matrix = []
for row in binderytools.pu shbind(u'a', string=DOC):
matrix.append(( unicode(row.b), unicode(row.c)) )
print matrix
Which outputs:
[(u'x1', u'y1'), (u'x2', u'y2'), (u'x3', u'y3')]
If your matrix actually has a variable or previously unknown number of
columns (e.g. <a><b>x1</b><c>y1</c><d>z1</d></a> ), the following
version of the for loop is a more general solution:
for row in binderytools.pu shbind(u'a', string=DOC):
matrix.append(t uple([ unicode(e) for e in row.xml_xpath(u '*') ]))
Same output, of course. I even tested it for you in Amara 0.9.4. And
what the heck, while I was there, I added it to the demos.
You can make things even more obfuscated^H^H^ H^H^H^H^H^H^H^H terse using
further lambda or list comp tricks, but I leave that as an exercise for
the perverse ;-)
--
Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.ht
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xm...x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xm...-tiplook2.html This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Dirk Rudolf |
last post by:
I like to announce you the product X2U, avaible under
http://www.lumrix.net/x2.
X2U is an acronym for "XML to user". Existing XML editors still ignore
the fact that users don't want to read XML markup. Our view is: Not
users have to align to XML, but XML has to align to users.
X2U allows users tp fill out simple web forms, which are automatically
derived from XML models. The XML model provides the XML structure and
an interface...
|
by: Pomax |
last post by:
Does anyone know of a good standoff markup tool that isn't Gate?
I've tried to work with gate but it's very obviously not meant as a
commercial or intuitive product, so I'm looking for programs that
simply let me create standoff XML annotions with a reasonable
functionality such as shifting the markup integrally through the data
file, show/hide markup code, and uses a milestone/syncpoint system so
that editing the data outside the...
|
by: Jukka K. Korpela |
last post by:
As well all know, valid markup is important... but when trying to find
a convincing modern argument in favor of this, I found pages like
http://www.htmlhelp.com/tools/validator/reasons.html
which was very nice when Netscape ruled the World Wide Web, and
http://valet.htmlhelp.com/page/why.html
which is very suggestive but does not really give any factual example.
Is there something to be _shown_ to people who ask "why validate?"?
A page...
|
by: r.shimmin |
last post by:
There exist a number of related informal markup languages whose design
philosophy is to use terse, easily human-entered and human-read tags,
that are intended to be converted by software into some flavour of SGML
or XML. The markup languages used for editing on many wikis are the
most prominent examples.
Can anyone tell me whether there exists a tool that would allow me to
define such a language, and the xml elements that the "informal"...
|
by: Trebek |
last post by:
Hello grp:
I have a situation I was hoping someone might be able to suggest a solution.
I am retrieving html from a url and storing this information in Sql Server.
Our web service supplies this data to our clients via a web service that is
a client of the ws and to integration clients as xml data (HTML is encoded
in CDATA). We have an integration client who cannot accept html embedded in
the xml for whatever reason. Due to the large...
| |
by: Scott Abel |
last post by:
Tony Self of HyperWrite presents an interesting and informative article
entitled "Semantic, Structured Authoring: The Challenge for Technical
Writers" that is sure to be of use to many technical writers struggling
to find themselves in the changing world of structured XML authoring
and content management. Check it out and leave a comment to let us know
what you think.
http://www.thecontentwrangler.com
The Content Wrangler, Inc.
|
by: kowmudi |
last post by:
hi all,
I am working on data conversions and Im very new to the field of XML so would be very happy if i do get a helping hand on my work..
Iam converting PDF files to an XML file and for this purpose used many converting tools but none of them were giving me a good structured XML file as output so would like to know if there is anyone working or worked on this and have any sort of suggestion to my problem...
desparately waiitng for a...
|
by: Andreas Prilop |
last post by:
I have three test pages that are marked as Italian, Spanish,
Portuguese, resp. by
Content-Language: it
<html lang="it">
<body lang="it">
and the same for "es" and "pt".
Yahoo regards all three pages as Italian:
|
by: Daniele Perilli |
last post by:
Hi everybody,
I'd like to introduce you a new little tool I developed to
automatically check markup validation of all pages in given websites.
It uses W3C HTML Validator and CSS Validator online services in
recursive mode (without any max pages limit as other known services).
It's called Markup Validator and, of course, it's completely free.
I'd be delighted to have a feedback from you about it:
http://www.markupvalidator.com
Thanks.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |