473,785 Members | 2,506 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to use structured markup tools


I'm dealing with XML files in which there are lots of tags of the
following form: <a><b>x</b><c>y</c></a> (all of these letters are being
used as 'metalinguistic variables') Not all of the tags in the file are
of that form, but that's the only type of tag I'm interested in. (For
the insatiably curious, I'm talking about a conversation log from MSN
Messenger.) What I need to do is to pull out all the x's and y's in a
form I can use. In other words, from...

..
..
<a><b>x1</b><c>y1</c></a>
..
..
<a><b>x2</b><c>y2</c></a>
..
..
<a><b>x3</b><c>y3</c></a>
..
..

....I would like to produce, for example,...

[ (x1,y1), (x2,y2), (x3,y3) ]

Now, I'm aware that there are extensive libraries for dealing with
marked-up text, but here's the thing: I think I have a reasonable
understanding of python, but I use it in a lisplike way, and in
particular I only know the rudiments of how classes work. So here's
what I'm asking for:

Can anybody give me a rough idea how to come to grips with the problem
described above? Or even (dare to dream) example code? Any help will be
very much appreciated.

Peace,
STM

Jul 18 '05 #1
3 1246
Sean McIlroy wrote:
I'm dealing with XML files in which there are lots of tags of the
following form: <a><b>x</b><c>y</c></a> (all of these letters are being
used as 'metalinguistic variables') Not all of the tags in the file are
of that form, but that's the only type of tag I'm interested in. (For
the insatiably curious, I'm talking about a conversation log from MSN
Messenger.) What I need to do is to pull out all the x's and y's in a
form I can use. In other words, from...
.
<a><b>x1</b><c>y1</c></a>
.
<a><b>x2</b><c>y2</c></a>
.
<a><b>x3</b><c>y3</c></a>
.
...I would like to produce, for example,...

[ (x1,y1), (x2,y2), (x3,y3) ]


how about:

from elementtree import ElementTree

TEXT = """\
<doc>
<a><b>x1</b><c>y1</c></a>
<a><b>x2</b><c>y2</c></a>
<a><b>x3</b><c>y3</c></a>
</doc>
"""

tree = ElementTree.XML (TEXT)

data = []

for elem in tree.findall(".//a"):
data.append((el em.findtext("b" ), elem.findtext(" c")))

print data

=> [('x1', 'y1'), ('x2', 'y2'), ('x3', 'y3')]

more here:

http://effbot.org/zone/element-index.htm

</F>

Jul 18 '05 #2
Exactly what I was looking for. Thanks.

Jul 18 '05 #3
On Sat, 2005-03-19 at 00:14 -0800, Sean McIlroy wrote:
I'm dealing with XML files in which there are lots of tags of the
following form: <a><b>x</b><c>y</c></a> (all of these letters are being
used as 'metalinguistic variables') Not all of the tags in the file are
of that form, but that's the only type of tag I'm interested in. (For
the insatiably curious, I'm talking about a conversation log from MSN
Messenger.) What I need to do is to pull out all the x's and y's in a
form I can use. In other words, from...

.
.
<a><b>x1</b><c>y1</c></a>
.
.
<a><b>x2</b><c>y2</c></a>
.
.
<a><b>x3</b><c>y3</c></a>
.
.

...I would like to produce, for example,...

[ (x1,y1), (x2,y2), (x3,y3) ]

Now, I'm aware that there are extensive libraries for dealing with
marked-up text, but here's the thing: I think I have a reasonable
understanding of python, but I use it in a lisplike way, and in
particular I only know the rudiments of how classes work. So here's
what I'm asking for:

Can anybody give me a rough idea how to come to grips with the problem
described above? Or even (dare to dream) example code? Any help will be
very much appreciated.


There are many tools you can use to get this done in Python. Here's a
recipe using Amara ( http://www.xml.com/pub/a/2005/01/19/amara.html )

DOC = """\
<matrix>
<a><b>x1</b><c>y1</c></a>
<a><b>x2</b><c>y2</c></a>
<a><b>x3</b><c>y3</c></a>
</matrix>
"""

from amara import binderytools

matrix = []
for row in binderytools.pu shbind(u'a', string=DOC):
matrix.append(( unicode(row.b), unicode(row.c)) )

print matrix

Which outputs:

[(u'x1', u'y1'), (u'x2', u'y2'), (u'x3', u'y3')]

If your matrix actually has a variable or previously unknown number of
columns (e.g. <a><b>x1</b><c>y1</c><d>z1</d></a> ), the following
version of the for loop is a more general solution:

for row in binderytools.pu shbind(u'a', string=DOC):
matrix.append(t uple([ unicode(e) for e in row.xml_xpath(u '*') ]))

Same output, of course. I even tested it for you in Amara 0.9.4. And
what the heck, while I was there, I added it to the demos.

You can make things even more obfuscated^H^H^ H^H^H^H^H^H^H^H terse using
further lambda or list comp tricks, but I leave that as an exercise for
the perverse ;-)
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.ht
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xm...x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xm...-tiplook2.html

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2436
by: Dirk Rudolf | last post by:
I like to announce you the product X2U, avaible under http://www.lumrix.net/x2. X2U is an acronym for "XML to user". Existing XML editors still ignore the fact that users don't want to read XML markup. Our view is: Not users have to align to XML, but XML has to align to users. X2U allows users tp fill out simple web forms, which are automatically derived from XML models. The XML model provides the XML structure and an interface...
0
1039
by: Pomax | last post by:
Does anyone know of a good standoff markup tool that isn't Gate? I've tried to work with gate but it's very obviously not meant as a commercial or intuitive product, so I'm looking for programs that simply let me create standoff XML annotions with a reasonable functionality such as shifting the markup integrally through the data file, show/hide markup code, and uses a milestone/syncpoint system so that editing the data outside the...
38
3616
by: Jukka K. Korpela | last post by:
As well all know, valid markup is important... but when trying to find a convincing modern argument in favor of this, I found pages like http://www.htmlhelp.com/tools/validator/reasons.html which was very nice when Netscape ruled the World Wide Web, and http://valet.htmlhelp.com/page/why.html which is very suggestive but does not really give any factual example. Is there something to be _shown_ to people who ask "why validate?"? A page...
1
1441
by: r.shimmin | last post by:
There exist a number of related informal markup languages whose design philosophy is to use terse, easily human-entered and human-read tags, that are intended to be converted by software into some flavour of SGML or XML. The markup languages used for editing on many wikis are the most prominent examples. Can anyone tell me whether there exists a tool that would allow me to define such a language, and the xml elements that the "informal"...
1
1705
by: Trebek | last post by:
Hello grp: I have a situation I was hoping someone might be able to suggest a solution. I am retrieving html from a url and storing this information in Sql Server. Our web service supplies this data to our clients via a web service that is a client of the ws and to integration clients as xml data (HTML is encoded in CDATA). We have an integration client who cannot accept html embedded in the xml for whatever reason. Due to the large...
1
1389
by: Scott Abel | last post by:
Tony Self of HyperWrite presents an interesting and informative article entitled "Semantic, Structured Authoring: The Challenge for Technical Writers" that is sure to be of use to many technical writers struggling to find themselves in the changing world of structured XML authoring and content management. Check it out and leave a comment to let us know what you think. http://www.thecontentwrangler.com The Content Wrangler, Inc.
0
936
by: kowmudi | last post by:
hi all, I am working on data conversions and Im very new to the field of XML so would be very happy if i do get a helping hand on my work.. Iam converting PDF files to an XML file and for this purpose used many converting tools but none of them were giving me a good structured XML file as output so would like to know if there is anyone working or worked on this and have any sort of suggestion to my problem... desparately waiitng for a...
16
2069
by: Andreas Prilop | last post by:
I have three test pages that are marked as Italian, Spanish, Portuguese, resp. by Content-Language: it <html lang="it"> <body lang="it"> and the same for "es" and "pt". Yahoo regards all three pages as Italian:
9
3607
by: Daniele Perilli | last post by:
Hi everybody, I'd like to introduce you a new little tool I developed to automatically check markup validation of all pages in given websites. It uses W3C HTML Validator and CSS Validator online services in recursive mode (without any max pages limit as other known services). It's called Markup Validator and, of course, it's completely free. I'd be delighted to have a feedback from you about it: http://www.markupvalidator.com Thanks.
0
9647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9485
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10161
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9958
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8986
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6743
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5390
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3662
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2890
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.