473,406 Members | 2,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

searching an XML doc

Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Thanks in advance
Jan 15 '08 #1
5 1350
Gowri schrieb:
Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

lxml does that.

Diez
Jan 15 '08 #2
On Jan 15, 3:49 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
Gowri schrieb:
Hello,
I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

lxml does that.

Diez
Hi Diez

I was trying lxml out and was unable to find any examples that would
help me parse an XML file with namespaces. For example, my XML file
looks like this:

<phedexData xmlns="http://a.b.com/phedex"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://a.b.com/phedex requests.xsd">
<!-- Low priority replication request -->
<request id="1234" last_update="1060199000.0">
<status>
<approved>T1_RAL_MSS</approved>
<approved>T2_London_ICHEP</approved>
<disapproved>T2_Southgrid_Bristol</disapproved>
<pending/>
<move_pending/>
</status>
<subscription open="1" priority="0" type="replicate">
<items>
<dataset>/PrimaryDS1/ProcessedDS1/Tier</dataset>
<block>/PrimaryDS2/ProcessedDS2/Tier/block</block>
</items>
</subscription>
</request>
</phedexData>

If my Xpath query is //request, it obviously would not work. Is there
some sort of namespace registration etc. that is to be done before
issuing a query? Example code would help a lot.
Jan 16 '08 #3
On Jan 15, 9:33 pm, Gowri <gowr...@gmail.comwrote:
Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Thanks in advance
Create your query like:

ns0 = '{http://a.b.com/phedex}'

query = '%srequest/%sstatus' % (ns0, ns0)

Also, although imperfect, some people have found this useful:

http://gflanagan.net/site/python/uti...tfilter.py.txt

Expand|Select|Wrap|Line Numbers
  1.  
  2. test = '''<phedexData xmlns="http://a.b.com/phedex"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://a.b.com/phedex requests.xsd">
  5. <!--  Low priority replication request -->
  6. <request id="1234" last_update="1060199000.0">
  7. <status>
  8. <approved>T1_RAL_MSS</approved>
  9. <approved>T2_London_ICHEP</approved>
  10. <disapproved>T2_Southgrid_Bristol</
  11. disapproved>
  12. <pending/>
  13. <move_pending/>
  14. </status>
  15. <subscription open="1" priority="0" type="replicate">
  16. <items>
  17. <dataset>/PrimaryDS1/ProcessedDS1/
  18. Tier</dataset>
  19. <block>/PrimaryDS2/
  20. ProcessedDS2/Tier/block</block>
  21. </items>
  22. </subscription>
  23. </request>
  24. </phedexData>
  25. '''
  26.  
  27. from xml.etree import ElementTree as ET
  28.  
  29. root = ET.fromstring(test)
  30.  
  31. ns0 = '{http://a.b.com/phedex}'
  32.  
  33. from rattlebag.elementfilter import findall, data
  34.  
  35. #http://gflanagan.net/site/python/utils/elementfilter/
  36. elementfilter.py.txt
  37.  
  38. query0 = '%(ns)srequest/%(ns)sstatus' % {'ns': ns0}
  39. query1 = '%(ns)srequest/%(ns)ssubscription[@type=="replicate"]/%
  40. (ns)sitems' % {'ns': ns0}
  41. query2 = '%(ns)srequest[@id==1234]/%(ns)sstatus/%(ns)sapproved' %
  42. {'ns': ns0}
  43.  
  44. print 'With ElementPath: '
  45. print root.findall(query0)
  46. print
  47. print 'With ElementFilter:'
  48. for query in [query0, query1, query2]:
  49. print
  50. print '+'*50
  51. print 'query: ', query
  52. print
  53. for item in findall(root, query):
  54. print 'item: ', item
  55. print 'xml:'
  56. ET.dump(item)
  57.  
  58. print '-'*50
  59. print
  60. print 'approved: ', data(root, query2)
  61.  
  62.  
[OUTPUT]
With ElementPath:
[<Element {http://a.b.com/phedex}status at b95ad0>]

With ElementFilter:

++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request/{http://a.b.com/phedex}status

item: <Element {http://a.b.com/phedex}status at b95ad0>
xml:
<ns0:status xmlns:ns0="http://a.b.com/phedex">
<ns0:approved>T1_RAL_MSS</ns0:approved>
<ns0:approved>T2_London_ICHEP</ns0:approved>
<ns0:disapproved>T2_Southgrid_Bristol</
ns0:disapproved>
<ns0:pending />
<ns0:move_pending />
</ns0:status>
++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request/{http://a.b.com/
phedex}subscription[@type
=="replicate"]/{http://a.b.com/phedex}items

item: <Element {http://a.b.com/phedex}items at b95eb8>
xml:
<ns0:items xmlns:ns0="http://a.b.com/phedex">
<ns0:dataset>/PrimaryDS1/ProcessedDS1/
Tier</ns0:
dataset>
<ns0:block>/PrimaryDS2/
ProcessedDS2/Tier
/block</ns0:block>
</ns0:items>
++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b.com/phedex}request[@id==1234]/{http://a.b.com/
phedex}status/
{http://a.b.com/phedex}approved

item: <Element {http://a.b.com/phedex}approved at b95cd8>
xml:
<ns0:approved xmlns:ns0="http://a.b.com/phedex">T1_RAL_MSS</
ns0:approved>

item: <Element {http://a.b.com/phedex}approved at b95cb0>
xml:
<ns0:approved xmlns:ns0="http://a.b.com/phedex">T2_London_ICHEP</
ns0:approved>

--------------------------------------------------

approved: ['T1_RAL_MSS', 'T2_London_ICHEP']
INFO End logging.
[/OUTPUT]
Jan 16 '08 #4
Hi Gerard,

I don't know what to say :) thank you so much for taking time to post
all of this. truly appreciate it :)
Jan 16 '08 #5
grflanagan wrote:
On Jan 15, 9:33 pm, Gowri <gowr...@gmail.comwrote:
>I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Create your query like:

ns0 = '{http://a.b.com/phedex}'

query = '%srequest/%sstatus' % (ns0, ns0)
lxml supports the same thing, BTW, and how to work with namespaces is
explained in the tutorial:

http://codespeak.net/lxml/dev/tutorial.html#namespaces

Stefan
Jan 16 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: John | last post by:
Hi everyone ! This is a first time I post a message here. If I post my message in a wrong group. Please ignore it. I am trying to build a website which allows users (can be thousands of...
18
by: jblazi | last post by:
I should like to search certain characters in a string and when they are found, I want to replace other characters in other strings that are at the same position (for a very simply mastermind game)...
4
by: tgiles | last post by:
Hi, all. Another bewildered newbie struggling with Python goodness. This time it's searching strings. The goal is to search a string for a value. The string is a variable I assigned the name...
2
by: Kakarot | last post by:
I'm gona be very honest here, I suck at programming, *especially* at C++. It's funny because I actually like the idea of programming ... normally what I like I'm atleast decent at. But C++ is a...
8
by: Gordon Knote | last post by:
Hi can anyone tell me what's the best way to search in binary content? Best if someone could post or link me to some source code (in C/C++). The search should be as fast as possible and it would...
33
by: Geoff Jones | last post by:
Hiya I have a DataTable containing thousands of records. Each record has a primary key field called "ID" and another field called "PRODUCT" I want to retrieve the rows that satisy the following...
5
by: justobservant | last post by:
When more than one keyword is typed into a search-query, most of the search-results displayed indicate specified keywords scattered throughout an entire website of content i.e., this is shown as...
7
by: pbd22 | last post by:
Hi. I am somewhat new to this and would like some advice. I want to search my xml file using "keyword" search and return results based on "proximity matching" - in other words, since the search...
5
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The...
2
by: Bart Kastermans | last post by:
I have a file in which I am searching for the letter "i" (actually a bit more general than that, arbitrary regular expressions could occur) as long as it does not occur inside an expression that...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.