On Feb 17, 6:55 pm, "snewma...@gmail.com" <snewma...@gmail.comwrote:
I'm trying to parse out some XML nodes with namespaces using
BeautifulSoup. I can't seem to get the syntax correct. It doesn't like
the colon in the tag name, and I'm not sure how to refer to that tag.
I'm trying to get the attributes of this tag:
<yweather:forecast day="Sun" date="18 Feb 2007" low="39" high="55"
text="Partly Cloudy/Wind" code="24">
The only way I've been able to get it is by doing a findAll with
regex. Is there a better way?
----------
from BeautifulSoup import BeautifulStoneSoup
import urllib2
url = 'http://weather.yahooapis.com/forecastrss?p=33609'
page = urllib2.urlopen(url)
soup = BeautifulStoneSoup(page)
print soup['yweather:forecast']
----------
If you are just trying to extract a single particular tag, pyparsing
can do this pretty readily, and the results returned make it very easy
to pick out the tag attribute values.
-- Paul
from pyparsing import makeHTMLTags
import urllib2
url = 'http://weather.yahooapis.com/forecastrss?p=78732'
page = urllib2.urlopen(url)
html = page.read()
page.close()
forecastTag = makeHTMLTags('yweather:forecast')[0]
for fc in forecastTag.searchString(html):
print fc.asList()
print "Date: %(date)s, hi:%(high)s lo:%(low)s" % fc
print
Prints:
['yweather:forecast', ['day', 'Sat'], ['date', '17 Feb 2007'], ['low',
'34'], ['high', '67'], ['text', 'Clear'], ['code', '31'], True]
Date: 17 Feb 2007, hi:67 lo:34
['yweather:forecast', ['day', 'Sun'], ['date', '18 Feb 2007'], ['low',
'42'], ['high', '65'], ['text', 'Sunny'], ['code', '32'], True]
Date: 18 Feb 2007, hi:65 lo:42