Help Parsing XML Namespaces with BeautifulSoup

snewman18

I'm trying to parse out some XML nodes with namespaces using
BeautifulSoup. I can't seem to get the syntax correct. It doesn't like
the colon in the tag name, and I'm not sure how to refer to that tag.

I'm trying to get the attributes of this tag:

<yweather:forecast day="Sun" date="18 Feb 2007" low="39" high="55"
text="Partly Cloudy/Wind" code="24">

The only way I've been able to get it is by doing a findAll with
regex. Is there a better way?

----------

from BeautifulSoup import BeautifulStoneSoup
import urllib2

url = 'http://weather.yahooapis.com/forecastrss?p=33609'
page = urllib2.urlopen(url)
soup = BeautifulStoneSoup(page)

print soup['yweather:forecast']

----------

Feb 18 '07 #1

Subscribe Post Reply

4732

Paul McGuire

On Feb 17, 6:55 pm, "snewma...@gmail.com" <snewma...@gmail.comwrote:

I'm trying to parse out some XML nodes with namespaces using
BeautifulSoup. I can't seem to get the syntax correct. It doesn't like
the colon in the tag name, and I'm not sure how to refer to that tag.

I'm trying to get the attributes of this tag:

<yweather:forecast day="Sun" date="18 Feb 2007" low="39" high="55"
text="Partly Cloudy/Wind" code="24">

The only way I've been able to get it is by doing a findAll with
regex. Is there a better way?

----------

from BeautifulSoup import BeautifulStoneSoup
import urllib2

url = 'http://weather.yahooapis.com/forecastrss?p=33609'
page = urllib2.urlopen(url)
soup = BeautifulStoneSoup(page)

print soup['yweather:forecast']

----------

If you are just trying to extract a single particular tag, pyparsing
can do this pretty readily, and the results returned make it very easy
to pick out the tag attribute values.

-- Paul
from pyparsing import makeHTMLTags
import urllib2

url = 'http://weather.yahooapis.com/forecastrss?p=78732'
page = urllib2.urlopen(url)
html = page.read()
page.close()

forecastTag = makeHTMLTags('yweather:forecast')[0]

for fc in forecastTag.searchString(html):
print fc.asList()
print "Date: %(date)s, hi:%(high)s lo:%(low)s" % fc
print

Prints:

['yweather:forecast', ['day', 'Sat'], ['date', '17 Feb 2007'], ['low',
'34'], ['high', '67'], ['text', 'Clear'], ['code', '31'], True]
Date: 17 Feb 2007, hi:67 lo:34

['yweather:forecast', ['day', 'Sun'], ['date', '18 Feb 2007'], ['low',
'42'], ['high', '65'], ['text', 'Sunny'], ['code', '32'], True]
Date: 18 Feb 2007, hi:65 lo:42

Feb 18 '07 #2

Similar topics

Parsing HTML

by: Anders Eriksson | last post by:

Hello! I want to extract some info from a some specific HTML pages, Microsofts International Word list (e.g. http://msdn.microsoft.com/library/en-us/dnwue/html/swe_word_list.htm). I want to...

Python

html parsing? Or just simple regex'ing?

by: Dan Stromberg | last post by:

I'm working on writing a program that will synchronize one database with another. For the source database, we can just use the python sybase API; that's nice and normal. For the target...

Python

Regular Expression help for parsing html tables

by: steve551979 | last post by:

Hello, I am having some difficulty creating a regular expression for the following string situation in html. I want to find a table that has specific text in it and then extract the html just...

Python

HTML Parsing

by: mtuller | last post by:

Alright. I have tried everything I can find, but am not getting anywhere. I have a web page that has data like this: <tr > <td headers="col1_1" style="width:21%" > <span class="hpPageText"...

Python

How use XML parsing tools on this one specific URL?

by: seberino | last post by:

I understand that the web is full of ill-formed XHTML web pages but this is Microsoft: http://moneycentral.msn.com/companyreport?Symbol=BBBY I can't validate it and xml.minidom.dom.parseString...

Python

Parsing HTML/XML documents

by: pabloski | last post by:

I need to parse real world HTML/XML documents and I found two nice python solution: BeautifulSoup and Tidy. However I found pyXPCOM that is a wrapper for Gecko. So I was thinking Gecko surely...

Python

String parsing

by: HMS Surprise | last post by:

The string below is a piece of a longer string of about 20000 characters returned from a web page. I need to isolate the number at the end of the line containing 'LastUpdated'. I can find...

Python

Help Parsing an HTML File

by: egonslokar | last post by:

Hello Python Community, It'd be great if someone could provide guidance or sample code for accomplishing the following: I have a single unicode file that has descriptions of hundreds of...

Python

Re: HTML Parsing

by: Victor Noagbodji | last post by:

Hi everyone Hello urllib2: http://docs.python.org/lib/module-urllib2.html BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++