473,399 Members | 4,254 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

BeautifulSoup fetch help

ted
Hi,

I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.

Any help appreciated. Thanks,
Ted

import re
from BeautifulSoup import BeautifulSoup

text = open('yahoo.html').read()
soup = BeautifulSoup(text)
tables = soup('table', {'border':re.compile('.+')})

for table in tables:
cells = table.fetch('td', {'bgcolor':re.compile('.+')})
for cell in cells:
print cell
print "================"

Jan 7 '06 #1
2 2419
"ted" <te*********************@sbcglobal.net> writes:
I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.


BeatifulSoups matching is for any tag with a matching attribute, not
tags that only match that attribute. That's why you're getting tags
with other attributes.

However, you can use a callable as the tag argument to check for what
you want:

def findtagswithly(name, attr):
return (lambda tag: tag.name == name and
len(tag.attrs) == 1 and
tag.attrs[0][0] == attr)

....

cells = table.fetch(findtagswithonly('a', 'bgcolor'))
Or, because I wrote it to check out:

def findtagswithoneattrib(name):
return lambda tag: tag.name == name and len(tag.attrs) == 1

....
cells = table.fetch(findtagswithoneattrib('a', {bgcolor: re.compile('.+)}))

I'm not sure why you're getting tags without attributes. If the above
code does that, post some sample data along with the code.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 7 '06 #2
ted
Thanks Mike, works like a charm.

-Ted
"Mike Meyer" <mw*@mired.org> wrote in message
news:86************@bhuda.mired.org...
"ted" <te*********************@sbcglobal.net> writes:
I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm
expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.


BeatifulSoups matching is for any tag with a matching attribute, not
tags that only match that attribute. That's why you're getting tags
with other attributes.

However, you can use a callable as the tag argument to check for what
you want:

def findtagswithly(name, attr):
return (lambda tag: tag.name == name and
len(tag.attrs) == 1 and
tag.attrs[0][0] == attr)

...

cells = table.fetch(findtagswithonly('a', 'bgcolor'))
Or, because I wrote it to check out:

def findtagswithoneattrib(name):
return lambda tag: tag.name == name and len(tag.attrs) == 1

...
cells = table.fetch(findtagswithoneattrib('a', {bgcolor:
re.compile('.+)}))

I'm not sure why you're getting tags without attributes. If the above
code does that, post some sample data along with the code.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more
information.

Jan 7 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Steve Young | last post by:
I tried using BeautifulSoup to make changes to the url links on html pages, but when the page was displayed, it was garbled up and didn't look right (even when I didn't actually change anything on...
7
by: Gonzillaaa | last post by:
I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but...
4
by: William Xu | last post by:
Hi, all, This piece of code used to work well. i guess the error occurs after some upgrade. >>> import urllib >>> from BeautifulSoup import BeautifulSoup >>> url = 'http://www.google.com'...
1
by: gcmartijn | last post by:
I'm trying to extract something like this: <object classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/...
5
by: Larry Bates | last post by:
Info: Python version: ActivePython 2.5.1.1 Platform: Windows I wanted to install BeautifulSoup today for a small project and decided to use easy_install. I can install other packages just...
3
by: bsagert | last post by:
I downloaded BeautifulSoup.py from http://www.crummy.com/software/BeautifulSoup/ and being a n00bie, I just placed it in my Windows c:\python25\lib\ file. When I type "import beautifulsoup" from...
2
by: Alexnb | last post by:
Okay, I am not sure if there is a better way of doing this than findAll() but that is how I am doing it right now. I am making an app that screen scapes dictionary.com for definitions. However, I...
1
by: Alexnb | last post by:
Okay, what I want to do with this code is to got to thesaurus.reference.com and then search for a word and get the syns for it. Now, I can get the syns, but they are still in html form and some are...
2
by: academicedgar | last post by:
Hi I would appreciate some help. I am trying to learn Python and want to use BeautifulSoup to pull some data from tables. I was really psyched earlier tonight when I discovered that I could do...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.