473,387 Members | 1,512 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

BeautifulSoup to get string inner 'p' and 'a' tags

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:
from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"<p class="contentBody">FOO <a name="f"></a</p></td>'
tree = BeautifulSoup(s)
print tree.first('p')
<p class="contentBody">FOO <a name="f"></a</p>

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:
print tree.first('p').string
Null

Any solution?

Jul 24 '06 #1
3 5832
In <11**********************@i42g2000cwa.googlegroups .com>, GinTon wrote:
I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:
>from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"<p class="contentBody">FOO <a name="f"></a</p></td>'
tree = BeautifulSoup(s)
>print tree.first('p')
<p class="contentBody">FOO <a name="f"></a</p>

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:
>print tree.first('p').string
Null

Any solution?
In [53]: print tree.first('p').contents[0]
FOO

Ciao,
Marc 'BlackJack' Rintsch
Jul 24 '06 #2

Marc 'BlackJack' Rintsch wrote:
In [53]: print tree.first('p').contents[0]
FOO
Thanks! I was going to crazy with this.

Jul 24 '06 #3
Quick-n-dirty way:
After you get your whole p string: <p class="contentBody">FOO <a
name="f"></a</p>
Remove any tags delimited by '<' and '>' with a regex. In your short
example you _don't_ show that there might be something between the <a>
and </atags so I assume there won't be anything or if there would be
something then you also want it included in the final text. As in
'<p class="contentBody">FOO <a name="f">URLNAME</a</p>' =='FOO
URLNAME'

For the regex start with something simple like <.*?and see if it
works then improve it. Use kiki or kodos - python visual regex
helpers.

Hope this helps,
Nick V.
GinTon wrote:
I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:
from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"<p class="contentBody">FOO <a name="f"></a</p></td>'
tree = BeautifulSoup(s)
print tree.first('p')
<p class="contentBody">FOO <a name="f"></a</p>

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:
print tree.first('p').string
Null

Any solution?
Jul 24 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: ted | last post by:
Hi, I'm using the BeautifulSoup module and having some trouble processing a file. It's not printing what I'm expecting. In the code below, I'm expecting cells with only "bgcolor" attributes to...
7
by: Gonzillaaa | last post by:
I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but...
5
by: John Nagle | last post by:
This, which is from a real web site, went into BeautifulSoup: <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer fantastic rates for selected weeks or days!!&blinkt=Click...
6
by: John Nagle | last post by:
Here's a construct with which BeautifulSoup has problems. It's from "http://support.microsoft.com/contactussupport/?ws=support". This is the original: <a...
11
by: John Nagle | last post by:
The syntax that browsers understand as HTML comments is much less restrictive than what BeautifulSoup understands. I keep running into sites with formally incorrect HTML comments which are parsed...
0
by: John Nagle | last post by:
This web page: http://azultralights.com/ulclass.html parses OK with BeautifulSoup, but "prettify" will hit the recursion limit if you try to display it. I raised the recursion limit to a...
0
by: Marco Hornung | last post by:
Hy guys, I'm using the python-framework BeautifulSoup(BS) to parse some information out of a german soccer-website. I spend some qualitiy time with the BS-docs, but I couldn't really figure out...
3
by: Jackie Wang | last post by:
Dear all, I have the following html code: <td valign="top" headers="col1"> <font size="2"> Center Bank <br /> Los Angeles, CA </font>
0
by: bruce | last post by:
hi jackie, if you don't mind... can i ask what you're looking to accomplish? are you looking to simply get the text/string data, or something else??? -----Original Message----- From:...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.