BeautifulSoup to get string inner 'p' and 'a' tags

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:

from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"FOO <a name="f"></a</td>'
tree = BeautifulSoup(s)

print tree.first('p')

FOO <a name="f"></a

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:

print tree.first('p').string

Null

Any solution?

Jul 24 '06 #1

Subscribe Post Reply

5832

Marc 'BlackJack' Rintsch

In <11**********************@i42g2000cwa.googlegroups .com>, GinTon wrote:

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:

>from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"FOO <a name="f"></a</td>'
tree = BeautifulSoup(s)

>print tree.first('p')
FOO <a name="f"></a

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:

>print tree.first('p').string
Null

Any solution?

In [53]: print tree.first('p').contents[0]
FOO

Ciao,
Marc 'BlackJack' Rintsch

Jul 24 '06 #2

GinTon

Marc 'BlackJack' Rintsch wrote:

In [53]: print tree.first('p').contents[0]
FOO

Thanks! I was going to crazy with this.

Jul 24 '06 #3

Nick Vatamaniuc

Quick-n-dirty way:
After you get your whole p string: FOO <a
name="f"></a
Remove any tags delimited by '<' and '>' with a regex. In your short
example you _don't_ show that there might be something between the <a>
and </atags so I assume there won't be anything or if there would be
something then you also want it included in the final text. As in
'FOO <a name="f">URLNAME</a' =='FOO
URLNAME'

For the regex start with something simple like <.*?and see if it
works then improve it. Use kiki or kodos - python visual regex
helpers.

Hope this helps,
Nick V.
GinTon wrote:

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:

from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"FOO <a name="f"></a</td>'
tree = BeautifulSoup(s)

print tree.first('p')
FOO <a name="f"></a

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:

print tree.first('p').string
Null

Any solution?

Jul 24 '06 #4

Similar topics

BeautifulSoup fetch help

by: ted | last post by:

Hi, I'm using the BeautifulSoup module and having some trouble processing a file. It's not printing what I'm expecting. In the code below, I'm expecting cells with only "bgcolor" attributes to...

Python

scraping nested tables with BeautifulSoup

by: Gonzillaaa | last post by:

I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but...

Python

BeautifulSoup bug when ">>>" found in attribute value

by: John Nagle | last post by:

This, which is from a real web site, went into BeautifulSoup: <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer fantastic rates for selected weeks or days!!&blinkt=Click...

Python

BeautifulSoup vs. Microsoft

by: John Nagle | last post by:

Here's a construct with which BeautifulSoup has problems. It's from "http://support.microsoft.com/contactussupport/?ws=support". This is the original: <a...

Python

BeautifulSoup vs. real-world HTML comments

by: John Nagle | last post by:

The syntax that browsers understand as HTML comments is much less restrictive than what BeautifulSoup understands. I keep running into sites with formally incorrect HTML comments which are parsed...

Python

Web page from hell breaks BeautifulSoup, almost

by: John Nagle | last post by:

This web page: http://azultralights.com/ulclass.html parses OK with BeautifulSoup, but "prettify" will hit the recursion limit if you try to display it. I raised the recursion limit to a...

Python

BeautifulSoup: problems with parsing a website

by: Marco Hornung | last post by:

Hy guys, I'm using the python-framework BeautifulSoup(BS) to parse some information out of a german soccer-website. I spend some qualitiy time with the BS-docs, but I couldn't really figure out...

Python

Use BeautifulSoup to delete certain tag while keeping its content

by: Jackie Wang | last post by:

Dear all, I have the following html code: <td valign="top" headers="col1"> Center Bank Los Angeles, CA

Python

RE: Use BeautifulSoup to delete certain tag while keeping its content

by: bruce | last post by:

hi jackie, if you don't mind... can i ask what you're looking to accomplish? are you looking to simply get the text/string data, or something else??? -----Original Message----- From:...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing