473,396 Members | 1,693 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Python Regex Question

I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?

Sep 20 '07 #1
5 7334
jo***********@gmail.com wrote:
I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?

'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account from http://www.teranews.com

Sep 20 '07 #2
jo***********@gmail.com wrote:
>I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?
If all the td's content has the &nbsp;[value_to_extract]&nbsp; pattern,
things goes simplest

[untested]

/<td.*&nbsp;([^&]*)&nbsp;/

the parentesis will be used to group() the result (and extract what you
really want)

Cheers
Gerardo
Sep 20 '07 #3
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
joemystery...@gmail.com wrote:
I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>
The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>
How can I just extract the real/integer number using regex?

'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account fromhttp://www.teranews.com
I am trying to use BeautifulSoup:

soup = BeautifulSoup(page)

td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?

Sep 20 '07 #4
Ivo
crybaby wrote:
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
>joemystery...@gmail.com wrote:
>>I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>
The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>
How can I just extract the real/integer number using regex?
'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account fromhttp://www.teranews.com

I am trying to use BeautifulSoup:

soup = BeautifulSoup(page)

td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?
I don't know anything about BeautifulSoup, but to the other questions:

var=re.compile(regexpr) compiles the expression and after that you can
use var as the reference to that compiled expression (costs less)

re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.

The way you use it it doesn't matter.

do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)

Now you can reuse pattern.

Cheers,
Ivo.
Sep 21 '07 #5
re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.
The re module-level helper functions cache expressions and their
compiled form in a dict. They are only compiled once. The main
overhead would be for repeated dict lookups.

See sre.py (included from re.py) for more details. /usr/lib/python2.4/sre.py
Sep 21 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Tony C | last post by:
I'm writing a python program which uses regular expressions, but I'm totally new to regexps. I've got Kuchling's "Regexp HOWTO", "Mastering Regular Expresions" by Oreilly, and have access to...
17
by: Michael McGarry | last post by:
Hi, I am just starting to use Python. Does Python have all the regular expression features of Perl? Is Python missing any features available in Perl? Thanks, Michael
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
5
by: Vamsee Krishna Gomatam | last post by:
Hello, I'm having some problems understanding Regexps in Python. I want to replace "<google>PHRASE</google>" with "<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of text....
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
3
by: gisleyt | last post by:
I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File...
8
by: Xah Lee | last post by:
the Python regex documentation is available at: http://xahlee.org/perl-python/python_re-write/lib/module-re.html Note that, i've just made the terms of use clear. Also, can anyone answer what...
10
by: Raymond | last post by:
For some reason I'm unable to grok Python's string.replace() function. Just trying to parse a simple IP address, wrapped in square brackets, from Postfix logs. In sed this is straightforward given:...
3
by: Walter Cruz | last post by:
Hi all! Just a simple question about the behaviour of a regex in python. (I discussed this on IRC, and they suggest me to post here). I tried to split the string "walter ' cruz" using \b . ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.