472,952 Members | 2,187 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,952 software developers and data experts.

Python Regex Question

I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?

Sep 20 '07 #1
5 7312
jo***********@gmail.com wrote:
I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?

'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account from http://www.teranews.com

Sep 20 '07 #2
jo***********@gmail.com wrote:
>I need to extract the number on each <td tags from a html file.

i.e 49.950 from the following:

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>

The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.

<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>

How can I just extract the real/integer number using regex?
If all the td's content has the &nbsp;[value_to_extract]&nbsp; pattern,
things goes simplest

[untested]

/<td.*&nbsp;([^&]*)&nbsp;/

the parentesis will be used to group() the result (and extract what you
really want)

Cheers
Gerardo
Sep 20 '07 #3
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
joemystery...@gmail.com wrote:
I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>
The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>
How can I just extract the real/integer number using regex?

'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account fromhttp://www.teranews.com
I am trying to use BeautifulSoup:

soup = BeautifulSoup(page)

td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?

Sep 20 '07 #4
Ivo
crybaby wrote:
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
>joemystery...@gmail.com wrote:
>>I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>
The actual number between: &nbsp;49.950&nbsp; can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>
How can I just extract the real/integer number using regex?
'[0-9]*\.[0-9]*'

--
Posted via a free Usenet account fromhttp://www.teranews.com

I am trying to use BeautifulSoup:

soup = BeautifulSoup(page)

td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?
I don't know anything about BeautifulSoup, but to the other questions:

var=re.compile(regexpr) compiles the expression and after that you can
use var as the reference to that compiled expression (costs less)

re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.

The way you use it it doesn't matter.

do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)

Now you can reuse pattern.

Cheers,
Ivo.
Sep 21 '07 #5
re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.
The re module-level helper functions cache expressions and their
compiled form in a dict. They are only compiled once. The main
overhead would be for repeated dict lookups.

See sre.py (included from re.py) for more details. /usr/lib/python2.4/sre.py
Sep 21 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Tony C | last post by:
I'm writing a python program which uses regular expressions, but I'm totally new to regexps. I've got Kuchling's "Regexp HOWTO", "Mastering Regular Expresions" by Oreilly, and have access to...
17
by: Michael McGarry | last post by:
Hi, I am just starting to use Python. Does Python have all the regular expression features of Perl? Is Python missing any features available in Perl? Thanks, Michael
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
5
by: Vamsee Krishna Gomatam | last post by:
Hello, I'm having some problems understanding Regexps in Python. I want to replace "<google>PHRASE</google>" with "<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of text....
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
3
by: gisleyt | last post by:
I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File...
8
by: Xah Lee | last post by:
the Python regex documentation is available at: http://xahlee.org/perl-python/python_re-write/lib/module-re.html Note that, i've just made the terms of use clear. Also, can anyone answer what...
10
by: Raymond | last post by:
For some reason I'm unable to grok Python's string.replace() function. Just trying to parse a simple IP address, wrapped in square brackets, from Postfix logs. In sed this is straightforward given:...
3
by: Walter Cruz | last post by:
Hi all! Just a simple question about the behaviour of a regex in python. (I discussed this on IRC, and they suggest me to post here). I tried to split the string "walter ' cruz" using \b . ...
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.