I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
The actual number between: 49.950 can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> ######.#### </font></td>
How can I just extract the real/integer number using regex? 5 7312 jo***********@gmail.com wrote:
I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
The actual number between: 49.950 can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> ######.#### </font></td>
How can I just extract the real/integer number using regex?
'[0-9]*\.[0-9]*'
--
Posted via a free Usenet account from http://www.teranews.com jo***********@gmail.com wrote:
>I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times Roman,Times,Serif"> 49.950 </font></td>
The actual number between: 49.950 can be any number of digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times Roman,Times,Serif"> ######.#### </font></td>
How can I just extract the real/integer number using regex?
If all the td's content has the [value_to_extract] pattern,
things goes simplest
[untested]
/<td.* ([^&]*) /
the parentesis will be used to group() the result (and extract what you
really want)
Cheers
Gerardo
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
joemystery...@gmail.com wrote:
I need to extract the number on each <td tags from a html file.
i.e 49.950 from the following:
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
The actual number between: 49.950 can be any number of
digits before decimal and after decimal.
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> ######.#### </font></td>
How can I just extract the real/integer number using regex?
'[0-9]*\.[0-9]*'
--
Posted via a free Usenet account fromhttp://www.teranews.com
I am trying to use BeautifulSoup:
soup = BeautifulSoup(page)
td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)
I am getting an error:
price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer
Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?
crybaby wrote:
On Sep 20, 4:12 pm, Tobiah <t...@tobiah.orgwrote:
>joemystery...@gmail.com wrote:
>>I need to extract the number on each <td tags from a html file. i.e 49.950 from the following: <td align=right width=80><font size=2 face="New Times Roman,Times,Serif"> 49.950 </font></td> The actual number between: 49.950 can be any number of digits before decimal and after decimal. <td align=right width=80><font size=2 face="New Times Roman,Times,Serif"> ######.#### </font></td> How can I just extract the real/integer number using regex?
'[0-9]*\.[0-9]*'
-- Posted via a free Usenet account fromhttp://www.teranews.com
I am trying to use BeautifulSoup:
soup = BeautifulSoup(page)
td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print "td: ", td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)
I am getting an error:
price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer
Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search? What is the different between
re.search vs re.compile().search?
I don't know anything about BeautifulSoup, but to the other questions:
var=re.compile(regexpr) compiles the expression and after that you can
use var as the reference to that compiled expression (costs less)
re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.
The way you use it it doesn't matter.
do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)
Now you can reuse pattern.
Cheers,
Ivo.
re.search(expr, string) compiles and searches every time. This can
potentially be more expensive in calculating power. especially if you
have to use the expression a lot of times.
The re module-level helper functions cache expressions and their
compiled form in a dict. They are only compiled once. The main
overhead would be for repeated dict lookups.
See sre.py (included from re.py) for more details. /usr/lib/python2.4/sre.py This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Tony C |
last post by:
I'm writing a python program which uses regular expressions, but I'm
totally new to regexps.
I've got Kuchling's "Regexp HOWTO", "Mastering Regular Expresions" by
Oreilly, and have access to...
|
by: Michael McGarry |
last post by:
Hi,
I am just starting to use Python. Does Python have all the regular
expression features of Perl?
Is Python missing any features available in Perl?
Thanks,
Michael
|
by: Xah Lee |
last post by:
http://python.org/doc/2.4.1/lib/module-re.html
http://python.org/doc/2.4.1/lib/node114.html
---------
QUOTE
The module defines several functions, constants, and an exception. Some
of the...
|
by: Vamsee Krishna Gomatam |
last post by:
Hello,
I'm having some problems understanding Regexps in Python. I want
to replace "<google>PHRASE</google>" with
"<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
text....
|
by: Vibha Tripathi |
last post by:
Hi Folks,
I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:
In the Python re.sub(regex, replacement, subject)...
|
by: gisleyt |
last post by:
I'm trying to compile a perfectly valid regex, but get the error
message:
r =
re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*')
Traceback (most recent call last):
File...
|
by: Xah Lee |
last post by:
the Python regex documentation is available at:
http://xahlee.org/perl-python/python_re-write/lib/module-re.html
Note that, i've just made the terms of use clear.
Also, can anyone answer what...
|
by: Raymond |
last post by:
For some reason I'm unable to grok Python's string.replace() function.
Just trying to parse a simple IP address, wrapped in square brackets,
from Postfix logs. In sed this is straightforward given:...
|
by: Walter Cruz |
last post by:
Hi all!
Just a simple question about the behaviour of a regex in python. (I
discussed this on IRC, and they suggest me to post here).
I tried to split the string "walter ' cruz" using \b .
...
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: tracyyun |
last post by:
Hello everyone,
I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
|
by: giovanniandrean |
last post by:
The energy model is structured as follows and uses excel sheets to give input data:
1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM)
Please note that the UK and Europe revert to winter time on...
|
by: nia12 |
last post by:
Hi there,
I am very new to Access so apologies if any of this is obvious/not clear.
I am creating a data collection tool for health care employees to complete. It consists of a number of...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
| |