Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...
In a previous task I needed to get a specific number out of this source code:
<TD HEIGHT="24" CLASS="bubblemiddle" ALIGN="right" id="homeindexvolume" name="homeindexvolume">2,017,798,400</TD>
so I used:
e.compile('<TD>.*name="homeindexvolume">(.*?)</TD>',re.M|re.DOTALL)
Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
Here's the source code:
<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>
Now all I want is 1,508,577,000.
How would I grab just that number?
How about if I wanted a different nubmer in there, say 51,073,000?
Thanks
1 1098 bvdet 2,851
Expert Mod 2GB
Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...
In a previous task I needed to get a specific number out of this source code:
<TD HEIGHT="24" CLASS="bubblemiddle" ALIGN="right" id="homeindexvolume" name="homeindexvolume">2,017,798,400</TD>
so I used:
e.compile('<TD>.*name="homeindexvolume">(.*?)</TD>',re.M|re.DOTALL)
Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
Here's the source code:
<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>
Now all I want is 1,508,577,000.
How would I grab just that number?
How about if I wanted a different nubmer in there, say 51,073,000?
Thanks
This will extract the numbers from the string: - import re
-
-
s = '<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>'
-
-
patt = r'>([0-9,]+)<'
-
dataList = re.findall(patt, s)
-
print dataList
-
-
'''
-
>>> ['1,508,577,000', '51,073,000', '1,966,371,000', '2,125,754,373']
-
'''
Use the list index to get individual items: - >>> number = dataList[0]
-
>>> number
-
'1,508,577,000'
-
>>>
Sign in to post your reply or Sign up for a free account.
Similar topics
by: KC |
last post by:
I have written a parser using htmllib.HTMLParser and it functions fine
unless the HTML is malformed. For example, is some instances, the
provider of the HTML leaves out the <TR> tags but includes...
|
by: mike420 |
last post by:
In the context of LATEX, some Pythonista asked what the big
successes of Lisp were. I think there were at least three *big*
successes.
a. orbitz.com web site uses Lisp for algorithms, etc.
b....
|
by: Gerrit Holl |
last post by:
Posted with permission from the author.
I have some comments on this PEP, see the (coming) followup to this message.
PEP: 321
Title: Date/Time Parsing and Formatting
Version: $Revision: 1.3 $...
|
by: Tuang |
last post by:
I've been looking all over in the docs, but I can't figure out how
you're *supposed* to parse formatted strings into numbers (and other
data types, for that matter) in Python.
In C#, you can say...
|
by: RiGGa |
last post by:
Hi,
I want to parse a web page in Python and have it write certain values out to
a mysql database. I really dont know where to start with parsing the html
code ( I can work out the database...
|
by: Willem Ligtenberg |
last post by:
I decided to use SAX to parse my xml file.
But the parser crashes on:
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
raise exception...
|
by: Sanjay Arora |
last post by:
We are looking to select the language & toolset more suitable for a
project that requires getting data from several web-sites in real-
time....html parsing/scraping. It would require full emulation...
|
by: Phillip B Oldham |
last post by:
Is there a standard library for parsing emails that can cope with the
different way email clients quote?
|
by: Felipe De Bene |
last post by:
I'm having problems parsing an HTML file with the following syntax :
<TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'>
<TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH>
<TH...
|
by: =?ISO-8859-1?Q?Andr=E9?= |
last post by:
Hi everyone,
I would like to implement a parser for a mini-language
and would appreciate some pointers. The type of
text I would like to parse is an extension of:
...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |