By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,199 Members | 1,077 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,199 IT Pros & Developers. It's quick & easy.

Parsing HTML [solved using the re module]

P: 54
Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...

In a previous task I needed to get a specific number out of this source code:
<TD HEIGHT="24" CLASS="bubblemiddle" ALIGN="right" id="homeindexvolume" name="homeindexvolume">2,017,798,400</TD>

so I used:
e.compile('<TD>.*name="homeindexvolume">(.*?)</TD>',re.M|re.DOTALL)

Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
Here's the source code:

<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>

Now all I want is 1,508,577,000.

How would I grab just that number?

How about if I wanted a different nubmer in there, say 51,073,000?

Thanks
May 21 '07 #1
Share this Question
Share on Google+
1 Reply


bvdet
Expert Mod 2.5K+
P: 2,851
Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...

In a previous task I needed to get a specific number out of this source code:
<TD HEIGHT="24" CLASS="bubblemiddle" ALIGN="right" id="homeindexvolume" name="homeindexvolume">2,017,798,400</TD>

so I used:
e.compile('<TD>.*name="homeindexvolume">(.*?)</TD>',re.M|re.DOTALL)

Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
Here's the source code:

<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>

Now all I want is 1,508,577,000.

How would I grab just that number?

How about if I wanted a different nubmer in there, say 51,073,000?

Thanks
This will extract the numbers from the string:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. s = '<tr><td bgcolor="EEEEEE"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000"><b>Total</b></font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,508,577,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">51,073,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">1,966,371,000</font></td><td bgcolor="EEEEEE" align="right"><FONT FACE="Arial,Helvetica,sans-serif" SIZE="2" COLOR="#000000">2,125,754,373</font></td></tr>'
  4.  
  5. patt = r'>([0-9,]+)<'
  6. dataList = re.findall(patt, s)
  7. print dataList
  8.  
  9. '''
  10. >>> ['1,508,577,000', '51,073,000', '1,966,371,000', '2,125,754,373']
  11. '''
Use the list index to get individual items:
Expand|Select|Wrap|Line Numbers
  1. >>> number = dataList[0]
  2. >>> number
  3. '1,508,577,000'
  4. >>> 
May 21 '07 #2

Post your reply

Sign in to post your reply or Sign up for a free account.