By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,538 Members | 1,293 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,538 IT Pros & Developers. It's quick & easy.

Regular expression issue

P: n/a
I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120:( KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>
I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

..*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround

Can someone help me debug this. It's not picking up the number, and
I'm not sure I've got the syntax for '|' right, but can't find a
detailed tutorial on how to use |.

Any help would be appreciated.

Thanks

Matt

Jul 19 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
In <11**********************@m79g2000cwm.googlegroups .com>, dmbkiwi wrote:
I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120:( KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>
I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

.*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround

Can someone help me debug this. It's not picking up the number, and
I'm not sure I've got the syntax for '|' right, but can't find a
detailed tutorial on how to use |.
What about something like

align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\)

You need the flags re.MULTILINE and re.DOTALL when compiling the regular
expression.

You'll have to check the 'baro' and 'na' groups to decide if it matched a
numerical value or 'N/A'.

Ciao,
Marc 'BlackJack' Rintsch
Jul 19 '06 #2

P: n/a
dm*****@gmail.com schrieb:
I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120:( KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>
I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

.*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround
Wouldn't it be simpler to use HTMLParser or something similar first to
separate text and HTML tags and get the content of each cell separately?
Then you have only to find the 'right' cell, possibly quite simply by
its position in the HTML table, and check if it contains 'N/A' or
something numeric (that check wouldn't need a regular expression if its
really so simple).

No Python here so I can't try it out to be more specific, but look for
HTMLParser in the library reference.

--
Dr. Sibylle Koczian
Universitaetsbibliothek, Abt. Naturwiss.
D-86135 Augsburg
e-mail : Si*************@Bibliothek.Uni-Augsburg.DE
Jul 24 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.