By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,994 Members | 2,063 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,994 IT Pros & Developers. It's quick & easy.

Simple Regular Expression ?

P: n/a
I would like a regex for the following html snippet:

<h2>Field1</h2>

<h3>

$123,456.78 - $987,654.32

</h3>

I would like to capture Field1 and the first numeric value only.

I have created the following that works somewhat:
$pattern='%<h2>(?P<field1>.*?)</h2>
.*?
<h3>.*?\$(?P<field2>.*?)\s.*?</h3>
%six';
However I would like to improve field2's capture to be the first series of
numbers after <h3> excluding the thousand seperator and stop the capture as
soon as a non numeric is encountered other than the decimal point, I cannot
depend on the dollar sign always being present, so in this case I'd capture
123456.78

Thanks in advance...

Jun 26 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Rik
McHenry wrote:
<h2>Field1</h2>

<h3>

$123,456.78 - $987,654.32

</h3>

I would like to capture Field1 and the first numeric value only.
I have created the following that works somewhat:
$pattern='%<h2>(?P<field1>.*?)</h2>
.*?
<h3>.*?\$(?P<field2>.*?)\s.*?</h3> %six'; However I would like to
improve field2's capture to be the first series of numbers after <h3>
excluding the thousand seperator and stop the capture as soon as a
non numeric is encountered other than the decimal point, I cannot
depend on the dollar sign always being present, so in this case I'd
capture 123456.78

Thanks in advance...
simple one, capture at least 1 number, fo9llowed by numbers, decimal- or
thousand-seperator:
<h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>

advanced, will validate currency format:
<h3>.*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*|0)(?:\.[0-9]{2})?).*?</h3


allow for unexpected html tags/attributes, where we don't want to match the
'10' in a '<span margin="10px">' for instance:
<h3[^>]*>(?:[^<]*?(?:<[^>]*>)?)*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*
|0)(?:\.[0-9]{2})?).*?</h3>

Offcourse, if you're naming your captures 'field1' & 'field2', you might as
well not name them at all.

Grtz,
--
Rik Wasmus
Jun 26 '06 #2

P: n/a

"Rik" <lu************@hotmail.com> wrote in message
news:d3**************************@news1.tudelft.nl ...
McHenry wrote:
<h2>Field1</h2>

<h3>

$123,456.78 - $987,654.32

</h3>

I would like to capture Field1 and the first numeric value only.
I have created the following that works somewhat:
$pattern='%<h2>(?P<field1>.*?)</h2>
.*?

<h3>.*?\$(?P<field2>.*?)\s.*?</h3> %six'; However I would like to
improve field2's capture to be the first series of numbers after <h3>
excluding the thousand seperator and stop the capture as soon as a
non numeric is encountered other than the decimal point, I cannot
depend on the dollar sign always being present, so in this case I'd
capture 123456.78

Thanks in advance...


Rik I started a new thread as I had asked you enough and didn't want to push
your generosity, having said this I am glad you responded, thanks.
simple one, capture at least 1 number, fo9llowed by numbers, decimal- or
thousand-seperator:
<h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>
I'll stick to this one as the ones below are over my head...

Why could we not simply have used as this is what I tried and it didn't work
?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>


advanced, will validate currency format:
<h3>.*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*|0)(?:\.[0-9]{2})?).*?</h3

allow for unexpected html tags/attributes, where we don't want to match
the
'10' in a '<span margin="10px">' for instance:
<h3[^>]*>(?:[^<]*?(?:<[^>]*>)?)*?(?P<field2>(?:[1-9][0-9]{0,2}(?:,[0-9]{3})*
|0)(?:\.[0-9]{2})?).*?</h3>

Offcourse, if you're naming your captures 'field1' & 'field2', you might
as
well not name them at all.


This was simply to help illustrate where the fields were in the regex

Grtz,
--
Rik Wasmus

Jun 26 '06 #3

P: n/a
Rik
McHenry wrote:
Why could we not simply have used as this is what I tried and it
didn't work ?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>


1. It matches a single dot or comma, not desired.
2. It matches 'nothing' (* = 0 or more)

In <h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>, we use [0-9]+ to say: once
you have found at least 1 number, and maybe more, capture all numbers,
comma's and dot's.

Grtz,
--
Rik Wasmus
Jun 26 '06 #4

P: n/a

"Rik" <lu************@hotmail.com> wrote in message
news:ab***************************@news1.tudelft.n l...
McHenry wrote:
Why could we not simply have used as this is what I tried and it
didn't work ?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>


1. It matches a single dot or comma, not desired.
2. It matches 'nothing' (* = 0 or more)

In <h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>, we use [0-9]+ to say: once
you have found at least 1 number, and maybe more, capture all numbers,
comma's and dot's.

Grtz,
--
Rik Wasmus


Thanks Rik, I understand the difference... wow are these things normally
easy to follow or always this heard ?

I have a regex, that performs three captures of three prices, it works when
all three are present and numeric however if one is missing or listed as POA
or similar then all three fail. Can this be overcome or do I need three
seperate preg_match statements ?

Thanks in advance...
Jun 27 '06 #5

P: n/a
Rik
McHenry wrote:
"Rik" <lu************@hotmail.com> wrote in message
news:ab***************************@news1.tudelft.n l...
McHenry wrote:
Why could we not simply have used as this is what I tried and it
didn't work ?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>
1. It matches a single dot or comma, not desired.
2. It matches 'nothing' (* = 0 or more)

In <h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>, we use [0-9]+ to say:
once you have found at least 1 number, and maybe more, capture all
numbers, comma's and dot's.


Thanks Rik, I understand the difference... wow are these things
normally easy to follow or always this heard ?


YOu'll just have to get used to it, the more you use them, the easier they
become. One reason I normally reply to regex questions is to sharpen my
skills :-).
I have a regex, that performs three captures of three prices, it
works when all three are present and numeric however if one is
missing or listed as POA or similar then all three fail. Can this be
overcome or do I need three seperate preg_match statements ?

Thanks in advance...


With listed as 'POA' you mean it literally?
If you're using the simple one:

<h3>.*?(?:(?P<prices>(?:[0-9]+[0-9\.,]*)|POA).*?){1,3}</h3>

Grtz,
--
Rik Wasmus
Jun 27 '06 #6

P: n/a

"Rik" <lu************@hotmail.com> wrote in message
news:d3***************************@news1.tudelft.n l...
McHenry wrote:
"Rik" <lu************@hotmail.com> wrote in message
news:ab***************************@news1.tudelft.n l...
McHenry wrote:
Why could we not simply have used as this is what I tried and it
didn't work ?
<h3>.*?(?P<field2>[0-9\.,]*).*?</h3>

1. It matches a single dot or comma, not desired.
2. It matches 'nothing' (* = 0 or more)

In <h3>.*?(?P<field2>[0-9]+[0-9\.,]*).*?</h3>, we use [0-9]+ to say:
once you have found at least 1 number, and maybe more, capture all
numbers, comma's and dot's.


Thanks Rik, I understand the difference... wow are these things
normally easy to follow or always this heard ?


YOu'll just have to get used to it, the more you use them, the easier they
become. One reason I normally reply to regex questions is to sharpen my
skills :-).
I have a regex, that performs three captures of three prices, it
works when all three are present and numeric however if one is
missing or listed as POA or similar then all three fail. Can this be
overcome or do I need three seperate preg_match statements ?

Thanks in advance...


With listed as 'POA' you mean it literally?
If you're using the simple one:

<h3>.*?(?:(?P<prices>(?:[0-9]+[0-9\.,]*)|POA).*?){1,3}</h3>

Grtz,
--
Rik Wasmus


Rik, thanks as always. POA was simply an example it could be anything.
What I meant was if the one regex performs three caputres in the one
statement and one fails must all three fail or can the other two still
capture ?
Jun 27 '06 #7

P: n/a
Rik
McHenry wrote:
With listed as 'POA' you mean it literally?
If you're using the simple one:

<h3>.*?(?:(?P<prices>(?:[0-9]+[0-9\.,]*)|POA).*?){1,3}</h3>

Rik, thanks as always. POA was simply an example it could be anything.
What I meant was if the one regex performs three caputres in the one
statement and one fails must all three fail or can the other two still
capture ?


<h3>.*?(?:(?P<prices>[0-9]+[0-9\.,]*).*?){0,3}</h3>

Will search for max three prices, will allow none.

<h3>.*?(?:(?P<prices>[0-9]+[0-9\.,]*).*?){1,3}</h3>

Will search for max three prices, will allow at least one.

<h3>.*?(?:(?P<prices>[0-9]+[0-9\.,]*).*?)+?</h3>

Will search for as many prices as there are, will allow at least one.

<h3>.*?(?:(?P<prices>[0-9]+[0-9\.,]*).*?)*?</h3>

Will search for as many prices as there are, will allow none.

Grtz,
--
Rik Wasmus
Jun 27 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.