By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,890 Members | 1,048 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,890 IT Pros & Developers. It's quick & easy.

regex question

P: n/a
Hi folks,
I have to do the following:

match everything between "start match after this text:" and "</td>".
My problem is that there are other html-tags between, so [^<] doesn't work.
How can do something like [^<\/td>] (yes, I know this means not < or /
or ...), but do it right?

Many thanks in advance,
yours Henri

--
| Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES
| * * Datendesign für Internet und Intranet
| * * * * http://www.byteconcepts.de
| * * * * http://www.virtual-homes.de
Jul 17 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Henri Schomaecker wrote:
I have to do the following:

match everything between "start match after this text:" and "</td>".
My problem is that there are other html-tags between, so [^<] doesn't work. How can do something like [^<\/td>] (yes, I know this means not < or / or ...), but do it right?


What's wrong with preg_match('/STARTTEXT(.*)<\/td>/', $text, $array)?
Where STARTTEXT is the start match.

Maybe I'm mimsunderstanding your requirement, in which case you would
need to post some explicit examples of what you want.

--
Oli

Jul 17 '05 #2

P: n/a
'/STARTTEXT(.*)<\/td>/' will continue matching until the last
occurance of </TD>

use this regex:
'|STARTTEXT([^<>]*)</td>|'

you shouldn't have any < or >s (other than those that make up tags)
right. :)

Jul 17 '05 #3

P: n/a

BKDotCom wrote:
'/STARTTEXT(.*)<\/td>/' will continue matching until the last
occurance of </TD>

In that case, /STARTTEXT(.*?)<\/td>/ instead.
use this regex:
'|STARTTEXT([^<>]*)</td>|'

you shouldn't have any < or >s (other than those that make up tags)
right. :)


The OP stated that there *will* be other HTML tags in between, so this
regex won't work.

--
Oli

Jul 17 '05 #4

P: n/a
Thanks to all of you!

I solved it. It was a greedy problem.
I just don't understand why in PHP .* catches far over the (...) when I
don't set the N (non-greedy) Option. - In my Opinion it should at least
stop matching, when the match-making ) is reached. - But it doesn't!
In perl, this is no problem, I tried a few one-liners with the g option
(perl's greedy option) with my example now.
PHP seems to match, and match ..., and does not stop with matching until the
end of the subject string is found.

I recently wrote a (unfortunately at the moment closed source) c++ API for
libpcre. Because the PHP API seems to be kind of copied from pcre, I think
I'll have to make some tests, if this behaviour is also present in he pcre
API, this will really be a problem for me.

Question: Is it correct PHP pcre behaviour to match all over the
match-delimiter ) ?

Many thanks for every answer,
yours Henri

--
| Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES
| * * Datendesign für Internet und Intranet
| * * * * http://www.byteconcepts.de
| * * * * http://www.virtual-homes.de
Jul 17 '05 #5

P: n/a
*** BKDotCom wrote/escribió (11 May 2005 14:57:11 -0700):
'/STARTTEXT(.*)<\/td>/' will continue matching until the last
occurance of </TD>


Unless you turn greediness off:

'/STARTTEXT(.*)<\/td>/U'

or just

'#STARTTEXT(.*)</td>#U'
--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--
Jul 17 '05 #6

P: n/a
Henri Schomaecker <hs@byteconcepts.de> wrote:

I solved it. It was a greedy problem.
I just don't understand why in PHP .* catches far over the (...) when I
don't set the N (non-greedy) Option. - In my Opinion it should at least
stop matching, when the match-making ) is reached. - But it doesn't!
That's your opinion, because it conveniently suits your current
requirement. Regular expressions have been greedy right from the start.
In perl, this is no problem, I tried a few one-liners with the g option
(perl's greedy option) with my example now.
Perl is greedy by default (as are all regular expression matchers).
Perhaps you should post your test so we can figure out what you really did.
PHP seems to match, and match ..., and does not stop with matching until the
end of the subject string is found.


Please post your exact tests. I want to make sure we can explain this to
everyone.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jul 17 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.