By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,743 Members | 1,771 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,743 IT Pros & Developers. It's quick & easy.

regular expression - help

P: n/a
can anyone translate this into plain english?

preg_match_all("/(\w+[,. ?])+/U", $text, $words);
Jul 17 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
kaptain kernel wrote:
can anyone translate this into plain english?

preg_match_all("/(\w+[,. ?])+/U", $text, $words);


http://us2.php.net/manual/en/functio...-match-all.php
-Eric Kincl
Jul 17 '05 #2

P: n/a
kaptain kernel wrote:
can anyone translate this into plain english?

preg_match_all("/(\w+[,. ?])+/U", $text, $words);


/(\w+[,. ?])+/U

\w+ : word character, one or more times

[,. ?] : (looks like) a literal comman, period, space, or question mark

(\w+[,. ?])+ : the above explanations, one or more times

/U : ungreedy (doesn't effect this pattern)

See here for more info:
http://www.comp.leeds.ac.uk/Perl/matching.html
http://www.anaesthetist.com/mnm/perl/regex.htm
http://sitescooper.org/tao_regexps.html

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

Jul 17 '05 #3

P: n/a
Justin Koivisto wrote:
preg_match_all("/(\w+[,. ?])+/U", $text, $words);


/U : ungreedy (doesn't effect this pattern)


I had thought that initially too.

The pattern has one capturing subpattern, meaning -- with the
default flag for preg_match_all (PREG_PATTERN_ORDER), which
applies in this case as no flag was explicitly specified -- that
the $words array contains two further arrays: one array containing
full pattern matches, and another array containing the capturing
subpattern matches.

But because the U modifier (PCRE_UNGREEDY) is set, both arrays
will have exactly the same contents; that is, they'll both contain
values of one or more word characters followed by a single comma,
period, space, or question mark. That seems redundant to me.

I wonder what this pattern is supposed to accomplish.

--
Jock
Jul 17 '05 #4

P: n/a
On Mon, 10 Nov 2003 17:16:47 +0000, kaptain kernel <no****@nospam.gov> wrote:
can anyone translate this into plain english?

preg_match_all("/(\w+[,. ?])+/U", $text, $words);


/U isn't a Perl regex modifier, manual for PCRE says it makes matches
non-greedy by default though.

For the rest, YAPE::Regex::Explain (a useful Perl module) comes up with:
(after appending '?' to all the quantifiers to make them non-greedy)

The regular expression:

(?-imsx:(\w+?[,. ?])+?)

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
\w+? word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
[,. ?] any character of: ',', '.', ' ', '?'
----------------------------------------------------------------------
)+? end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

--
Andy Hassall (an**@andyh.co.uk) icq(5747695) (http://www.andyh.co.uk)
Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)
Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.