I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heither.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
Thanks,
Mikko Nummelin 5 8724
In article <ba**********************************@x41g2000hsb. googlegroups.com>,
mikko.n <mn******@gmail.comwrote:
>I have recently been experimenting with GNU C library regular expression functions and noticed a problem with pattern matching.
Then you should ask in a GNU newsgroup. Regular expressions are
not part of the C standard, so the proper usage of
any particular regular expression library should be discussed
in the appropriate forum for that library.
--
"They called it golf because all the other four letter words
were taken." -- Walter Hagen
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heither.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
The problem is that you misunderstand what a match is.
If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).
For example, try:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\(.\\)",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
$ ./a
start=2 end=4
start=3 end=4
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam.invalidwrote:
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heither.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
The problem is that you misunderstand what a match is.
If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).
For example, try:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\(.\\)",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
$ ./a
start=2 end=4
start=3 end=4
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Mikko Nummelin
mikko.n wrote, On 02/04/08 09:37:
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam.invalidwrote:
>On 2 Apr 2008 at 6:20, mikko.n wrote:
<snip>
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
As Walter suggested, ask in a GNU group or mailing list where your
question would be topical (there is one specifically for regexp) instead
of comp.lang.c where it is not.
I note that this time you have added a cross post to
comp.unix.programmer where your question might be topical, but why
continue posting where it is not?
--
Flash Gordon
On 2 Apr 2008 at 8:37, mikko.n wrote:
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Just use a loop, like this:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm;
char *s="mikko mikko";
regoff_t last_match=0;
regcomp(&p, "k", 0);
while(regexec(&p, s+last_match, 1, &pm, 0) == 0) {
printf("start=%d end=%d\n", pm.rm_so + last_match, pm.rm_eo + last_match);
last_match += pm.rm_so+1;
}
regfree(&p);
return 0;
} This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make...
|
by: Christian Staffe |
last post by:
Hi,
I would like to check for a partial match between an input string and a
regular expression using the Regex class in .NET. By partial match, I mean
that the input string could not yet be...
|
by: Ben Dewey |
last post by:
Hey,
I have only been playing with regular expressions for some time. I am
working on some code that parses and object 560 event log. I have created
two expressions the first one which works...
|
by: Billa |
last post by:
Hi,
I am replaceing a big string using different regular expressions (see
some example at the end of the message). The problem is whenever I
apply a "replace" it makes a new copy of string and I...
|
by: Pete Davis |
last post by:
I'm using regular expressions to extract some data and some links from some
web pages. I download the page and then I want to get a list of certain
links.
For building regular expressions, I use...
|
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
|
by: teo |
last post by:
I need to implement a boolean evaluation
in a Regular Expression like this:
(aaa AND bbb) OR (ccc AND ddd)
(see the #3 case)
- - -
1)
If I need to match a single word only,
|
by: Zeba |
last post by:
Hi guys,
I need some help regarding regular expressions. Consider the following
statement :
System.Text.RegularExpressions.Match match =...
|
by: Thomas Dybdahl Ahle |
last post by:
Hi, I'm writing a program with a large data stream to which modules can
connect using regular expressions.
Now I'd like to not have to test all expressions every time I get a line,
as most of...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |