473,799 Members | 3,817 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular expressions (multiple match problem)

I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:

mikko.c:
-----

#include <stdio.h>
#include <regex.h>
#include <sys/types.h>

int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}

-----

This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:

$ ./mikko
start=2 end=3
start=-1 end=-1

instead of the expected

start=2 end=3
start=3 end=4

Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.

Thanks,

Mikko Nummelin
Apr 2 '08 #1
5 8785
In article <ba************ *************** *******@x41g200 0hsb.googlegrou ps.com>,
mikko.n <mn******@gmail .comwrote:
>I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching.
Then you should ask in a GNU newsgroup. Regular expressions are
not part of the C standard, so the proper usage of
any particular regular expression library should be discussed
in the appropriate forum for that library.
--
"They called it golf because all the other four letter words
were taken." -- Walter Hagen
Apr 2 '08 #2
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:

mikko.c:
-----

#include <stdio.h>
#include <regex.h>
#include <sys/types.h>

int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}

-----

This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:

$ ./mikko
start=2 end=3
start=-1 end=-1

instead of the expected

start=2 end=3
start=3 end=4

Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
The problem is that you misunderstand what a match is.

If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).

For example, try:

#include <stdio.h>
#include <regex.h>
#include <sys/types.h>

int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\ (.\\)",0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
$ ./a
start=2 end=4
start=3 end=4

Apr 2 '08 #3
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam. invalidwrote:
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.

The problem is that you misunderstand what a match is.

If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).

For example, try:

#include <stdio.h>
#include <regex.h>
#include <sys/types.h>

int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\ (.\\)",0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;

}

$ ./a
start=2 end=4
start=3 end=4
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?

Mikko Nummelin
Apr 2 '08 #4
mikko.n wrote, On 02/04/08 09:37:
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam. invalidwrote:
>On 2 Apr 2008 at 6:20, mikko.n wrote:
<snip>
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
As Walter suggested, ask in a GNU group or mailing list where your
question would be topical (there is one specifically for regexp) instead
of comp.lang.c where it is not.

I note that this time you have added a cross post to
comp.unix.progr ammer where your question might be topical, but why
continue posting where it is not?
--
Flash Gordon
Apr 2 '08 #5
On 2 Apr 2008 at 8:37, mikko.n wrote:
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Just use a loop, like this:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>

int main(void)
{
regex_t p;
regmatch_t pm;
char *s="mikko mikko";
regoff_t last_match=0;
regcomp(&p, "k", 0);
while(regexec(& p, s+last_match, 1, &pm, 0) == 0) {
printf("start=% d end=%d\n", pm.rm_so + last_match, pm.rm_eo + last_match);
last_match += pm.rm_so+1;
}
regfree(&p);
return 0;
}

Apr 2 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4187
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
2
7960
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be complete but I want to know if a match is possible so far. For instance I want to design a text box to enter a date and validate the correctness of the date as the user types character. If the user enters 1953/12/23 it will match my regex of course...
4
2184
by: Ben Dewey | last post by:
Hey, I have only been playing with regular expressions for some time. I am working on some code that parses and object 560 event log. I have created two expressions the first one which works okay is for the actual csv of each log. The second one parses out the description of the log. My problem is with the accesses section of the description. How do I parse multiple groups that have the same name. When I do a for each through the...
7
3831
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way to avoid multiple copies of a string. Any help will be highly appreciated
9
3360
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use an app call The Regulator, which makes it pretty easy to build and test regular expressions. As a warning, I'm real weak with regular expressions. Let's say my regular expression is:
25
5174
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
5
3005
by: teo | last post by:
I need to implement a boolean evaluation in a Regular Expression like this: (aaa AND bbb) OR (ccc AND ddd) (see the #3 case) - - - 1) If I need to match a single word only,
3
2761
by: Zeba | last post by:
Hi guys, I need some help regarding regular expressions. Consider the following statement : System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(requestPath, "(*?\ \.ashx)"); (where requestPath is a string)
10
1882
by: Thomas Dybdahl Ahle | last post by:
Hi, I'm writing a program with a large data stream to which modules can connect using regular expressions. Now I'd like to not have to test all expressions every time I get a line, as most of the time, one of them having a match means none of the others can have so. But ofcource there are also cases where a regular expression can "contain" another expression, like in: "^strange line (\w+) and (\w+)$" and "^strange line (\w+) (?:.*?)$"...
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9543
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10488
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10029
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9077
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7567
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5467
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5588
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3761
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.