I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
Thanks,
Mikko Nummelin 5 8785
In article <ba************ *************** *******@x41g200 0hsb.googlegrou ps.com>,
mikko.n <mn******@gmail .comwrote:
>I have recently been experimenting with GNU C library regular expression functions and noticed a problem with pattern matching.
Then you should ask in a GNU newsgroup. Regular expressions are
not part of the C standard, so the proper usage of
any particular regular expression library should be discussed
in the appropriate forum for that library.
--
"They called it golf because all the other four letter words
were taken." -- Walter Hagen
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
The problem is that you misunderstand what a match is.
If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).
For example, try:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\ (.\\)",0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
$ ./a
start=2 end=4
start=3 end=4
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam. invalidwrote:
On 2 Apr 2008 at 6:20, mikko.n wrote:
I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:
mikko.c:
-----
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k", 0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
-----
This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:
$ ./mikko
start=2 end=3
start=-1 end=-1
instead of the expected
start=2 end=3
start=3 end=4
Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with <regex.heithe r.
With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.
The problem is that you misunderstand what a match is.
If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).
For example, try:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\ (.\\)",0);
regexec(&p,"mik ko",2,pm,0);
printf("start=% d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=% d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}
$ ./a
start=2 end=4
start=3 end=4
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Mikko Nummelin
mikko.n wrote, On 02/04/08 09:37:
On 2 huhti, 11:01, Antoninus Twink <nos...@nospam. invalidwrote:
>On 2 Apr 2008 at 6:20, mikko.n wrote:
<snip>
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
As Walter suggested, ask in a GNU group or mailing list where your
question would be topical (there is one specifically for regexp) instead
of comp.lang.c where it is not.
I note that this time you have added a cross post to
comp.unix.progr ammer where your question might be topical, but why
continue posting where it is not?
--
Flash Gordon
On 2 Apr 2008 at 8:37, mikko.n wrote:
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Just use a loop, like this:
#include <stdio.h>
#include <regex.h>
#include <sys/types.h>
int main(void)
{
regex_t p;
regmatch_t pm;
char *s="mikko mikko";
regoff_t last_match=0;
regcomp(&p, "k", 0);
while(regexec(& p, s+last_match, 1, &pm, 0) == 0) {
printf("start=% d end=%d\n", pm.rm_so + last_match, pm.rm_eo + last_match);
last_match += pm.rm_so+1;
}
regfree(&p);
return 0;
} This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make regular expressions
easier to create and use (and in my experience as a regular expression
user, it makes them MUCH easier to create and use.)
I'm still working on formal documentation, and in any case, such
documentation isn't necessarily the...
|
by: Christian Staffe |
last post by:
Hi,
I would like to check for a partial match between an input string and a
regular expression using the Regex class in .NET. By partial match, I mean
that the input string could not yet be complete but I want to know if a
match is possible so far.
For instance I want to design a text box to enter a date and validate the
correctness of the date as the user types character. If the user enters
1953/12/23 it will match my regex of course...
|
by: Ben Dewey |
last post by:
Hey,
I have only been playing with regular expressions for some time. I am
working on some code that parses and object 560 event log. I have created
two expressions the first one which works okay is for the actual csv of each
log. The second one parses out the description of the log. My problem is
with the accesses section of the description.
How do I parse multiple groups that have the same name. When I do a for
each through the...
|
by: Billa |
last post by:
Hi,
I am replaceing a big string using different regular expressions (see
some example at the end of the message). The problem is whenever I
apply a "replace" it makes a new copy of string and I want to avoid
that. My question here is if there is a way to pass either a memory
stream or array of "find", "replace" expressions or any other way to
avoid multiple copies of a string.
Any help will be highly appreciated
|
by: Pete Davis |
last post by:
I'm using regular expressions to extract some data and some links from some
web pages. I download the page and then I want to get a list of certain
links.
For building regular expressions, I use an app call The Regulator, which
makes it pretty easy to build and test regular expressions.
As a warning, I'm real weak with regular expressions. Let's say my regular
expression is:
| |
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?
Thanks,
Mike
|
by: teo |
last post by:
I need to implement a boolean evaluation
in a Regular Expression like this:
(aaa AND bbb) OR (ccc AND ddd)
(see the #3 case)
- - -
1)
If I need to match a single word only,
|
by: Zeba |
last post by:
Hi guys,
I need some help regarding regular expressions. Consider the following
statement :
System.Text.RegularExpressions.Match match =
System.Text.RegularExpressions.Regex.Match(requestPath, "(*?\
\.ashx)");
(where requestPath is a string)
|
by: Thomas Dybdahl Ahle |
last post by:
Hi, I'm writing a program with a large data stream to which modules can
connect using regular expressions.
Now I'd like to not have to test all expressions every time I get a line,
as most of the time, one of them having a match means none of the others
can have so.
But ofcource there are also cases where a regular expression can
"contain" another expression, like in:
"^strange line (\w+) and (\w+)$" and "^strange line (\w+) (?:.*?)$"...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |