473,397 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Regex Matches

Any takers?

Got a string of DNA as an input sequence GGATGGATG, apply the simple
regex "GGATG" as in

Regex r = new Regex("GGATG", (RegexOptions.Compiled));

MatchCollection matches = r.Matches("GGATGGATG");

Now I would expect to get two matches right? One at index 0 in the
string and the second at index 4? Or am I being really dumb or
something (EricGu, where art thou?).

Thanks 4 help.

Kofi.

Nov 24 '05 #1
5 2395
It would match two times if you put an extra G at index 4 in the matches
string:
GGATGGGATG

Gabriel Lozano-Morán
Nov 24 '05 #2

Gabriel Lozano-Morán wrote:
It would match two times if you put an extra G at index 4 in the matches
string:
GGATGGGATG

Gabriel Lozano-Morán


Well, yes, but I think that what the OP wanted to know is why Regex
doesn't re-scan after a match. That is, in the string GGATGGATG, the
Regex will match the initial string: GGATG. After that, where does the
Regex processor look to start matching next? Does it start with the
part of the string after the first matched character, so does it begin
matching the substring GATGGATG, in which case it would find a second
match in the fifth character of the original string (the fourth
character of the substring)? Or does it start looking for another match
after the last character matched in the first match, therefore matching
against GATG, which will result in no second match?

Regex appears to display the latter behaviour, according to the OP.

I checked the RegexOptions enumeration, and don't see any flag for
Rescan. I have seen this option for other Regex pattern matchers, but
it doesn't appear to be in the .NET one.

One thing the OP could do is use Match instead of Matches:

string dna = "GGATGGATG";
int matchIndex = 0;
Regex r = new Regex("GGATG");
Match sequence = r.Match(dna, matchIndex);
while (sequence != Match.Empty)
{
matchIndex = sequence.Index;
Console.WriteLine("Sequence matched at index {0}", matchIndex);
matchIndex++;
sequence = r.Match(dna, matchIndex);
}

Or something like that. Then he could determine where Regex should
start searching again after it finds a match.

Nov 25 '05 #3
Barry,

Thanks for your helpful reply - spot on.

Kofi.

Nov 25 '05 #4
That is logical that you get only one result.
If you want to get all indexes that have matched, you can use this trick :
use GGAT(?=G) instead of GGATG
So you will match all GGAT sequence that is followed by G. You will not get
of course GGATG in a match result but you don't mind because you know you
are looking for GGATG.
So with
Regex r = new Regex("GGAT(?=G)", (RegexOptions.Compiled));
MatchCollection matches = r.Matches("GGATGGATG");

you will get 2 matches, the first at position 0 and the second at position 4

Hope it helps,

Ludovic SOEUR.
"Kofi" <ko**@nimoh.com> a écrit dans le message de
news:11**********************@o13g2000cwo.googlegr oups.com...
Barry,

Thanks for your helpful reply - spot on.

Kofi.

Nov 25 '05 #5

Bruce Wood wrote:
string dna = "GGATGGATG";
int matchIndex = 0;
Regex r = new Regex("GGATG");
Match sequence = r.Match(dna, matchIndex);
while (sequence != Match.Empty)
{
matchIndex = sequence.Index;
Console.WriteLine("Sequence matched at index {0}", matchIndex);
matchIndex++;
sequence = r.Match(dna, matchIndex);
}


I should point out that there's a bug in my code. The loop test should
read:

while (sequence != Match.Empty && matchIndex < dna.Length) ...

The bug will show up only when matching a one-character Regex pattern
that matches on the last character of the string.

Nov 25 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
4
by: Gawelek | last post by:
Lat say, we have such a string : "Ala ma kota" Is is possible to express using Regular Expresion, that I want to get word "kot", that lies behind word "ma" ? BUT, it is the most important thing,...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
2
by: D | last post by:
My first attempt at this and I'm searching formulas like so RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan','-',TEXT(A15,'yy'))+1 I want to extract the row / col coordinates (A15 in...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
8
by: vbmark | last post by:
I'm new to RegEx in vb.net so I'm not sure how to do this. I want to know if a string contains two minus signs "-". If there are two then I want it to return TRUE. I also need to know if the...
5
by: Chris | last post by:
How Do I use the following auto-generated code from The Regulator? '------------------------------------------------------------------------------ ' <autogenerated> ' This code was generated...
17
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
2
by: O.B. | last post by:
In the following example, the Matches operation never returns 4 matches as I am expecting. What's wrong with my syntax? private const string DOUBLE_REGEX = @"?*?*"; private const string...
1
by: al.moorthi | last post by:
the below program is working in Suse and not working on Cent 5: can any body have the solution ? #include <regex.h> #include <stdlib.h> #include <stdio.h> int main(){ char cool =...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.