473,396 Members | 2,011 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Complex Regular Expression

Hello,

I'm having a bit of trouble creating my regular expression and need a guru's
help!

Here's what I have...I have a sequence of characters that need to be
validated against the database.

string: ACCCGUCAU[5Br]IAACCU

What I'm trying to do is load the available values from the database and
create my regex pattern from that. Right now I'm basically just using the "|"
operator which gets a lot of it but it still needs more. I'm also escaping
the "[" and "]" characters during generation.

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

My problem is I think I'm escaping things improperly or something because if
I use this whole pattern I'm able to locate all of my "A,C,G,U,I" characters.
However, if I trim off those characters from my regex and start at U\[5Br]...
I can then locate the U[5Br] in my string. This is why I think I've screwed
something up.

What I would really like for this to do is not show me what matches but what
doesn't match.

string: ACCCGUCAU[5Bxxx]IAACCU

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

From this I'd hope to see "U[5Bxxx]" since it's not in the database.

Any ideas?

Thanks in advance.

Nov 17 '05 #1
1 1523
"ENIZIN" <EN****@discussions.microsoft.com> wrote in
news:A8**********************************@microsof t.com...
Hello,

I'm having a bit of trouble creating my regular expression and need a
guru's
help!

Here's what I have...I have a sequence of characters that need to be
validated against the database.

string: ACCCGUCAU[5Br]IAACCU

What I'm trying to do is load the available values from the database and
create my regex pattern from that. Right now I'm basically just using the
"|"
operator which gets a lot of it but it still needs more. I'm also escaping
the "[" and "]" characters during generation.

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

My problem is I think I'm escaping things improperly or something because
if
I use this whole pattern I'm able to locate all of my "A,C,G,U,I"
characters.
You can use Regex.Escape to escape a string, however, I don't think that's
your problem.
However, if I trim off those characters from my regex and start at
U\[5Br]...
I can then locate the U[5Br] in my string. This is why I think I've
screwed
something up.
There's a 'U' in your alternation before the 'U\[5Br\]' part: This will
match the "U" in the input string. The following "[Br]" part can't be
matched anymore, so the match ends. The regex engine has no reason to do
backtracking, so it simply returns this match (although it's not the longest
possible). You can either give it a reason to backtrack like this:

(A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I)*$

This will backtrack after the failed attempt to match, and find the correct
match (if there is one)

Another way to get a full match is to modify the original alternation
sequence. If you put the 'U' part after the 'U\[...' parts in the
alternation, those will be tried first, resulting in a good match, too. If I
got you right, you build the pattern programatically anyway, so it seems
possible to me to eliminate this kind of situation (multiple alternation
members starting with the same substring); You should be able to build an
"alternation tree" from your input patterns, recursively combining the ones
starting with a common substring:

A|C(\[5F\]|)|G|U(\[5(Br\]|F\]|I\])|)|5-M-C|2'-N-C|I

I think this should always work, as it does more or less the same thing I'd
do if I had to do it without regex's.
What I would really like for this to do is not show me what matches but
what
doesn't match.

string: ACCCGUCAU[5Bxxx]IAACCU

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

From this I'd hope to see "U[5Bxxx]" since it's not in the database.


But "U" is in the database, so why wouldn't the output be "[5Bxxx]"? If
there actually is a way to find out these characters belong together
(although they're not in the DB), that could make your task a lot easier.

Anyway, assuming you have a pattern that recognizes all correct input
sequences, and assuming you want to the lowest number of "mismatch
characters" (which would be [Bxx] in your example), this should be possible.
I didn't test this too much, but it seems to work:

((?>(A|C(\[5F\]|)|G|U(\[5(Br\]|F\]|I\])|)|5-M-C|2'-N-C|I)*)(?<mismatch>.*?))*$

But I don't know how fast it is if input strings get longer. You can get the
"mismatch characters" from the "mismatch"-group's captures list.

Hope this helps,

Niki
Nov 17 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: M Wells | last post by:
Hi All, I couldn't find a regular expressions group to ask this in, so I thought I'd ask here as I'm a little familiar with php's regular expressions syntax. I have a comma delimited text...
4
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
11
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...
3
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
9
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
1
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "":...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.