In a C# Regex expression which would be faster when run against say 10,000
strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and
it's not clear why one would be faster than the other. 7 2051
Try both tests and then you will know!
"Chuck B" <ch****@shc1.co mwrote in message
news:Ou******** ******@TK2MSFTN GP05.phx.gbl...
In a C# Regex expression which would be faster when run against say 10,000
strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and
it's not clear why one would be faster than the other.
Yes I will - but then that won't tell me much about what went on internally.
I was hoping that someone with knowledge of the Regex engine could help me
understand why one was better than the other.
Thanks for your help. Oh wait... nm... ;)
"Stephany Young" <noone@localhos twrote in message
news:em******** ******@TK2MSFTN GP03.phx.gbl...
Try both tests and then you will know!
"Chuck B" <ch****@shc1.co mwrote in message
news:Ou******** ******@TK2MSFTN GP05.phx.gbl...
>In a C# Regex expression which would be faster when run against say 10,000 strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and it's not clear why one would be faster than the other.
The Regex engine works by looping through the string one character at a
time, applying the rules of the test to each fragment. In some cases, it
may backtrack, depending on the rules. Therefore, the more rules there are
in the regular expression, the longer it will take. The rules in the regular
expression are defined by the characters in it.
--
HTH,
Kevin Spencer
Microsoft MVP
DSI PrintManager, Miradyne Component Libraries: http://www.miradyne.net
"Chuck B" <ch****@shc1.co mwrote in message
news:Ou******** ******@TK2MSFTN GP05.phx.gbl...
In a C# Regex expression which would be faster when run against say 10,000
strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and
it's not clear why one would be faster than the other.
Chuck B wrote:
Yes I will - but then that won't tell me much about what went on internally.
I was hoping that someone with knowledge of the Regex engine could help me
understand why one was better than the other.
Thanks for your help. Oh wait... nm... ;)
Trying it for yourself and seeing the results is a perfectly reasonable
answer.
While it may not allow you to recreate the source code, it certainly
will tell you which way is more efficient. Because your original regexes
were completely unequal, it's hard to say whether you could actually
learn anything about what is going on "internally " since in one instance
you'd get a LOT more hits than the other.
Assuming a somewhat normal dataset where some records have "The quick
brown fox" and more records have "The" in them, I'd suggest that it's
probably more work to return more results than it is simply to find
them, so #1 wins out. On 10,000 lines of "The quick brown fox" where the
returned values would be equal, I'd wager on the second regex being
faster (equal time spent returning values, less time spent searching).
Chris.
"Chris Shepherd" <ch**@nospam.ch sh.cawrote in message
news:%2******** *******@TK2MSFT NGP03.phx.gbl.. .
Chuck B wrote:
>Yes I will - but then that won't tell me much about what went on internally.
I was hoping that someone with knowledge of the Regex engine could help me understand why one was better than the other.
Thanks for your help. Oh wait... nm... ;)
Trying it for yourself and seeing the results is a perfectly reasonable
answer.
I have to disagree here. Trying it for myself will give me numbers but not
understanding which is the untimate goal.
While it may not allow you to recreate the source code, it certainly will
tell you which way is more efficient. Because your original regexes were
completely unequal, it's hard to say whether you could actually learn
anything about what is going on "internally " since in one instance you'd
get a LOT more hits than the other.
Assuming a somewhat normal dataset where some records have "The quick
brown fox" and more records have "The" in them, I'd suggest that it's
probably more work to return more results than it is simply to find them,
so #1 wins out. On 10,000 lines of "The quick brown fox" where the
returned values would be equal, I'd wager on the second regex being faster
(equal time spent returning values, less time spent searching).
The fault here is mine for not explaining adequately what I wanted.
In the case above I'm looking for the special case where there is 1 match
per string for either Regex. For instance; the result of running both
examples above against "09/26/07 The quick brown fox ran away."
The date would probably take just as long for each regex. However, it seems
like it might be more efficient with the static characters to search for a
longer string than a shorter one (assuming that there was no match embedded
inside of another match). The reason for the increase is that the pointer
pointing to the head of the search would move a greater length after a
successful match.
Thanks Kevin.
I ran a test with both 10,000 iterations of the test string - "9/19/09 This
is a test. This is only a test of the quick brown fox. If this had been a
real quick brown fox it would have eaten yer toes." with the Regex
"\d+/\d+/\d+.*real.*" and "\d+/\d+/\d+.*this had been a real.*" and it turns
out that there's about a 3 microsecond difference with the shorter
expression being faster.
I'm guessing that rules that involve escape characters like \d, \w would
take longer to match due to the fact that there are more candidates to sort
thru.
"Kevin Spencer" <un**********@n othinks.comwrot e in message
news:eO******** ******@TK2MSFTN GP03.phx.gbl...
The Regex engine works by looping through the string one character at a
time, applying the rules of the test to each fragment. In some cases, it
may backtrack, depending on the rules. Therefore, the more rules there are
in the regular expression, the longer it will take. The rules in the
regular expression are defined by the characters in it.
--
HTH,
Kevin Spencer
Microsoft MVP
DSI PrintManager, Miradyne Component Libraries: http://www.miradyne.net
"Chuck B" <ch****@shc1.co mwrote in message
news:Ou******** ******@TK2MSFTN GP05.phx.gbl...
>In a C# Regex expression which would be faster when run against say 10,000 strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and it's not clear why one would be faster than the other.
Hi Chuck,
Actually, it's just a simple matter of more rules in the regular expression.
\d+/\d+/\d+ The quick brown fox.*
\d+/\d+/\d+ The.*
Note that each character represents a rule. "quick brown fox" is actually 15
rules, indicating specific characters that must match. So, the Regex engine
must test each successive character in the target string against each of
these characters/rules to ascertain a match before moving on. With the
wildcard (.*), only the newline character (1 character) must be looked for.
--
HTH,
Kevin Spencer
Microsoft MVP
DSI PrintManager, Miradyne Component Libraries: http://www.miradyne.net
"Chuck B" <ch****@shc1.co mwrote in message
news:OE******** ******@TK2MSFTN GP03.phx.gbl...
Thanks Kevin.
I ran a test with both 10,000 iterations of the test string - "9/19/09
This is a test. This is only a test of the quick brown fox. If this had
been a real quick brown fox it would have eaten yer toes." with the Regex
"\d+/\d+/\d+.*real.*" and "\d+/\d+/\d+.*this had been a real.*" and it
turns out that there's about a 3 microsecond difference with the shorter
expression being faster.
I'm guessing that rules that involve escape characters like \d, \w would
take longer to match due to the fact that there are more candidates to
sort thru.
"Kevin Spencer" <un**********@n othinks.comwrot e in message
news:eO******** ******@TK2MSFTN GP03.phx.gbl...
>The Regex engine works by looping through the string one character at a time, applying the rules of the test to each fragment. In some cases, it may backtrack, depending on the rules. Therefore, the more rules there are in the regular expression, the longer it will take. The rules in the regular expression are defined by the characters in it.
-- HTH,
Kevin Spencer Microsoft MVP
DSI PrintManager, Miradyne Component Libraries: http://www.miradyne.net
"Chuck B" <ch****@shc1.co mwrote in message news:Ou******* *******@TK2MSFT NGP05.phx.gbl.. .
>>In a C# Regex expression which would be faster when run against say 10,000 strings:
Regex(@"\d+/\d+/\d+ The quick brown fox.*");
or
Regex(@"\d+/\d+/\d+ The.*");
The reason I'm asking is that I'm not sure how Regex works internally and it's not clear why one would be faster than the other. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Reinhold Birkenfeld |
last post by:
Hello,
I recently ported a simple utility script to analyze a data file from
Perl to Python that uses regex substitutions, not more complex than
re1 = re.compile(r"\s*<.*>\s*")
re2 = re.compile(r".*\((.*)\).*")
re3 = re.compile(r'^"(.*)"$')
When run without these regex substitutions, the scripts' speed is nearly
|
by: Chance Hopkins |
last post by:
I'm trying to match a set of matches after some initial text:
mytext: "something" "somethingelse" "another thing"
"maybe another"
(?:mytext: )(?<mymatch>{1,1}+{1,1}+)+
I only get the last one "maybe another". I want to get all the values with
quotes as a group, hence the + after the ()'s.
Is this possible?
|
by: John Grandy |
last post by:
Is it advisable to compile a Regex for a massively scalable ASP.NET
web-application ?
How exactly does this work ?
Do you create a separate class library and expose the Regex.Replace() as a
method .....
..... or can you just use the syntax :
|
by: .NET Developer |
last post by:
Hello,
I'm trying to write a RegEx that will find all occurances of a particular
type of HTML anchor <a> element in a big block of HTML. Here are the pattern
requirements - they consist of certain attributes being present, basically:
must start with "<a" followed by 1 or more white-space characters
then, before the closing tag ">" it must contain 3 attributes:
1. class
2. href
|
by: rh |
last post by:
hi all,
take the following 2 c# lines:
1) str = Regex.Replace(str, ".*AAA", "");
2) str = Regex.Replace(str, "^.*AAA", "");
notice that the only difference is that the pattern in line 2 has a
starter
marker (^). if str is large and does not contain the pattern, line 1
takes much much longer to complete than line 2 and cpu goes nearly
full throttle.
| |
by: Kay Schluehr |
last post by:
I have a list of strings ls = and want to create a
regular expression sx from it, such that sx.match(s) yields a SRE_Match
object when s starts with an s_i for one i in . There might
be relations between those strings: s_k.startswith(s_1) -> True or
s_k.endswith(s_1) -> True. An extreme case would be ls = . For this reason SRE_Match should provide the longest
possible match.
Is there a Python module able to create an optimized regex rx...
|
by: Extremest |
last post by:
I have a huge regex setup going on. If I don't do each one by itself
instead of all in one it won't work for. Also would like to know if
there is a faster way tried to use string.replace with all the right
parts in there in one big line and for some reason that did not work
either. Here is my regex's.
static Regex rar = new Regex("\\.part.*",
RegexOptions.IgnoreCase);
static Regex par = new Regex("\\.vol.*",
RegexOptions.IgnoreCase);
|
by: morleyc |
last post by:
Hi, i would like to remove a number of characters from my string (\t
\r \n which are throughout the string), i know regex can do this but i
have no idea how. Any pointers much appreciated.
Chris
|
by: Ravikiran |
last post by:
Hi Friends,
I wanted know about whatt is ment by zero optimization and
sign optimization and its differences....
Thank you...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |