By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,529 Members | 1,860 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,529 IT Pros & Developers. It's quick & easy.

Regex - Memory performance

P: n/a
Hi

I have an .Net application which processes thousands of Xml nodes each
day and for each node I am using around 30-40 Regex matches to see if
they satisfy some conditions are not. These Regex matches are called
within a loop (like if or for). E.g.

for(int i = 0; i < 10; i++)
{
Regex r = new Regex();
r.Match(..., ...);
}

I assumed that these Regex objects should be deleted by GC. Process
memory keeps on increasing in TaskManager so I decided to check the
performance using .Net Memory profiler. What it showed me is that,
after few iterations, there are 90,000 odd Regex objects in memory. And
there are many GC calls (as shown by the .Net profiler). I am not using
Compiled Regex. Is there any reason why these are not deleted. However,
I overcame this problem using Static variables but just want to get to
the bottom of these so that I might have a better understanding.

Thanks
Jeevan

Nov 17 '05 #1
Share this Question
Share on Google+
20 Replies


P: n/a
> Hi

I have an .Net application which processes thousands of Xml nodes each
day and for each node I am using around 30-40 Regex matches to see if
they satisfy some conditions are not. These Regex matches are called
within a loop (like if or for). E.g.

for(int i = 0; i < 10; i++)
{
Regex r = new Regex();
r.Match(..., ...);
}

I assumed that these Regex objects should be deleted by GC. Process
memory keeps on increasing in TaskManager so I decided to check the
performance using .Net Memory profiler. What it showed me is that,
after few iterations, there are 90,000 odd Regex objects in memory. And
there are many GC calls (as shown by the .Net profiler). I am not using
Compiled Regex. Is there any reason why these are not deleted. However,
I overcame this problem using Static variables but just want to get to
the bottom of these so that I might have a better understanding.

Thanks
Jeevan


Sometimes the garbage collector isn't very timely in collecting,
particularly if the CPU is constantly busy. You might try periodically
calling GC.Collect() to force the collection. I've found this to be helpful
in similar situations where the failure of the garbage collector to collect
ended up causing excessive paging.

Pete
Nov 17 '05 #2

P: n/a

<je**********@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi

I have an .Net application which processes thousands of Xml nodes each
day and for each node I am using around 30-40 Regex matches to see if
they satisfy some conditions are not. These Regex matches are called
within a loop (like if or for). E.g.

for(int i = 0; i < 10; i++)
{
Regex r = new Regex();
r.Match(..., ...);
}

I assumed that these Regex objects should be deleted by GC. Process
memory keeps on increasing in TaskManager so I decided to check the
performance using .Net Memory profiler. What it showed me is that,
after few iterations, there are 90,000 odd Regex objects in memory. And
there are many GC calls (as shown by the .Net profiler). I am not using
Compiled Regex. Is there any reason why these are not deleted. However,
I overcame this problem using Static variables but just want to get to
the bottom of these so that I might have a better understanding.

Thanks
Jeevan


Do you mean that despite the many GC calls there are 90000 objects left
after a collection, are you sure they are RegEx objects instances?

Could you post a sample that illustrates the issue?

Willy.

Nov 17 '05 #3

P: n/a
Yes, this is despite lot of GC calls. Ofcourse my program is not making
the GC calls. .Net Memory Profiler which I was using is making those GC
calls (and I can see that because memory is dropping, raising and so
on). I don't have the sample as of right now because, I changed my
program to use Static variables so that Regex objects are not there in
memory and here is how it looks now

Regex 470 26320

First number is the number of Live instance and the second is the Live
size (in bytes). The number used to be 90,000 before I made the change.

Nov 17 '05 #4

P: n/a
Hold on, "memory profiler making GC calls". this is not true, the memory
profiler does not call GC.Collect, it's an other application runs in another
process and isn't even managed code.
The GC calls you see are "induced" calls, that is, they are triggered by the
CLR because some thresholds are reached on the generational heap.
Also I don't see why static variable would reduce the memory consumption,
the object the variable is pointing to is allocated in the GC heap and each
new Regex() creates a new object and returns the refence to the variable, be
it a static or a local variable, the number of objects doesn't change with
that.
Again, post your code or a sample that illustrates the issue.

Willy.
<je**********@gmail.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Yes, this is despite lot of GC calls. Ofcourse my program is not making
the GC calls. .Net Memory Profiler which I was using is making those GC
calls (and I can see that because memory is dropping, raising and so
on). I don't have the sample as of right now because, I changed my
program to use Static variables so that Regex objects are not there in
memory and here is how it looks now

Regex 470 26320

First number is the number of Live instance and the second is the Live
size (in bytes). The number used to be 90,000 before I made the change.

Nov 17 '05 #5

P: n/a
That was my mistake, they were CLR calls. Here is the part of the code
which I had before I optimized it:

for(int i = 0; i < length; i++)
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
}

The above code runs some thousands of times (this is a long running
process).
When I saw that there are 90,000 instances of Regex, I modified the
code as:

for(int i = 0; i < length; i++)
{
object o = Helper.AvailableRegex[pattern];
if (o == null)
{
o = new Regex(pattern, RegexOptions.IgnoreCase);
Helper.AvailableRegex[pattern] = o;
}
((Regex)o).Match(..);
}

When I said, I optimized it, AvailableRegex is a static Hashtable whose
scope is at the process level. Its created only once and if a Regex is
already there, it is used.

Willy Denoyette [MVP] wrote:
Hold on, "memory profiler making GC calls". this is not true, the memory
profiler does not call GC.Collect, it's an other application runs in another
process and isn't even managed code.
The GC calls you see are "induced" calls, that is, they are triggered by the
CLR because some thresholds are reached on the generational heap.
Also I don't see why static variable would reduce the memory consumption,
the object the variable is pointing to is allocated in the GC heap and each
new Regex() creates a new object and returns the refence to the variable, be
it a static or a local variable, the number of objects doesn't change with
that.
Again, post your code or a sample that illustrates the issue.

Willy.
<je**********@gmail.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Yes, this is despite lot of GC calls. Ofcourse my program is not making
the GC calls. .Net Memory Profiler which I was using is making those GC
calls (and I can see that because memory is dropping, raising and so
on). I don't have the sample as of right now because, I changed my
program to use Static variables so that Regex objects are not there in
memory and here is how it looks now

Regex 470 26320

First number is the number of Live instance and the second is the Live
size (in bytes). The number used to be 90,000 before I made the change.


Nov 17 '05 #6

P: n/a
Why don't you use Static methods of Regex class ?

Instead of :
for(int i = 0; i < 10; i++) {
Regex r = new Regex(patternString, RegexOptions.IgnoreCase);
r.Match(inputString);
}

Use :
for(int i = 0; i < 10; i++) {

Regex.Match(inputString, patternString, RegexOptions.IgnoreCase);
}

Hope it helps,
Ludovic Soeur.

<je**********@gmail.com> a écrit dans le message de
news:11*********************@g49g2000cwa.googlegro ups.com...
That was my mistake, they were CLR calls. Here is the part of the code
which I had before I optimized it:

for(int i = 0; i < length; i++)
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
}

The above code runs some thousands of times (this is a long running
process).
When I saw that there are 90,000 instances of Regex, I modified the
code as:

for(int i = 0; i < length; i++)
{
object o = Helper.AvailableRegex[pattern];
if (o == null)
{
o = new Regex(pattern, RegexOptions.IgnoreCase);
Helper.AvailableRegex[pattern] = o;
}
((Regex)o).Match(..);
}

When I said, I optimized it, AvailableRegex is a static Hashtable whose
scope is at the process level. Its created only once and if a Regex is
already there, it is used.

Willy Denoyette [MVP] wrote:
Hold on, "memory profiler making GC calls". this is not true, the memory
profiler does not call GC.Collect, it's an other application runs in another process and isn't even managed code.
The GC calls you see are "induced" calls, that is, they are triggered by the CLR because some thresholds are reached on the generational heap.
Also I don't see why static variable would reduce the memory consumption, the object the variable is pointing to is allocated in the GC heap and each new Regex() creates a new object and returns the refence to the variable, be it a static or a local variable, the number of objects doesn't change with that.
Again, post your code or a sample that illustrates the issue.

Willy.
<je**********@gmail.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Yes, this is despite lot of GC calls. Ofcourse my program is not making the GC calls. .Net Memory Profiler which I was using is making those GC calls (and I can see that because memory is dropping, raising and so
on). I don't have the sample as of right now because, I changed my
program to use Static variables so that Regex objects are not there in
memory and here is how it looks now

Regex 470 26320

First number is the number of Live instance and the second is the Live
size (in bytes). The number used to be 90,000 before I made the change.

Nov 17 '05 #7

P: n/a
> Why don't you use Static methods of Regex class ?

They would create new Regex instances, which is precisely what the OP
is trying to avoid.

Personally, rather than keep a hashtable, I'd just keep specific
regular expressions with appropriate names. Create them up-front, and
then you don't need to litter your code with nullity checks etc.

Jon

Nov 17 '05 #8

P: n/a
Hi,

Is it necessary for every node to pass thru all of 30-40 regex matches
?

i.e if I assume correctly, it might be safe to exit once it matches any
pattern
Did I interpret the scenario correctly ?

If not, can you elaborate the scenario so that extra code execution
statements can be trimmed out.

Does this help ?

Kalpesh

Nov 17 '05 #9

P: n/a
As Jon said, even Regex.Match() (though, its static) creates a new
Regex object (inside the Match function). I found this out using the
Mem profiler. Not every node needs to pass thru the 30-40 of them, but
in some cases they might have to and this is a long runnign process.
So, over the time, memory is increasing.

Whatever is the solution , Hashtable (the reason I am using Hashtable
is that some of the Regex are dynamic, I don't know what their pattern
is before hand, but are very few in number) or having objects with
specific names, this problem is solved for now.

I am really interested (if there is one reasoning) to know why this is
happening. In the sense, why these objects are not being removed from
memory. If this is a random thing which is out of our hands (and so in
the hands of people at Microsoft) then that answers everything.

Nov 17 '05 #10

P: n/a

<je**********@gmail.com> wrote in message
news:11*********************@o13g2000cwo.googlegro ups.com...
As Jon said, even Regex.Match() (though, its static) creates a new
Regex object (inside the Match function). I found this out using the
Mem profiler. Not every node needs to pass thru the 30-40 of them, but
in some cases they might have to and this is a long runnign process.
So, over the time, memory is increasing.

Whatever is the solution , Hashtable (the reason I am using Hashtable
is that some of the Regex are dynamic, I don't know what their pattern
is before hand, but are very few in number) or having objects with
specific names, this problem is solved for now.

I am really interested (if there is one reasoning) to know why this is
happening. In the sense, why these objects are not being removed from
memory. If this is a random thing which is out of our hands (and so in
the hands of people at Microsoft) then that answers everything.


Please, post a complete sample that illustrates the issue. The code snips
you posted until now did not show the Hastable stuff. Anyway, if you keep
references to your RegEx objects they won't get collected, if you release
the referenes or if the go out of scope they will get collected at some
point in time.
You also said "memory is increasing", now questions are what counter are you
looking at? by what amount is it increasing, is it only increasing whithout
ever decreasing?
Did you ever watched the CLR Gen0, Gen1 and Gen2 performance counters (run
perfmon)? If you didn't you should start by this before you ever run the
memory profiler.
Willy.
Nov 17 '05 #11

P: n/a
Hi Willy,

Please check one my 3rd post, it has both the code snippets (posting it
again for your reference). It is clear from the snippet (in the old
implementation) that the Regex object is local to a for-loop, so it
should have been cleared by GC. The second one shows the Hashtable one.
But forget about the Hashtable, it solved my problem. I was interested
in knowing why they were not collected in teh first place (in the old
code).

What do you mean by watching Gen0, Gen1 and Gen2. Memory profiler shows
when each of that is called. Can you explain what you meant by that.

Old One:
for(int i = 0; i < length; i++)
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
}

New one:

for(int i = 0; i < length; i++)
{
object o = Helper.AvailableRegex[pattern];
if (o == null)
{
o = new Regex(pattern, RegexOptions.IgnoreCase);
Helper.AvailableRegex[pattern] = o;
}
((Regex)o).Match(..);
}

Nov 17 '05 #12

P: n/a
What I'm looking for is a complete sample that illustrates the issue, the 2
lines of code you posted is not a complete sample and does not have the
issue you describe, that is the RegEx instances are getting collected.
The Gen0,1 and 2 counters are performance counters maintained by the CLR,
you can watch them when starting perfmon.exe.
Select the performance object ".NET CLR memory" and watch the Gen0, 1 and 2
heap size and their respective collection counters.
For instance when you run next code you will see that the Gen0, 1 and 2 heap
sizes are going up until the GC kicks in to collect the garbage for that
specific generation, after which the memory drops and starts climbing again.
You might be running this for day's, you will never get a OM exception.

using System;
using System.Text.RegularExpressions;
class Tester
{
static void Main()
{
string[] tests = {"-42", "19.99", "0.001", "100 USD"};
for (int i = 0; i < 10000000 ; i++ )
{
int count = 0;
Regex rx = new Regex(@"^-?\d+(\.\d{2})?$", RegexOptions.IgnoreCase);
foreach (string test in tests)
{
if (rx.IsMatch(test))
count++;
}
}
}
}

Willy.
<je**********@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
Hi Willy,

Please check one my 3rd post, it has both the code snippets (posting it
again for your reference). It is clear from the snippet (in the old
implementation) that the Regex object is local to a for-loop, so it
should have been cleared by GC. The second one shows the Hashtable one.
But forget about the Hashtable, it solved my problem. I was interested
in knowing why they were not collected in teh first place (in the old
code).

What do you mean by watching Gen0, Gen1 and Gen2. Memory profiler shows
when each of that is called. Can you explain what you meant by that.

Old One:
for(int i = 0; i < length; i++)
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
}

New one:

for(int i = 0; i < length; i++)
{
object o = Helper.AvailableRegex[pattern];
if (o == null)
{
o = new Regex(pattern, RegexOptions.IgnoreCase);
Helper.AvailableRegex[pattern] = o;
}
((Regex)o).Match(..);
}

Nov 17 '05 #13

P: n/a
my two lines of code is similar to the actual situation of the program
I am running. And it is similar to what you have written (as an example
code). The .Net Memory profiler which I am using shows when GC0, GC1
and GC2 kicks off and shows memory going down etc. But the memory used
by Regex objects is not going down. Thats what I was trying to say. Its
like I am runnign the example code written by you and Regex objects are
not being collected.

Nov 17 '05 #14

P: n/a

<je**********@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
my two lines of code is similar to the actual situation of the program
I am running. And it is similar to what you have written (as an example
code). The .Net Memory profiler which I am using shows when GC0, GC1
and GC2 kicks off and shows memory going down etc. But the memory used
by Regex objects is not going down. Thats what I was trying to say. Its
like I am runnign the example code written by you and Regex objects are
not being collected.


Did you actually run the code I showed? Did you check the GC counters when
it runs?
If the RegEx objects aren't collected you will see memory usage going up
without ever going down, resulting in an OM exception, but it's not how the
program behaves. Again forget about the memory profiler, run the program and
watch the performance counters.
Willy.
Nov 17 '05 #15

P: n/a
Willy

One other possibility is that the RegEx objects are reaching Gen2 before
being dereferenced, so they won't be picked up until it does a Gen2 collect.
It won't force a Gen2 collect until it hits the memory high watermark (32mb
free ISTR) as it's cheaper to just keep on allocating memory until something
else needs it.

It's confusing behaviour the first time you see it; process memory in
TaskManager just keeps getting higher and higher for no apparent reason.

Regards

Paul

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:ug****************@TK2MSFTNGP09.phx.gbl...

<je**********@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
my two lines of code is similar to the actual situation of the program
I am running. And it is similar to what you have written (as an example
code). The .Net Memory profiler which I am using shows when GC0, GC1
and GC2 kicks off and shows memory going down etc. But the memory used
by Regex objects is not going down. Thats what I was trying to say. Its
like I am runnign the example code written by you and Regex objects are
not being collected.


Did you actually run the code I showed? Did you check the GC counters when
it runs?
If the RegEx objects aren't collected you will see memory usage going up
without ever going down, resulting in an OM exception, but it's not how
the program behaves. Again forget about the memory profiler, run the
program and watch the performance counters.
Willy.

Nov 17 '05 #16

P: n/a
To be sure there were no memory leak, I wrote 2 simple programs. The first
one keep a reference on the regex to be sure there is a memory leak. The
second one is only a loop that should not have a memory leak.

I profiled the results with .Net Memory Profiler. For each program, I took 2
snapshots when you must press a key (on the line System.Console.Read()) and
asked memory profiler to show differences. The results were logical : in the
first case, there is a leak, and in the second case, there is no leak.
------------------------------------------------
First code that have a memory leak :

using System;
using System.Collections;
using System.Text.RegularExpressions;

class Class1 {
static void Main() {
Hashtable h=new Hashtable();
for(int j=0;j<3;j++) {
System.Console.WriteLine("Start");
for(int i=0;i<1000;i++) {
Regex regex=new Regex("^myPattern "+i+"$",RegexOptions.IgnoreCase);
Match match=regex.Match("myInput "+i);
Console.WriteLine("Regex = "+regex.ToString()+"\tMatch =
"+match.Success);
h.Add(j*10000+i,regex);
}
System.Console.WriteLine("End");
System.Console.Read();
}
}
}

Results :
System.Text.RegularExpression.Regex : Delta = 2000
System.String : Delta = 2000
System.Int32 : Delta = 2000
---------------------------------------------------------------
Second code that have no memory leak :

using System;
using System.Text.RegularExpressions;

class Class1 {
static void Main() {
for(int j=0;j<3;j++) {
System.Console.WriteLine("Start");
for(int i=0;i<1000;i++) {
Regex regex=new Regex("^myPattern "+i+"$",RegexOptions.IgnoreCase);
Match match=regex.Match("myInput "+i);
Console.WriteLine("Regex = "+regex.ToString()+"\tMatch =
"+match.Success);
}
System.Console.WriteLine("End");
System.Console.Read();
}
}
}

Results :
System.Text.RegularExpression.Regex : Delta = 0
System.String : Delta = 0
System.Int32 : Delta = 0

-----------------------------------------------------------

The loop is as simple as the one you told you had :
for(int i = 0; i < length; i++) {
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
}

My conclusion is that there is no memory leak with Regex with your loop. You
may have a reference somewhere.
Again, put you ENTIRE source code. It's impossible to help you without ALL
your source. Maybe you think there is no reference to your regex but we can
find out if there is one. How do you initialise your regex, how do you do
your match, .....

Try my two examples to see if there is a leak on your computer. If there is
a leak for the second example it is maybe because you don't use properly
memory profiler....

Hope it helps,

Ludovic SOEUR.
<je**********@gmail.com> a écrit dans le message de
news:11**********************@f14g2000cwb.googlegr oups.com...
my two lines of code is similar to the actual situation of the program
I am running. And it is similar to what you have written (as an example
code). The .Net Memory profiler which I am using shows when GC0, GC1
and GC2 kicks off and shows memory going down etc. But the memory used
by Regex objects is not going down. Thats what I was trying to say. Its
like I am runnign the example code written by you and Regex objects are
not being collected.

Nov 17 '05 #17

P: n/a

"Paul Hatcher" <ph******@nospam.cix.co.uk> wrote in message
news:ed**************@TK2MSFTNGP09.phx.gbl...
Willy

One other possibility is that the RegEx objects are reaching Gen2 before
being dereferenced, so they won't be picked up until it does a Gen2
collect. It won't force a Gen2 collect until it hits the memory high
watermark (32mb free ISTR) as it's cheaper to just keep on allocating
memory until something else needs it.

It's confusing behaviour the first time you see it; process memory in
TaskManager just keeps getting higher and higher for no apparent reason.

Regards

Paul


Paul,

When you run the sample I posted, you'll notice a high collection rate for
all generations, the allocation rate on v1.1 is much higher than the gen0
collection frequency, that means there are a lot of promotions. The same
goes for gen1, so a lot objects are reaching gen2 (the number depends on CPU
performance, the type of GC, OS version etc.). However, one of the GC
heuristics forces a gen2 collection every x gen1 GC runs, where x varies
depending on the allocation rate measured in the previous cycle. What I
notice when I run the sample, is a gen2 collection per ~10 gen1 collections
( ~1 gen2 collection per second), the gen2 heap never exceeding 2Mb and the
working set never exeeds 12MB. You can try yourself, you'll see the memory
never reaches the high water mark you mentioned.

Willy.
Nov 17 '05 #18

P: n/a
Since I started this topic, I think its my responsibility to say that I
am getting out of this topic. Reason is that I cannot post my code (and
I am more than 100% sure that my example accurately describes what I
have with my source code). I fixed my problem (with the hashtable
method). If no one thinks there is no problem with Regex, then its
fine. But I learned one valuable lesson on where to look for memory
growth, if there is one.

Thanks for all your replies.

Nov 17 '05 #19

P: n/a
After several request for a complete sample that illustrates the issue, you
are telling us that you can't post your code. I posted a complete sample
(essentially the same your code as you said), that illustrates there is no
such issue, at least not when I ran it, did you actually run this code?
guess not, did you look at the performance counters? guess not.
As long as you can't prove your issue with Regex, I will assume there is no
issue at all.

Willy.

<je**********@gmail.com> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
Since I started this topic, I think its my responsibility to say that I
am getting out of this topic. Reason is that I cannot post my code (and
I am more than 100% sure that my example accurately describes what I
have with my source code). I fixed my problem (with the hashtable
method). If no one thinks there is no problem with Regex, then its
fine. But I learned one valuable lesson on where to look for memory
growth, if there is one.

Thanks for all your replies.

Nov 17 '05 #20

P: n/a
In the previous posts, you were asking about Regex :
I am really interested (if there is one reasoning) to know why this is
happening. In the sense, why these objects are not being removed from
memory. If this is a random thing which is out of our hands (and so in

t>he hands of people at Microsoft) then that answers everything.

I posted to you two samples that do what you said you were doing to
let you try and understand where there is a problem. You did not reply
that means you did not tried.

To me, the anser is : there is no memory leak with Regex and I will continue
to think that until someone shows me an example of leaking that I can
compile.
If you still think there is, I ,certainly Willy too and others of this
newsgroup,
ARE INTERESTED to have a source code that prove it.

I'm still interested in knowing the issue of this problem,
Ludovic.
Nov 17 '05 #21

This discussion thread is closed

Replies have been disabled for this discussion.