By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,812 Members | 856 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,812 IT Pros & Developers. It's quick & easy.

Bizarre benchmark result -- C# hundreds of times slower than Java?

P: n/a
While asking some Java enthusiasts what they think about C#, I came across
this:

http://www.manageability.org/blog/ar...m_with_cameron

Reportedly, the (essentially) same program in C# is much, much slower than
in Java.

This is a program that is heavy on regexes (which I'm not an expert on) and
am wondering if the C# version makes an elementary blunder. Do any experts
want to have a look? See also comp.lang.java.

(Query: Is he compiling the regex once in Java, but every time through the
loop in C#?)


Mar 11 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
This is a program that is heavy on regexes (which I'm not an expert on) and
am wondering if the C# version makes an elementary blunder. *Do any experts
want to have a look? *See also comp.lang.java.

(Query: Is he compiling the regex once in Java, but every time through the
loop in C#?)
I think he is compiling the regular expression each time in the loop.
A good benchmark would be compiling it once and matching it in the
loop. Maybe C# uses a DFA-NFA hybrid (which might explain the large
memory usage, as the author of the article claims) which has the
potential of matching a regular expression several times faster than a
backtracking implementation, but it would compile the regular
expression slower than backtracking.

Another good benchmark would include the use of backreferences, which
forces a regex implementation to use backtracking.

FYI, egrep uses a DFA-NFA hybrid.
Mar 13 '08 #2

P: n/a
Thanks for the links! Very helpful and interesting.
Ethan
Plenty, usually caused by the fact that regexes aren't regular expressions
(the theoretical constructs, which always match in linear time). See, e.g.,
http://www.codinghorror.com/blog/archives/000488.html and
http://www.regular-expressions.info/catastrophic.html.
Mar 13 '08 #3

P: n/a
I don't want to be the one making Jesus cry, but I am not sure that my code
is what is doing it. I see that representing DNA as strings is not going to
make the cpu as happy as it could be, but it is not obvious to me how to
represent DNA (and RNA and protein, so I can't use a byte array anymore) as a
numeric array and still get relatively programmer friendly functionality.
I started yesterday (code below...) and stopped pretty rapidly because I
don't see a way to recreate IndexOf or Regex type functionality without a lot
of work! If you have anything more complete I would be interested!
Thanks,
Ethan

using System;
using System.Collections.Generic;
using System.Text;

namespace TestSequence
{
public struct DNA
{
private DNABase[] _Sequence;
public DNA(string sequence)
{
List<DNABaseThisSequence = new List<DNABase>();
foreach (char thisBase in sequence.ToUpper().ToCharArray())
{
DNABase NextBase;
switch (thisBase)
{
case "G":
{
NextBase = DNABase.G;
break;
}
case "A":
{
NextBase = DNABase.A;
break;
}
case "T":
{
NextBase = DNABase.T;
break;
}
case "C":
{
NextBase = DNABase.C;
break;
}
default:
{
continue;
}
}
ThisSequence.Add(NextBase);
}
_Sequence = ThisSequence.ToArray();
}
}
public enum DNABase : byte
{
N = 0,
G = 1,
A = 2,
T = 3,
C = 4
}
}
And I hope you realise that using strings to represent DNA sequences
except for input/output makes baby Jesus cry. Your programs would be
faster and more maintainable using your own type (probably backed by a
byte[] and some unsafe code). The best way to represent it normally
depends on what you're doing and whether you need to consider SNPs, but
Unicode strings are a bad idea!

Alun Harford
Mar 13 '08 #4

P: n/a
Hello Michael,
While asking some Java enthusiasts what they think about C#, I came
across this:

http://www.manageability.org/blog/ar...he_problem_wit
h_cameron

Reportedly, the (essentially) same program in C# is much, much slower
than in Java.

This is a program that is heavy on regexes (which I'm not an expert
on) and am wondering if the C# version makes an elementary blunder.
Do any experts want to have a look? See also comp.lang.java.

(Query: Is he compiling the regex once in Java, but every time through
the loop in C#?)
Replacing
Regex regexpr = new Regex(matchthis, RegexOptions.Compiled);

with
Regex regexpr = new Regex(matchthis, RegexOptions.None);

made it fly.

--
Jesse Houwing
jesse.houwing at sogeti.nl
Mar 14 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.