I don't want to be the one making Jesus cry, but I am not sure that my code
is what is doing it. I see that representing DNA as strings is not going to
make the cpu as happy as it could be, but it is not obvious to me how to
represent DNA (and RNA and protein, so I can't use a byte array anymore) as a
numeric array and still get relatively programmer friendly functionality.
I started yesterday (code below...) and stopped pretty rapidly because I
don't see a way to recreate IndexOf or Regex type functionality without a lot
of work! If you have anything more complete I would be interested!
Thanks,
Ethan
using System;
using System.Collections.Generic;
using System.Text;
namespace TestSequence
{
public struct DNA
{
private DNABase[] _Sequence;
public DNA(string sequence)
{
List<DNABaseThisSequence = new List<DNABase>();
foreach (char thisBase in sequence.ToUpper().ToCharArray())
{
DNABase NextBase;
switch (thisBase)
{
case "G":
{
NextBase = DNABase.G;
break;
}
case "A":
{
NextBase = DNABase.A;
break;
}
case "T":
{
NextBase = DNABase.T;
break;
}
case "C":
{
NextBase = DNABase.C;
break;
}
default:
{
continue;
}
}
ThisSequence.Add(NextBase);
}
_Sequence = ThisSequence.ToArray();
}
}
public enum DNABase : byte
{
N = 0,
G = 1,
A = 2,
T = 3,
C = 4
}
}
And I hope you realise that using strings to represent DNA sequences
except for input/output makes baby Jesus cry. Your programs would be
faster and more maintainable using your own type (probably backed by a
byte[] and some unsafe code). The best way to represent it normally
depends on what you're doing and whether you need to consider SNPs, but
Unicode strings are a bad idea!
Alun Harford