By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,179 Members | 970 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,179 IT Pros & Developers. It's quick & easy.

Perl - How to create all possible sequence from a word

P: 23
Hi all,
I have to do a translation of a word into different possible word based on a dictionary, which define differ meaning for each letter in a word.
For example,
The word "VAEFGH" to be translated to various all possible words based on the dictonary that I define here like the one.
@V=('GTT','GTC','GTA','GTG');
@A=('GCT','GCC','GCA','GCG');
@E=('GAA','GAG');
@F=('TTT','TTC');
@G=('GGT','GGC','GGA','GGG');
@H=('CAT','CAC');

Just to say, an example of a translated ouput from the dictionary for the word "VAEFGH" is to be "GTTGCTGAATTTGGTCAT". Please provide me the solution by donating your code or else the algorithm to crack this one out.

Thanks in advance,
B.Nataraj
Oct 20 '08 #1
Share this Question
Share on Google+
14 Replies


nithinpes
Expert 100+
P: 410
Before giving any solution, we would like to know what you have tried so far.
But, just to give you a hint to start with, all you need to do is to use nested foreach() loops for all the required arrays to get a set of possible combinations.

-Nithin
Oct 20 '08 #2

P: 23
Before giving any solution, we would like to know what you have tried so far.
But, just to give you a hint to start with, all you need to do is to use nested foreach() loops for all the required arrays to get a set of possible combinations.

-Nithin
Hi Nithin,
Thank you for your reply. Yes, I have been trying exactly what you have said about like nested foreach() loop, but it really confuse me to implement it as a programm..I am not a regular programmer, so I would be happy with modifying existing code of this sort to my need. Thats why I posted my concern into this forum. More I do understand that doing such a nested loop will consume more computational time so if any advanced programmer had done a heuristic approach for this problem would be much sought out for me here.
Oct 20 '08 #3

Icecrack
Expert 100+
P: 174
Hi Nithin,
Thank you for your reply. Yes, I have been trying exactly what you have said about like nested foreach() loop, but it really confuse me to implement it as a programm..I am not a regular programmer, so I would be happy with modifying existing code of this sort to my need. Thats why I posted my concern into this forum. More I do understand that doing such a nested loop will consume more computational time so if any advanced programmer had done a heuristic approach for this problem would be much sought out for me here.

if you have Searched this forum you would have found something on the lines of what you want.

all i know is i will not attempt to help you unless some code is placed
Oct 20 '08 #4

KevinADC
Expert 2.5K+
P: 4,059
I totally agree with what other people have posted, you need to show some effort before we post some code. That is my personal rule also, 99% of the time. I reserve that last 1% for a question I find interesting enough to violate my own rule.

My approach to solving the problem is not heuristic, but you could say its part of a heuristic approach to discovering other ways to do the same thing that are hopefully more efficient.

I applied very linear logic to solving the problem. Which is almost always how I start. There could be short cuts or better ways to do this. In fact I am almost sure there is but I might not be smart enough to figure them out.

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;# used to loop through the letters in sequential order
  14. my %arrays;# initialize a hash 
  15. %arrays = map {$_=> $hash{$_}} split //,$word;#convert the word into a hash of arrays
  16. my @loops;#initialize an array to find how many permutations are possible
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};#gets the length of each array in the hash
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[0];#sort the array and get the 0th value which is the number of possible permutations
  21. my @perms;#initialize an array to store the permutations
  22. foreach my $j (0..$i-1) {#start the number of loops we need to get all permutations
  23.    my $perm;#initialize a scalar to build a permutation
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j];
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;    
Now lets hear why the results are not correct ;)

Results using Data::Dumper to print the permutations:

Expand|Select|Wrap|Line Numbers
  1. $VAR1 = [
  2.           'GTTGCTGAATTTGGTCAT',
  3.           'GTCGCCGAGTTCGGCCAC'
  4.         ];
I assumed the number of possible permutations is limited by the shortest array of sequences. If that assumption is wrong you can use the longest array as the number of loops you need to make but you will have to test each array to make sure you have not gone past the end of shorter arrays.

If this is school work (which I am sure it is) you should not submit this code as your own work, that would be unethical. Hopefully that is important to you.
Oct 21 '08 #5

KevinADC
Expert 2.5K+
P: 4,059
This version filles in blank spots with '---':

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;
  14. my %arrays;
  15. %arrays = map {$_=> $hash{$_}} split //,$word;
  16. my @loops;
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[-1];
  21. my @perms;
  22. foreach my $j (0..$i-1) {
  23.    my $perm;
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j] || '---';
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;
Change this:

Expand|Select|Wrap|Line Numbers
  1. || '---';
to this to ignore blank spots

Expand|Select|Wrap|Line Numbers
  1.  || '';
Oct 21 '08 #6

P: 23
This version filles in blank spots with '---':

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;
  14. my %arrays;
  15. %arrays = map {$_=> $hash{$_}} split //,$word;
  16. my @loops;
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[-1];
  21. my @perms;
  22. foreach my $j (0..$i-1) {
  23.    my $perm;
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j] || '---';
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;
Change this:

Expand|Select|Wrap|Line Numbers
  1. || '---';
to this to ignore blank spots

Expand|Select|Wrap|Line Numbers
  1.  || '';
Hi Kevin,
Thanks for your help, I just saw your post today morning and thanks for the code and I will check it out and let you know the outcome very soon.

Regards,
B.Nataraj
Oct 21 '08 #7

KevinADC
Expert 2.5K+
P: 4,059
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
Oct 21 '08 #8

Icecrack
Expert 100+
P: 174
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
You should read and not scan i have learn't this the hard way.

:P

and after reading your script thats what i thought it was not what he wanted.
Oct 21 '08 #9

KevinADC
Expert 2.5K+
P: 4,059
You should read and not scan i have learn't this the hard way.

:P

and after reading your script thats what i thought it was not what he wanted.
hehehe.... I do read the threads, I've been posting on forums for more than 10 years and learned long ago not to get into a thread half-baked. If he really wants all premutations that should actually be easier, if more intense, since it will generate a lot more sequences. The unblanaced length of the arrays is what made me think he wants only the possible combinations achieved from looping through the set of arrays once. Maybe not though.
Oct 21 '08 #10

P: 23
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
Hi Kevin,
Yes I want all possible permutation by keeping the original length of the word as constant. The output should not be short in length ( I mean inserting dash or space is not permitted in my problem). This is originally a bioinformatics problem, I did not talk here in the language of bioinformatics since I belive most of you in this forum may not aware of bioinformatics or even molecular biology's pros and cons. But after seeing your intrest in this problem I am bound to explain my problem in bioinformatics term, that is the approach which I am doing is called backtranslation ,I am trying to translate from protein sequence to DNA sequence, The three letter codon (DNA sequence) (eg ATT, GCC etc.,) code for single amino acid , say A, V, E ,F etc., There are more than one codon can represent an indivdual amino acid but the reverse is not true. Hope you can get some idea of my problem now..so could you please suggest modification in your code (Which I belive 70% represented my problem) to achive it as 100% successful code.

Thanks and regards,
B.Nataraj
Oct 21 '08 #11

KevinADC
Expert 2.5K+
P: 4,059
All the possible permutations is simple:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my @V=('GTT','GTC','GTA','GTG');
  4. my @A=('GCT','GCC','GCA','GCG');
  5. my @E=('GAA','GAG');
  6. my @F=('TTT','TTC');
  7. my @G=('GGT','GGC','GGA','GGG');
  8. my @H=('CAT','CAC');
  9. my $i = 1;
  10. foreach my $v (@V) {
  11.    foreach my $a (@A) {
  12.       foreach my $e (@E) {
  13.          foreach my $f (@F) {
  14.             foreach my $g (@G) {
  15.                foreach my $h (@H) {
  16.                   print "$i $v$a$e$f$g$h\n";
  17.                   $i++;
  18.                }
  19.             }
  20.          }
  21.       }
  22.    }
  23. }
  24.  
$i is there just to show how many permutations there are, in this case 512.

You may also want to look into Bioperl which is written exclusively for this type of work.
Oct 21 '08 #12

P: 23
All the possible permutations is simple:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my @V=('GTT','GTC','GTA','GTG');
  4. my @A=('GCT','GCC','GCA','GCG');
  5. my @E=('GAA','GAG');
  6. my @F=('TTT','TTC');
  7. my @G=('GGT','GGC','GGA','GGG');
  8. my @H=('CAT','CAC');
  9. my $i = 1;
  10. foreach my $v (@V) {
  11.    foreach my $a (@A) {
  12.       foreach my $e (@E) {
  13.          foreach my $f (@F) {
  14.             foreach my $g (@G) {
  15.                foreach my $h (@H) {
  16.                   print "$i $v$a$e$f$g$h\n";
  17.                   $i++;
  18.                }
  19.             }
  20.          }
  21.       }
  22.    }
  23. }
  24.  
$i is there just to show how many permutations there are, in this case 512.

You may also want to look into Bioperl which is written exclusively for this type of work.
Hi Kevin,
Great help indeed and thanks ,as I am in the middle of my wet lab work that I do in addition to my computation work, for that reason I am unable to check your code and report you immediately, I do it today evening and will let you know.

Regards,
B.Nataraj
Oct 21 '08 #13

KevinADC
Expert 2.5K+
P: 4,059
I should add that this is a much less interesting problem than I had originally anticipated so I am back to my personal rule. You need to show some effort in solving your coding problems before I will post anymore code. I am no more interested in learning bioinformatics than you are in learning perl programming.

You are welcome,
Kevin(ADC)
Oct 21 '08 #14

P: 23
I should add that this is a much less interesting problem than I had originally anticipated so I am back to my personal rule. You need to show some effort in solving your coding problems before I will post anymore code. I am no more interested in learning bioinformatics than you are in learning perl programming.

You are welcome,
Kevin(ADC)
Hi Kevin,
The code works fine the way I wanted it is to be. I do feel that it's so simple but I had panicked once I saw the problem earlier and more the urge to do fast and quick made me to post here...once again I reiterate that I am a weekend programmer and regular programmer like you it made so simple. Any way thanks a lot for your help

I now have to generalize your code for any possible input protein sequence given as input may be I have to make it to read from a text file by each line and have to convert it into the way the program does it now....Hope I can do it now by at least within three days and I will come back to you with my code if I happened to struck there elsewhere so you wont need to escalate from your personal rule then :-)

Regards,
B.Nataraj
Oct 21 '08 #15

Post your reply

Sign in to post your reply or Sign up for a free account.