473,386 Members | 1,738 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Perl - How to create all possible sequence from a word

Hi all,
I have to do a translation of a word into different possible word based on a dictionary, which define differ meaning for each letter in a word.
For example,
The word "VAEFGH" to be translated to various all possible words based on the dictonary that I define here like the one.
@V=('GTT','GTC','GTA','GTG');
@A=('GCT','GCC','GCA','GCG');
@E=('GAA','GAG');
@F=('TTT','TTC');
@G=('GGT','GGC','GGA','GGG');
@H=('CAT','CAC');

Just to say, an example of a translated ouput from the dictionary for the word "VAEFGH" is to be "GTTGCTGAATTTGGTCAT". Please provide me the solution by donating your code or else the algorithm to crack this one out.

Thanks in advance,
B.Nataraj
Oct 20 '08 #1
14 4099
nithinpes
410 Expert 256MB
Before giving any solution, we would like to know what you have tried so far.
But, just to give you a hint to start with, all you need to do is to use nested foreach() loops for all the required arrays to get a set of possible combinations.

-Nithin
Oct 20 '08 #2
Before giving any solution, we would like to know what you have tried so far.
But, just to give you a hint to start with, all you need to do is to use nested foreach() loops for all the required arrays to get a set of possible combinations.

-Nithin
Hi Nithin,
Thank you for your reply. Yes, I have been trying exactly what you have said about like nested foreach() loop, but it really confuse me to implement it as a programm..I am not a regular programmer, so I would be happy with modifying existing code of this sort to my need. Thats why I posted my concern into this forum. More I do understand that doing such a nested loop will consume more computational time so if any advanced programmer had done a heuristic approach for this problem would be much sought out for me here.
Oct 20 '08 #3
Icecrack
174 Expert 100+
Hi Nithin,
Thank you for your reply. Yes, I have been trying exactly what you have said about like nested foreach() loop, but it really confuse me to implement it as a programm..I am not a regular programmer, so I would be happy with modifying existing code of this sort to my need. Thats why I posted my concern into this forum. More I do understand that doing such a nested loop will consume more computational time so if any advanced programmer had done a heuristic approach for this problem would be much sought out for me here.

if you have Searched this forum you would have found something on the lines of what you want.

all i know is i will not attempt to help you unless some code is placed
Oct 20 '08 #4
KevinADC
4,059 Expert 2GB
I totally agree with what other people have posted, you need to show some effort before we post some code. That is my personal rule also, 99% of the time. I reserve that last 1% for a question I find interesting enough to violate my own rule.

My approach to solving the problem is not heuristic, but you could say its part of a heuristic approach to discovering other ways to do the same thing that are hopefully more efficient.

I applied very linear logic to solving the problem. Which is almost always how I start. There could be short cuts or better ways to do this. In fact I am almost sure there is but I might not be smart enough to figure them out.

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;# used to loop through the letters in sequential order
  14. my %arrays;# initialize a hash 
  15. %arrays = map {$_=> $hash{$_}} split //,$word;#convert the word into a hash of arrays
  16. my @loops;#initialize an array to find how many permutations are possible
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};#gets the length of each array in the hash
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[0];#sort the array and get the 0th value which is the number of possible permutations
  21. my @perms;#initialize an array to store the permutations
  22. foreach my $j (0..$i-1) {#start the number of loops we need to get all permutations
  23.    my $perm;#initialize a scalar to build a permutation
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j];
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;    
Now lets hear why the results are not correct ;)

Results using Data::Dumper to print the permutations:

Expand|Select|Wrap|Line Numbers
  1. $VAR1 = [
  2.           'GTTGCTGAATTTGGTCAT',
  3.           'GTCGCCGAGTTCGGCCAC'
  4.         ];
I assumed the number of possible permutations is limited by the shortest array of sequences. If that assumption is wrong you can use the longest array as the number of loops you need to make but you will have to test each array to make sure you have not gone past the end of shorter arrays.

If this is school work (which I am sure it is) you should not submit this code as your own work, that would be unethical. Hopefully that is important to you.
Oct 21 '08 #5
KevinADC
4,059 Expert 2GB
This version filles in blank spots with '---':

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;
  14. my %arrays;
  15. %arrays = map {$_=> $hash{$_}} split //,$word;
  16. my @loops;
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[-1];
  21. my @perms;
  22. foreach my $j (0..$i-1) {
  23.    my $perm;
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j] || '---';
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;
Change this:

Expand|Select|Wrap|Line Numbers
  1. || '---';
to this to ignore blank spots

Expand|Select|Wrap|Line Numbers
  1.  || '';
Oct 21 '08 #6
This version filles in blank spots with '---':

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use Data::Dumper;
  4. my %hash = (
  5.    V => ['GTT','GTC','GTA','GTG'],
  6.    A => ['GCT','GCC','GCA','GCG'],
  7.    E => ['GAA','GAG'],
  8.    F => ['TTT','TTC'],
  9.    G => ['GGT','GGC','GGA','GGG'],
  10.    H => ['CAT','CAC'],
  11. );
  12. my $word = 'VAEFGH';
  13. my @order = split //, $word;
  14. my %arrays;
  15. %arrays = map {$_=> $hash{$_}} split //,$word;
  16. my @loops;
  17. foreach my $array (keys %hash) {
  18.    push @loops, scalar @{$hash{$array}};
  19. }
  20. my $i = (sort {$a <=> $b} @loops)[-1];
  21. my @perms;
  22. foreach my $j (0..$i-1) {
  23.    my $perm;
  24.    foreach my $letter (@order) {
  25.       $perm .= @{$hash{$letter}}[$j] || '---';
  26.    }
  27.    push @perms, $perm;
  28. }
  29. print Dumper \@perms;
Change this:

Expand|Select|Wrap|Line Numbers
  1. || '---';
to this to ignore blank spots

Expand|Select|Wrap|Line Numbers
  1.  || '';
Hi Kevin,
Thanks for your help, I just saw your post today morning and thanks for the code and I will check it out and let you know the outcome very soon.

Regards,
B.Nataraj
Oct 21 '08 #7
KevinADC
4,059 Expert 2GB
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
Oct 21 '08 #8
Icecrack
174 Expert 100+
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
You should read and not scan i have learn't this the hard way.

:P

and after reading your script thats what i thought it was not what he wanted.
Oct 21 '08 #9
KevinADC
4,059 Expert 2GB
You should read and not scan i have learn't this the hard way.

:P

and after reading your script thats what i thought it was not what he wanted.
hehehe.... I do read the threads, I've been posting on forums for more than 10 years and learned long ago not to get into a thread half-baked. If he really wants all premutations that should actually be easier, if more intense, since it will generate a lot more sequences. The unblanaced length of the arrays is what made me think he wants only the possible combinations achieved from looping through the set of arrays once. Maybe not though.
Oct 21 '08 #10
The code I posted does not find all possible combinations for the "word". I guess after reviewing the thread I am not sure thats what you were after anyway. But if you really did want all permutations that would be a different thing and the output would grow exponentially.
Hi Kevin,
Yes I want all possible permutation by keeping the original length of the word as constant. The output should not be short in length ( I mean inserting dash or space is not permitted in my problem). This is originally a bioinformatics problem, I did not talk here in the language of bioinformatics since I belive most of you in this forum may not aware of bioinformatics or even molecular biology's pros and cons. But after seeing your intrest in this problem I am bound to explain my problem in bioinformatics term, that is the approach which I am doing is called backtranslation ,I am trying to translate from protein sequence to DNA sequence, The three letter codon (DNA sequence) (eg ATT, GCC etc.,) code for single amino acid , say A, V, E ,F etc., There are more than one codon can represent an indivdual amino acid but the reverse is not true. Hope you can get some idea of my problem now..so could you please suggest modification in your code (Which I belive 70% represented my problem) to achive it as 100% successful code.

Thanks and regards,
B.Nataraj
Oct 21 '08 #11
KevinADC
4,059 Expert 2GB
All the possible permutations is simple:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my @V=('GTT','GTC','GTA','GTG');
  4. my @A=('GCT','GCC','GCA','GCG');
  5. my @E=('GAA','GAG');
  6. my @F=('TTT','TTC');
  7. my @G=('GGT','GGC','GGA','GGG');
  8. my @H=('CAT','CAC');
  9. my $i = 1;
  10. foreach my $v (@V) {
  11.    foreach my $a (@A) {
  12.       foreach my $e (@E) {
  13.          foreach my $f (@F) {
  14.             foreach my $g (@G) {
  15.                foreach my $h (@H) {
  16.                   print "$i $v$a$e$f$g$h\n";
  17.                   $i++;
  18.                }
  19.             }
  20.          }
  21.       }
  22.    }
  23. }
  24.  
$i is there just to show how many permutations there are, in this case 512.

You may also want to look into Bioperl which is written exclusively for this type of work.
Oct 21 '08 #12
All the possible permutations is simple:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my @V=('GTT','GTC','GTA','GTG');
  4. my @A=('GCT','GCC','GCA','GCG');
  5. my @E=('GAA','GAG');
  6. my @F=('TTT','TTC');
  7. my @G=('GGT','GGC','GGA','GGG');
  8. my @H=('CAT','CAC');
  9. my $i = 1;
  10. foreach my $v (@V) {
  11.    foreach my $a (@A) {
  12.       foreach my $e (@E) {
  13.          foreach my $f (@F) {
  14.             foreach my $g (@G) {
  15.                foreach my $h (@H) {
  16.                   print "$i $v$a$e$f$g$h\n";
  17.                   $i++;
  18.                }
  19.             }
  20.          }
  21.       }
  22.    }
  23. }
  24.  
$i is there just to show how many permutations there are, in this case 512.

You may also want to look into Bioperl which is written exclusively for this type of work.
Hi Kevin,
Great help indeed and thanks ,as I am in the middle of my wet lab work that I do in addition to my computation work, for that reason I am unable to check your code and report you immediately, I do it today evening and will let you know.

Regards,
B.Nataraj
Oct 21 '08 #13
KevinADC
4,059 Expert 2GB
I should add that this is a much less interesting problem than I had originally anticipated so I am back to my personal rule. You need to show some effort in solving your coding problems before I will post anymore code. I am no more interested in learning bioinformatics than you are in learning perl programming.

You are welcome,
Kevin(ADC)
Oct 21 '08 #14
I should add that this is a much less interesting problem than I had originally anticipated so I am back to my personal rule. You need to show some effort in solving your coding problems before I will post anymore code. I am no more interested in learning bioinformatics than you are in learning perl programming.

You are welcome,
Kevin(ADC)
Hi Kevin,
The code works fine the way I wanted it is to be. I do feel that it's so simple but I had panicked once I saw the problem earlier and more the urge to do fast and quick made me to post here...once again I reiterate that I am a weekend programmer and regular programmer like you it made so simple. Any way thanks a lot for your help

I now have to generalize your code for any possible input protein sequence given as input may be I have to make it to read from a text file by each line and have to convert it into the way the program does it now....Hope I can do it now by at least within three days and I will come back to you with my code if I happened to struck there elsewhere so you wont need to escalate from your personal rule then :-)

Regards,
B.Nataraj
Oct 21 '08 #15

Sign in to post your reply or Sign up for a free account.

Similar topics

41
by: Xah Lee | last post by:
here's another interesting algorithmic exercise, again from part of a larger program in the previous series. Here's the original Perl documentation: =pod merge($pairings) takes a list of...
2
by: Alex | last post by:
Basically I want to parse some C++ source code and replace some constants. I've mangaed to read in the file and replace some of the things that need replacing. but I'm having a hard time...
10
by: Hendry Taylor | last post by:
I wrote a perl script and tested it on windows and it works fine. I then moved it onto a solaris machine and now it says that there is a syntax error with the following line of code: $user1 =...
0
by: Kirt Loki Dankmyer | last post by:
So, I download the latest "stable" tar for perl (5.8.7) and try to compile it on the Solaris 8 (SPARC) box that I administrate. I try all sorts of different switches, but I can't get it to compile....
0
by: ryan | last post by:
I've been tasked with consuming a Perl web service that was written by a person in a different department of my company. The problem is it's the guy's first attempt at web services and he doesn't...
1
by: itm | last post by:
I have merged the data from Access. Now I need Word to start a new page every time the sequence number changes. How is this accomplished? Also, is it possible to tell Word to hide duplicates? ...
5
by: Johann C. Rocholl | last post by:
The following is my first attempt at adding a taint feature to Python to prevent os.system() from being called with untrusted input. What do you think of it? # taint.py - Emulate Perl's taint...
2
by: muppetjones | last post by:
I have been receiving a "Floating point exception" from Perl after my program has run several iterations. I have not been able to pin point the location of the error, and I am clueless. Especially as...
10
by: happyse27 | last post by:
Hi All, I got this apache errors(see section A1 and A2 below) when I used a html(see section b below) to activate acctman.pl(see section c below). Section D below is part of the configuration...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.