By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,541 Members | 1,455 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,541 IT Pros & Developers. It's quick & easy.

2 questions for perl text manipulation

P: 14
I've just started programming in perl and have written a few successful scripts but had a quick question on how to do 2 things.

First here is a script that I wrote recently that works for what it is supposed to do, but is not quite what I want.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. $file_q = "x.txt";
  4.  
  5. open(FILE, $file_q)||die "nope\n";
  6. while(<FILE>){
  7.  
  8. @line = split(/\s+/, $_);
  9.  
  10. if($line[0]=~/cere/){
  11.  
  12. push(@wanted_lines,$line[2]);
  13. }}
  14.  
  15. close (FILE);
  16.  
  17. print "@wanted_lines\n";
Basically what I need to do is to extract the nth character of each line beginning with 'cere' and push the output of that into an array. I will repeat that for some other strings as well. Then from there I need to be able to only print n characters per line so that I can say print 100 cere characters, then 100 a characters, then 100 b characters in a format similar to this:


cere-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
aaaa-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
bbbb-zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

any help is greatly appreciated!
Jul 1 '08 #1
Share this Question
Share on Google+
4 Replies


KevinADC
Expert 2.5K+
P: 4,059
Hard to say wihtout seeing your data, but here is something you can maybe chew on:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
  4. my $file_q = "x.txt";
  5. my @wanted = ();
  6. open(FILE, $file_q) or die "nope: $!\n";
  7. while(<FILE>){
  8.    if(/^cere/){ # line begins with cere
  9.       push @wanted_lines,substr($_,5,100);
  10.    }
  11. }
  12. close (FILE);
  13. print "@wanted_lines\n";
  14.  
Look up substr() and how to use it.
Jul 1 '08 #2

P: 14
Basically the format of my data is like this, but contains closer to like 10,000 lines.

cere 662376 G
para 662376 C
baya 662376 x
cere 662375 C
para 662375 G
baya 662375 x
cere 662374 G
para 662374 C
baya 662374 x
cere 662373 C
para 662373 A
baya 662373 x
cere 662372 A
para 662372 A
baya 662372 x
cere 662371 T
para 662371 C
baya 662371 x
cere 662370 G
para 662370 G
baya 662370 x
cere 662369 C
para 662369 A
baya 662369 C
cere 662368 A
para 662368 A
baya 662368 A
cere 662367 T
para 662367 C
baya 662367 T
cere 662366 C
para 662366 C
baya 662366 C
cere 662365 G
para 662365 C
baya 662365 G
cere 662364 A
para 662364 G
baya 662364 A
cere 662363 C
para 662363 C
baya 662363 C
cere 662362 G
para 662362 G
baya 662362 G
cere 662361 T
para 662361 T
baya 662361 T
cere 662360 C
para 662360 A
baya 662360 C
cere 662359 A
para 662359 T
baya 662359 A
cere 662358 C
para 662358 G
baya 662358 C

I've been using the substring function, but the main thing is I want to align all the cere against all the para, against all the baya in a format similar to my first post while only printing a certain # of characters per line because 1) its so long, and 2) I have to do this to many different outputs. The problem with just the substring function I've been having is that itll list all of the cere points, then all of another, whereas I'd want it to be aligned so that I can compare.
Jul 1 '08 #3

KevinADC
Expert 2.5K+
P: 4,059
Just going by the sample data, I wrote this:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my %data = ();
  4. my @genes = (); 
  5. while (my $line=<DATA>) {
  6.    $line =~ tr/ //d; # remove the spaces
  7.    my ($var1, $var2, $var3) = unpack("A4A6A1",$line); # unpack is very efficient
  8.    push @genes, $var1; #to maintain order. Can be omitted if order is not important 
  9.    $data{$var1} .= $var3; # creates a hash 
  10. }
  11.  
  12. foreach my $g (@genes) {
  13.    print "$g ", substr($data{$g},0,10), "\n";
  14. }
  15.  
  16. __DATA__
  17. cere 662376 G
  18. para 662376 C
  19. baya 662376 x
  20. cere 662375 C
  21. para 662375 G
  22. baya 662375 x
  23. cere 662374 G
  24. para 662374 C
  25. baya 662374 x
  26. cere 662373 C
  27. para 662373 A
  28. baya 662373 x
  29. cere 662372 A
  30. para 662372 A
  31. baya 662372 x
  32. cere 662371 T
  33. para 662371 C
  34. baya 662371 x
  35. cere 662370 G
  36. para 662370 G
  37. baya 662370 x
  38. cere 662369 C
  39. para 662369 A
  40. baya 662369 C
  41. cere 662368 A
  42. para 662368 A
  43. baya 662368 A
  44. cere 662367 T
  45. para 662367 C
  46. baya 662367 T
  47. cere 662366 C
  48. para 662366 C
  49. baya 662366 C
  50. cere 662365 G
  51. para 662365 C
  52. baya 662365 G
  53. cere 662364 A
  54. para 662364 G
  55. baya 662364 A
  56. cere 662363 C
  57. para 662363 C
  58. baya 662363 C
  59. cere 662362 G
  60. para 662362 G
  61. baya 662362 G
  62. cere 662361 T
  63. para 662361 T
  64. baya 662361 T
  65. cere 662360 C
  66. para 662360 A
  67. baya 662360 C
  68. cere 662359 A
  69. para 662359 T
  70. baya 662359 A
  71. cere 662358 C
  72. para 662358 G
  73. baya 662358 C 
Jul 1 '08 #4

P: 14
Thanks! I've done a bit more manipulation to get it to do exactly what I want, your help is greatly appreciated!
Jul 1 '08 #5

Post your reply

Sign in to post your reply or Sign up for a free account.