473,513 Members | 2,519 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

2 questions for perl text manipulation

14 New Member
I've just started programming in perl and have written a few successful scripts but had a quick question on how to do 2 things.

First here is a script that I wrote recently that works for what it is supposed to do, but is not quite what I want.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. $file_q = "x.txt";
  4.  
  5. open(FILE, $file_q)||die "nope\n";
  6. while(<FILE>){
  7.  
  8. @line = split(/\s+/, $_);
  9.  
  10. if($line[0]=~/cere/){
  11.  
  12. push(@wanted_lines,$line[2]);
  13. }}
  14.  
  15. close (FILE);
  16.  
  17. print "@wanted_lines\n";
Basically what I need to do is to extract the nth character of each line beginning with 'cere' and push the output of that into an array. I will repeat that for some other strings as well. Then from there I need to be able to only print n characters per line so that I can say print 100 cere characters, then 100 a characters, then 100 b characters in a format similar to this:


cere-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
aaaa-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
bbbb-zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

any help is greatly appreciated!
Jul 1 '08 #1
4 1306
KevinADC
4,059 Recognized Expert Specialist
Hard to say wihtout seeing your data, but here is something you can maybe chew on:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
  4. my $file_q = "x.txt";
  5. my @wanted = ();
  6. open(FILE, $file_q) or die "nope: $!\n";
  7. while(<FILE>){
  8.    if(/^cere/){ # line begins with cere
  9.       push @wanted_lines,substr($_,5,100);
  10.    }
  11. }
  12. close (FILE);
  13. print "@wanted_lines\n";
  14.  
Look up substr() and how to use it.
Jul 1 '08 #2
sessmurda
14 New Member
Basically the format of my data is like this, but contains closer to like 10,000 lines.

cere 662376 G
para 662376 C
baya 662376 x
cere 662375 C
para 662375 G
baya 662375 x
cere 662374 G
para 662374 C
baya 662374 x
cere 662373 C
para 662373 A
baya 662373 x
cere 662372 A
para 662372 A
baya 662372 x
cere 662371 T
para 662371 C
baya 662371 x
cere 662370 G
para 662370 G
baya 662370 x
cere 662369 C
para 662369 A
baya 662369 C
cere 662368 A
para 662368 A
baya 662368 A
cere 662367 T
para 662367 C
baya 662367 T
cere 662366 C
para 662366 C
baya 662366 C
cere 662365 G
para 662365 C
baya 662365 G
cere 662364 A
para 662364 G
baya 662364 A
cere 662363 C
para 662363 C
baya 662363 C
cere 662362 G
para 662362 G
baya 662362 G
cere 662361 T
para 662361 T
baya 662361 T
cere 662360 C
para 662360 A
baya 662360 C
cere 662359 A
para 662359 T
baya 662359 A
cere 662358 C
para 662358 G
baya 662358 C

I've been using the substring function, but the main thing is I want to align all the cere against all the para, against all the baya in a format similar to my first post while only printing a certain # of characters per line because 1) its so long, and 2) I have to do this to many different outputs. The problem with just the substring function I've been having is that itll list all of the cere points, then all of another, whereas I'd want it to be aligned so that I can compare.
Jul 1 '08 #3
KevinADC
4,059 Recognized Expert Specialist
Just going by the sample data, I wrote this:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my %data = ();
  4. my @genes = (); 
  5. while (my $line=<DATA>) {
  6.    $line =~ tr/ //d; # remove the spaces
  7.    my ($var1, $var2, $var3) = unpack("A4A6A1",$line); # unpack is very efficient
  8.    push @genes, $var1; #to maintain order. Can be omitted if order is not important 
  9.    $data{$var1} .= $var3; # creates a hash 
  10. }
  11.  
  12. foreach my $g (@genes) {
  13.    print "$g ", substr($data{$g},0,10), "\n";
  14. }
  15.  
  16. __DATA__
  17. cere 662376 G
  18. para 662376 C
  19. baya 662376 x
  20. cere 662375 C
  21. para 662375 G
  22. baya 662375 x
  23. cere 662374 G
  24. para 662374 C
  25. baya 662374 x
  26. cere 662373 C
  27. para 662373 A
  28. baya 662373 x
  29. cere 662372 A
  30. para 662372 A
  31. baya 662372 x
  32. cere 662371 T
  33. para 662371 C
  34. baya 662371 x
  35. cere 662370 G
  36. para 662370 G
  37. baya 662370 x
  38. cere 662369 C
  39. para 662369 A
  40. baya 662369 C
  41. cere 662368 A
  42. para 662368 A
  43. baya 662368 A
  44. cere 662367 T
  45. para 662367 C
  46. baya 662367 T
  47. cere 662366 C
  48. para 662366 C
  49. baya 662366 C
  50. cere 662365 G
  51. para 662365 C
  52. baya 662365 G
  53. cere 662364 A
  54. para 662364 G
  55. baya 662364 A
  56. cere 662363 C
  57. para 662363 C
  58. baya 662363 C
  59. cere 662362 G
  60. para 662362 G
  61. baya 662362 G
  62. cere 662361 T
  63. para 662361 T
  64. baya 662361 T
  65. cere 662360 C
  66. para 662360 A
  67. baya 662360 C
  68. cere 662359 A
  69. para 662359 T
  70. baya 662359 A
  71. cere 662358 C
  72. para 662358 G
  73. baya 662358 C 
Jul 1 '08 #4
sessmurda
14 New Member
Thanks! I've done a bit more manipulation to get it to do exactly what I want, your help is greatly appreciated!
Jul 1 '08 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

58
4435
by: @ | last post by:
A benchmark in 2002 showed PHP is much slower in shell or when Apache has Mod_Perl. With the new PHP kissing Java's ass, Perl is once again the #1 CGI choice. Java is for a big team in short...
4
2516
by: Piotr Turkowski | last post by:
Hi! I've got some code in Perl and I have to have it in C, but my knowlege of Perl is < 0 :-(, so I need your help. here's the code. Thanks in advance. decrypt.pl #!/usr/local/bin/perl...
9
10175
by: Martin Foster | last post by:
Hi. I would like to be able to mimic the unix tool 'uniq' within a Perl script. I have a file with entries that look like this 4 10 21 37 58 83 111 145 184 226...
6
3089
by: Richard Bell | last post by:
I'm returning to Perl and Linux after many years away and while I know/knew way back when about Perl and Unix I'm new to this world today. I'm considering using LWP as the heart of a Web...
0
1326
by: Jack Coxen | last post by:
------_=_NextPart_001_01C3584E.5FF65B60 Content-Type: text/plain; charset="iso-8859-1" I've gone through the mailing list archives, read the appropriate parts of the manual and searched the...
3
1416
by: pc | last post by:
hi all, I have been blessed with the task of writing a web based database representing the state of our globally installed isam databases. there are basically four steps in setting this up: ...
9
1485
by: jay | last post by:
Hi, I'm totally new to Python and was hoping someone might be able to answer a few questions for me: 1. What are your views about Python vs Perl? Do you see one as better than the other? ...
1
3937
by: pvenu | last post by:
Hi, I know basic perl (regular expressions, pattern matching, string manipulation, reading writing into text files). Yet, my requirement is to read an input text file -> process this input file...
13
1613
by: filipo | last post by:
Hello all; I have a .csv file that contains messages exported from one discussion forum that I want to import into another forum (phpBB), but I need to do some data manipulation on the original...
0
7260
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7160
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7384
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7537
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7099
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7525
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
3233
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3222
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
799
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.