473,587 Members | 2,489 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

2 questions for perl text manipulation

14 New Member
I've just started programming in perl and have written a few successful scripts but had a quick question on how to do 2 things.

First here is a script that I wrote recently that works for what it is supposed to do, but is not quite what I want.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. $file_q = "x.txt";
  4.  
  5. open(FILE, $file_q)||die "nope\n";
  6. while(<FILE>){
  7.  
  8. @line = split(/\s+/, $_);
  9.  
  10. if($line[0]=~/cere/){
  11.  
  12. push(@wanted_lines,$line[2]);
  13. }}
  14.  
  15. close (FILE);
  16.  
  17. print "@wanted_lines\n";
Basically what I need to do is to extract the nth character of each line beginning with 'cere' and push the output of that into an array. I will repeat that for some other strings as well. Then from there I need to be able to only print n characters per line so that I can say print 100 cere characters, then 100 a characters, then 100 b characters in a format similar to this:


cere-xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx
aaaa-yyyyyyyyyyyyyyy yyyyyyyyyyyyyyy yyyyyyyyyyyy
bbbb-zzzzzzzzzzzzzzz zzzzzzzzzzzzzzz zzzzzzzzzzzz

any help is greatly appreciated!
Jul 1 '08 #1
4 1312
KevinADC
4,059 Recognized Expert Specialist
Hard to say wihtout seeing your data, but here is something you can maybe chew on:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
  4. my $file_q = "x.txt";
  5. my @wanted = ();
  6. open(FILE, $file_q) or die "nope: $!\n";
  7. while(<FILE>){
  8.    if(/^cere/){ # line begins with cere
  9.       push @wanted_lines,substr($_,5,100);
  10.    }
  11. }
  12. close (FILE);
  13. print "@wanted_lines\n";
  14.  
Look up substr() and how to use it.
Jul 1 '08 #2
sessmurda
14 New Member
Basically the format of my data is like this, but contains closer to like 10,000 lines.

cere 662376 G
para 662376 C
baya 662376 x
cere 662375 C
para 662375 G
baya 662375 x
cere 662374 G
para 662374 C
baya 662374 x
cere 662373 C
para 662373 A
baya 662373 x
cere 662372 A
para 662372 A
baya 662372 x
cere 662371 T
para 662371 C
baya 662371 x
cere 662370 G
para 662370 G
baya 662370 x
cere 662369 C
para 662369 A
baya 662369 C
cere 662368 A
para 662368 A
baya 662368 A
cere 662367 T
para 662367 C
baya 662367 T
cere 662366 C
para 662366 C
baya 662366 C
cere 662365 G
para 662365 C
baya 662365 G
cere 662364 A
para 662364 G
baya 662364 A
cere 662363 C
para 662363 C
baya 662363 C
cere 662362 G
para 662362 G
baya 662362 G
cere 662361 T
para 662361 T
baya 662361 T
cere 662360 C
para 662360 A
baya 662360 C
cere 662359 A
para 662359 T
baya 662359 A
cere 662358 C
para 662358 G
baya 662358 C

I've been using the substring function, but the main thing is I want to align all the cere against all the para, against all the baya in a format similar to my first post while only printing a certain # of characters per line because 1) its so long, and 2) I have to do this to many different outputs. The problem with just the substring function I've been having is that itll list all of the cere points, then all of another, whereas I'd want it to be aligned so that I can compare.
Jul 1 '08 #3
KevinADC
4,059 Recognized Expert Specialist
Just going by the sample data, I wrote this:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. my %data = ();
  4. my @genes = (); 
  5. while (my $line=<DATA>) {
  6.    $line =~ tr/ //d; # remove the spaces
  7.    my ($var1, $var2, $var3) = unpack("A4A6A1",$line); # unpack is very efficient
  8.    push @genes, $var1; #to maintain order. Can be omitted if order is not important 
  9.    $data{$var1} .= $var3; # creates a hash 
  10. }
  11.  
  12. foreach my $g (@genes) {
  13.    print "$g ", substr($data{$g},0,10), "\n";
  14. }
  15.  
  16. __DATA__
  17. cere 662376 G
  18. para 662376 C
  19. baya 662376 x
  20. cere 662375 C
  21. para 662375 G
  22. baya 662375 x
  23. cere 662374 G
  24. para 662374 C
  25. baya 662374 x
  26. cere 662373 C
  27. para 662373 A
  28. baya 662373 x
  29. cere 662372 A
  30. para 662372 A
  31. baya 662372 x
  32. cere 662371 T
  33. para 662371 C
  34. baya 662371 x
  35. cere 662370 G
  36. para 662370 G
  37. baya 662370 x
  38. cere 662369 C
  39. para 662369 A
  40. baya 662369 C
  41. cere 662368 A
  42. para 662368 A
  43. baya 662368 A
  44. cere 662367 T
  45. para 662367 C
  46. baya 662367 T
  47. cere 662366 C
  48. para 662366 C
  49. baya 662366 C
  50. cere 662365 G
  51. para 662365 C
  52. baya 662365 G
  53. cere 662364 A
  54. para 662364 G
  55. baya 662364 A
  56. cere 662363 C
  57. para 662363 C
  58. baya 662363 C
  59. cere 662362 G
  60. para 662362 G
  61. baya 662362 G
  62. cere 662361 T
  63. para 662361 T
  64. baya 662361 T
  65. cere 662360 C
  66. para 662360 A
  67. baya 662360 C
  68. cere 662359 A
  69. para 662359 T
  70. baya 662359 A
  71. cere 662358 C
  72. para 662358 G
  73. baya 662358 C 
Jul 1 '08 #4
sessmurda
14 New Member
Thanks! I've done a bit more manipulation to get it to do exactly what I want, your help is greatly appreciated!
Jul 1 '08 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

58
4470
by: @ | last post by:
A benchmark in 2002 showed PHP is much slower in shell or when Apache has Mod_Perl. With the new PHP kissing Java's ass, Perl is once again the #1 CGI choice. Java is for a big team in short time to develope something slow. ASP is a joke. PHP is a kid.
4
2520
by: Piotr Turkowski | last post by:
Hi! I've got some code in Perl and I have to have it in C, but my knowlege of Perl is < 0 :-(, so I need your help. here's the code. Thanks in advance. decrypt.pl #!/usr/local/bin/perl $keyword=$ARGV ; @key=split(//,$keyword) ; $period=length($keyword) ;
9
10186
by: Martin Foster | last post by:
Hi. I would like to be able to mimic the unix tool 'uniq' within a Perl script. I have a file with entries that look like this 4 10 21 37 58 83 111 145 184 226 4 12 24 42 64 92 124 162 204 252 4 11 23 44 67 95 134 168 215 271 ..
6
3107
by: Richard Bell | last post by:
I'm returning to Perl and Linux after many years away and while I know/knew way back when about Perl and Unix I'm new to this world today. I'm considering using LWP as the heart of a Web application and have a number of questions. It appears to me that the Get method returns ONLY the content of the single object referenced by the URL. ...
0
1332
by: Jack Coxen | last post by:
------_=_NextPart_001_01C3584E.5FF65B60 Content-Type: text/plain; charset="iso-8859-1" I've gone through the mailing list archives, read the appropriate parts of the manual and searched the internet and still have questions. What I'm running into is a serious performance problem with a fairly large (by my standards) database. My system...
3
1427
by: pc | last post by:
hi all, I have been blessed with the task of writing a web based database representing the state of our globally installed isam databases. there are basically four steps in setting this up: 1) schedule isam file reports to run at all remote sites. this generates a text file describing the current state of tables (files) in the database.
9
1493
by: jay | last post by:
Hi, I'm totally new to Python and was hoping someone might be able to answer a few questions for me: 1. What are your views about Python vs Perl? Do you see one as better than the other? 2. Is there a good book to start with while learning Python? I'm currently reading 'Python Essential Reference' by David M. Beazley.
1
3942
by: pvenu | last post by:
Hi, I know basic perl (regular expressions, pattern matching, string manipulation, reading writing into text files). Yet, my requirement is to read an input text file -> process this input file through a Perl script and the output generated from this script is to be written into an Excel sheet file (into each cells of the sheet). Can you...
13
1630
by: filipo | last post by:
Hello all; I have a .csv file that contains messages exported from one discussion forum that I want to import into another forum (phpBB), but I need to do some data manipulation on the original export file first. The original .csv file contains a subject or topic field on each line. Subsequent lines contain replies and other subjects. The...
0
7923
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7852
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8216
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8349
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7974
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8221
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5719
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3882
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1192
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.