 | 
August 6th, 2008, 12:48 AM
| | Newbie | | Join Date: Aug 2008
Posts: 3
| | Erasing or Skipping lines in a data file
Hi there,
I just started programming with PERL and am trying to put together my first little data manipulation program. I am working on a MAC with OSX.
I have a data file with the following header that has been created on a Windows XP machine: Quote:
------ Begin Next Fly'm ------
"Begin Fly'm (Thu Jul 24 01:22:02 2008)"
"Ions Flown Separately, Comp Quality(100)"
"Number of Ions to Fly = 200000"
"Changes:","Mass","Charge","X","Y","Z","KE","Azm", "Elv","Time of Birth"
" ","YES","NO ","NO ","NO ","NO ","YES","YES","YES","NO "
"Ion N","Events","TOF","Mass","X","KE","KE Error"
1,1,0,2.01402,9.53,1,1.28051e-009
1,4,0.725546,2.01402,178,3181.59,0.542952
|
My goal is to get rid of the header and the empty lines to finally have an output file with only the number entries.
I've put together some lines that -IMO- should work, but they don't and I don't really get it why they don't do their work. - # read file that is given in STDIN
-
$dat = shift or die "Need input file with data. \n";
-
-
# open files for read and for write
-
open DAT,"< $dat" or die "Cannot read $dat\n";
-
open OUT, "> output.txt" or die "Cannot open file!\n";
-
-
-
# loop for reading input file
-
while($dat = <DAT>){
-
- # skip lines beginning with ", - and empty
-
next if $dat =~ /^\s*$|\"|-/;
-
-
# next if $dat =~ /^\"/;
-
# s/^-//;
-
# s/^"//;
-
#s/^\s*//;
-
chomp($dat);
-
-
print OUT <DAT>; #print the contents of the input file into the ouput file
-
}
-
-
# close files for reading and writing
-
close DAT;
-
close OUT;
The bold part is the line where the skipping /erasing should take place. I tried several different combinations of the command, put m/ before the expression and /i after and even had it split into three seperate commands that should skip ", - and whitespace:
for just erasing the " and the same for - and whitespace.
My output file looks like this when I run the program: Quote:
"Begin Fly'm (Thu Jul 24 01:22:02 2008)"
"Ions Flown Separately, Comp Quality(100)"
"Number of Ions to Fly = 200000"
"Changes:","Mass","Charge","X","Y","Z","KE","Azm", "Elv","Time of Birth"
" ","YES","NO ","NO ","NO ","NO ","YES","YES","YES","NO "
"Ion N","Events","TOF","Mass","X","KE","KE Error"
1,1,0,2.01402,9.53,1,1.28051e-009
1,4,0.725546,2.01402,178,3181.59,0.542952
| There are some empty lines in the beginning and then the usual header minus the very first line.
I even tried to completely erase the header using the commands in line 16 and 17 of my code, but usually I get the following error for that attempt:
Use of uninitialized value in substitution (s///) at ./foo.pl line 16, <DAT> line 1.
Use of uninitialized value in substitution (s///) at ./foo.pl line 17, <DAT> line 1.
The skipping of lines worked to some extent with a simple little textfile I made up (except for getting rid of the empty lines), but it more or less fails when I try the same on my datafile.
Hopefully someone can help me with this problem.
Thanks a lot.
| 
August 6th, 2008, 12:56 AM
| | Newbie | | Join Date: Aug 2008
Posts: 3
| |
I just saw that the very first line of my data file gets erased whenever I run the program. No idea how this can happen.
| 
August 6th, 2008, 05:30 AM
|  | Expert | | Join Date: Dec 2007 Age: 24
Posts: 365
| | Quote: |
Originally Posted by BibI I just saw that the very first line of my data file gets erased whenever I run the program. No idea how this can happen. |
Modify the while() loop as below: -
while($dat = <DAT>){
-
# skip lines beginning with ", - and empty
-
next if $dat =~ /(^\s*$)|(^\")|(^-)/;
-
print OUT $dat; #print the contents of the input file into the ouput file
-
}
-
The regex is changed to suit your need (skip lines beginning with ", - and empty), the regex you used would skip blank lines, lines containing " and - (not just the lines beginning with).
The chomp() line was removed to obtain lines of output. If you want all the lines having numbers in a single line output, you can include that.
The first line of data was getting removed because of this:
When you have already assigned <DAT> to $dat, using <DAT> again will read the next line and print it. You should be using $dat in this line.
| 
August 6th, 2008, 05:32 AM
|  | Expert | | Join Date: Jan 2007
Posts: 3,639
| | - while(my $dat = <DAT>) {
-
next if ($dat =~ /^([-"])|^\s*$/);
-
print OUT $dat;
-
}
| 
August 6th, 2008, 08:22 PM
| | Newbie | | Join Date: Aug 2008
Posts: 3
| |
Hey thanks very much.
That totally solved my problem. :)
|  |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|