By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,541 Members | 1,476 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,541 IT Pros & Developers. It's quick & easy.

Parsing data

P: 3
Let me ask for forgiveness right from the get-go. I am a newbie to Perl and found that it would be a beneficial tool for my line of profession and I am trying to teach myself with on-line resources.

My question to the community, and I've seen variations on what I am trying to achieve, is the following scenario:

I have a comma-delimited text file (metadata_output.txt) saved in a given directory (eg \usr\home\project\dump)

The format has for argument's sake, 4 fields:

ssnumb, name_last, name_first, ratios



I now want to open the file and have my script go through each entry line by line and print only the 4th "ratio"field or [3].

Up to this point I have seen variations on how to do this. such as:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/home/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. open(META, "<\usr\home\project\dump\metadata_output.txt>"
  6.  
  7. while(<META>)
  8.  
  9. {
  10.      chomp($_);
  11.      my @line = split(/,/, $_);
  12.  
  13. print ("line[3]\n");
  14.  
  15. }
  16.  
What I am seeking is a way of :

1) taking these outputed fields parsed from the original metadata_output.txt file and create an entirely NEW file (eg. new_file.txt)with this stripped out data to keep my original data untouched.

2) Is it possible with a sub query within the same script to perform mySQL-like functions such as padding the new_file.txt's records to conform to a standard pattern.

eg. my new file would have the following record examples for its sole ratio field :

12/12/12
3:3
0:0/1
33:33:34

Is it then possible to have it do something like a RTrim? but have it recognize that it is to take only up to the first two integers and stop when arriving at non-integer character?

Of course all of this would be overwriting the new_file.txt which I would then like to append as a new field in original metadata_output.txt

Comments ./ direction would be appreciated.

Nick
Nov 3 '08 #1
Share this Question
Share on Google+
5 Replies


KevinADC
Expert 2.5K+
P: 4,059
I now want to open the file and have my script go through each entry line by line and print only the 4th "ratio"field or [3].

Comments ./ direction would be appreciated.

Nick
Assuming all lines are inthe correct format:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/home/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. open(META, '<\usr\home\project\dump\metadata_output.txt') or die "$!";
  6. while(<META>){
  7.    chomp;
  8.    print +(split(/,/))[3],"\n";
  9. }
  10. close META;
  11.  
Note how I changed the quotes around the file name to single-quotes. While your code is probably meant as an example and not real code, if you had used that code it would return several errors/warnings because the backslash is perls escape character and meta sequence character. So perl looks at \u and \h \p \d \m and tries to interpolate them as meta characters instead of as a directory delimiter. The point is to always use forward slashes in perl code whenever possible to avoid the problem entirely. Even in a single-quoted string the backslash is the escape character and can cause problems if you are not aware of that. There are no meta sequences in a single-quoted string but you should still try and use forward slashes as a general habit.
Nov 3 '08 #2

KevinADC
Expert 2.5K+
P: 4,059
Another note: don't put parentheses around your print commands:

Expand|Select|Wrap|Line Numbers
  1. print ("line[3]\n");
print is not a function and eventually that will cause problems in your perl code. I used them in mine but thats because I was printing a list slice back from the split() function and the leading "(" is necessary:
Expand|Select|Wrap|Line Numbers
  1. (split(/,/))[3]
but note the '+' symbol added to the beginning. Thats the proper way to print something when you must use parentheses just after a print command, otherwise don't use parentheses.
Nov 3 '08 #3

P: 3
Another note: don't put parentheses around your print commands:

Expand|Select|Wrap|Line Numbers
  1. print ("line[3]\n");
print is not a function and eventually that will cause problems in your perl code. I used them in mine but thats because I was printing a list slice back from the split() function and the leading "(" is necessary:
Expand|Select|Wrap|Line Numbers
  1. (split(/,/))[3]
but note the '+' symbol added to the beginning. Thats the proper way to print something when you must use parentheses just after a print command, otherwise don't use parentheses.

KevinADC:

Thanks for the prompt response and toleration!!!

Now that you have shown me the correct way to extract that column into my new text file how would one going about parsing the integers out of the results:

ie) if I had

44:45:44
12/11/11
9:9:0

and I wanted the first number parsed out of each line.

I was looking to see if a mySQL query would work with my text output but the closest thing I can find is the RTrim() and I am dumbfounded for the flags how to specify I need the first digits (which can either be one or two digits) and to ignore anything to the right of the first / or : (those symbols to be ignored as well)....

Did I pick the right method by using Perl or would it be easier/more fesible to do in mySQL?

Regards,

Nick
Nov 4 '08 #4

nithinpes
Expert 100+
P: 410
KevinADC:

Thanks for the prompt response and toleration!!!

Now that you have shown me the correct way to extract that column into my new text file how would one going about parsing the integers out of the results:

ie) if I had

44:45:44
12/11/11
9:9:0

and I wanted the first number parsed out of each line.



Regards,

Nick
You can make use of regex to extract the first number:
Expand|Select|Wrap|Line Numbers
  1. my $res=(split(/,/))[3] ;
  2. print "$1\n" if($res=~/^(\d+)/);
  3.  
Nov 4 '08 #5

KevinADC
Expert 2.5K+
P: 4,059
As far as I know, MySQL or other SQL can do those things but perl can certainly do them easy enough. But I humbly suggest you ask mySQL or SQL in general questions on an SQL forum. I don't even know what RTrim() does myself.

nithinpes has posted a way using a regexp to get the digits that preceed a / or : in a string or line.
Nov 4 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.