By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,177 Members | 976 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,177 IT Pros & Developers. It's quick & easy.

Perl syntax for sed searches

P: 7
I am aware that Perl has a lot of features that originally came from sed and awk. I have a pattern that I am using like this:

sed -n '/|Y|/p'

I want to do the same thing in Perl and be able to either save that value in some kind of variable or array or potentially write it out to a file.

For the simple case, writing out to a file, I think the syntax is very close to the sed syntax. I would like to get a few recommendations, first, on a few alternative ways to write a similar expression in Perl, then how to do I/O properly.

My second question is that I have two files, both with pipe separated data. In the first file, I want to do a large data reduction first, taking the pattern above, and retaining only records containing |Y|. In the second file, I have a field containing an employee number with an A as the first digit. The other larger file contains this same data, but with a lower case a.

The complete exercise, then, is to first reduce the first file to records containing Y in the field, surrounded by the pipe symbol. Next, I need to compare the records that match in the second file to the employee ID field, after making sure it is lower cased in both files. I will need to print out two, possibly three fields within records that match, perhaps fields 1, 3, and 4.

Can anyone give me a few key technology snippets on this so I don't keep struggling with it, and I will then apply that technology to my modest sized application, which I am writing in Perl - for speed and portability. I have used Perl before, but I have never become an expert, and it has been years since I used it last. I am confusing myself with pieces of different syntax and making a lot of silly mistakes, therefore I would appreciate some sound advice to set me back on course. I am reading up using a few classics - the Perl Cookbook and Programming Perl, but both are large books and daunting to get through. Until I can digest them, I'd appreciate some pointers to accelerate my learning, and more importantly, get a script in at least a minimally usable form ASAP. Therefore, I appreciate specific tips. I'll get better at it once I have digested the classic resources and actually done more coding to regain the experience.
Aug 4 '08 #1
Share this Question
Share on Google+
10 Replies


eWish
Expert 100+
P: 971
Is this school/class work? What have you tried so far?

Here is the best place to start. They have a wonderful search feature. If you have further problems once you have written your code, then post it here and we will be glad to assist you. Please use the code tags when posting code samples.

--Kevin
Aug 4 '08 #2

KevinADC
Expert 2.5K+
P: 4,059
ask over on www.unix.com in the Shell Programming and Scripting forum.
Aug 4 '08 #3

P: 7
Is this school/class work? What have you tried so far?

Here is the best place to start. They have a wonderful search feature. If you have further problems once you have written your code, then post it here and we will be glad to assist you. Please use the code tags when posting code samples.

--Kevin
No, this is not school work, it is for a task on a job. I am a project manager now so my coding skills have become stale.

I'll see if I can find anything helpful at perldoc.perl.org, but I have been hunting around since Friday without much success, plus I have been trying to digest stuff from the Perl Cookbook and Programming Perl, and I have been confusing myself.
Aug 4 '08 #4

P: 7
ask over on www.unix.com in the Shell Programming and Scripting forum.
I asked over at unix.com on Friday and have not had any responses at all.
Aug 4 '08 #5

P: 7
I put the following code in my script and I can now reduce the search on the first file to records containing |Y|.

I still need to compare this reduced file to a second file, taking the second field in the first file and comparing it to the fourth field in the second file, then output matching records into another file using the first, second, and fourth fields from the second file.

Any suggestions on how to do this?

Here is what I came up with to locate the |Y| in the first file:
Expand|Select|Wrap|Line Numbers
  1. while (<MINPUT>) {
  2.    if ( /\|Y\|/ ) {
  3.       print MOUTPUT;
  4.       my $line = <SINPUT>;
  5.    }
  6. }
Aug 4 '08 #6

KevinADC
Expert 2.5K+
P: 4,059
I put the following code in my script and I can now reduce the search on the first file to records containing |Y|.

I still need to compare this reduced file to a second file, taking the second field in the first file and comparing it to the fourth field in the second file, then output matching records into another file using the first, second, and fourth fields from the second file.

Any suggestions on how to do this?

Here is what I came up with to locate the |Y| in the first file:
Expand|Select|Wrap|Line Numbers
  1. while (<MINPUT>) {
  2.    if ( /\|Y\|/ ) {
  3.       print MOUTPUT;
  4.       my $line = <SINPUT>;
  5.    }
  6. }

Your code does not make sense. You are looping through <MINPUT> , but you assigning a value to $line from <SINPUT>, and since it is scpoed only to that very small block of code it can't be used for anything else.

Also, if the pattern |Y| can be in another part of the line you might ger false matches. Maybe you should split the input file and check the specific column for a 'Y' instead of pattern matching the entire line for a substring.

Post some sample data you are working with, explain it if necessary.
Aug 4 '08 #7

P: 7
I should take the SINPUT line out of there; it does nothing, but the rest of that loop actually works and it identifies lines containing |Y|, which is a unique pattern. I output the lines containing this match to a file, and I now want to compare that file to another file which has stuff that looks like this:

1460151|Duke,John|6021|a000822|SKYWRITER

It is the a000822 that I want to match in this second file with identical entries in the second field of the first file.

So what I would want to do is open the first file, read a record and set a variable to the second pipe delimited field, then read a record in the second file, examine the fourth field and look for an entry that matches the second field in that first file.

The first file looks like:

A000822|a000822|<lots of other fields that I do not care about - 34 in all>
Aug 4 '08 #8

P: 7
The sed replacement code now looks like this:

Expand|Select|Wrap|Line Numbers
  1.       while (<MINPUT>) {
  2.          if ( /\|Y\|/ ) {
  3.             print MOUTPUT;
  4.          }
  5.       }
Aug 4 '08 #9

KevinADC
Expert 2.5K+
P: 4,059
How big are the files?
Aug 4 '08 #10

P: 7
CLOSED: Program written, thank you very much!
Here is the final version (edited) of what I came up with:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl 
  2.  
  3. # File: MatchID.pl
  4. # Author: Brian Masinick
  5. # Initial Creation Date: August 1, 2008
  6.  
  7.  
  8. # Inputs:
  9. # M input feed file
  10. # S input feed file
  11.  
  12. # Intermediate outputs:
  13. # M output feed matching |Y|
  14.  
  15. # Final outputs:
  16. # S output file matching records in M output file with those in S input file.
  17.  
  18. # Required software: Perl 5.10
  19.  
  20. open (MINPUT,"<M_input_file") || die("Could not open M_input_file");
  21. open (MOUTPUT,">M_output_file") || die("Could not open M_output_file");
  22. open (SOUTPUT,">Sl_output_file") || die("Could not open S_output_file");
  23.  
  24. # This produces the intermediate file, so we have a written record of which set
  25. # of records we are actually processing to produce the final result.
  26.  
  27. while (<MINPUT>) {
  28.    if ( /\|Y\|/ ) {
  29.       print MOUTPUT;
  30.    }
  31. }
  32.  
  33. close (MINPUT,  M_input_file);
  34. close (MOUTPUT, M_output_file);
  35.  
  36. print "M registered users recorded.\n";
  37.  
  38. open (MOUTPUT,"<M_output_file") || die("Could not open M_output_file");
  39.  
  40. while (<MOUTPUT>) {
  41.         # split the input line into the @fields array
  42.         @fields = split "[|]";
  43.         # store the line in the %moutput hash indexed by the second field
  44.         $moutput{$fields[1]}=$_;
  45. }
  46.  
  47.  
  48. open (SINPUT,"<S_input_file") || die("Could not open S_input_file");
  49.  
  50. print "Reading S users file.\n";
  51.  
  52. while (<SINPUT>) {
  53.         # split the input line into the @fields array
  54.         @fields=split "[|]";
  55.         # if the fourth field exists in the %moutput hash, print the current
  56.         # input line
  57.         if (exists $moutput{$fields[3]}) { print SOUTPUT}
  58. }
  59.  
  60. print "Processing complete.\n";
  61.  
  62. close (MOUTPUT, $MIIS_output_file);
  63. close (SINPUT,  $Skytel_input_file);
  64. close (SOUTPUT, $Skytel_output_file);
  65.  
  66. print "Files closed. Program complete.\n";
  67.  
  68.  
Aug 5 '08 #11

Post your reply

Sign in to post your reply or Sign up for a free account.