By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,369 Members | 1,207 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,369 IT Pros & Developers. It's quick & easy.

Text File Parsing - List Unique Column Values

P: 1
Am new to perl language , would really help if some of you assist me how to use a regex say for example this is my log file

000046571|1000025|CUSTOMER|27-JUN-2007 06:27:59|005|DEFAULT
000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|013|DEFAULT
000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|018|MENU

i want to take only the 6th field(DEFAULT) by using my following perl srcipt

cut -d \| -f6 <filename> | sort | uniq

so that i take only the unique fields(DEFAULT, MENU. etc.) from the log file.

When i run the above command i get duplicate fields also which should not be the case.I guess at the end of every row there is a space so am not able to take unique fields.

Reallly apprecaite if anyof you could help me from this?
Aug 13 '07 #1
Share this Question
Share on Google+
3 Replies

P: 4
A couple of things:
- that works fine here (on Solaris 8)! What OS are you on?
- I assume that you're seeing the following output:
is that right? If not, what output are you seeing?
(I see:

- Also, this isn't anything to do with perl - I'm new here too, so I'm not sure how strict people are, but you may get better help in the Linux / Unix / BSD forum.


Aug 13 '07 #2

Expert Mod 2.5K+
P: 3,503
Hello! Just as a note, when you post to the forum, be sure to post the code that you have tried thus far, that way, we can help you work out any errors/issues you are experiencing.

As for your issue, I am in a code writing mood today and have whipped up something really quick.

You mentioned using a regular expression to pull out the 6th field. Sure, you could do that, but that's kind of like using a hammer to open the door by breaking the window when you have the key in your pocket.

Instead, since your fields are "|" (pipe) delimited, why not feed each line into the split function using a while loop, and pull out the 6th field (or element [5] of the array)? To me, this was much easier. Save the regex's for when you really need them, of course, don't let me stop you from trying that route as it is great to learn regex's if you haven't already.

Here is the code:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  4. open(FILE, "<Text1.txt");
  6. while(<FILE>)
  7. {
  8.     chomp($_);
  9.     my @line = split(/\|/, $_);
  11.     print("$line[5]\n");
  12. }

Also, your line of code:

Expand|Select|Wrap|Line Numbers
  1. cut -d \| -f6 <filename> | sort | uniq
is a line from a shell script, not Perl. Sure, you could put that inside back tics or a system() function, but if you are going to code in Perl, then do so.


Aug 13 '07 #3

Expert 100+
P: 1,089
As has already been stated, this is not truly a perl issue. However, it can be solved fairly easily with perl.

The following code prints out only the unique values for the 6th column in the DATA file handle:

Expand|Select|Wrap|Line Numbers
  1. my %seen = ();
  2. while (<DATA>) {
  3.     chomp;
  4.     my @columns = split '\|';
  5.     print "$columns[5]\n" if ! $seen{$columns[5]}++;
  6. }
  8. __DATA__
  9. 000046571|1000025|CUSTOMER|27-JUN-2007 06:27:59|005|DEFAULT
  10. 000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|013|DEFAULT
  11. 000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|018|MENU
This technique is documented here:

perlfaq4 Data Manipulation - How can I remove duplicate elements from a list or array?

- Miller
Aug 14 '07 #4

Post your reply

Sign in to post your reply or Sign up for a free account.