By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,847 Members | 1,336 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,847 IT Pros & Developers. It's quick & easy.

extract repeating text segments

P: 3
I have an archive file (PIDATA) that contains multiple (>30) segments of text like this:

Expand|Select|Wrap|Line Numbers
  1. Archive[0]: d:\archives\piarch.012  (500MB, Used: 9.0%)
  2.         PIarcfilehead[$Workfile: piarfile.cxx $ $Revision: 114 $]::
  3.           Version: 5 Path: d:\archives\piarch.012
  4.           State: 4 Type: 0 (fixed) Write Flag: 1 Shift Flag: 1
  5.           Record Size: 1024 Count: 512000  Add Rate/Hour: 4118.3
  6.           Offsets: Primary: 25853/128000 Overflow: 491596/512000
  7.                Start Time: 1-Apr-08 22:02:38
  8.                  End Time: Current Time
  9.               Backup Time: 2-Apr-08 02:01:07
The program repeats this over and over, naming each segment "Archive[1,2,3..]" and I need to extract the bold sections, and print them on one line... for example, I'd like THIS:

Archive[0]: d:\archives\piarch.012 (500MB, Used: 9.0%) ..... Start Time: 1-Apr-08 22:02:38 ..... End Time: Current Time

ALL on one line.

Here's my PERL script, but it doesn't seem to work.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. while(<PIDATA>){
  3.  
  4.         if (m/Archive.[\d+].*/) {
  5.                 $m1 =~ "$MATCH";
  6. }
  7.         if (m/Start\sTime.*/) {
  8.                 $m2 =~ "$MATCH";
  9. }
  10.         if (m/End\sTime.*/) {
  11.                 $m3 =~ "$MATCH";
  12. }
  13. print "$m1 \s $m2 \s $m3\n\n";
  14.  
  15. }
  16.  
I tried to loop over the text file, and redirect the output, but the file is empty after running this.

HELP!
Apr 3 '08 #1
Share this Question
Share on Google+
5 Replies


nithinpes
Expert 100+
P: 410
The square brackets inside pattern need to be escaped, else it will be mistaken for character class. Also, what is the $MATCH that you are trying to match after matching the desired pattern.
From your description, I feel all you need is to print out those emphasised lines. Try this:
Expand|Select|Wrap|Line Numbers
  1. while(<PIDATA>){
  2.      print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
  3. }
  4.  
Apr 3 '08 #2

P: 3
I've removed my code and placed yours into the file piarchive.pl.... NO errors, but when I redirect output, the file is empty. Something's wrong...
I run this:

perl piarchive.pl > output

And I get the file "output" but it's empty.
Apr 3 '08 #3

nithinpes
Expert 100+
P: 410
I've removed my code and placed yours into the file piarchive.pl.... NO errors, but when I redirect output, the file is empty. Something's wrong...
I run this:

perl piarchive.pl > output

And I get the file "output" but it's empty.
For the given sample data, I got the desired output.
Expand|Select|Wrap|Line Numbers
  1. open(PIDATA,"data.txt") or die "open failed:$!";
  2. while(<PIDATA>){
  3.  print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
  4. }
  5.  
  6.  
On command line, I executed the following line:
Expand|Select|Wrap|Line Numbers
  1. perl archive.pl > C:\\output.txt
  2.  
The file output.txt contained:
Expand|Select|Wrap|Line Numbers
  1. Archive[0]: d:\archives\piarch.012  (500MB, Used: 9.0%)
  2.                Start Time: 1-Apr-08 22:02:38
  3.                  End Time: Current Time
  4.  
If you want this in one line, modify
Expand|Select|Wrap|Line Numbers
  1. while(<PIDATA>){
  2.  print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
  3. }
  4.  
  5.  
to:

Expand|Select|Wrap|Line Numbers
  1. while(<PIDATA>){
  2.  chomp;
  3.  print "$_ ..." if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
  4. }
  5.  
  6.  
Apr 4 '08 #4

nithinpes
Expert 100+
P: 410
Alternately, you can write into the output file within the script:
Expand|Select|Wrap|Line Numbers
  1. open(PIDATA,"data.txt") or die "open failed:$!";
  2. open(OUT,"output.txt") or die "create failed:$!";
  3. while(<PIDATA>){
  4.  chomp;
  5.  print OUT "$_ ..." if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
  6. }
  7.  
Apr 4 '08 #5

P: 3
Thanks for the help!
Apr 7 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.