Login or Sign up Help | Site Map
Connecting Tech Pros Worldwide

how to match a paragraph by regexp?

Question posted by: poolboi (Familiar Sight) on May 13th, 2008 06:13 AM
hey guys,

another regexp problem which i'm probably not good at here goes
my text file has this


Code: ( text )
  1. /* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
  2. I AM A GIRL
  3.  /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
  4.  /* 4c COMMAND EXECUTED */
  5.  /* 4c SESSION=01547 USERID=SASTST 2008-05-13 09:46:12 */
  6.  /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */
  7.  /* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
  8.  I AM A BOY
  9.  /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
  10.  /* 4c COMMAND EXECUTED */
  11.  /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */


if u see it's like 2 paragraphs
i need to match the paragraphs as u can see each paragraph starts with "1 TB" and end with "2 TB" how can i match the one paragraph?

previously i was using this to match but this one only go through one line
how can i make it match a paragraph?

Code: ( text )
  1. foreach $data1 (@$data)
  2. {
  3. chomp ($data1);
  4.  
  5. if ($data1 =~ /1 TBHLR.+word/){ ## i replace the word with anything i wanna match
Last edited by eWish : May 13th, 2008 at 12:55 PM. Reason: Added code tags
Would you like to answer this question?
Sign up for a free account, or Login (if you're already a member).
nithinpes's Avatar
nithinpes
Expert
234 Posts
May 13th, 2008
08:48 AM
#2

Re: how to match a paragraph by regexp?
Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
Code: ( text )
  1. open(IN,"data.txt")or die "failed:$!";
  2. @file=<IN>;
  3. foreach(@file) {
  4.  if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
  5. unless(/2 TB/) {
  6.    $str.=$_  ;     ## append lines until line with '2 TB' is reached
  7. }
  8. else {
  9. $str.= $_ ;
  10. push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
  11. undef $str;
  12. }
  13.  
  14. #print "\n\n$_\n\n" foreach(@data);
  15. foreach $data1 (@data)
  16. {
  17. ##### $data1 will be a paragraph now
  18.  
  19. }

Reply
poolboi's Avatar
poolboi
Familiar Sight
170 Posts
May 14th, 2008
02:27 AM
#3

Re: how to match a paragraph by regexp?
great thanks!

erm just some clarification

the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

and what's the use of putting undef $str??

correct me if i'm wrong
:)

Reply
poolboi's Avatar
poolboi
Familiar Sight
170 Posts
May 14th, 2008
02:41 AM
#4

Re: how to match a paragraph by regexp?
Quote:
Originally Posted by nithinpes
Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
Code: ( text )
  1. open(IN,"data.txt")or die "failed:$!";
  2. @file=<IN>;
  3. foreach(@file) {
  4.  if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
  5. unless(/2 TB/) {
  6.    $str.=$_  ;     ## append lines until line with '2 TB' is reached
  7. }
  8. else {
  9. $str.= $_ ;
  10. push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
  11. undef $str;
  12. }
  13.  
  14. #print "\n\n$_\n\n" foreach(@data);
  15. foreach $data1 (@data)
  16. {
  17. ##### $data1 will be a paragraph now
  18.  
  19. }


hm...nithinpes
sorry, i think there could be a problem
cos say i wanna just the paragraph from "1 TB" to "Command Execute"
i just change to :

Code: ( text )
  1. foreach (@file)
  2. {
  3. if(/1 TB/) {$str = $_; next;}
  4. unless (/COMMAND EXECUTED/){
  5.     $str.=$_;
  6. }else{
  7. $str.=$_;
  8. push @data,$str;
  9. undef $str;
  10. }
  11. }
  12.  
  13. print "$_" foreach(@data);


but it still prints everything as per normal
it doesn't print the paragraph from "TB 1" to "COMMAND EXECUTED"

Reply
poolboi's Avatar
poolboi
Familiar Sight
170 Posts
May 14th, 2008
02:47 AM
#5

Re: how to match a paragraph by regexp?
sorrie sorrie i define something wrongly in the earlier part of my script
alright it's working
thanks :)
u can go back to my previous question on "undef" and ".="
many thanks
:)

Reply
nithinpes's Avatar
nithinpes
Expert
234 Posts
May 14th, 2008
04:12 AM
#6

Re: how to match a paragraph by regexp?
Quote:
Originally Posted by poolboi
great thanks!

erm just some clarification

the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

and what's the use of putting undef $str??

correct me if i'm wrong
:)


The operator ".=" is used in string concatention context.For ex:
Code: ( text )
  1. $a = "Hello";
  2. $b= " poolboi";
  3. $c= $a.$b;
  4. print $c;  ## prints "Hello poolboi"

In this case,
Code: ( text )
  1. $str.=$_;
is equivalent to
Code: ( text )
  1. $str=$str.=$_;


Regarding 'undef $str'. This is used to undefine $str(delete previously stored value) inorder to avoid concatenating string for the next paragraph to previous paragraph(string).
But this line is actually irrelevant in the above code as the line:
Code: ( text )
  1. if(/1 TB/) {$str = $_ ; next;}

will redefine/reassign the value from scratch whenever a line with '1 TB' pattern is seen.

Reply
poolboi's Avatar
poolboi
Familiar Sight
170 Posts
May 14th, 2008
06:47 AM
#7

Re: how to match a paragraph by regexp?
hm..thanks for the explanation

hm..just discovered a problem
it's similar to on top

i got these info in my textfile now

Code: ( text )
  1. /* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
  2.  /* 4 ABC:DEFG=12334556,BSERV=T22; */
  3.  /* 4c COMMAND EXECUTED */
  4.  /* 1 TBHLR HLRi VTP-11 SESSION=01548 USERID=user STARTED 2008-05-


wheni use this code:
Code: ( text )
  1. foreach $data1 (@$data)
  2.     {
  3.       chomp ($data1);
  4.     if ($data1 =~ /4 \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
  5.       print "$data1\n"; 
  6. }

it returns me

/* 4 ABC:DEFG=12334556,BSERV=T22; */

but when i use the paragraphing code...

Code: ( text )
  1. foreach (@file)
  2. {
  3. if(/3 SESSION/) {$str = $_; next;}
  4. unless (/4c COMMAND/){
  5.     $str.=$_;
  6.  
  7. }else{
  8. $str.=$_;
  9. push @data,$str;
  10. undef $str;
  11. }}
  12.  
  13. foreach $data1 (@data)
  14. {
  15. if($data1 =~ /4c \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
  16.       print "$data1\n"; 
  17.      
  18. }}

nothing returns

is there a problem with the 2nd code? i suppose it's suppose to return the same for both

Reply
nithinpes's Avatar
nithinpes
Expert
234 Posts
May 14th, 2008
08:53 AM
#8

Re: how to match a paragraph by regexp?
That's the expected behaviour. In your first code, your are matching with one line at a time. Hence, it returns the line

/* 4 ABC:DEFG=12334556,BSERV=T22; */

which fits all the three matching conditions that you have put.
But in the second code, you are taking one paragraph at a time. So, for the first iteration $data1 will be
/* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
/* 4 ABC:DEFG=12334556,BSERV=T22; */
/* 4c COMMAND EXECUTED */

which doesn't return true for all the 3 match conditions.

Last edited by nithinpes : May 14th, 2008 at 08:55 AM. Reason: removed initial quote
Reply
Reply
Not the answer you were looking for? Post your question . . .
178,100 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

Top Perl Forum Contributors