By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,850 Members | 1,269 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,850 IT Pros & Developers. It's quick & easy.

how to match a paragraph by regexp?

100+
P: 170
hey guys,

another regexp problem which i'm probably not good at here goes
my text file has this


Expand|Select|Wrap|Line Numbers
  1.  /* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
  2. I AM A GIRL
  3.  /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
  4.  /* 4c COMMAND EXECUTED */ 
  5.  /* 4c SESSION=01547 USERID=SASTST 2008-05-13 09:46:12 */
  6.  /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */
  7.  /* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
  8.  I AM A BOY
  9.  /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
  10.  /* 4c COMMAND EXECUTED */ 
  11.  /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */
if u see it's like 2 paragraphs
i need to match the paragraphs as u can see each paragraph starts with "1 TB" and end with "2 TB" how can i match the one paragraph?

previously i was using this to match but this one only go through one line
how can i make it match a paragraph?

Expand|Select|Wrap|Line Numbers
  1. foreach $data1 (@$data)
  2. {
  3. chomp ($data1); 
  4.  
  5. if ($data1 =~ /1 TBHLR.+word/){ ## i replace the word with anything i wanna match
  6.  
May 13 '08 #1
Share this Question
Share on Google+
7 Replies


nithinpes
Expert 100+
P: 410
Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
Expand|Select|Wrap|Line Numbers
  1. open(IN,"data.txt")or die "failed:$!";
  2. @file=<IN>;
  3. foreach(@file) {
  4.  if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
  5. unless(/2 TB/) {
  6.    $str.=$_  ;     ## append lines until line with '2 TB' is reached
  7. else {
  8. $str.= $_ ; 
  9. push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
  10. undef $str;
  11. }  
  12. }
  13.  
  14. #print "\n\n$_\n\n" foreach(@data);
  15. foreach $data1 (@data)
  16. {
  17. ##### $data1 will be a paragraph now
  18.  
  19. }
  20.  
May 13 '08 #2

100+
P: 170
great thanks!

erm just some clarification

the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

and what's the use of putting undef $str??

correct me if i'm wrong
:)
May 14 '08 #3

100+
P: 170
Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
Expand|Select|Wrap|Line Numbers
  1. open(IN,"data.txt")or die "failed:$!";
  2. @file=<IN>;
  3. foreach(@file) {
  4.  if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
  5. unless(/2 TB/) {
  6.    $str.=$_  ;     ## append lines until line with '2 TB' is reached
  7. else {
  8. $str.= $_ ; 
  9. push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
  10. undef $str;
  11. }  
  12. }
  13.  
  14. #print "\n\n$_\n\n" foreach(@data);
  15. foreach $data1 (@data)
  16. {
  17. ##### $data1 will be a paragraph now
  18.  
  19. }
  20.  
hm...nithinpes
sorry, i think there could be a problem
cos say i wanna just the paragraph from "1 TB" to "Command Execute"
i just change to :

Expand|Select|Wrap|Line Numbers
  1. foreach (@file)
  2. if(/1 TB/) {$str = $_; next;}
  3. unless (/COMMAND EXECUTED/){
  4.     $str.=$_;
  5. }else{
  6. $str.=$_;
  7. push @data,$str;
  8. undef $str;
  9. }
  10. }
  11.  
  12. print "$_" foreach(@data);
but it still prints everything as per normal
it doesn't print the paragraph from "TB 1" to "COMMAND EXECUTED"
May 14 '08 #4

100+
P: 170
sorrie sorrie i define something wrongly in the earlier part of my script
alright it's working
thanks :)
u can go back to my previous question on "undef" and ".="
many thanks
:)
May 14 '08 #5

nithinpes
Expert 100+
P: 410
great thanks!

erm just some clarification

the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

and what's the use of putting undef $str??

correct me if i'm wrong
:)
The operator ".=" is used in string concatention context.For ex:
Expand|Select|Wrap|Line Numbers
  1. $a = "Hello";
  2. $b= " poolboi";
  3. $c= $a.$b; 
  4. print $c;  ## prints "Hello poolboi"
  5.  
In this case,
Expand|Select|Wrap|Line Numbers
  1. $str.=$_;
is equivalent to
Expand|Select|Wrap|Line Numbers
  1. $str=$str.=$_;
Regarding 'undef $str'. This is used to undefine $str(delete previously stored value) inorder to avoid concatenating string for the next paragraph to previous paragraph(string).
But this line is actually irrelevant in the above code as the line:
Expand|Select|Wrap|Line Numbers
  1. if(/1 TB/) {$str = $_ ; next;}
  2.  
will redefine/reassign the value from scratch whenever a line with '1 TB' pattern is seen.
May 14 '08 #6

100+
P: 170
hm..thanks for the explanation

hm..just discovered a problem
it's similar to on top

i got these info in my textfile now

Expand|Select|Wrap|Line Numbers
  1.  /* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
  2.  /* 4 ABC:DEFG=12334556,BSERV=T22; */
  3.  /* 4c COMMAND EXECUTED */ 
  4.  /* 1 TBHLR HLRi VTP-11 SESSION=01548 USERID=user STARTED 2008-05-
  5.  
wheni use this code:
Expand|Select|Wrap|Line Numbers
  1. foreach $data1 (@$data)
  2.     {
  3.       chomp ($data1); 
  4.     if ($data1 =~ /4 \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
  5.       print "$data1\n";     
  6. }
  7.  
it returns me

/* 4 ABC:DEFG=12334556,BSERV=T22; */

but when i use the paragraphing code...

Expand|Select|Wrap|Line Numbers
  1. foreach (@file)
  2. if(/3 SESSION/) {$str = $_; next;}
  3. unless (/4c COMMAND/){
  4.     $str.=$_;
  5.  
  6. }else{
  7. $str.=$_;
  8. push @data,$str;
  9. undef $str;
  10. }}
  11.  
  12. foreach $data1 (@data)
  13. {
  14. if($data1 =~ /4c \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
  15.       print "$data1\n";     
  16.  
  17. }}
  18.  
nothing returns

is there a problem with the 2nd code? i suppose it's suppose to return the same for both
May 14 '08 #7

nithinpes
Expert 100+
P: 410
That's the expected behaviour. In your first code, your are matching with one line at a time. Hence, it returns the line

/* 4 ABC:DEFG=12334556,BSERV=T22; */

which fits all the three matching conditions that you have put.
But in the second code, you are taking one paragraph at a time. So, for the first iteration $data1 will be
/* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
/* 4 ABC:DEFG=12334556,BSERV=T22; */
/* 4c COMMAND EXECUTED */

which doesn't return true for all the 3 match conditions.
May 14 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.