Connecting Tech Pros Worldwide Forums | Help | Site Map

Extracting data from raw HTML

Newbie
 
Join Date: Aug 2007
Posts: 3
#1: Aug 21 '07
Hi everybody,
I had generated the links from and stored the web pages as text and i need to extract some fields from that text file using pattern matching

some portion of my text file is

Expand|Select|Wrap|Line Numbers
  1. <p><span>Contact Name:</span>Kent Busse</p>
  2.  
  3. <p><span>Contact Title:</span>Owner</p>
  4.  
Now here is the which i had tried so far by parsing...

use bytes;
Expand|Select|Wrap|Line Numbers
  1.      $parser = HTML::Parser->new(text_h => [ sub { print TMPFILE shift },"dtext" ]);
  2.     no bytes;
  3.     $parser->parse($lines); #Parsing HTML files
  4.  @lines= $lines;
  5.      $name_tel;
  6.     #my @content = <TMPFILE>;
  7.      $temp;
  8.      $cur_line;
  9.      $prev_line;
  10.      $full_content;
  11.     foreach  $temp(@lines)
  12.         { 
  13.         # Searching Name and Telephone Number from the parsed text
  14.     #    $temp = trim($cur_line);
  15.         $temp =~ s/^\s+//;
  16.         $temp =~ s/\s+$//;
  17.         chomp($temp);
  18.         $full_content .= $temp;
  19.         }
  20.  
  21.         if ($cur_line ="~m/<h1> .+ </h1>/") 
  22.            {
  23.             print $prev_line .",";
  24.             $name_tel .= $prev_line;
  25.             push(@name_tel, $prev_line);
  26.             $worksheet->write($row, 0, $prev_line); #Writing content in Excel Sheet
  27.             $col++;
  28.             }
  29.  
  30.         if ($cur_line =~m/<p> .+ <br>/) 
  31.            {
  32.             print $prev_line .",";
  33.             $name_tel .= $prev_line;
  34.             push(@name_tel, $prev_line);
  35.             $worksheet->write($row, 1, $prev_line); #Writing content in Excel Sheet
  36.             $col++;
  37.             } 
etc..

I had written like this... Can anyone guide me in proceeding further..

pbmods's Avatar
Site Moderator
 
Join Date: Apr 2007
Location: Texas
Posts: 5,435
#2: Aug 21 '07

re: Extracting data from raw HTML


Heya, deepuceg. Welcome to TSDN!

Please use CODE tags when posting source code. See the REPLY GUIDELINES on the right side of the page next time you post.

Changed thread title to better describe the problem (did you know that threads whose titles contain phrases such as, 'need help' actually get FEWER responses?).
Reply