473,396 Members | 2,147 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Extracting data from raw HTML

Hi everybody,
I had generated the links from and stored the web pages as text and i need to extract some fields from that text file using pattern matching

some portion of my text file is

Expand|Select|Wrap|Line Numbers
  1. <p><span>Contact Name:</span>Kent Busse</p>
  2.  
  3. <p><span>Contact Title:</span>Owner</p>
  4.  
Now here is the which i had tried so far by parsing...

use bytes;
Expand|Select|Wrap|Line Numbers
  1.      $parser = HTML::Parser->new(text_h => [ sub { print TMPFILE shift },"dtext" ]);
  2.     no bytes;
  3.     $parser->parse($lines); #Parsing HTML files
  4.  @lines= $lines;
  5.      $name_tel;
  6.     #my @content = <TMPFILE>;
  7.      $temp;
  8.      $cur_line;
  9.      $prev_line;
  10.      $full_content;
  11.     foreach  $temp(@lines)
  12.         { 
  13.         # Searching Name and Telephone Number from the parsed text
  14.     #    $temp = trim($cur_line);
  15.         $temp =~ s/^\s+//;
  16.         $temp =~ s/\s+$//;
  17.         chomp($temp);
  18.         $full_content .= $temp;
  19.         }
  20.  
  21.         if ($cur_line ="~m/<h1> .+ </h1>/") 
  22.            {
  23.             print $prev_line .",";
  24.             $name_tel .= $prev_line;
  25.             push(@name_tel, $prev_line);
  26.             $worksheet->write($row, 0, $prev_line); #Writing content in Excel Sheet
  27.             $col++;
  28.             }
  29.  
  30.         if ($cur_line =~m/<p> .+ <br>/) 
  31.            {
  32.             print $prev_line .",";
  33.             $name_tel .= $prev_line;
  34.             push(@name_tel, $prev_line);
  35.             $worksheet->write($row, 1, $prev_line); #Writing content in Excel Sheet
  36.             $col++;
  37.             } 
etc..

I had written like this... Can anyone guide me in proceeding further..
Aug 21 '07 #1
1 1690
pbmods
5,821 Expert 4TB
Heya, deepuceg. Welcome to TSDN!

Please use CODE tags when posting source code. See the REPLY GUIDELINES on the right side of the page next time you post.

Changed thread title to better describe the problem (did you know that threads whose titles contain phrases such as, 'need help' actually get FEWER responses?).
Aug 21 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Trader | last post by:
Hi, I'm trying to use Mark Hammond's win32clipboard module to extract more complex data than just plain ASCII text from the Windows clipboard. For instance, when you select all the content on...
2
by: mtp1032 | last post by:
I need to be able to extract the values from an XmlRpcValue where I do not know in advance what the keys are, or how many exist. For example, suppose I have an XmlRpcValue, object, returned by...
5
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past,...
1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
1
by: v0lcan0 | last post by:
Any help on extracting the time part from the datetime field in SQL database. even though i had entered only the time part in the database when i extract the field it gives me only the date...
4
by: Moogy | last post by:
I'm pulling my hair out here. First, I'm new to XML, so that doesn't help, but none of this makes any sense to me. All I'm trying to do is take a simple source XML file and translate it with an...
2
by: gee57 | last post by:
I am trying ti extract data from a html file in an xml format. The html contains java script and I only want a small part of the file to be extracted. Can anyone tell where I can get C source code...
3
by: news | last post by:
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results...
3
by: Johny | last post by:
Does anyone know about a good regular expression for URL extracting? J.
6
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.