473,387 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Extracting data from HTML using PERL Regex

I have two files, xml and an html and need to extract data from these on certain patterns. my XML file is pretty well formatted and i can use getline to read a line and search data between tags.

if($line =~ /\$varvalue\</tag1>/)

However, for my HTML, it has one of the worst code i have scene and the file is like...


Expand|Select|Wrap|Line Numbers
  1. <div class="theater">
  2.                                         <h2>
  3.  
  4. <a href="/showtimes/university-village-3" >**University Village 3**</a></h2>
  5.                                         <div class="address">
  6.                                             <i>**3323 South Hoover Street, Los Angeles CA 90007 | (213) 748-6321**</i>
  7.                                         </div>
  8.                                     </div>
  9.  
  10.  
  11.                                               <div class="mtitle">
  12.  
  13.  
  14. <a href="/movie/dream-house-2011"  title="Dream House" onmouseover="mB(event, 771204354);"  >**Dream House**</a>
  15.                                                             <span>**(PG-13 , 1 hr. 31 min.)**</span>
  16.                                                         </div>
  17.  
  18.  
  19.  
  20.  
  21.                                                 <div class="times">
  22.  
  23.                                                                         **1:00 PM,**
  24.                                                                                                         </div>
  25.  
  26.  
  27.  
Oct 16 '11 #1
1 2173
RonB
589 Expert Mod 512MB
You should not be using a regex for parsing xml or html. You should be using one of the parsers on cpan such as HTML::Parser.

This list contains several html parsers.
http://search.cpan.org/modlist/World_Wide_Web/HTML
Oct 16 '11 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Markus Ernst | last post by:
Hello I have a regex problem, spent about 7 hours on this now, but I don't find the answer in the manual and googling, though I think this must have been discussed before. I try to simply...
4
by: RK | last post by:
Hi, In my application, I need to copy data from an Excel file into a SQL table. The article related to this can be found at http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B306572 ...
4
by: Friday | last post by:
Being an Old L.A.M.P guy, I beg you to please excuse my ignorance of dot.net (and all things Windows, for that matter). As part of an experiment (to learn enough ASP/VB.net to port a series of ...
1
by: prashantkuppa | last post by:
please help me anyone, here i am struck up with extracting check box status from pdf file using perl please give anybody some suggestions its urgent thanks prashant
5
by: ogo796 | last post by:
hi everyone.i want to convert rtf file to html so that people can view file in an html format on the site can anyone give me the idea of how to do it.i want to use perl to execute unrtf software to...
3
rajiv07
by: rajiv07 | last post by:
Hi to all, We can store image in mysql using MEDIUMBLOG data type.I want to know how to insert or select and store an image data type using perl. Is any idea Please Thanks. Regards ...
4
by: poolboi | last post by:
hi guys i've having some problem extracting data from a text file example if i got a text file with infos like: Date 2008-05-01 Time 22-10 Date 2008-05-01 Time 21-00 Date 2008-05-02 Time...
3
by: Davo1977 | last post by:
Does anyone know a regular expression that will rename multiple files that have different extensions to have the same extension. For example, you could use this code when several text files exist in...
9
by: happyse27 | last post by:
Hi All, In perl script(item b below) where we check if html registration form are filled in properly without blank with the necessary fields, how to prompt users that the field are incomplete...
3
by: Sidra Nisar | last post by:
Hello……I am working on voice transmission control through linux as my project in high school……I need to know if the comments I have made on the following program are correct…. Object of the perl...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.