473,396 Members | 2,004 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Newbie...parsing from multiple lines.

3
I'm trying to get my script to parse a bunch of files and grab data between the <title></> and <blah></> tags. Yes yes, I'm parsing html with regex, it works though. :)

The issue I have is sometimes there is one line, sometimes 30 lines, between <title> and <blah> so I can't just .+ it all the way. Plus there are multiple <blah> tags in each file. I'm looking for a way for to scan the file for <title>, assign to $1, then search for every instance of <blah> and assign to $2 and upwards as necessary. Then print to the tab file $1 \t $2 \t $3 etc. Boy I hope that jibberish made sense lol. I'm new so offering an explanation with hardcore jargon might not be good for me. Here's what I have so far:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env perl
  2. #fix.py 
  3.  
  4.  
  5. $dir = 'e:\\tmp';
  6. $outdir = "newfiles";
  7. $tabfile = "tabdata.txt";
  8.  
  9.  
  10.  
  11.  
  12. ### EDIT CAREFULLY BELOW HERE :) ###
  13. open(TAB, ">$dir\\$outdir\\$tabfile");
  14. print TAB ("Item Name\tItem Number\tCost\tAdd\tIn All\n");
  15. open(PARTNUMBER, "$dir\\$outdir\\partnumber.txt");
  16. while (<PARTNUMBER>) {
  17.     chomp;
  18.     $i = $_;
  19. }
  20. close(PARTNUMBER);
  21. print "Opening $dir\n";
  22. opendir(DH,$dir);
  23. while (defined ( my $filename = readdir(DH))) {
  24.     if ($filename =~ m/\.htm/ ) {
  25.         $outfilename=">$dir\\$outdir\\$filename";
  26.         print "Opening $filename\n";
  27.         open(FHI,$filename);
  28.         while (<FHI>) {
  29.         $html .= $_;
  30.         }
  31.         close(FHI);
  32.             while ($html =~ s/<title>(.+?)<\/title>/$1$2$3$4/)
  33.             {
  34.         print TAB ("$1\t$2\t$3\t$i\n");
  35.         open (PARTNUMBER, ">$dir\\$outdir\\partnumber.txt");
  36.         print PARTNUMBER ($i);
  37.         close(PARTNUMBER);
  38.                 print "$i matches foung in $filename\n";
  39.                 print "Saving to $outfilename\n";
  40.             open(FHO, $outfilename);
  41.             print FHO ($html);
  42.             close(FHO);
  43.             }
  44.         }
  45.         $html = '';
  46. }
  47. print "Done\n";
  48.  
Thanks in advance!
Mar 25 '08 #1
2 1426
KevinADC
4,059 Expert 2GB
some sample input and sample output would probably help.
Mar 26 '08 #2
eWish
971 Expert 512MB
Is this the line you are using to capture the data between the title tags?
Expand|Select|Wrap|Line Numbers
  1. while ($html =~ s/<title>(.+?)<\/title>/$1$2$3$4/)
The reason I as is because s/// is the substitution operator.

--Kevin
Mar 26 '08 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Michael Hogan | last post by:
I want to pars a playlist file for three different varibles, so I can save them as mp3 files. I am using: strTEMPURL = GetUrlSource(Text1.Text) to put the entire .pls file into a strTEMPURL...
2
by: Todd Moyer | last post by:
I would like to use Python to parse a *python-like* data description language. That is, it would have it's own keywords, but would have a syntax like Python. For instance: Ob1 ('A'): Ob2...
8
by: netbogus | last post by:
hi, I have a file stored in memory using mmap() and I'd like to parse to read line by line. Also, there are several threads that read this buffer so I think strtok(p, "\n") wouldnt be a good...
7
by: Lucas Tam | last post by:
Hi all, Does anyone know of a GOOD example on parsing text with text qualifiers? I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto...
6
by: Jacob Rael | last post by:
Hello, I have a simple script to parse a text file (a visual basic program) and convert key parts to tcl. Since I am only working on specific sections and I need it quick, I decided not to...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
1
by: Rick Owen | last post by:
Greetings, I have a form that, when submitted, calls a plsql procedure. The form has a number of fields (text, hidden, select, radio) but the particular field that is giving me problems is a...
3
by: castor. | last post by:
hi all, i have two tables BOOK ------------------------------------ CODE NOT NULL NUMBER TITLE VARCHAR2(45) YEAR NUMBER
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.