I'm trying to get my script to parse a bunch of files and grab data between the <title></> and <blah></> tags. Yes yes, I'm parsing html with regex, it works though. :)
The issue I have is sometimes there is one line, sometimes 30 lines, between <title> and <blah> so I can't just .+ it all the way. Plus there are multiple <blah> tags in each file. I'm looking for a way for to scan the file for <title>, assign to $1, then search for every instance of <blah> and assign to $2 and upwards as necessary. Then print to the tab file $1 \t $2 \t $3 etc. Boy I hope that jibberish made sense lol. I'm new so offering an explanation with hardcore jargon might not be good for me. Here's what I have so far: - #!/usr/bin/env perl
-
#fix.py
-
-
-
$dir = 'e:\\tmp';
-
$outdir = "newfiles";
-
$tabfile = "tabdata.txt";
-
-
-
-
-
### EDIT CAREFULLY BELOW HERE :) ###
-
open(TAB, ">$dir\\$outdir\\$tabfile");
-
print TAB ("Item Name\tItem Number\tCost\tAdd\tIn All\n");
-
open(PARTNUMBER, "$dir\\$outdir\\partnumber.txt");
-
while (<PARTNUMBER>) {
-
chomp;
-
$i = $_;
-
}
-
close(PARTNUMBER);
-
print "Opening $dir\n";
-
opendir(DH,$dir);
-
while (defined ( my $filename = readdir(DH))) {
-
if ($filename =~ m/\.htm/ ) {
-
$outfilename=">$dir\\$outdir\\$filename";
-
print "Opening $filename\n";
-
open(FHI,$filename);
-
while (<FHI>) {
-
$html .= $_;
-
}
-
close(FHI);
-
while ($html =~ s/<title>(.+?)<\/title>/$1$2$3$4/)
-
{
-
print TAB ("$1\t$2\t$3\t$i\n");
-
open (PARTNUMBER, ">$dir\\$outdir\\partnumber.txt");
-
print PARTNUMBER ($i);
-
close(PARTNUMBER);
-
print "$i matches foung in $filename\n";
-
print "Saving to $outfilename\n";
-
open(FHO, $outfilename);
-
print FHO ($html);
-
close(FHO);
-
}
-
}
-
$html = '';
-
}
-
print "Done\n";
-
Thanks in advance!
2 1426
some sample input and sample output would probably help.
Is this the line you are using to capture the data between the title tags? - while ($html =~ s/<title>(.+?)<\/title>/$1$2$3$4/)
The reason I as is because s/// is the substitution operator.
--Kevin
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Michael Hogan |
last post by:
I want to pars a playlist file for three different varibles, so I can save
them as mp3 files. I am using:
strTEMPURL = GetUrlSource(Text1.Text)
to put the entire .pls file into a strTEMPURL...
|
by: Todd Moyer |
last post by:
I would like to use Python to parse a *python-like* data description
language. That is, it would have it's own keywords, but would have a
syntax like Python. For instance:
Ob1 ('A'):
Ob2...
|
by: netbogus |
last post by:
hi,
I have a file stored in memory using mmap() and I'd like to parse to
read line by line.
Also, there are several threads that read this buffer so I think
strtok(p, "\n") wouldnt be a good...
|
by: Lucas Tam |
last post by:
Hi all,
Does anyone know of a GOOD example on parsing text with text qualifiers?
I am hoping to parse text with variable length delimiters/qualifiers. Also,
qualified text could run onto...
|
by: Jacob Rael |
last post by:
Hello,
I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to...
|
by: Robert Neville |
last post by:
Basically, I want to create a table in html, xml, or xslt; with any
number of regular expressions; a script (Perl or Python) which reads
each table row (regex and replacement); and performs the...
|
by: Chris Carlen |
last post by:
Hi:
Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.
I have a very simple...
|
by: Rick Owen |
last post by:
Greetings,
I have a form that, when submitted, calls a plsql procedure. The form
has a number of fields (text, hidden, select, radio) but the
particular field that is giving me problems is a...
|
by: castor. |
last post by:
hi all,
i have two tables
BOOK
------------------------------------
CODE NOT NULL NUMBER
TITLE VARCHAR2(45)
YEAR NUMBER
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |