469,342 Members | 5,583 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,342 developers. It's quick & easy.

Reading text file

I have the following short script that I'm using to clean up the source of a
web page in order to index and search the page:

#!/usr/bin/perl
#striphtml.pl

undef $/;
open FD, "< testfile1.txt" or die $!;

while (<FD>) {
#s/\r\n//gs;

#s/^\s+$//;
s/<.*?>//gs;
trim();
print "$_";
}

sub trim {

my @out = @_ ? @_ : $_;
$_ = join(' ', split(' ')) for @out;
return wantarray ? @out : "@out";
}
the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?

Kevin

Jul 19 '05 #1
1 4039
This newsgroup is defunct. You will reach more people if you post in
comp.lang.perl.misc instead.

"Kevin B" <ka*****@verizon.net> wrote in message news:<Gl*******************@nwrdny01.gnilink.net>. ..
undef $/;
Ok, you're slurping the whole file in at once...
open FD, "< testfile1.txt" or die $!;

while (<FD>) {
No real point in a while, if you're getting the whole file in one
read. Just do
$_ = <FD>;
s/<.*?>//gs;
strip out all the tags...
print "$_";
No need for the quotes. In this case, no need for an argument at all.
Just
print;
the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?


Maybe something like
tr/\n//s;
or
s/\n\s*\n/\n/g;
?
Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by fabrice | last post: by
19 posts views Thread by Lionel B | last post: by
50 posts views Thread by Michael Mair | last post: by
2 posts views Thread by Sabin Finateanu | last post: by
4 posts views Thread by Amit Maheshwari | last post: by
3 posts views Thread by The Cool Giraffe | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.