By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,220 Members | 1,542 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,220 IT Pros & Developers. It's quick & easy.

check html file size

P: n/a
would anyone like to translate the following perl script to Python or
Scheme (scsh)?

the file takes a inpath, and report all html files in it above certain
size. (counting inline images)
also print a sorted report of html files and their size.

(a copy of the script is here:
http://xahlee.org/_scripts/check_file_size.pl
)

Xah
xa*@xahlee.org
http://xahlee.org/
# perl

# Tue Oct 4 14:36:48 PDT 2005
# given a dir, report all html file's size. (counting inline images)
# XahLee.org

use Data::Dumper;
use File::Find;
use File::Basename;

$inpath = '/Users/t/web/mydirectory/';
$sizeLimit = 800 * 1000;

# $inpath = $ARGV[0]; # should give a full path; else the
$File::Find::dir won't give full path.
while ($inpath =~ m@^(.+)/$@) { $inpath = $1;} # get rid of trailing
slash

die "dir $inpath doesn't exist! $!" unless -e $inpath;
##################################################
# subroutines
# getInlineImg($file_full_path) returns a array that is a list of
inline images. For example, it may return ('xx.jpg','../image.png')
sub getInlineImg ($) { $full_file_name= $_[0];
@linx =(); open (FF, "<$full_file_name") or die "error: can not open
$full_file_name $!";
while (<FF>) { @txt_segs = split(m/img/, $_); shift @txt_segs;
for $lin (@txt_segs) { if ($lin =~ m@ src\s*=\s*\"([^\"]+)\"@i) {
push @linx, $1; }}
} close FF;
return @linx;
}

# linkFullPath($dir,$locallink) returns a string that is the full path
to the local link. For example,
linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
'Users/t/public_html/a/image/t.png'. The returned result will not
contain double slash or '../' string.
sub linkFullPath($$){ $result=$_[0] . $_[1]; while ($result =~
s@\/\/@\/@) {}; while ($result =~ s@/[^\/]+\/\.\.@@) {}; return
$result;}
# listLocalLinks($html_file_full_path) returns a array where each
element is a full path of local links in the html.
sub listLocalLinks($) {
my $htmlfile= $_[0];

my ($name, $dir, $suffix) = fileparse($htmlfile, ('\.html') );
my @aa = getlinks($htmlfile);
@aa = grep(!m/\#/, @aa);
@aa = grep (!m/^mailto:/, @aa);
@aa = grep (!m/^http:/, @aa);

my @linkedFiles=();
foreach my $lix (@aa) { push @linkedFiles, linkFullPath($dir,$lix);}
return @linkedFiles;
}
# listInlineImg($html_file_full_path) returns a array where each
element is a full path to inline images in the html.
sub listInlineImg($) {
my $htmlfile= $_[0];

my ($name, $dir, $suffix) = fileparse($htmlfile, ('\.html') );
my @aa = getInlineImg($htmlfile);

my @result=();
foreach my $ele (@aa) { push @result, linkFullPath($dir,$ele);}
return @result;
}

##################################################
sub checkLink {
if (
-T $File::Find::name
&& $File::Find::name =~ m@\.html$@
) {
$total= -s $File::Find::name;
@h2 = listInlineImg($File::Find::name);
for my $ln (@h2) {$total += -s $ln;};
if ( $total > $sizeLimit) {print "problem: file:
$File::Find::name, size: $total\n";}

push (@result, [$total, $File::Find::name]);
};
}

find(\&checkLink, $inpath);

@result = sort { $b->[0] <=> $a->[0]} @result;

print Dumper(\@result);
print "done reporting. (any file above size are printed above.)";

__END__

Oct 5 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a

"Xah Lee" <xa*@xahlee.org> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
would anyone like to translate the following perl script to Python or
Scheme (scsh)?


Even if you weren't an incredibly offensive and petulant poster, what makes
you think anyone would write a script from you?

Matt
Oct 5 '05 #2

P: n/a
On 2005-10-05, Xah Lee <xa*@xahlee.org> wrote:
would anyone like to translate the following perl script to
Python or Scheme (scsh)?


Sure. It'll cost you $110/hour with a 2-hour minimum. Where do
I send the invoice?

--
Grant Edwards grante Yow! I'll take ROAST BEEF
at if you're out of LAMB!!
visi.com
Oct 5 '05 #3

P: n/a
Matt Garrish wrote:
Even if you weren't an incredibly offensive and petulant poster, what makes
you think anyone would write a script from you?


Because in addition to being offensive and petulant, he's also an idiot.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
There is no fate that cannot be surmounted by scorn.
-- Albert Camus
Oct 5 '05 #4

P: n/a
Xah Lee <xa*@xahlee.org> wrote:
would anyone like to translate the following perl script to Python or
Scheme (scsh)?

Yes, I would.
--
Tad McClellan SGML consulting
ta***@augustmail.com Perl programming
Fort Worth, Texas
Oct 5 '05 #5

P: n/a
On Tue, 04 Oct 2005 17:44:02 -0700, Xah Lee wrote:
would anyone like to translate the following perl script to Python or
Scheme (scsh)?


Are you fucking seriously fucking expecting some fucking moron to
translate your tech geeking fucking code moronicity? Fucking try writing
it fucking properly in fucking Perl first.

--
I guess everybody's the same: Gotta be good at your job before you can enjoy the rest of your life
-- Cole Trickle

Oct 5 '05 #6

P: n/a
Richard Gration wrote:
Are you fucking seriously fucking expecting some fucking moron to
translate your tech geeking fucking code moronicity? Fucking try writing
it fucking properly in fucking Perl first.


Fucking excuse me?

Fucking maybe you should fucking go fucking fuck your fucking self...

Seriously, Xah might be a troll, but this is just pathetic.

--
We're glad that graduates already know Java,
so we only have to teach them how to program.
somewhere in a German company
(credit to M. Felleisen and M. Sperber)
Oct 5 '05 #7

P: n/a
Richard Gration <ri*****@zync.co.uk> writes:
Are you fucking seriously fucking expecting some fucking moron to
translate your tech geeking fucking code moronicity? Fucking try writing
it fucking properly in fucking Perl first.


Good fucking job! That's the funniest fucking response I've ever fucking seen
to Xah's fucking moronistic fucking nonsense.

Lenny Bruce would be so fucking proud.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
Oct 6 '05 #8

P: n/a
Ulrich Hobelmann <u.*********@web.de> writes:
Richard Gration wrote:
Are you fucking seriously fucking expecting some fucking moron to
translate your tech geeking fucking code moronicity? Fucking try writing
it fucking properly in fucking Perl first.


Fucking excuse me?

Fucking maybe you should fucking go fucking fuck your fucking self...

Seriously, Xah might be a troll, but this is just pathetic.


I'm guessing you didn't get the joke then. I think Richard's response was a
parody of Xah's "style" - a funny parody, at that.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
Oct 6 '05 #9

P: n/a
Richard Gration wrote:
... fucking ... fucking ... fucking ... fucking ... Fucking ... fucking
... fucking


My friend, you can learn to use a far richer vocabulary of
obscenities. If your creative flow is blocked by the fear
that you can't spell more dirty words correctly, you can
dispel this fear with a few evenings of study and preparation.

Amaze your friends! Amuse your enemies! Enrich your
vocabulary! You can learn the joys of cussing seven
times in the same sentence without resorting to repetition!
For extra points, and with suitable study, you can even
learn to write entire paragraphs of _original_ obscenity!

Just imagine how much clearer your point would have been if
you'd called him a jizz-licking dogcock grabber! Why insult
his code with a vague word like "moronicity" when you could
use "steaming pile of entrails" or better yet, "bucket of
fermented ballsweat?" wouldn't that have made your technical
point much clearer?

Now go, and don't attempt obscenity in public again until
you learn how.

Bear
Oct 6 '05 #10

P: n/a
Sherm Pendley wrote:
I'm guessing you didn't get the joke then. I think Richard's response was a
parody of Xah's "style" - a funny parody, at that.


If you take all the line noise in Perl as swearing ;)
I suppose I'm lucky I can't read it.

--
We're glad that graduates already know Java,
so we only have to teach them how to program.
somewhere in a German company
(credit to M. Felleisen and M. Sperber)
Oct 6 '05 #11

P: n/a
On Wed, 05 Oct 2005 20:39:18 -0400, Sherm Pendley wrote:
Richard Gration <ri*****@zync.co.uk> writes:
Are you fucking seriously fucking expecting some fucking moron to
translate your tech geeking fucking code moronicity? Fucking try writing
it fucking properly in fucking Perl first.
Good fucking job! That's the funniest fucking response I've ever fucking seen
to Xah's fucking moronistic fucking nonsense.


Thanks, Sherm. I knew someone would get it. I think Bear and Ulrich
haven't yet been exposed to Xah "in full effect" ;-) They're probably
denizens of the Scheme group which seems to be a new entry on Xah's "this
newsgroup needs spamming" list ;-)
Lenny Bruce would be so fucking proud.


LOL
Oct 6 '05 #12

P: n/a
Xah Lee wrote: « would anyone like to translate the following perl
script to Python or Scheme (scsh)?»

Here's the Python version.

# -*- coding: utf-8 -*-
# Python
# Wed Oct 5 15:50:31 PDT 2005
# given a dir, report all html file's size. (counting inline images)
# XahLee.org

import re, os.path, sys

inpath= '/Users/t/web/'

while inpath[-1] == '/': inpath = inpath[0:-1] # get rid of trailing
slash

if (not os.path.exists(inpath)):
print "dir " + inpath + " doesn't exist!"
sys.exit(1)

##################################################
# subroutines
def getInlineImg(file_full_path):
'''getInlineImg($file_full_path) returns a array that is a list of
inline images. For example, it may return ['xx.jpg','../image.png']'''

FF = open(file_full_path,'rb')
txt_segs = re.split( r'src', unicode(FF.read(),'utf-8'))
txt_segs.pop(0)
FF.close()
linx=[]
for linkBlock in txt_segs:
matchResult = re.search(r'\s*=\s*\"([^\"]+)\"', linkBlock)
if matchResult: linx.append( matchResult.group(1) )
return linx
def linkFullPath(dir,locallink):
'''linkFullPath(dir, locallink) returns a string that is the full
path to the local link. For example,
linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
'Users/t/public_html/a/image/t.png'. The returned result will not
contain double slash or '../' string.'''
result = dir + '/' + locallink
result = re.sub(r'//+', r'/', result)
while re.search(r'/[^\/]+\/\.\.', result): result =
re.sub(r'/[^\/]+\/\.\.', '', result)
return result

def listInlineImg(htmlfile):
'''listInlineImg($html_file_full_path) returns a array where each
element is a full path to inline images in the html.'''
dir=os.path.dirname(htmlfile)
imgPaths = getInlineImg(htmlfile)
result = []
for aPath in imgPaths:
result.append(linkFullPath( dir, aPath))
return result
##################################################
# main

fileSizeList=[]
def checkLink(dummy, dirPath, fileList):
for fileName in fileList:
if '.html' == os.path.splitext(fileName)[1] and
os.path.isfile(dirPath+'/'+fileName):
totalSize = os.path.getsize(dirPath+'/'+fileName)
imagePathList = listInlineImg(dirPath+'/'+fileName)
for imgPath in imagePathList: totalSize +=
os.path.getsize(imgPath)
fileSizeList.append([totalSize, dirPath+'/'+fileName])
os.path.walk(inpath, checkLink, 'dummy')

fileSizeList.sort(key=lambda x:x[0],reverse=True)

for it in fileSizeList: print it
print "done reporting."

-------------------------------------------------
This Python version is a direct translation of the Perl version. They
match pretty much line by line.

for both the Python version and the Perl version, see:
http://xahlee.org/perl-python/check_html_size.html

Would any lisper provides a Scheme version? i don't think i'll do a
Scheme version anytime soon. Please, Schemers, show us some fanfare.

Xah
xa*@xahlee.org
http://xahlee.org/

Oct 7 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.