By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,804 Members | 1,526 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,804 IT Pros & Developers. It's quick & easy.

compress html output with php?

P: n/a
HI I know about
ob_start( 'ob_gzhandler' );
But I'm looking for something that removes all line breaks and extra
whitespace in the html before sending it to the visitor's browser. Is
this possible?

Cheers,
Ciarán

Mar 23 '07 #1
Share this Question
Share on Google+
11 Replies


P: n/a
Hi,

This comment has an example of stripping whitespace:

http://ru.php.net/manual/en/function.ob-start.php#71953

On Mar 22, 9:48 pm, "Ciaran" <cronok...@hotmail.comwrote:
HI I know about
ob_start( 'ob_gzhandler' );
But I'm looking for something that removes all line breaks and extra
whitespace in the html before sending it to the visitor's browser. Is
this possible?

Cheers,
Ciarán

Mar 23 '07 #2

P: n/a
Hmm, yeah looks good. Thanks a lot Peter.

Does anyone know if it's worth it? I mean is the time spent running
the function less then the time it would take to download the longer
html file? Any thoughts on this?

Cheers,
Ciarán

Mar 23 '07 #3

P: n/a
OK I ran a few tests on my slowest page... Here's my results:

WITHOUT COMPRESSION FUNCTION:::::::::::
Page Size: 523.97 kb
Load Time: 0.9995 seconds
Load Time: 0.8 seconds
Load Time: 0.8095 seconds
Load Time: 0.7091 seconds
Load Time: 0.7223 seconds

WITH COMPRESSION FUNCTION:::::::::::
Page Size: 494.77 kb
Load Time: 0.8448 seconds
Load Time: 0.8307 seconds
Load Time: 0.8307 seconds
Load Time: 0.8444 seconds
Load Time: 0.9014 seconds

AVERAGE SPEED WITHOUT COMPRESSION: 0.80808
AVERAGE SPEED WITH COMPRESSION: 0.8504

Hope that helps someone!
Cheers,
Ciarán

Mar 23 '07 #4

P: n/a
Dammit I'm confused! I'm not sure how acurate this info is! when I
downloaded the uncompressed version it was 613 KB while the compressed
version of the same page was a tiny 48KB! Surely the download speed of
the page has to be considered?
Anyone?

Mar 23 '07 #5

P: n/a
Ciaran wrote:

Does anyone know if it's worth it? [compressing HTML] I mean is the time
spent running
the function less then the time it would take to download the longer
html file? Any thoughts on this?

This comes up over at alt.html all the time. I have seen arguments put
forward where compressing a 50K file to 30K is a good thing *but* that 50K
file links to 500K of images. It's the images that cause the trouble.

--
Richard.
Mar 23 '07 #6

P: n/a
Richard Formby wrote:
Does anyone know if it's worth it? [compressing HTML] I mean is the time
spent running the function less then the time it would take to download
the longer html file? Any thoughts on this?
Not the way Ciaran's attempting to do it. Gzipping HTML as you send it
will lead to a dramatic reduction in file size. (The resultant file will
probably be half the size of the original, or even smaller.) Whatsmore,
the compression is done in well-optimised C code, so it uses very little
time to do. You can perform it using particular settings in php.ini or
Apache, so it doesn't require any modification to your PHP code base,
making it very easy to toggle on or off as required.

Stripping out redundant whitespace in a file leads to a small reduction in
file size. Depending on how much whitespace there is in the first place,
you might shave off 10% or so from the file size. The compression is
typically done using PHP and regular expressions, which is slower than the
method above. It generally requires you to make some modifications to your
PHP code. Whatsmore, it's error-prone. Whitespace is significant in some
places (e.g. within PRE, TEXTAREA and SCRIPT elements). Most whitespace
stripping scripts get this wrong in certain places -- getting it right
requires even more careful effort parsing the HTML, and slows the script
down even more.

Zipping HTML content in transit can save significant bandwidth on
mainly textual websites without using much extra CPU time.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 23 '07 #7

P: n/a
Hi again fellas, thank for the replies on this. I'm using a
combination of ob_gzhandler and this php function to compress my
pages. Depending on the page, I'm getting a small increase in server
side time and a huge reduction in the size of the outputted html page
(and bandwidth!). The problem is, as Toby mentioned, the compression
function messes up some things. One thing I've noticed is some of my
javascript functions are breaking because of it so I'm only adding it
on select pages. I love the result so is there any way to stop it
breaking things or is there a better way to get the same effect?

Here's the function:

function compress($buffer){
$search = array('/\>[^\S ]+/s','/[^\S ]+\</s','/(\s)+/s');
$replace = array('>','<','\\1');
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}

Cheers,
Ciarán

Mar 23 '07 #8

P: n/a
Ciaran wrote:
Dammit I'm confused! I'm not sure how acurate this info is! when I
downloaded the uncompressed version it was 613 KB while the compressed
version of the same page was a tiny 48KB! Surely the download speed of
the page has to be considered?
Anyone?
Sorry Ciaran, but you're metrics are meaningless unless you can tell us what
you were measuring (hardware at each end, intervening network hardware,
bandwidths, RTT, network latency, request latency...).

I find it very hard to believe that the gz handler would only reduce a 524Kb
HTML or Text file to 494Kb. I think your methodology is flawed.

C.
Mar 23 '07 #9

P: n/a
Ciaran wrote:
The problem is, as Toby mentioned, the compression function messes up
some things.
I can share a little code with you I suppose... I happen to do exactly the
opposite of what you're describing -- add *more* whitespace to some HTML,
in order to pretty-print it. Obviously, this screws up when you get inside
a PRE or TEXTAREA element, so I made my function smart enough to know when
it's inside one of those.

http://svn.sourceforge.net/viewvc/de...15&view=markup

It's the indent_html() function you're looking for. Obviously, you'll need
to work at it a bit to get it to do what you want, but you should see that
it fairly reliably knows at each point whether or not it's within a "safe
tag" or not.

That said, I'd still advise against your plan. Gzipping your files will be
far more effective, more reliable and easier.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 24 '07 #10

P: n/a
Sorry Ciaran, but you're metrics are meaningless unless you can tell us what
you were measuring (hardware at each end, intervening network hardware,
bandwidths, RTT, network latency, request latency...).
I don't see why that matters. All I'm measuring is the increase in
speed. The speed itself is not an issue.

I find it very hard to believe that the gz handler would only reduce a 524Kb
HTML or Text file to 494Kb. I think your methodology is flawed.
Sorry I dont think I made this clear - I'm already using the gz
handler. The stats I posted are using the 'homemade' compression
function I posted that removes whitespace.
Mar 24 '07 #11

P: n/a
Thanks for the info Toby - I'll check it out when I get a chance.
That said, I'd still advise against your plan. Gzipping your files will be
far more effective, more reliable and easier.
I was actually planning on doing both. I've always been using gzip
compression - I started this thread in the hope I could squeeze a bit
more compression in there.

THE BOTTOM LINE:::::::::::::::::::::::::::::
Using the (temperamental) compression function posted earlier I'm
getting a reduction of 4.7% in filesize but my server is 5% slower at
throwing the pages together. I guess that means using the function
will save a small amount of bandwidth at the expense of a tiny
increase in page access time. You can make up your own minds weather
that's worth it! ;)

Ciarán

Mar 24 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.