Connecting Tech Pros Worldwide Forums | Help | Site Map

Compression - if and how

Thomas Mlynarczyk
Guest
 
Posts: n/a
#1: Aug 8 '08
Hello,

I have a script that generates XML files which are sent to a client app
on request via "new SimpleXMLElement( $sUrl, 0, true )". Both client and
server run PHP 5.2 and the XML file can get as large as about 150 KB.
(The client then generates an HTML page using the data in the XML file
and sends it to the browser. So there are 3 parties involved: User
requests HTML page - server requests XML data from other server - other
server sends back XML data - first server generates HTML page from it
and sends it to the client. Here I am concerned about the communication
between the two servers.)

Now 150 KB seem quite big when transferred over the internet, so I
thought it might be a good idea to compress the XML file (should shrink
down to about 10%). On the other hand: compression and decompression use
resources as well and I assume the transfer from one server to another
happens over a broadband connection anyway, so maybe it's not really
worth the trouble to compress the data?

But if it is indeed advisable to use compression, which would be the
best or most elegant way to go about it? Can I tell the SimpleXMLElement
constructor to send the appropriate headers to accept compression? And
will the decompression then happen automatically? And use some
output_handler function to have the data compressed automatically before
sending it out? Or should I rather call all the (de)compressing
functions explicitly? And as there seem to be several options (zlib, gz,
....) - which one should I choose?

Any advice?

Greetings,
Thomas


--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)

petersprc
Guest
 
Posts: n/a
#2: Aug 9 '08

re: Compression - if and how


I wouldn't bother with it unless this is a significant bottleneck and
is impacting the user experience.

However, if you do think you need compression at some point, you can
do it by encoding the server's response with gzip using mod_gzip,
zlib.output_compression, or ob_gzhandler. Then download with cURL
using gzip as the CURLOPT_ENCODING value. The added CPU is negligible
in most cases.

Regards,

John Peters

On Aug 8, 4:11 pm, Thomas Mlynarczyk <tho...@mlynarczyk-webdesign.de>
wrote:
Quote:
Hello,
>
I have a script that generates XML files which are sent to a client app
on request via "new SimpleXMLElement( $sUrl, 0, true )". Both client and
server run PHP 5.2 and the XML file can get as large as about 150 KB.
(The client then generates an HTML page using the data in the XML file
and sends it to the browser. So there are 3 parties involved: User
requests HTML page - server requests XML data from other server - other
server sends back XML data - first server generates HTML page from it
and sends it to the client. Here I am concerned about the communication
between the two servers.)
>
Now 150 KB seem quite big when transferred over the internet, so I
thought it might be a good idea to compress the XML file (should shrink
down to about 10%). On the other hand: compression and decompression use
resources as well and I assume the transfer from one server to another
happens over a broadband connection anyway, so maybe it's not really
worth the trouble to compress the data?
>
But if it is indeed advisable to use compression, which would be the
best or most elegant way to go about it? Can I tell the SimpleXMLElement
constructor to send the appropriate headers to accept compression? And
will the decompression then happen automatically? And use some
output_handler function to have the data compressed automatically before
sending it out? Or should I rather call all the (de)compressing
functions explicitly? And as there seem to be several options (zlib, gz,
...) - which one should I choose?
>
Any advice?
>
Greetings,
Thomas
>
--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Sjoerd
Guest
 
Posts: n/a
#3: Aug 9 '08

re: Compression - if and how


On Aug 8, 10:11*pm, Thomas Mlynarczyk <tho...@mlynarczyk-webdesign.de>
wrote:
Quote:
Now 150 KB seem quite big when transferred over the internet, so I
thought it might be a good idea to compress the XML file (should shrink
down to about 10%).
First of all, compression uses more resources than decompression:

$ time gzip rss20.xml # measure the time it takes to compress a
300KB file
real 0m0.266s, user 0m0.236s, sys 0m0.016s

$ time gunzip rss20.xml.gz # measure the time it takes to decompress
a 300KB file
real 0m0.034s, user 0m0.020s, sys 0m0.012s

However, since the data that needs to be transferred becomes much
smaller, some resources are also saved. It may improve the speed of
your application. It may improve it greatly, if the transfer of the
XML is really the bottleneck. It depends on many things so it is hard
to predict what will happen.

The easiest way to do compression is using HTTP compression. It can be
as easy as putting the line 'ob_start("ob_gzhandler");' in your
script, or configuring Apache to compress your page. All modern
browsers support this. However, PHPs fopen() or SimpleXMLElement do
not support compression.

I have tested that as follows:
<?php
$bla = new SimpleXMLElement('http://planet.mozilla.org/rss20.xml',
0, true);
print_r($http_response_header);
?>

First of all, new SimpleXMLElement sets $http_response_header, which
makes me think it simply uses fopen() with HTTP wrappers. Secondly,
the headers do not contain "Content-Encoding: gzip" or something
similar.

CURL seems to support compression, so you may use that.

Another way to compress your file is to use gzcompress() or something
like that on the XML data and send that to the client. However, this
probably requires more steps and it is not anymore possible to view
your XML in a browser.
Thomas Mlynarczyk
Guest
 
Posts: n/a
#4: Aug 9 '08

re: Compression - if and how


Thank you for your suggestions.

I think I will do without compression for the moment and add it later if
needed. I suppose data transfer between the servers of professional
hosting providers will be fast enough. Is there such a thing as a
reasonable guess about common transfer rates between professional
providers' servers?

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Jerry Stuckle
Guest
 
Posts: n/a
#5: Aug 9 '08

re: Compression - if and how


Thomas Mlynarczyk wrote:
Quote:
Thank you for your suggestions.
>
I think I will do without compression for the moment and add it later if
needed. I suppose data transfer between the servers of professional
hosting providers will be fast enough. Is there such a thing as a
reasonable guess about common transfer rates between professional
providers' servers?
>
Greetings,
Thomas
>
Multiple gigabits/second across parallel fiber optic lines.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

C. (http://symcbean.blogspot.com/)
Guest
 
Posts: n/a
#6: Aug 10 '08

re: Compression - if and how


On Aug 9, 1:49 am, petersprc <peters...@gmail.comwrote:
Quote:
I wouldn't bother with it unless this is a significant bottleneck and
is impacting the user experience.
>
However, if you do think you need compression at some point, you can
do it by encoding the server's response with gzip using mod_gzip,
zlib.output_compression, or ob_gzhandler. Then download with cURL
using gzip as the CURLOPT_ENCODING value. The added CPU is negligible
in most cases.
>
Regards,
I beg to disagree. I've never seen an HTTP architecture where the
performance could not be improved by using compression. The
bottelnecks are in network transport times. But when there's a lot of
data waiting to be transfered over the network, in the absence of a
reverse proxy, conenctions at the webserver stay open for the duration
of the transaction, occupying memory for a lot longer than they need
to. This higher memory usage then has a knock on affect on processor
usage. So in addition to giving better response at the client end,
every time I've switched on compression at the server end it has
reduced load, improved throughput and increased capacity at the server
end.

Since compression is handled at the transport layer there should be
issues in compressing XML output.

C.
Thomas Mlynarczyk
Guest
 
Posts: n/a
#7: Aug 10 '08

re: Compression - if and how


Jerry Stuckle schrieb:

[common transfer rates]
Quote:
Multiple gigabits/second across parallel fiber optic lines.
Thus, taking into account Sjoerd's timing tests, compression would take
longer than transferring the uncompressed file.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Jerry Stuckle
Guest
 
Posts: n/a
#8: Aug 10 '08

re: Compression - if and how


C. (http://symcbean.blogspot.com/) wrote:
Quote:
On Aug 9, 1:49 am, petersprc <peters...@gmail.comwrote:
Quote:
>I wouldn't bother with it unless this is a significant bottleneck and
>is impacting the user experience.
>>
>However, if you do think you need compression at some point, you can
>do it by encoding the server's response with gzip using mod_gzip,
>zlib.output_compression, or ob_gzhandler. Then download with cURL
>using gzip as the CURLOPT_ENCODING value. The added CPU is negligible
>in most cases.
>>
>Regards,
>
I beg to disagree. I've never seen an HTTP architecture where the
performance could not be improved by using compression. The
bottelnecks are in network transport times. But when there's a lot of
data waiting to be transfered over the network, in the absence of a
reverse proxy, conenctions at the webserver stay open for the duration
of the transaction, occupying memory for a lot longer than they need
to. This higher memory usage then has a knock on affect on processor
usage. So in addition to giving better response at the client end,
every time I've switched on compression at the server end it has
reduced load, improved throughput and increased capacity at the server
end.
>
Since compression is handled at the transport layer there should be
issues in compressing XML output.
>
C.
>
Interesting. I've seen just the opposite. When enabling Apache
compression (which is much more efficient than PHP compression) on the
servers at data centers, CPU usage goes up but if there is any
performance change, it's down. I've never seen performance increase. I
guess it could, though, if you're trying to serve from a 768K dsl line,
instead of data center connections.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Jerry Stuckle
Guest
 
Posts: n/a
#9: Aug 10 '08

re: Compression - if and how


Thomas Mlynarczyk wrote:
Quote:
Jerry Stuckle schrieb:
>
[common transfer rates]
Quote:
>Multiple gigabits/second across parallel fiber optic lines.
>
Thus, taking into account Sjoerd's timing tests, compression would take
longer than transferring the uncompressed file.
>
Greetings,
Thomas
>
True. But that is not the only factor involved. You need to be looking
at the entire path - server to client and back.

But my experience has been that unless you're using slow lines , data
compression adds cpu workload but I haven't seen it provide any
measurable performance. I guess, of course, if you're serving very
large pages, that could change.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Sjoerd
Guest
 
Posts: n/a
#10: Aug 11 '08

re: Compression - if and how


On Aug 10, 12:14*am, Thomas Mlynarczyk <tho...@mlynarczyk-
webdesign.dewrote:
Quote:
Is there such a thing as a
reasonable guess about common transfer rates between professional
providers' servers?
Are both servers in the same datacenter? If they are, transfers
between them will be very fast. If you have access to both servers,
you can test the speed between them.
Thomas Mlynarczyk
Guest
 
Posts: n/a
#11: Aug 11 '08

re: Compression - if and how


Sjoerd schrieb:
Quote:
Are both servers in the same datacenter? If they are, transfers
between them will be very fast. If you have access to both servers,
you can test the speed between them.
No, they're not in the same location. Theoretically, I could test the
speed between them, but as the service I am developing is meant to be
used by more or less anyone on their websites, such a test would
probably not be representative.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Thomas Mlynarczyk
Guest
 
Posts: n/a
#12: Aug 11 '08

re: Compression - if and how


Jerry Stuckle schrieb:
Quote:
But that is not the only factor involved. You need to be looking
at the entire path - server to client and back.
Yes. I was mostly concerned with reducing the load on the server as
there could be many websites in the future using the service I am
developing. The final client receives just a part of the data at a time,
the first server receives the complete data once per session from the
other server, but the other server has several clients to serve.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Thomas Mlynarczyk
Guest
 
Posts: n/a
#13: Aug 11 '08

re: Compression - if and how


Michael Fesser schrieb:
Quote:
For server-to-server transfer I think I wouldn't use compression either,
since the backbone should be fast enough.
So I am convinced now that I need not bother with compression.

Thanks again to everyone.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Steve Fantastico
Guest
 
Posts: n/a
#14: Aug 11 '08

re: Compression - if and how


On Aug 10, 7:56*am, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.comwrote:
Quote:
I beg to disagree. I've never seen an HTTP architecture where the
performance could not be improved by using compression. The
If the encoding is pre-cached, then I'd agree. mod_deflate is a
pitiful substitute for mod_gzip because it doesn't cache and has to
recompress for every client.

If there's no opportunity for pre-caching and speed is more important
than bandwidth, don't bother compressing.

Steve
C. (http://symcbean.blogspot.com/)
Guest
 
Posts: n/a
#15: Aug 12 '08

re: Compression - if and how


On Aug 10, 2:42 pm, Jerry Stuckle <jstuck...@attglobal.netwrote:
Quote:
C. (http://symcbean.blogspot.com/) wrote:
Quote:
On Aug 9, 1:49 am, petersprc <peters...@gmail.comwrote:
Quote:
I wouldn't bother with it unless this is a significant bottleneck and
is impacting the user experience.
>
Quote:
Quote:
However, if you do think you need compression at some point, you can
do it by encoding the server's response with gzip using mod_gzip,
zlib.output_compression, or ob_gzhandler. Then download with cURL
using gzip as the CURLOPT_ENCODING value. The added CPU is negligible
in most cases.
>
Quote:
Quote:
Regards,
>
Quote:
I beg to disagree. I've never seen an HTTP architecture where the
performance could not be improved by using compression. The
bottelnecks are in network transport times. But when there's a lot of
data waiting to be transfered over the network, in the absence of a
reverse proxy, conenctions at the webserver stay open for the duration
of the transaction, occupying memory for a lot longer than they need
to. This higher memory usage then has a knock on affect on processor
usage. So in addition to giving better response at the client end,
every time I've switched on compression at the server end it has
reduced load, improved throughput and increased capacity at the server
end.
>
Quote:
Since compression is handled at the transport layer there should be
issues in compressing XML output.
>
Quote:
C.
>
Interesting. I've seen just the opposite. When enabling Apache
compression (which is much more efficient than PHP compression) on the
servers at data centers, CPU usage goes up but if there is any
performance change, it's down. I've never seen performance increase. I
guess it could, though, if you're trying to serve from a 768K dsl line,
instead of data center connections.
>
I've done this four times for different environments, each time
throughput increased as a result - two were on E2s, one E1 and the
other on megastream.

Given that its so trivial to do on the webserver, I guess the answer
is to suck it and see.

C.
Closed Thread