detect bytes written on abort

Shailesh Humbad

Here is an advanced PHP question. Can anyone think of a way to detect
the number of bytes written to output when a script is aborted?

I am sending a large file to the client, and I want to record how many
bytes are actually sent. I can detect abort of the script using a
shutdown handler. In the shutdown handler, I tried ob_get_length, but
it returns false. I tried to read the server's log file, but it is does
not contain the information until the script fully quits. I tried to
use fwrite to php://output, and then get the bytes written return value.
However, if the script aborts in the middle of the write, then bytes
written is never returned.

The only thing that worked was writing one byte at a time using
fread/fwrite. But this also made the processor load 100% and download
speed very slow (700KB/sec versus 24MB/sec on localhost using wget).

I know I can poll the web server's log file using a background process.
But if I can do everything from within the script, the system becomes
simpler and more responsive.

Below is the PHP 5.1.1 test script. I suspect it can not be done, but
any advice would be appreciated!

sendfile.php

ignore_user_abort(false);
set_time_limit(60);
register_shutdown_function("handleShutdown");

$fp = false;
$fp = fopen("largefile.html", "rb");
fpassthru($fp); // script is aborted in mid-execution

function handleShutdown() {
global $fp;
if($fp !== false) {
fclose($fp);
}
$byteswritten = 0; // how to detect here?
$shutdownmessage =
"bytes: ".$byteswritten."\n".
"status: ".connection_status()."\n".
"aborted: ".connection_aborted()."\n";

file_put_contents("shutdownlog.txt", $shutdownmessage);
}

Here is a windows batch file to run the request:

ECHO Hit Ctrl-C to simulate abort
IF EXIST sendfile.php.1 del sendfile.php.1
wget http://localhost/sendfile.php --tries=1

Jan 6 '06 #1

Subscribe Post Reply

2385

Gordon Burditt

>Here is an advanced PHP question. Can anyone think of a way to detect

the number of bytes written to output when a script is aborted?
What are you going to use that information for?
I am sending a large file to the client, and I want to record how many
bytes are actually sent.
Bytes sent != bytes received. Why is there a problem with the
script aborting? Modem drops carrier? Spontaneous Windows reboot?
Browser crash?
I can detect abort of the script using a
shutdown handler. In the shutdown handler, I tried ob_get_length, but
it returns false. I tried to read the server's log file, but it is does
not contain the information until the script fully quits. I tried to
use fwrite to php://output, and then get the bytes written return value.
However, if the script aborts in the middle of the write, then bytes
written is never returned.
If one end of a TCP connection quietly goes away, you might be
able to write 64k more bytes before you realize it has gone away.
The only thing that worked was writing one byte at a time using
fread/fwrite. But this also made the processor load 100% and download
speed very slow (700KB/sec versus 24MB/sec on localhost using wget).

What do you intend to use this for? If you're trying to figure out
where to restart sending the file, it won't be accurate.

Gordon L. Burditt

Jan 6 '06 #2

Shailesh Humbad

Gordon Burditt wrote:

Here is an advanced PHP question. Can anyone think of a way to detect
the number of bytes written to output when a script is aborted?

What are you going to use that information for?
I am sending a large file to the client, and I want to record how many
bytes are actually sent.

Bytes sent != bytes received. Why is there a problem with the
script aborting? Modem drops carrier? Spontaneous Windows reboot?
Browser crash?
I can detect abort of the script using a
shutdown handler. In the shutdown handler, I tried ob_get_length, but
it returns false. I tried to read the server's log file, but it is does
not contain the information until the script fully quits. I tried to
use fwrite to php://output, and then get the bytes written return value.
However, if the script aborts in the middle of the write, then bytes
written is never returned.

If one end of a TCP connection quietly goes away, you might be
able to write 64k more bytes before you realize it has gone away.
The only thing that worked was writing one byte at a time using
fread/fwrite. But this also made the processor load 100% and download
speed very slow (700KB/sec versus 24MB/sec on localhost using wget).

What do you intend to use this for? If you're trying to figure out
where to restart sending the file, it won't be accurate.

Gordon L. Burditt

TCP is a reliable transport, meaning that at the application layer, one
always know exactly how much data the client received, and this is
always equal to how much was successfully sent. I don't care how many
bytes were transferred by TCP in the data link layer.

I don't want to restart sending the file. I also do not care why the
script aborted. Of course, it must be from a network/client abort, not
a server reboot or such, because the script must finish executing. I
only want to be able to track how many bytes were sent to the client,
which equals the value that is eventually written to the server log file.

The reason I need it is because in this system, I want to be able to
show the user how many bytes the server sent them. This will tell them
how much data transfer they have used.

I need the status of bytes sent as soon as possible after the script
completes or aborts. Thanks.

Jan 6 '06 #3

Chung Leong

Shailesh Humbad wrote:

Here is an advanced PHP question. Can anyone think of a way to detect
the number of bytes written to output when a script is aborted?

If you are using Apache and can recompile the SAPI module, you can add
the following function to php_functions.c:

/* {{{ proto integer apache_get_bytes_sent(void)
Get the number of bytes actually sent */
PHP_FUNCTION(apache_get_bytes_sent)
{
php_struct *ctx;
ctx = SG(server_context);
RETURN_LONG(ctx->r->bytes_sent);
}
/* }}} */
If TCP/IP guarenteed delivery is to be believed, then that should be
the exact number of bytes received (but not necessarily saved) by the
client.

Jan 6 '06 #4

Gordon Burditt

>TCP is a reliable transport, meaning that at the application layer, one

always know exactly how much data the client received, and this is
always equal to how much was successfully sent.
The above is *NOT* a conventional definition of "reliable transport".
And it's not what TCP tries to implement.

Stdio buffering put on a "reliable transport" as you define it above
makes it unreliable, as a successful fwrite() on a socket may simply
mean that the data has been placed in a buffer on the sender, not
even passed to the OS yet. You also don't know how much data is
buffered by Apache or web proxies. You don't know that the other
end of the TCP connection is on the user's browser.

In a scenario where the communication channel is going to be cut
at some point in time (corresponding to, say, a modem dropping
carrier or network connectivity otherwise going down and staying
down), and no further message traffic is possible, it is impossible
to implement a protocol where the sender and receiver always agree
exactly on the number of bytes received. If you send a packet and
get no answer, you don't know whether the sent packet got lost or
the acknowledgement got lost. You can get the uncertainty down to
one byte by sending single-byte packets all the time. Slow. Wasteful
of bandwidth. Even the Theory of Relativity is relevant here. The
Speed of Data, as well as the Speed of Light, is finite and does
not permit instantaneous communication of information.
I don't care how many
bytes were transferred by TCP in the data link layer. I don't want to restart sending the file. I also do not care why the
script aborted.
You DO care if the *client* aborted. Just because the browser got the
data from TCP doesn't mean it was safely saved to disk before someone
tripped over the power cord.
Of course, it must be from a network/client abort, not
a server reboot or such, because the script must finish executing. I
only want to be able to track how many bytes were sent to the client,
which equals the value that is eventually written to the server log file.

The reason I need it is because in this system, I want to be able to
show the user how many bytes the server sent them. This will tell them
how much data transfer they have used.
Why would the user care? Unless you're billing them against a quota
or something, which is quite a different problem from being able
to restart a file transfer.
I need the status of bytes sent as soon as possible after the script
completes or aborts. Thanks.

It won't happen reliably. You might get something accurate enough
for *quotas*, but not for restarting file transfers. The way things
like FTP do this is get the size of the partially-transferred file
on the client side and start from there.

Gordon L. Burditt

Jan 6 '06 #5

Shailesh Humbad

Chung Leong wrote:

Shailesh Humbad wrote:
Here is an advanced PHP question. Can anyone think of a way to detect
the number of bytes written to output when a script is aborted?

If you are using Apache and can recompile the SAPI module, you can add
the following function to php_functions.c:

/* {{{ proto integer apache_get_bytes_sent(void)
Get the number of bytes actually sent */
PHP_FUNCTION(apache_get_bytes_sent)
{
php_struct *ctx;
ctx = SG(server_context);
RETURN_LONG(ctx->r->bytes_sent);
}
/* }}} */
If TCP/IP guarenteed delivery is to be believed, then that should be
the exact number of bytes received (but not necessarily saved) by the
client.

Here is my hosting provider's configuration:
apache_get_version: Apache/1.3.33 (Debian GNU/Linux)
mod_throttle/3.1.2 mod_ssl/2.8.22 OpenSSL/0.9.7d
php_sapi_name: apache

I am unfamiliar with PHP internals and how they interact with the web
server. Will that only work on Apache 2 and with sapi as
"apache2filter" or "apache2handler"? I don't think I can recompile the
SAPI module, but I could build my own module and have my hosting
provider install it. Is it possible to add such a function in a custom
module? If it is too hard, then I will just go the low-tech route
(polling the log file).

Thanks.

Jan 6 '06 #6

Shailesh Humbad

Gordon Burditt wrote:

TCP is a reliable transport, meaning that at the application layer, one
always know exactly how much data the client received, and this is
always equal to how much was successfully sent.

The above is *NOT* a conventional definition of "reliable transport".
And it's not what TCP tries to implement.

Stdio buffering put on a "reliable transport" as you define it above
makes it unreliable, as a successful fwrite() on a socket may simply
mean that the data has been placed in a buffer on the sender, not
even passed to the OS yet. You also don't know how much data is
buffered by Apache or web proxies. You don't know that the other
end of the TCP connection is on the user's browser.

In a scenario where the communication channel is going to be cut
at some point in time (corresponding to, say, a modem dropping
carrier or network connectivity otherwise going down and staying
down), and no further message traffic is possible, it is impossible
to implement a protocol where the sender and receiver always agree
exactly on the number of bytes received. If you send a packet and
get no answer, you don't know whether the sent packet got lost or
the acknowledgement got lost. You can get the uncertainty down to
one byte by sending single-byte packets all the time. Slow. Wasteful
of bandwidth. Even the Theory of Relativity is relevant here. The
Speed of Data, as well as the Speed of Light, is finite and does
not permit instantaneous communication of information.
I don't care how many
bytes were transferred by TCP in the data link layer.

I don't want to restart sending the file. I also do not care why the
script aborted.

You DO care if the *client* aborted. Just because the browser got the
data from TCP doesn't mean it was safely saved to disk before someone
tripped over the power cord.
Of course, it must be from a network/client abort, not
a server reboot or such, because the script must finish executing. I
only want to be able to track how many bytes were sent to the client,
which equals the value that is eventually written to the server log file.

The reason I need it is because in this system, I want to be able to
show the user how many bytes the server sent them. This will tell them
how much data transfer they have used.

Why would the user care? Unless you're billing them against a quota
or something, which is quite a different problem from being able
to restart a file transfer.
I need the status of bytes sent as soon as possible after the script
completes or aborts. Thanks.

It won't happen reliably. You might get something accurate enough
for *quotas*, but not for restarting file transfers. The way things
like FTP do this is get the size of the partially-transferred file
on the client side and start from there.

Gordon L. Burditt

I don't care how much data the client actually saved, only how much was
transferred. Yes, my eventual aim is to bill against a quota. To solve
the file restart problem, I can implement HTTP range handling later.

"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103

Jan 6 '06 #7

Gordon Burditt

>>> TCP is a reliable transport, meaning that at the application layer, one

always know exactly how much data the client received, and this is
always equal to how much was successfully sent.
The above is *NOT* a conventional definition of "reliable transport".
And it's not what TCP tries to implement.

Stdio buffering put on a "reliable transport" as you define it above
makes it unreliable, as a successful fwrite() on a socket may simply
mean that the data has been placed in a buffer on the sender, not
even passed to the OS yet. You also don't know how much data is
buffered by Apache or web proxies. You don't know that the other
end of the TCP connection is on the user's browser.

In a scenario where the communication channel is going to be cut
at some point in time (corresponding to, say, a modem dropping
carrier or network connectivity otherwise going down and staying
down), and no further message traffic is possible, it is impossible
to implement a protocol where the sender and receiver always agree
exactly on the number of bytes received. If you send a packet and
get no answer, you don't know whether the sent packet got lost or
the acknowledgement got lost. You can get the uncertainty down to
one byte by sending single-byte packets all the time. Slow. Wasteful
of bandwidth. Even the Theory of Relativity is relevant here. The
Speed of Data, as well as the Speed of Light, is finite and does
not permit instantaneous communication of information.
I don't care how many
bytes were transferred by TCP in the data link layer.

I don't want to restart sending the file. I also do not care why the
script aborted.

You DO care if the *client* aborted. Just because the browser got the
data from TCP doesn't mean it was safely saved to disk before someone
tripped over the power cord.
Of course, it must be from a network/client abort, not
a server reboot or such, because the script must finish executing. I
only want to be able to track how many bytes were sent to the client,
which equals the value that is eventually written to the server log file.

The reason I need it is because in this system, I want to be able to
show the user how many bytes the server sent them. This will tell them
how much data transfer they have used.

Why would the user care? Unless you're billing them against a quota
or something, which is quite a different problem from being able
to restart a file transfer.
I need the status of bytes sent as soon as possible after the script
completes or aborts. Thanks.

It won't happen reliably. You might get something accurate enough
for *quotas*, but not for restarting file transfers. The way things
like FTP do this is get the size of the partially-transferred file
on the client side and start from there.

Gordon L. Burditt

I don't care how much data the client actually saved, only how much was
transferred. Yes, my eventual aim is to bill against a quota.

Why do you care about getting these numbers exact? You don't
seem to care about what is transmitted at the data link layer,
which is probably how your provider will bill YOU if your
agreement with them involves traffic-sensitive costs.
"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103

This says nothing about knowing HOW MUCH was delivered in the case
of a failure. If the session fails, you know not all of it got
delivered. You also know that they didn't get any more than you
sent. When a write() on a socket returns, you don't know that ANY
of it got delivered (yet). A failure may be reported later. Much
later. The above quote does not say "If a failure prevents reliable
delivery, the sender is informed instantaneously with an itemized
report of how much was delivered".

Gordon L. Burditt

Jan 6 '06 #8

Peter Fox

Following on from Shailesh Humbad's message. . .

8><

I don't care how much data the client actually saved, only how much was
transferred. Yes, my eventual aim is to bill against a quota. To solve
the file restart problem, I can implement HTTP range handling later.

"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103

Gordon, being the sort of person who spots the cracks down which bad
things happen that most people miss, is suggesting that what you sent
will not be the same as what was received unless completed and there
will be buffers of various sorts between your code and your cable to the
Internet and then some. In this he is trying to be helpful because the
effort required to engineer a byte-accurate solution is going to be a
great deal more than to obtain an estimate.

Here is a 'free' solution: If a file transfer fails then charge for
half the full size. Either there are very few failures in practice in
which case the issue is not very important in the scheme of things; or
it happens all the time, in which case this will work out fine
statistically. [But you will lose all your customers for reasons of crap
service.]
--
PETER FOX Not the same since the bolt company screwed up
pe******@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>

Jan 6 '06 #9

Shailesh Humbad

Gordon Burditt wrote:

TCP is a reliable transport, meaning that at the application layer, one
always know exactly how much data the client received, and this is
always equal to how much was successfully sent.
The above is *NOT* a conventional definition of "reliable transport".
And it's not what TCP tries to implement.

Stdio buffering put on a "reliable transport" as you define it above
makes it unreliable, as a successful fwrite() on a socket may simply
mean that the data has been placed in a buffer on the sender, not
even passed to the OS yet. You also don't know how much data is
buffered by Apache or web proxies. You don't know that the other
end of the TCP connection is on the user's browser.

In a scenario where the communication channel is going to be cut
at some point in time (corresponding to, say, a modem dropping
carrier or network connectivity otherwise going down and staying
down), and no further message traffic is possible, it is impossible
to implement a protocol where the sender and receiver always agree
exactly on the number of bytes received. If you send a packet and
get no answer, you don't know whether the sent packet got lost or
the acknowledgement got lost. You can get the uncertainty down to
one byte by sending single-byte packets all the time. Slow. Wasteful
of bandwidth. Even the Theory of Relativity is relevant here. The
Speed of Data, as well as the Speed of Light, is finite and does
not permit instantaneous communication of information.

I don't care how many
bytes were transferred by TCP in the data link layer.
I don't want to restart sending the file. I also do not care why the
script aborted.
You DO care if the *client* aborted. Just because the browser got the
data from TCP doesn't mean it was safely saved to disk before someone
tripped over the power cord.

Of course, it must be from a network/client abort, not
a server reboot or such, because the script must finish executing. I
only want to be able to track how many bytes were sent to the client,
which equals the value that is eventually written to the server log file.

The reason I need it is because in this system, I want to be able to
show the user how many bytes the server sent them. This will tell them
how much data transfer they have used.
Why would the user care? Unless you're billing them against a quota
or something, which is quite a different problem from being able
to restart a file transfer.

I need the status of bytes sent as soon as possible after the script
completes or aborts. Thanks.
It won't happen reliably. You might get something accurate enough
for *quotas*, but not for restarting file transfers. The way things
like FTP do this is get the size of the partially-transferred file
on the client side and start from there.

Gordon L. Burditt

I don't care how much data the client actually saved, only how much was
transferred. Yes, my eventual aim is to bill against a quota.

Why do you care about getting these numbers exact? You don't
seem to care about what is transmitted at the data link layer,
which is probably how your provider will bill YOU if your
agreement with them involves traffic-sensitive costs.
"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103

This says nothing about knowing HOW MUCH was delivered in the case
of a failure. If the session fails, you know not all of it got
delivered. You also know that they didn't get any more than you
sent. When a write() on a socket returns, you don't know that ANY
of it got delivered (yet). A failure may be reported later. Much
later. The above quote does not say "If a failure prevents reliable
delivery, the sender is informed instantaneously with an itemized
report of how much was delivered".

Gordon L. Burditt

You have good points, but I just don't need that much resolution or
accuracy. The socket will time out in 30 seconds if there is a problem
sending data. The bytes returned by the write call, even if known 30
seconds later, is all I need, and I know somewhere internally in PHP it
is being recorded.

When socket write returns (if being called in blocking-mode), it returns
the number of bytes written successfully to the socket. This is the
number of bytes guaranteed to be delivered to the client's receiving
socket (though the client may not have written it all to disk or other
issues may have occurred). The reason why TCP can know the number of
bytes sent with certainty is because every sent packet is replied to
with an acknowledgment (ACK) packet.

For my purposes, this number is going to be a decent approximation of
actual bandwidth used, and I realize it's not going to be exact. Thanks.

Jan 6 '06 #10

Gordon Burditt

>> Why do you care about getting these numbers exact? You don't

seem to care about what is transmitted at the data link layer,
which is probably how your provider will bill YOU if your
agreement with them involves traffic-sensitive costs.
"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103
This says nothing about knowing HOW MUCH was delivered in the case
of a failure. If the session fails, you know not all of it got
delivered. You also know that they didn't get any more than you
sent. When a write() on a socket returns, you don't know that ANY
of it got delivered (yet). A failure may be reported later. Much
later. The above quote does not say "If a failure prevents reliable
delivery, the sender is informed instantaneously with an itemized
report of how much was delivered".

Gordon L. Burditt

You have good points, but I just don't need that much resolution or
accuracy. The socket will time out in 30 seconds if there is a problem
sending data. The bytes returned by the write call, even if known 30
seconds later, is all I need, and I know somewhere internally in PHP it
is being recorded.

When socket write returns (if being called in blocking-mode), it returns
the number of bytes written successfully to the socket.

This has absolutely nothing to do with the number of bytes received
by the other side. The write() call (*IN BLOCKING MODE*) may return
before even one packet has been assembled to be sent. The write()
call will block if you run out of buffering. If it blocks, and
then returns, it still might not have even tried to send any data
from the last dozen or so write calls prior to the one that just
returned.
This is the
number of bytes guaranteed to be delivered to the client's receiving
socket (though the client may not have written it all to disk or other
issues may have occurred).
It is guaranteed that that data will be delivered to the client
*EVENTUALLY* or you will *EVENTUALLY* get an error. There is no
guarantee whatever that any of that data has been delivered (or even
attempted to be sent) at the time the write() returns.
The reason why TCP can know the number of
bytes sent with certainty is because every sent packet is replied to
with an acknowledgment (ACK) packet.
And the write() call *IN BLOCKING MODE* does not wait for such an
acknowledgement to be received. It doesn't even have to wait for
even one packet to be sent. It would be horribly inefficient if
you couldn't overlap, say, disk reads and network writes in a
single-threaded process that is sending a file down a socket, so
writes *DO NOT* wait until the client has received the data written.
Not even blocking writes.
For my purposes, this number is going to be a decent approximation of
actual bandwidth used, and I realize it's not going to be exact. Thanks.

If you're willing to put up with, say, an extra 32k or 64k sent but
not received when the modem drops carrier, you'll get a decent
approximation. If you're expecting much better than that, you won't.

Gordon L. Burditt

Jan 6 '06 #11

Shailesh Humbad

Gordon Burditt wrote:

Why do you care about getting these numbers exact? You don't
seem to care about what is transmitted at the data link layer,
which is probably how your provider will bill YOU if your
agreement with them involves traffic-sensitive costs.

"Reliable Delivery - Once a connection has been established, TCP
guarantees that data is delivered in exactly the same order it was sent,
with no loss, and no duplication. If a failure prevents reliable
delivery, the sender is informed.", Internetworking with TCP/IP Vol.
III, p. 103
This says nothing about knowing HOW MUCH was delivered in the case
of a failure. If the session fails, you know not all of it got
delivered. You also know that they didn't get any more than you
sent. When a write() on a socket returns, you don't know that ANY
of it got delivered (yet). A failure may be reported later. Much
later. The above quote does not say "If a failure prevents reliable
delivery, the sender is informed instantaneously with an itemized
report of how much was delivered".

Gordon L. Burditt

You have good points, but I just don't need that much resolution or
accuracy. The socket will time out in 30 seconds if there is a problem
sending data. The bytes returned by the write call, even if known 30
seconds later, is all I need, and I know somewhere internally in PHP it
is being recorded.

When socket write returns (if being called in blocking-mode), it returns
the number of bytes written successfully to the socket.

This has absolutely nothing to do with the number of bytes received
by the other side. The write() call (*IN BLOCKING MODE*) may return
before even one packet has been assembled to be sent. The write()
call will block if you run out of buffering. If it blocks, and
then returns, it still might not have even tried to send any data
from the last dozen or so write calls prior to the one that just
returned.
This is the
number of bytes guaranteed to be delivered to the client's receiving
socket (though the client may not have written it all to disk or other
issues may have occurred).

It is guaranteed that that data will be delivered to the client
*EVENTUALLY* or you will *EVENTUALLY* get an error. There is no
guarantee whatever that any of that data has been delivered (or even
attempted to be sent) at the time the write() returns.
The reason why TCP can know the number of
bytes sent with certainty is because every sent packet is replied to
with an acknowledgment (ACK) packet.

And the write() call *IN BLOCKING MODE* does not wait for such an
acknowledgement to be received. It doesn't even have to wait for
even one packet to be sent. It would be horribly inefficient if
you couldn't overlap, say, disk reads and network writes in a
single-threaded process that is sending a file down a socket, so
writes *DO NOT* wait until the client has received the data written.
Not even blocking writes.
For my purposes, this number is going to be a decent approximation of
actual bandwidth used, and I realize it's not going to be exact. Thanks.

If you're willing to put up with, say, an extra 32k or 64k sent but
not received when the modem drops carrier, you'll get a decent
approximation. If you're expecting much better than that, you won't.

Gordon L. Burditt

Okay, I think I see what you're saying. The "write" actually writes
first into internal buffers of the operating system, and so the number
of bytes it returns is not necessarily the number of bytes delivered.
"Write" only blocks if the internal buffers are full. So the number of
bytes returned by "write" (or "send") is the number of bytes written
into the TCP stack, but not the number of bytes that received ACKS
indicating delivery.

In any case, an extra 32-64K of bandwidth used on failed connections is
acceptable. I'll have to figure out a way to prevent this from becoming
an issue.

Question, could someone take advantage of this by writing a network
driver that requests data, but then doesn't send ACK replies after the
initial request is completed? My server's TCP stack would happily send
out 32-64KB of data, waiting for the ACKs. The nefarious recipient
would receive and save this data, drop the connection, and then request
the next range of bytes. Then, they would be able to download the data
without it registering as bandwidth-used on my server. Is this possible?

Jan 7 '06 #12

Chung Leong

Well, as it's not your own server you can't really modify the binary.
It doesn't sound like you need too accurate a number in any event. I
would just do something like this:

<?

ignore_user_abort(true);
$f = fopen($filename, "rb");
$sent = 0;
while($chunk = fread($f, 1024)) {
echo $chunk;
flush();
if(!connection_aborted()) {
$sent += strlen($chunk);
}
else {
/* record failure here */
break;
}
}

?>

Jan 7 '06 #13

Gordon Burditt

>> It is guaranteed that that data will be delivered to the client

*EVENTUALLY* or you will *EVENTUALLY* get an error. There is no
guarantee whatever that any of that data has been delivered (or even
attempted to be sent) at the time the write() returns.
The reason why TCP can know the number of
bytes sent with certainty is because every sent packet is replied to
with an acknowledgment (ACK) packet.
And the write() call *IN BLOCKING MODE* does not wait for such an
acknowledgement to be received. It doesn't even have to wait for
even one packet to be sent. It would be horribly inefficient if
you couldn't overlap, say, disk reads and network writes in a
single-threaded process that is sending a file down a socket, so
writes *DO NOT* wait until the client has received the data written.
Not even blocking writes.
For my purposes, this number is going to be a decent approximation of
actual bandwidth used, and I realize it's not going to be exact. Thanks.

If you're willing to put up with, say, an extra 32k or 64k sent but
not received when the modem drops carrier, you'll get a decent
approximation. If you're expecting much better than that, you won't.

Gordon L. Burditt

Okay, I think I see what you're saying. The "write" actually writes
first into internal buffers of the operating system, and so the number
of bytes it returns is not necessarily the number of bytes delivered.
"Write" only blocks if the internal buffers are full. So the number of
bytes returned by "write" (or "send") is the number of bytes written
into the TCP stack, but not the number of bytes that received ACKS
indicating delivery.

Yes.

And remember, it is impossible for the end sending the file to tell
whether a packet it sent, or the ack for it, got dropped. Either
way, it looks just the same. There's always one packet full of
uncertainity, no matter what you do to the protocol. And one byte
packets are horribly inefficient.
In any case, an extra 32-64K of bandwidth used on failed connections is
acceptable. I'll have to figure out a way to prevent this from becoming
an issue.
*HOW* would this "become an issue"? Please explain *WHY* you want
to bill the way you have said you want to.

Someone needs to get better networking facilities so the problem
of aborted file transfers doesn't happen so often?

Experiment some time with this. Download something, then pull the
network cable or phone cable. Look at the difference in bytes saved
vs. what you recorded as bytes sent. Is it consistent (sort of)?
Perhaps you could approximate things with "if it failed, charge
bytes sent minus X".
Question, could someone take advantage of this by writing a network
driver that requests data, but then doesn't send ACK replies after the
initial request is completed?
Unless you're charging, and people are actually PAYING, more than
$5 a *byte* for this stuff, I think you have a completely overblown
idea of how much your data is worth. Perhaps you ought to be
delivering it by limousine accompanied by armed thugs who collect
payment before turning over the data.
My server's TCP stack would happily send
out 32-64KB of data, waiting for the ACKs. The nefarious recipient
would receive and save this data, drop the connection, and then request
the next range of bytes. Then, they would be able to download the data
without it registering as bandwidth-used on my server. Is this possible?

Yes. Nobody would do it if you didn't blab on the net that this
was how you were charging. To be practical, someone would have to
arrange for their TCP stack to do it only on PARTICULAR connections
as doing it on all of them would really screw up trying to do
anything on the net. (But I can see how such a hack would be
possible, perhaps with a custom ioctl() call to turn on such wierd
treatment.)

Now, why is your data worth going to that effort to steal? You can
also keep counts of failed and successful transfers, and someone
trying the same 5 MB file repeatedly will probably stick out like
a sore thumb in the statistics. Then you look closer and maybe
shut off the account.

You cannot request a range of bytes with TCP. You can with HTTP, but
I think that requires using headers that PHP can see and refuse to
deliver if you're worried about this. And the simplest way to deal
with this is to charge by the byte SENT. Then there's no incentive to
do that. In fact, there's an incentive to make connections work
reliably the first time.

A lot of people handle this by selling a particular file, at which
time the customer is charged for it, and he then gets as many tries
as he wants to download the file over the next few days (or perhaps
forever). There usually isn't much reason to keep downloading the
file over and over (unless they are having trouble), since it doesn't
get updated. The user has to log in, and posting his username/password
on the internet can be detected by massive use from all over.

Gordon L. Burditt

Jan 9 '06 #14

detect bytes written on abort

Similar topics