472,802 Members | 1,273 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,802 software developers and data experts.

downloading a single file using multiple threads

hi

i'm downloading a single file using multiple threads...
how can i specify a particular range of bytes alone from a single
large file... for example say if i need only bytes ranging from
500000 to 3200000 of a file whose size is say 20MB...
how do i request a download which starts directly at 500000th byte...
thank u
cheers

Mar 28 '07 #1
35 9187
<ke****************@gmail.comwrote:
i'm downloading a single file using multiple threads...
how can i specify a particular range of bytes alone from a single
large file... for example say if i need only bytes ranging from
500000 to 3200000 of a file whose size is say 20MB...
how do i request a download which starts directly at 500000th byte...
That completely depends on what protocol you're using. Could you give
us more information?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #2
<ke****************@gmail.comwrote in message
news:11**********************@n76g2000hsh.googlegr oups.com...
hi

i'm downloading a single file using multiple threads...
how can i specify a particular range of bytes alone from a single
large file... for example say if i need only bytes ranging from
500000 to 3200000 of a file whose size is say 20MB...
how do i request a download which starts directly at 500000th byte...
thank u
cheers

Simple answer: you can't *download* a file in chunks, unless you implement a client/server
protocol for this.
If by *download* you mean *reading* a file from a "fileserver" and saving a copy to the
local filesystem, then you can use System.IO namespace classes
The question is - what makes you think you need multiple threads to *download* a file? If
you think you can speed-up the download by doing this, then you are wrong. The bottleneck
will always be the network, so if you download the file in one chunk, you'll get the maximum
throughput, introducing multiple threads will actually slowdown the whole process.

Willy.

Mar 28 '07 #3
On Mar 28, 10:30 am, "Willy Denoyette [MVP]"
<willy.denoye...@telenet.bewrote:
<keerthyragavend...@gmail.comwrote in message

news:11**********************@n76g2000hsh.googlegr oups.com...
hi
i'm downloading a single file using multiple threads...
how can i specify a particular range of bytes alone from a single
large file... for example say if i need only bytes ranging from
500000 to 3200000 of a file whose size is say 20MB...
how do i request a download which starts directly at 500000th byte...
thank u
cheers

Simple answer: you can't *download* a file in chunks, unless you implement a client/server
protocol for this.
Note that there already protocols which *do* support this, including
FTP and HTTP, both optionally as far as the server is concerned.
The question is - what makes you think you need multiple threads to *download* a file? If
you think you can speed-up the download by doing this, then you are wrong. The bottleneck
will always be the network, so if you download the file in one chunk, you'll get the maximum
throughput, introducing multiple threads will actually slowdown the whole process.
That depends - some servers may throttle per connection, at which
point it may make sense to have multiple connections (although
somewhat naughty). Also, in a more advanced way, if the same file is
available through multiple mirrors, it may make sense to get different
bits from different mirrors.

This is a fairly common thing to do - a lot of web "download managers"
do it.

Jon

Mar 28 '07 #4
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:11**********************@l77g2000hsb.googlegr oups.com...
On Mar 28, 10:30 am, "Willy Denoyette [MVP]"
<willy.denoye...@telenet.bewrote:
><keerthyragavend...@gmail.comwrote in message

news:11**********************@n76g2000hsh.googleg roups.com...
hi
i'm downloading a single file using multiple threads...
how can i specify a particular range of bytes alone from a single
large file... for example say if i need only bytes ranging from
500000 to 3200000 of a file whose size is say 20MB...
how do i request a download which starts directly at 500000th byte...
thank u
cheers

Simple answer: you can't *download* a file in chunks, unless you implement a
client/server
protocol for this.

Note that there already protocols which *do* support this, including
FTP and HTTP, both optionally as far as the server is concerned.
A *single file* download from an FTP or HTTP server? How do you indicate what chunk of the
file you want when say using FTP? As far as I know this is not part of the FTP neither of
HTTP protocol.

>The question is - what makes you think you need multiple threads to *download* a file? If
you think you can speed-up the download by doing this, then you are wrong. The bottleneck
will always be the network, so if you download the file in one chunk, you'll get the
maximum
throughput, introducing multiple threads will actually slowdown the whole process.

That depends - some servers may throttle per connection, at which
point it may make sense to have multiple connections (although
somewhat naughty). Also, in a more advanced way, if the same file is
available through multiple mirrors, it may make sense to get different
bits from different mirrors.

This is a fairly common thing to do - a lot of web "download managers"
do it.
Per file? I mean, retrieve a *single file* from multiple servers in chunks? Say the first 10
MB's from server A the next 2MB from server B, the next 3 MB from server C? Never heard of
something like this.
All I know that there are download managers that download file A from server A, file B from
server B etc.. in case of multi-file downloads.

Willy.

Mar 28 '07 #5
On Wed, 28 Mar 2007 08:51:09 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
A *single file* download from an FTP or HTTP server? How do you indicate
what chunk of the file you want when say using FTP? As far as I know
this is not part of the FTP neither of HTTP protocol.
See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

(I'm guessing that the above link may actually answer the question for the
OP)
Per file? I mean, retrieve a *single file* from multiple servers in
chunks? Say the first 10 MB's from server A the next 2MB from server B,
the next 3 MB from server C? Never heard of something like this.
Well, it does happen. Most larger servers monitor the client IP address
and refuse additional connections, and with smaller servers doing this
sort of thing is considered anti-social, since it intentionally bypasses
per-client throttling that has been set up. But it's true that there are
"download manager" programs that do exactly what Jon is referring to.
All I know that there are download managers that download file A from
server A, file B from server B etc.. in case of multi-file downloads.
Note that there's nothing to stop a download manager from retrieving
different parts of the same file from multiple servers as well. Assuming
an identical file stored on various mirrors, it doesn't matter which
mirror a given part of the file comes from.

Pete
Mar 28 '07 #6
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
On Wed, 28 Mar 2007 08:51:09 -0700, Willy Denoyette [MVP] <wi*************@telenet.be>
wrote:
>A *single file* download from an FTP or HTTP server? How do you indicate what chunk of
the file you want when say using FTP? As far as I know this is not part of the FTP
neither of HTTP protocol.

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1
I know that HTTP 1.1 supports range requests, but this is not exactly my point.
The multi part requests in HTTP1.1 are meant to request (for very specic application
purposes) a single part or multiple parts in a single request, but you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.
(I'm guessing that the above link may actually answer the question for the OP)
>Per file? I mean, retrieve a *single file* from multiple servers in chunks? Say the
first 10 MB's from server A the next 2MB from server B, the next 3 MB from server C?
Never heard of something like this.

Well, it does happen. Most larger servers monitor the client IP address and refuse
additional connections, and with smaller servers doing this sort of thing is considered
anti-social, since it intentionally bypasses per-client throttling that has been set up.
But it's true that there are "download manager" programs that do exactly what Jon is
referring to.
>All I know that there are download managers that download file A from server A, file B
from server B etc.. in case of multi-file downloads.

Note that there's nothing to stop a download manager from retrieving different parts of
the same file from multiple servers as well. Assuming an identical file stored on
various mirrors, it doesn't matter which mirror a given part of the file comes from.
That's true, but these will use dedicated protocols don't they? The clients also should have
multiple NIC's installed connected over segmented LAN's and/or routers to take some speed
advantage of the parallelism.

Willy.

Mar 28 '07 #7
On Wed, 28 Mar 2007 09:59:21 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
I know that HTTP 1.1 supports range requests, but this is not exactly my
point.
The multi part requests in HTTP1.1 are meant to request (for very specic
application purposes) a single part or multiple parts in a single
request, but you can't (AFAIK) requests multiple parts in parallel from
multiple client threads.
Well, I've never actually tried it myself. So I will refrain from
claiming that I know firsthand that this generally works. However, the
so-called "download managers" all claim to work with HTTP servers. I have
no reason to doubt them, and based on what I know about HTTP (which is an
extremely simple protocol) it seems likely.

There is no theoretical reason why it wouldn't work. There's nothing
about HTTP that requires servers to restrict their communications to a
given client to a single connection, and there's nothing about HTTP that
stipulates that an HTTP server needs to coordinate communications on
independent connections. If on one connection the client asks for the
first megabyte and on a second connection the same client asks for the
second megabyte, then if the server is capable of servicing both requests
at the same time, there's no reason the client can't wind up receiving
both the first and second megabytes in parallel.

Jon has already outlined the scenarios in which doing so would be
helpful. You are absolutely correct that in many cases, the HTTP server
is already providing data as fast as it can, and introducing multiple
connections to the same server will only slow things down.

Likewise, if the HTTP server does throttle each connection, but your
Internet connection is so slow that it's not even as fast as the throttled
connection, multiple connections to the same server will again slow things
down.

And of course if the HTTP server is configured to throttle the transfer
for each connection, it is often also configured to disallow multiple
connections from the same client IP address.

There are lots of situations in which a "download manager" won't help at
all. It's one of the reasons I don't bother with them...their ability to
improve things is greatly overstated IMHO.

But it is true that there are scenarios in which multiple connections
retrieving the same file, either from the same HTTP server or from
multiple mirror servers, can indeed improve throughput. These scenarios
may not be very common, but for some people they occur often enough to
make it worthwhile using software that takes advantage of them.
>Note that there's nothing to stop a download manager from retrieving
different parts of the same file from multiple servers as well.
Assuming an identical file stored on various mirrors, it doesn't
matter which mirror a given part of the file comes from.

That's true, but these will use dedicated protocols don't they? The
clients also should have multiple NIC's installed connected over
segmented LAN's and/or routers to take some speed advantage of the
parallelism.
It just depends. One well-known example of a dedicated protocol to do
this sort of thing is BitTorrent. And you're right in suggesting that
when this technique is used, it's not always via HTTP. But as you can
tell from the success of BitTorrent, you don't actually need multiple NICs
installed to take advantage of the technique, nor any special hardware at
all.

The goal is to saturate your inbound network connection. If a server is
throttling i/o (or is otherwise restricted) to a level below your inbound
network connection (something that is becoming more and more common as
broadband connections get faster and faster) then having that server send
data on multiple connections (assuming it's not smart enough to detect and
prevent that condition), or having multiple servers each sending different
portions of the same data to the same client, can help saturate that
inbound network connection, minimizing the time it takes to download 100%
of the data requested.

Pete
Mar 28 '07 #8
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
Simple answer: you can't *download* a file in chunks, unless you implement a
client/server
protocol for this.
Note that there already protocols which *do* support this, including
FTP and HTTP, both optionally as far as the server is concerned.

A *single file* download from an FTP or HTTP server? How do you
indicate what chunk of the file you want when say using FTP? As far
as I know this is not part of the FTP neither of HTTP protocol.
Content-Range for HTTP
(see http://www.w3.org/Protocols/rfc2616/....html#sec14.16)

RESTART (REST) for FTP I believe (followed by disconnecting after
you've fetched the chunk you want).
This is a fairly common thing to do - a lot of web "download managers"
do it.

Per file? I mean, retrieve a *single file* from multiple servers in
chunks? Say the first 10 MB's from server A the next 2MB from server
B, the next 3 MB from server C? Never heard of something like this.
All I know that there are download managers that download file A from
server A, file B from server B etc.. in case of multi-file downloads.
Nope, there are those which can fetch different bits from different
mirrors. After all, it makes perfect sense to do so if you know they're
the same file.

I used to use one such tool, but I can't remember which off-hand.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #9
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

I know that HTTP 1.1 supports range requests, but this is not exactly my point.
The multi part requests in HTTP1.1 are meant to request (for very specic application
purposes) a single part or multiple parts in a single request, but you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.
Why not? They'd be multiple requests, each requesting a single part to
form a whole. What's to stop it working?
Note that there's nothing to stop a download manager from
retrieving different parts of the same file from multiple servers
as well. Assuming an identical file stored on various mirrors, it
doesn't matter which mirror a given part of the file comes from.

That's true, but these will use dedicated protocols don't they? The
clients also should have multiple NIC's installed connected over
segmented LAN's and/or routers to take some speed advantage of the
parallelism.
Why shouldn't it work over multiple HTTP servers (using range headers)?
It depends where the network bottleneck is, of course, but if I can get
from the UK to (say) the US with 1Mbps and to Africa with 1Mbps, in
parallel, it makes sense to fetch at an overall rate of 2Mbps rather
than 1...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #10
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
On Wed, 28 Mar 2007 09:59:21 -0700, Willy Denoyette [MVP] <wi*************@telenet.be>
wrote:
>I know that HTTP 1.1 supports range requests, but this is not exactly my point.
The multi part requests in HTTP1.1 are meant to request (for very specic application
purposes) a single part or multiple parts in a single request, but you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.

Well, I've never actually tried it myself. So I will refrain from claiming that I know
firsthand that this generally works. However, the so-called "download managers" all
claim to work with HTTP servers. I have no reason to doubt them, and based on what I
know about HTTP (which is an extremely simple protocol) it seems likely.
I didn't try it either, so I won't cliam it doesn't work, We did some work at this level
using DECNet/OSI connecting to shared nothing clusters running OpenVMS some years ago, we
were beaten by the HW evolution each step we made, got frustrated and finally gave up to do
this in software,.
I know that down-load managers claim to work over HTTP, but that doesn't mean they support
multi-part parallel request handling over the same or multiple connections, I don't even
know if the protocol allows you to issue new range request when you have a range requests
pending (on the same logical connection).

There is no theoretical reason why it wouldn't work. There's nothing about HTTP that
requires servers to restrict their communications to a given client to a single
connection, and there's nothing about HTTP that stipulates that an HTTP server needs to
coordinate communications on independent connections. If on one connection the client
asks for the first megabyte and on a second connection the same client asks for the
second megabyte, then if the server is capable of servicing both requests at the same
time, there's no reason the client can't wind up receiving both the first and second
megabytes in parallel.
Agreed, but what's the advantage in a simple client server scenario? With simple I mean a
simple PC connected over a dedicated LAN to an HTTP server.

Jon has already outlined the scenarios in which doing so would be helpful. You are
absolutely correct that in many cases, the HTTP server is already providing data as fast
as it can, and introducing multiple connections to the same server will only slow things
down.
Agreed.
Likewise, if the HTTP server does throttle each connection, but your Internet connection
is so slow that it's not even as fast as the throttled connection, multiple connections
to the same server will again slow things down.
Sure.
And of course if the HTTP server is configured to throttle the transfer for each
connection, it is often also configured to disallow multiple connections from the same
client IP address.
They better do ;-)
There are lots of situations in which a "download manager" won't help at all. It's one
of the reasons I don't bother with them...their ability to improve things is greatly
overstated IMHO.

But it is true that there are scenarios in which multiple connections retrieving the same
file, either from the same HTTP server or from multiple mirror servers, can indeed
improve throughput. These scenarios may not be very common, but for some people they
occur often enough to make it worthwhile using software that takes advantage of them.
>>Note that there's nothing to stop a download manager from retrieving different parts
of the same file from multiple servers as well. Assuming an identical file stored on
various mirrors, it doesn't matter which mirror a given part of the file comes from.

That's true, but these will use dedicated protocols don't they? The clients also should
have multiple NIC's installed connected over segmented LAN's and/or routers to take some
speed advantage of the parallelism.

It just depends. One well-known example of a dedicated protocol to do this sort of thing
is BitTorrent. And you're right in suggesting that when this technique is used, it's not
always via HTTP. But as you can tell from the success of BitTorrent, you don't actually
need multiple NICs installed to take advantage of the technique, nor any special hardware
at all.

The goal is to saturate your inbound network connection. If a server is throttling i/o
(or is otherwise restricted) to a level below your inbound network connection (something
that is becoming more and more common as broadband connections get faster and faster)
then having that server send data on multiple connections (assuming it's not smart enough
to detect and prevent that condition), or having multiple servers each sending different
portions of the same data to the same client, can help saturate that inbound network
connection, minimizing the time it takes to download 100% of the data requested.
Agreed, however, I don't know if this discussion is of any help to the OP, let's see if he
will come back.
Willy.
Mar 28 '07 #11
Willy Denoyette [MVP] <wi*************@telenet.bewrote:

<snip>
I know that down-load managers claim to work over HTTP, but that
doesn't mean they support multi-part parallel request handling over
the same or multiple connections, I don't even know if the protocol
allows you to issue new range request when you have a range requests
pending (on the same logical connection).
I'm not sure what you mean by a "logical" connection, but you can
certainly do it over multiple "real" TCP connections.
There is no theoretical reason why it wouldn't work. There's
nothing about HTTP that requires servers to restrict their
communications to a given client to a single connection, and
there's nothing about HTTP that stipulates that an HTTP server
needs to coordinate communications on independent connections. If
on one connection the client asks for the first megabyte and on a
second connection the same client asks for the second megabyte,
then if the server is capable of servicing both requests at the
same time, there's no reason the client can't wind up receiving
both the first and second megabytes in parallel.

Agreed, but what's the advantage in a simple client server scenario?
With simple I mean a simple PC connected over a dedicated LAN to an
HTTP server.
Agreed, it's not useful in that case.
And of course if the HTTP server is configured to throttle the
transfer for each connection, it is often also configured to
disallow multiple connections from the same client IP address.

They better do ;-)
Most shouldn't, IMO. If I connect to the BBC and download the home page
which has multiple small images in, it makes sense to use a few
connections to get those images in parallel, given that a lot of the
time will be taken by latency rather than bandwidth.

Disallowing multiple connections from the same IP address to fetch the
same file would make sense, or having a limit on the number of
connections to allow from each IP address.

<snip>

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #12
Thus wrote Willy Denoyette [MVP],
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
>On Wed, 28 Mar 2007 08:51:09 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
>>A *single file* download from an FTP or HTTP server? How do you
indicate what chunk of the file you want when say using FTP? As far
as I know this is not part of the FTP neither of HTTP protocol.
See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1
I know that HTTP 1.1 supports range requests, but this is not exactly
my point.
The multi part requests in HTTP1.1 are meant to request (for very
specic application
purposes) a single part or multiple parts in a single request, but
you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.
There's nothing in RFC 2616 that forbids that. The actual limitation is the
maximum number of HTTP connections a client is permitted to establish to
a single host. The spec recommends two.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Mar 28 '07 #13
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:

<snip>
>I know that down-load managers claim to work over HTTP, but that
doesn't mean they support multi-part parallel request handling over
the same or multiple connections, I don't even know if the protocol
allows you to issue new range request when you have a range requests
pending (on the same logical connection).

I'm not sure what you mean by a "logical" connection, but you can
certainly do it over multiple "real" TCP connections.
Sorry, I'm talking about connections in terms of ISO layers, a logical connection is an end
to end socket connection, a connection at the session layer (HTTP is operating at this
layer - and it's called a session). I would call a "real connection", a TCP connection, that
is a connection at the transport layer (connected to a 'port' in TCP/IP terms), say a
connection to www.contoso.com port 80.

Agreed, you can have multiple range requests on separate "real connections" or "logical"
(same or different port, different sockets), however, I'm not sure you can have overlapping
range requests on a *single* an HTTP "session".
a connection with a client connection is the connection
There is no theoretical reason why it wouldn't work. There's
nothing about HTTP that requires servers to restrict their
communications to a given client to a single connection, and
there's nothing about HTTP that stipulates that an HTTP server
needs to coordinate communications on independent connections. If
on one connection the client asks for the first megabyte and on a
second connection the same client asks for the second megabyte,
then if the server is capable of servicing both requests at the
same time, there's no reason the client can't wind up receiving
both the first and second megabytes in parallel.

Agreed, but what's the advantage in a simple client server scenario?
With simple I mean a simple PC connected over a dedicated LAN to an
HTTP server.

Agreed, it's not useful in that case.
And of course if the HTTP server is configured to throttle the
transfer for each connection, it is often also configured to
disallow multiple connections from the same client IP address.

They better do ;-)

Most shouldn't, IMO. If I connect to the BBC and download the home page
which has multiple small images in, it makes sense to use a few
connections to get those images in parallel, given that a lot of the
time will be taken by latency rather than bandwidth.
Yep, but IMO the different images will be requested by different ovelapped get requests (not
range requests) over a single connection or at most two connection as per HTTP1.1 (see
below) .
Disallowing multiple connections from the same IP address to fetch the
same file would make sense, or having a limit on the number of
connections to allow from each IP address.
This limit is advised by HTTP1.1 (a configurable default for a client obeying the terms of
the protocol) to be a max. of 2 connections to the same server address. We have our backbone
routers configured to impose this maximum, in order to prevent some people to monopolize the
network resources.

Willy.

Mar 28 '07 #14
"Joerg Jooss" <ne********@joergjooss.dewrote in message
news:94**************************@msnews.microsoft .com...
Thus wrote Willy Denoyette [MVP],
>"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
>>On Wed, 28 Mar 2007 08:51:09 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:

A *single file* download from an FTP or HTTP server? How do you
indicate what chunk of the file you want when say using FTP? As far
as I know this is not part of the FTP neither of HTTP protocol.

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1
I know that HTTP 1.1 supports range requests, but this is not exactly
my point.
The multi part requests in HTTP1.1 are meant to request (for very
specic application
purposes) a single part or multiple parts in a single request, but
you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.

There's nothing in RFC 2616 that forbids that. The actual limitation is the maximum number
of HTTP connections a client is permitted to establish to a single host. The spec
recommends two.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de



I'm not talking about parallel connections, I'm talking about "same connection" multiple
"overlapping range" requests. I know you kan have parallel requests over two different
connections.

Willy.

Mar 28 '07 #15
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I'm not sure what you mean by a "logical" connection, but you can
certainly do it over multiple "real" TCP connections.

Sorry, I'm talking about connections in terms of ISO layers, a
logical connection is an end to end socket connection, a connection
at the session layer (HTTP is operating at this layer - and it's
called a session). I would call a "real connection", a TCP
connection, that is a connection at the transport layer (connected to
a 'port' in TCP/IP terms), say a connection to www.contoso.com port
80.

Agreed, you can have multiple range requests on separate "real
connections" or "logical" (same or different port, different
sockets), however, I'm not sure you can have overlapping range
requests on a *single* an HTTP "session".
Agreed, but I'm not sure anyone suggested that as an option.
Most shouldn't, IMO. If I connect to the BBC and download the home page
which has multiple small images in, it makes sense to use a few
connections to get those images in parallel, given that a lot of the
time will be taken by latency rather than bandwidth.

Yep, but IMO the different images will be requested by different
ovelapped get requests (not range requests) over a single connection
or at most two connection as per HTTP1.1 (see below) .
How would you do overlapping get requests on a single connection, out
of interest? That's something I haven't come across (and it sounds
pretty horrendous). Keeping one connection alive for multiple requests,
sure - but I haven't come across overlapping requests on one
connection.
Disallowing multiple connections from the same IP address to fetch the
same file would make sense, or having a limit on the number of
connections to allow from each IP address.

This limit is advised by HTTP1.1 (a configurable default for a client
obeying the terms of the protocol) to be a max. of 2 connections to
the same server address. We have our backbone routers configured to
impose this maximum, in order to prevent some people to monopolize
the network resources.
That's reasonable. I wouldn't like to fix it to 1 though, which is what
it sounded like you were suggesting before.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #16
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I'm not talking about parallel connections, I'm talking about "same
connection" multiple "overlapping range" requests. I know you kan
have parallel requests over two different connections.
But I can't see why you're focusing on "same connection" multiple
requests. There's nothing in the OP's original post to suggest he's
wanting that, and I don't think Peter, Joerg or I have suggested doing
it either.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #17
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

I know that HTTP 1.1 supports range requests, but this is not exactly my point.
The multi part requests in HTTP1.1 are meant to request (for very specic application
purposes) a single part or multiple parts in a single request, but you can't (AFAIK)
requests multiple parts in parallel from multiple client threads.

Why not? They'd be multiple requests, each requesting a single part to
form a whole. What's to stop it working?
I meant - multiple parallel range requests over the same client/server connection here.
Or...
One client A --- one Server,
two client threads,
sharing the same socket connection,
thread 1 issues asynchronous range request x...
thread 2 issues asynchronous range request y...
server accepts request x and at the same time accepts request y, starts to return data
for x and y.
AFAIK above is not possible, that doesn't mean it *fails* visibly.
IMO the server will choose to handle request x and queue request y, that means he will
return the data for x before he starts to return data for y.
Again, I'm not pretending this is exactly the case, I'll see if I can build a test case.
Note that there's nothing to stop a download manager from
retrieving different parts of the same file from multiple servers
as well. Assuming an identical file stored on various mirrors, it
doesn't matter which mirror a given part of the file comes from.

That's true, but these will use dedicated protocols don't they? The
clients also should have multiple NIC's installed connected over
segmented LAN's and/or routers to take some speed advantage of the
parallelism.

Why shouldn't it work over multiple HTTP servers (using range headers)?
It depends where the network bottleneck is, of course, but if I can get
from the UK to (say) the US with 1Mbps and to Africa with 1Mbps, in
parallel, it makes sense to fetch at an overall rate of 2Mbps rather
than 1...
Yep, but say you have 1Mbps to the US and 50Kbps to Afrika, is your download manager clever
enough to request all chunks from the US instead of waiting for a chunk to arrive from
Afrika, or is he clever enough to cancel the request and re-issue the same request over the
US connection. This requires permanent real-time monitoring of the transfer rates, something
that can't reliably be done at the application layer, that's what I meant when I said you
really need sophisticated software and be sure to measure (as always) the real benefits, I
guess (expensive) download managers can do it while other pretend they can, big difference!.

Willy.

Mar 28 '07 #18
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
Why not? They'd be multiple requests, each requesting a single part to
form a whole. What's to stop it working?

I meant - multiple parallel range requests over the same client/server connection here.
Or...
One client A --- one Server,
two client threads,
sharing the same socket connection,
And that's the problem - I don't *think* anyone other than you has a
single connection in mind.
thread 1 issues asynchronous range request x...
thread 2 issues asynchronous range request y...
server accepts request x and at the same time accepts request y,
starts to return data for x and y.
AFAIK above is not possible, that doesn't mean it *fails* visibly.
IMO the server will choose to handle request x and queue request y,
that means he will return the data for x before he starts to return
data for y. Again, I'm not pretending this is exactly the case, I'll
see if I can build a test case.
It seems likely - the server probably wouldn't bother reading until
ithad finished writing. Of course, if you're really unlucky the two
requests will be sent at the same time and corrupt each other :(
That's true, but these will use dedicated protocols don't they? The
clients also should have multiple NIC's installed connected over
segmented LAN's and/or routers to take some speed advantage of the
parallelism.
Why shouldn't it work over multiple HTTP servers (using range headers)?
It depends where the network bottleneck is, of course, but if I can get
from the UK to (say) the US with 1Mbps and to Africa with 1Mbps, in
parallel, it makes sense to fetch at an overall rate of 2Mbps rather
than 1...

Yep, but say you have 1Mbps to the US and 50Kbps to Afrika, is your
download manager clever enough to request all chunks from the US
instead of waiting for a chunk to arrive from Afrika, or is he clever
enough to cancel the request and re-issue the same request over the
US connection. This requires permanent real-time monitoring of the
transfer rates, something that can't reliably be done at the
application layer, that's what I meant when I said you really need
sophisticated software and be sure to measure (as always) the real
benefits, I guess (expensive) download managers can do it while other
pretend they can, big difference!.
Well, you'd have to have a certain amount of control of the client
implementation - e.g. not waiting until a whole response was ready
before returning - but I don't think it's that hard to do. I think most
of these download managers try a short sample request once to start
with to work out which servers to bother with at all, and how much to
ask each one to do. I don't know whether they'd go as far as cancelling
a chunk they'd only received partially, but it's possible.

I don't bother with such things these days, but a while ago (on a dial-
up line, even, where you'd think I'd always be able to saturate the
last mile) it was genuinely useful.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #19
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
>I'm not talking about parallel connections, I'm talking about "same
connection" multiple "overlapping range" requests. I know you kan
have parallel requests over two different connections.

But I can't see why you're focusing on "same connection" multiple
requests. There's nothing in the OP's original post to suggest he's
wanting that, and I don't think Peter, Joerg or I have suggested doing
it either.
Wait a minute I guess your purpose is one again to pick on me, right?
The OP didn't even mention HTTP or FTP nor did I, it was you you started with this FTP and
HTTP protocol stuff, and it was Peter who started (rightfully) with the HTTP range requests.
All I said, in a reply to Peter, that AFAIK you can't issue multiple range requests in
parallel over the same connection, so why would I even suggest this? I don't even know
whether the OP isn't simply talking about a file server request using System.IO.

Willy.

Mar 28 '07 #20
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I'm not talking about parallel connections, I'm talking about "same
connection" multiple "overlapping range" requests. I know you kan
have parallel requests over two different connections.
But I can't see why you're focusing on "same connection" multiple
requests. There's nothing in the OP's original post to suggest he's
wanting that, and I don't think Peter, Joerg or I have suggested doing
it either.

Wait a minute I guess your purpose is one again to pick on me, right?
Not at all - and I don't recall picking on you particularly in the past
either :)

It's just that when talking about HTTP and FTP I've always been
considering what I think of as the natural way of downloading multiple
chunks in multiple threads - using multiple requests on multiple
connections.

You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?

<snip>
I don't even know whether the OP isn't simply talking about a file
server request using System.IO.
Indeed. It would be nice if the OP would return to the thread :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #21
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I'm not sure what you mean by a "logical" connection, but you can
certainly do it over multiple "real" TCP connections.

Sorry, I'm talking about connections in terms of ISO layers, a
logical connection is an end to end socket connection, a connection
at the session layer (HTTP is operating at this layer - and it's
called a session). I would call a "real connection", a TCP
connection, that is a connection at the transport layer (connected to
a 'port' in TCP/IP terms), say a connection to www.contoso.com port
80.

Agreed, you can have multiple range requests on separate "real
connections" or "logical" (same or different port, different
sockets), however, I'm not sure you can have overlapping range
requests on a *single* an HTTP "session".

Agreed, but I'm not sure anyone suggested that as an option.
Most shouldn't, IMO. If I connect to the BBC and download the home page
which has multiple small images in, it makes sense to use a few
connections to get those images in parallel, given that a lot of the
time will be taken by latency rather than bandwidth.

Yep, but IMO the different images will be requested by different
ovelapped get requests (not range requests) over a single connection
or at most two connection as per HTTP1.1 (see below) .

How would you do overlapping get requests on a single connection, out
of interest? That's something I haven't come across (and it sounds
pretty horrendous). Keeping one connection alive for multiple requests,
sure - but I haven't come across overlapping requests on one
connection.
I didn't suggest that either, I said that AFAIK it's not possible, so it wouldn't make sense
to have multiple threads issuing parallel requests using range requests over the same
session.
Disallowing multiple connections from the same IP address to fetch the
same file would make sense, or having a limit on the number of
connections to allow from each IP address.

This limit is advised by HTTP1.1 (a configurable default for a client
obeying the terms of the protocol) to be a max. of 2 connections to
the same server address. We have our backbone routers configured to
impose this maximum, in order to prevent some people to monopolize
the network resources.

That's reasonable. I wouldn't like to fix it to 1 though, which is what
it sounded like you were suggesting before.
Where did you see me suggesting this? All I said was (in a reply to Peter and not to the
OP!) that it makes little or no sense to have multiple threads requesting ranges (we are
talking about one single file, right), not over the same connection and not over multiple
connections either as there is the limit of 2 connections as per HTTP1.1.
Willy.


Mar 28 '07 #22
On Wed, 28 Mar 2007 11:16:25 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
[...] I don't even know if the protocol allows you to issue new range
request when you have a range requests pending (on the same logical
connection).
I doubt it does. I haven't tried it myself, but I don't see anything in
the HTTP specification that allows for any parallelism. I don't even see
how it would work, given that the connection is for all intents and
purposes a single stream of bytes.
Agreed, but what's the advantage in a simple client server scenario?
With simple I mean a simple PC connected over a dedicated LAN to an HTTP
server.
I see no advantage at all, in that scenario. Or, alternatively, if for
some reason creating more connections to the server allows for more
bandwidth, then the correct solution in that scenario is actually to fix
the server so that it's not throttling that client on the dedicated
connection.
[...]
>And of course if the HTTP server is configured to throttle the
transfer for each connection, it is often also configured to disallow
multiple connections from the same client IP address.

They better do ;-)
Well, as Jon points out there are situations in which that wouldn't be
desirable. But I agree that in the file download situation, where a
"download manager" would be used, a server that throttles but doesn't
check for multiple connections from a single client is only
half-implemented. :)
Agreed, however, I don't know if this discussion is of any help to the
OP, let's see if he will come back.
Yeah, no kidding. I think we kind of went off on a bit of a tangent...who
knows if any of this part of the thread in any way relates to his original
issue. Guess we'll find out (or not).

He didn't even bother to specific the protocol he was using. However,
assuming he's using HTTP, hopefully the reference to the specification
will point him in the right direction. Note that the original question
doesn't really even presume parallel connections; he's just asking about
resuming a download where it got interrupted previously (well, at least
that's the scenario that seems to make the most sense given his actual
question).

Pete
Mar 28 '07 #23
On Wed, 28 Mar 2007 13:13:07 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
[...]
One client A --- one Server,
two client threads,
sharing the same socket connection,
thread 1 issues asynchronous range request x...
thread 2 issues asynchronous range request y...
server accepts request x and at the same time accepts request y,
starts to return data for x and y.
AFAIK above is not possible, that doesn't mean it *fails* visibly.
IMO the server will choose to handle request x and queue request y, that
means he will return the data for x before he starts to return data for
y.
I think we all agree that that particular scenario won't work, and/or
isn't useful (as you say, even if you have enabled reuse of the
connection, repeated requests for different ranges will simply result in
queuing of the replies).

The only way to get some parallel action is to have multiple connections.
[...]
Yep, but say you have 1Mbps to the US and 50Kbps to Afrika, is your
download manager clever enough to request all chunks from the US instead
of waiting for a chunk to arrive from Afrika, or is he clever enough to
cancel the request and re-issue the same request over the US connection.
Any "download manager" worthy of the name ought to do that. I'm sure
there are plenty of download managers NOT worthy of the name, but at the
very least I would expect a download manager to break the download into
small enough chunks that if one connection is considerably slower than
another, the faster connection can wind up with the bulk of the effort.

For example, downloading 1MB might be broken into 100K sections. As each
section completes, a new request is issued, meaning that while a slow
server spends a long time on a single 100K section, a faster server gets
to service the rest of the 100K sections.

A more sophisticated program may even be intelligent enough to notice one
connection is faster and start issuing larger requests on that connection,
to avoid wasting too much time on the latency for a single GET. An even
more sophisticated program might even have logic to do as you suggest, and
cancel the slower connection when it becomes apparent that re-issuing the
remaining data from that request on a faster connection will net an
improvement (presumably the data already retrieved would not be discarded).

It's clear that some amount of thought does need to go into implementing a
servicable "download manager" that actually improves the situation when
downloading files. But I've seen plenty of people just in this newsgroup
with the insight to be able to do this (I'd dare say the three of us
making so much noise in this thread are among them :) ), and I don't think
any of these issues in and of themselves suggest that there's no benefit
to parallel downloads of different sections of the same file.
This requires permanent real-time monitoring of the transfer rates,
something that can't reliably be done at the application layer,
Why do you say that real-time monitoring of the transfer rates can't be
reliably done at the application layer? I agree that you can't get
detailed information about the exact rate of transfer, and especially of
why the speed is fast or slow (a fast connection might have low throughput
due to errors, for example). But at the granularity required to monitor a
download and automatically compensate for slow and fast connections,
sufficient for a user to notice an improvement in download performance,
I'd say at the application layer you can easily get sufficient information.

Given that a download manager is most useful for downloads that take tens
of minutes or even more, just following the averages is plenty sufficient
to significantly improve performance. And of course, the bulk of the
improvement would come simply from having parallel connections, which
doesn't require any monitoring of throughput anyway (yes, with such a
simple approach there would be cases where things didn't get better, or
even got worse, but most of the time there would be a net improvement).
that's what I meant when I said you really need sophisticated software
and be sure to measure (as always) the real benefits, I guess
(expensive) download managers can do it while other pretend they can,
big difference!.
Assuming there's a such thng as an "expensive download manager" (I
wouldn't know, not being in the market for one), then yes...I'd hope they
would at least go to as much trouble as described above. But I think all
that can easily be handled at the application layer.

Pete
Mar 28 '07 #24
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
>"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft. com...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I'm not talking about parallel connections, I'm talking about "same
connection" multiple "overlapping range" requests. I know you kan
have parallel requests over two different connections.

But I can't see why you're focusing on "same connection" multiple
requests. There's nothing in the OP's original post to suggest he's
wanting that, and I don't think Peter, Joerg or I have suggested doing
it either.

Wait a minute I guess your purpose is one again to pick on me, right?

Not at all - and I don't recall picking on you particularly in the past
either :)
Yes, you did, I prefer not to go public on this.
It's just that when talking about HTTP and FTP I've always been
considering what I think of as the natural way of downloading multiple
chunks in multiple threads - using multiple requests on multiple
connections.

You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?
I guess that Peter suggested this (excuse Peter if I got this wrong) by saying

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

(I'm guessing that the above link may actually answer the question for the
OP)

well, having multiple threads without the possibility to have overlapped range requests per
connection makes no sense, having multiple threads requesting chunks over 2 connections
(HTTP1.1) makes little sense IMO either, unless you have measured the real benefits.
<snip>
>I don't even know whether the OP isn't simply talking about a file
server request using System.IO.

Indeed. It would be nice if the OP would return to the thread :)
That's the real problem these day's, they come up with a vague question they get some
confusing answers (our mistakes?), finally they get side tracked (by us) and they never come
back, they are scared off, really.

Willy.

Mar 28 '07 #25
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
How would you do overlapping get requests on a single connection, out
of interest? That's something I haven't come across (and it sounds
pretty horrendous). Keeping one connection alive for multiple requests,
sure - but I haven't come across overlapping requests on one
connection.

I didn't suggest that either, I said that AFAIK it's not possible, so
it wouldn't make sense to have multiple threads issuing parallel
requests using range requests over the same session.
I know you didn't suggest it - but you keep coming back to it to say it
won't work, when I don't think anyone's said that it *would* work, or
that anyone should try it.
This limit is advised by HTTP1.1 (a configurable default for a client
obeying the terms of the protocol) to be a max. of 2 connections to
the same server address. We have our backbone routers configured to
impose this maximum, in order to prevent some people to monopolize
the network resources.
That's reasonable. I wouldn't like to fix it to 1 though, which is what
it sounded like you were suggesting before.

Where did you see me suggesting this? All I said was (in a reply to
Peter and not to the OP!) that it makes little or no sense to have
multiple threads requesting ranges (we are talking about one single
file, right), not over the same connection and not over multiple
connections either as there is the limit of 2 connections as per
HTTP1.1.
Here (with Peter's text indented with the single angle bracket)

<quote>
And of course if the HTTP server is configured to throttle the
transfer for each connection, it is often also configured to disallow
multiple connections from the same client IP address.
They better do ;-)
</quote>

That suggests that you believe servers with transfer throttling should
disallow multiple connections from the same IP address. To me, 2
connections counts as "multiple". I believe it's perfectly reasonable
to have connection throttling and yet still allow 2 connections from a
single client IP address.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #26
Peter Duniho <Np*********@nnowslpianmk.comwrote:

<snip>
He didn't even bother to specific the protocol he was using. However,
assuming he's using HTTP, hopefully the reference to the specification
will point him in the right direction. Note that the original question
doesn't really even presume parallel connections; he's just asking about
resuming a download where it got interrupted previously (well, at least
that's the scenario that seems to make the most sense given his actual
question).
It doesn't presume parallel connections, but it *does* talk about
multiple threads which assumes parallelisation in some form or other. I
certainly *hope* he also means parallel connections :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #27
On Wed, 28 Mar 2007 14:20:52 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
It doesn't presume parallel connections, but it *does* talk about
multiple threads which assumes parallelisation in some form or other. I
certainly *hope* he also means parallel connections :)
I hope so too. But you've been here WAY long enough to know that you get
all kinds of whacky posts here sometimes. There's no way to know for sure
that he doesn't intend to use a single connection from multiple threads.

Too often, I make the incorrect assumption that the person posting knows
the difference between a good idea and a bad one. :) I always hope
that's not the case, but until the person clarifies, it's difficult or
impossible to know for sure.

Pete
Mar 28 '07 #28
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
Not at all - and I don't recall picking on you particularly in the past
either :)

Yes, you did, I prefer not to go public on this.
Well, my sincere apologies if it felt like that, whichever thread you
mean. I'm certainly not bearing a grudge against you or anything like
that.
You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?

I guess that Peter suggested this (excuse Peter if I got this wrong) by saying

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

(I'm guessing that the above link may actually answer the question for the
OP)

well, having multiple threads without the possibility to have
overlapped range requests per connection makes no sense having
multiple threads requesting chunks over 2 connections (HTTP1.1) makes
little sense IMO either, unless you have measured the real benefits.
Presumably the OP hasn't measured the benefits as he doesn't know how
to do it yet (and of course we still don't know if it's HTTP!) but I
certainly didn't see any implication of connection-sharing in Peter's
post. I suspect this conversation would have been quite different in
the flesh, with non-verbal signals to help.
Indeed. It would be nice if the OP would return to the thread :)

That's the real problem these day's, they come up with a vague
question they get some confusing answers (our mistakes?), finally
they get side tracked (by us) and they never come back, they are
scared off, really.
Or, alternatively, come back saying "The problem is solved" without
leaving anything for future readers in terms of how the problem was
solved, or what the problem even was...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #29
On Wed, 28 Mar 2007 14:13:32 -0700, Willy Denoyette [MVP]
<wi*************@telenet.bewrote:
[...]
>Not at all - and I don't recall picking on you particularly in the past
either :)

Yes, you did, I prefer not to go public on this.
For what it's worth, I didn't get the impression that anyone was picking
on anyone else.

When such an impression is received by someone, it's also been my
experience that it is almost always actually a case of one person simply
trying to make sure they have the facts straight (and in the process
drilling down into more detail than someone else thinks is necessary or
warranted), or a case of two people who think they are talking about the
same thing turning out to not be talking about the same thing, or both.

Honestly, I have seen precious little of what I'd call "picking on" in
this newsgroup. It's one of the reasons it's one of the few newsgroups I
still bother to read.

That said...
>You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?

I guess that Peter suggested this (excuse Peter if I got this wrong) by
saying

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1
You are excused. :)

If you'll look, you'll note that I posted that simply in reply to your
statement about whether it was possible to request a file fragment via
HTTP or FTP. The context had nothing to do (at the time) with numbers of
connections, sessions, etc. It was just a question of whether those
protocols had any way to specify some subset during a download.

In particular, here is the statement to which I replied (and quoted in my
post):

"How do you indicate what chunk of the file you want when say using FTP?
As far as I know this is not part of the FTP neither of HTTP protocol"

Nothing in there about connections or sessions. Just the question of
indicating what chunk of a file you want when using FTP or HTTP.
well, having multiple threads without the possibility to have overlapped
range requests per connection makes no sense, having multiple threads
requesting chunks over 2 connections (HTTP1.1) makes little sense IMO
either, unless you have measured the real benefits.
Well, as has been adequately described already I think, there are
definitely scenarios in which there is a measurable benefit.
><snip>
>>I don't even know whether the OP isn't simply talking about a file
server request using System.IO.

Indeed. It would be nice if the OP would return to the thread :)

That's the real problem these day's, they come up with a vague question
they get some confusing answers (our mistakes?), finally they get side
tracked (by us) and they never come back, they are scared off, really.
Well, hopefully the worthy ones recognize the ambiguity in their question
and post a clarification. :)

But I'd agree that it's our fault when we answer a question before we
really understand it. Hopefully we haven't done too much of that here. :)

Pete
Mar 28 '07 #30
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
On Wed, 28 Mar 2007 13:13:07 -0700, Willy Denoyette [MVP] <wi*************@telenet.be>
wrote:
>[...]
One client A --- one Server,
two client threads,
sharing the same socket connection,
thread 1 issues asynchronous range request x...
thread 2 issues asynchronous range request y...
server accepts request x and at the same time accepts request y, starts to return
data for x and y.
AFAIK above is not possible, that doesn't mean it *fails* visibly.
IMO the server will choose to handle request x and queue request y, that means he will
return the data for x before he starts to return data for y.

I think we all agree that that particular scenario won't work, and/or isn't useful (as
you say, even if you have enabled reuse of the connection, repeated requests for
different ranges will simply result in queuing of the replies).

The only way to get some parallel action is to have multiple connections.
Yep, that's the basic requirement.
>[...]
Yep, but say you have 1Mbps to the US and 50Kbps to Afrika, is your download manager
clever enough to request all chunks from the US instead of waiting for a chunk to arrive
from Afrika, or is he clever enough to cancel the request and re-issue the same request
over the US connection.

Any "download manager" worthy of the name ought to do that. I'm sure there are plenty of
download managers NOT worthy of the name, but at the very least I would expect a download
manager to break the download into small enough chunks that if one connection is
considerably slower than another, the faster connection can wind up with the bulk of the
effort.
Agreed, and all say they support HTTP, but what does this means, really. No one measures the
throughput benefits in real time, as it's really hard to measure (over the internet). The
same is true for those magical memory managers and memory defragmenters, they all pretend to
optimize memory access, but once you really start measuring, you'll notice they are
worthless.

For example, downloading 1MB might be broken into 100K sections. As each section
completes, a new request is issued, meaning that while a slow server spends a long time
on a single 100K section, a faster server gets to service the rest of the 100K sections.

A more sophisticated program may even be intelligent enough to notice one connection is
faster and start issuing larger requests on that connection, to avoid wasting too much
time on the latency for a single GET. An even more sophisticated program might even have
logic to do as you suggest, and cancel the slower connection when it becomes apparent
that re-issuing the remaining data from that request on a faster connection will net an
improvement (presumably the data already retrieved would not be discarded).

It's clear that some amount of thought does need to go into implementing a servicable
"download manager" that actually improves the situation when downloading files. But I've
seen plenty of people just in this newsgroup with the insight to be able to do this (I'd
dare say the three of us making so much noise in this thread are among them :) ), and I
don't think any of these issues in and of themselves suggest that there's no benefit to
parallel downloads of different sections of the same file.
>This requires permanent real-time monitoring of the transfer rates, something that can't
reliably be done at the application layer,

Why do you say that real-time monitoring of the transfer rates can't be reliably done at
the application layer? I agree that you can't get detailed information about the exact
rate of transfer, and especially of why the speed is fast or slow (a fast connection
might have low throughput due to errors, for example). But at the granularity required
to monitor a download and automatically compensate for slow and fast connections,
sufficient for a user to notice an improvement in download performance, I'd say at the
application layer you can easily get sufficient information.
Well, you can't get enough detailed (or the right) information about the transfer rates
based on a per socket connection in user space, you can't even get at the throughput at the
application (port) level. So you need some help of a NIC device driver of filter driver to
coordinate with the application. Something like this is done with "load balancing" feature
in a farm or cluster server, but this is at the server side, I'm talking about the client
here.
Sure, you can measure at a reasonably level of precision using simple client PC, running a
single application you are monitoring in isolation, but this picture changes when you have a
server with multiple NIC's, connected over segmented LAN's, that serves as a download client
(such beasts do exist). Here it's important to have nearly real-time performance figures
about the throughputs per application per connection (both physical and logical), such that
you can effectively balance the loads over the NIS's and as such over the segments.
I know, I'm going a bit far off to illustrate the point, but it does exists as a product
(what I call expensive) and I know who's using such things.

Given that a download manager is most useful for downloads that take tens of minutes or
even more, just following the averages is plenty sufficient to significantly improve
performance. And of course, the bulk of the improvement would come simply from having
parallel connections, which doesn't require any monitoring of throughput anyway (yes,
with such a simple approach there would be cases where things didn't get better, or even
got worse, but most of the time there would be a net improvement).
>that's what I meant when I said you really need sophisticated software and be sure to
measure (as always) the real benefits, I guess (expensive) download managers can do it
while other pretend they can, big difference!.

Assuming there's a such thng as an "expensive download manager" (I wouldn't know, not
being in the market for one), then yes...I'd hope they would at least go to as much
trouble as described above. But I think all that can easily be handled at the
application layer.
They exist, but you can't buy them in a PC store ;-).

Willy.
Mar 28 '07 #31
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
How would you do overlapping get requests on a single connection, out
of interest? That's something I haven't come across (and it sounds
pretty horrendous). Keeping one connection alive for multiple requests,
sure - but I haven't come across overlapping requests on one
connection.

I didn't suggest that either, I said that AFAIK it's not possible, so
it wouldn't make sense to have multiple threads issuing parallel
requests using range requests over the same session.

I know you didn't suggest it - but you keep coming back to it to say it
won't work, when I don't think anyone's said that it *would* work, or
that anyone should try it.
Well, I'm not coming back, I'm just asking how one could link " multiple threads " (that is
2 or more) to "multiple range requests" given you have (as per HTTP 1.1) only two
connections possible. But, promissed, I'm not coming back on this, ever ;-)

>This limit is advised by HTTP1.1 (a configurable default for a client
obeying the terms of the protocol) to be a max. of 2 connections to
the same server address. We have our backbone routers configured to
impose this maximum, in order to prevent some people to monopolize
the network resources.

That's reasonable. I wouldn't like to fix it to 1 though, which is what
it sounded like you were suggesting before.

Where did you see me suggesting this? All I said was (in a reply to
Peter and not to the OP!) that it makes little or no sense to have
multiple threads requesting ranges (we are talking about one single
file, right), not over the same connection and not over multiple
connections either as there is the limit of 2 connections as per
HTTP1.1.

Here (with Peter's text indented with the single angle bracket)

<quote>
>And of course if the HTTP server is configured to throttle the
transfer for each connection, it is often also configured to disallow
multiple connections from the same client IP address.

They better do ;-)
</quote>
Well, guess I was thinking "they better restrict "instead of "disallow", my bad, sorry.

That suggests that you believe servers with transfer throttling should
disallow multiple connections from the same IP address. To me, 2
connections counts as "multiple". I believe it's perfectly reasonable
to have connection throttling and yet still allow 2 connections from a
single client IP address.
Sorry, if I made you think that the server should refuse multiple connections, I'm
suggesting they follow the HTTP 1.1 recommendations, as per default, and that they allow you
configure that number for dedicated purposes. Note also that a server (here IIS) doesn't
enforce this rule, it relies on the client to obey HTTP1. If IIS gets' multiple connection
requests (>2) from the same client (IP address) he won't refuse a connection, he will
simply handle the first x request and enqueue the others depending on a number of
heuristics, the pending request are subject to timeout restrictions at the client. But
assuming that he will handle tens or hundreds of parallel requests from the same client is
simply wrong,

Willy.

Mar 28 '07 #32
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I know you didn't suggest it - but you keep coming back to it to say it
won't work, when I don't think anyone's said that it *would* work, or
that anyone should try it.

Well, I'm not coming back, I'm just asking how one could link "
multiple threads " (that is 2 or more) to "multiple range requests"
given you have (as per HTTP 1.1) only two connections possible. But,
promissed, I'm not coming back on this, ever ;-)
2 per server (or more if you find you can get away with it) - as I
said, there's the possibility of using mirrors too.

Admittedly you have to be pretty confident in the mirrors - I'd
certainly want an md5 checksum for the combined file afterwards :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 28 '07 #33
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
Not at all - and I don't recall picking on you particularly in the past
either :)

Yes, you did, I prefer not to go public on this.

Well, my sincere apologies if it felt like that, whichever thread you
mean. I'm certainly not bearing a grudge against you or anything like
that.
No big deal, I guess I'm a little tired, note also that English is not my first (nor my
second) language, I'm not sure whether my "interpretation" is always the right one :-(
You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?

I guess that Peter suggested this (excuse Peter if I got this wrong) by saying

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

(I'm guessing that the above link may actually answer the question for the
OP)

well, having multiple threads without the possibility to have
overlapped range requests per connection makes no sense having
multiple threads requesting chunks over 2 connections (HTTP1.1) makes
little sense IMO either, unless you have measured the real benefits.

Presumably the OP hasn't measured the benefits as he doesn't know how
to do it yet (and of course we still don't know if it's HTTP!) but I
certainly didn't see any implication of connection-sharing in Peter's
post. I suspect this conversation would have been quite different in
the flesh, with non-verbal signals to help.
Indeed. It would be nice if the OP would return to the thread :)

That's the real problem these day's, they come up with a vague
question they get some confusing answers (our mistakes?), finally
they get side tracked (by us) and they never come back, they are
scared off, really.

Or, alternatively, come back saying "The problem is solved" without
leaving anything for future readers in terms of how the problem was
solved, or what the problem even was...
Yes, that's even worse, but I know some get scared off, they start sending private mails (as
per proper experience) because they don't want (or can't) enter the discussion. Maybe we
(MVP's especially) should pay more attention to stay OT and don't get ourselves
side-tracked, not that it could not lead to a good discussion (like this), but it won't
help the OP I'm afraid.

Willy.

Mar 28 '07 #34
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
On Wed, 28 Mar 2007 14:13:32 -0700, Willy Denoyette [MVP] <wi*************@telenet.be>
wrote:
>[...]
>>Not at all - and I don't recall picking on you particularly in the past
either :)

Yes, you did, I prefer not to go public on this.

For what it's worth, I didn't get the impression that anyone was picking on anyone else.

When such an impression is received by someone, it's also been my experience that it is
almost always actually a case of one person simply trying to make sure they have the
facts straight (and in the process drilling down into more detail than someone else
thinks is necessary or warranted), or a case of two people who think they are talking
about the same thing turning out to not be talking about the same thing, or both.

Honestly, I have seen precious little of what I'd call "picking on" in this newsgroup.
It's one of the reasons it's one of the few newsgroups I still bother to read.

That said...
>>You repeatedly say (correctly, as far as I'm aware) that it can't be
done on a single connection (with HTTP/FTP) but that's a straw man that
no-one's actually suggested - so why keep talking about it?

I guess that Peter suggested this (excuse Peter if I got this wrong) by saying

See, for example, the "Range" field in HTTP.

http://www.w3.org/Protocols/rfc2616/...tml#sec14.35.1

You are excused. :)
Thanks :-)
If you'll look, you'll note that I posted that simply in reply to your statement about
whether it was possible to request a file fragment via HTTP or FTP. The context had
nothing to do (at the time) with numbers of connections, sessions, etc. It was just a
question of whether those protocols had any way to specify some subset during a download.

In particular, here is the statement to which I replied (and quoted in my post):

"How do you indicate what chunk of the file you want when say using FTP? As far as I know
this is not part of the FTP neither of HTTP protocol"

Nothing in there about connections or sessions. Just the question of indicating what
chunk of a file you want when using FTP or HTTP.
>well, having multiple threads without the possibility to have overlapped range requests
per connection makes no sense, having multiple threads requesting chunks over 2
connections (HTTP1.1) makes little sense IMO either, unless you have measured the real
benefits.

Well, as has been adequately described already I think, there are definitely scenarios in
which there is a measurable benefit.

Well, I was focused on a different scenario - multiple range requests over a single
connection - and my mistake was that you were thinking the same way.

>><snip>

I don't even know whether the OP isn't simply talking about a file
server request using System.IO.

Indeed. It would be nice if the OP would return to the thread :)

That's the real problem these day's, they come up with a vague question they get some
confusing answers (our mistakes?), finally they get side tracked (by us) and they never
come back, they are scared off, really.

Well, hopefully the worthy ones recognize the ambiguity in their question and post a
clarification. :)

But I'd agree that it's our fault when we answer a question before we really understand
it. Hopefully we haven't done too much of that here. :)
Yep, I guess we are trying too hard to help, instead of asking further questions so we get a
better idea about the issue at hand.

Willy.
Mar 28 '07 #35
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP************************@msnews.microsoft.c om...
Willy Denoyette [MVP] <wi*************@telenet.bewrote:
I know you didn't suggest it - but you keep coming back to it to say it
won't work, when I don't think anyone's said that it *would* work, or
that anyone should try it.

Well, I'm not coming back, I'm just asking how one could link "
multiple threads " (that is 2 or more) to "multiple range requests"
given you have (as per HTTP 1.1) only two connections possible. But,
promissed, I'm not coming back on this, ever ;-)

2 per server (or more if you find you can get away with it) - as I
said, there's the possibility of using mirrors too.

Admittedly you have to be pretty confident in the mirrors - I'd
certainly want an md5 checksum for the combined file afterwards :)
Sure, so do I, I remember I had to download the latest Windows SDK (not from msdn
subscribers though) three of four of times from a mirror before I got an installable copy.
The problem however is that you need to download before you can check, and you need to
restart if the check fails, this is frustrating especially when the download is a single
image of couple of GB.

Willy.
Mar 28 '07 #36

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: KenFehling | last post by:
Hello. I am wondering if there exists a piece of software that takes multiple .js files that are nicely indented and commented and create one big tightly packed .js file. I'm hoping the one file...
5
by: fniles | last post by:
We created an ActiveX control and marked it as safe for scripting using Implements IObjectSafety. We then created a CAB file and signed it using Verisign. We also created a license file (LPK file)...
1
by: Maury | last post by:
Hello, I would like to download multiple files with one click in a web form link, I know how to download a single file, and I think that a solution is to zip all the files in an archive, and then...
16
by: WATYF | last post by:
Hi there... I have a huge text file that needs to be processed. At the moment, I'm loading it into memory in small chunks (x amount of lines) and processing it that way. I'd like the process to be...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.