Connecting Tech Pros Worldwide Forums | Help | Site Map

TCPClient in HTTp Request

jin
Guest
 
Posts: n/a
#1: Nov 17 '05
hi, i'm trying using the tcpClient to get a html file from net, instead of
using WebClient or WebRequest,

the main part of the source code is like this:

private void tcpconnect()
{
tcp=new TcpClient("www.yahoo.com",80);
tcp.NoDelay=false;
tcp.ReceiveTimeout=60000;
tcp.ReceiveBufferSize=25000;
stream = tcp.GetStream();
byte[] send = Encoding.ASCII.GetBytes("GET /index.html HTTP/1.0\r\n\r\n");
stream.Write(send,0,send.Length);

byte[] receive = new byte[tcp.ReceiveBufferSize];

int lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize);
string str = Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize);

textBox1.Text=str;
tcp.Close();
stream.Close();
}

but the problem is, while i try to run this function, it didn't read all the
html source code for me, but just a part, i try to run twice of the read
method in the function, and it does continue reading for me. I think there
might be some other function which allow us to check whether the html file is
finished loading or not, but i'm dunno which is it and i can't find it
through the msdn library. Is that anyone could help?

Thank for all

Johann Blake
Guest
 
Posts: n/a
#2: Nov 17 '05

re: TCPClient in HTTp Request


The TcpClient is only a communications protocol. It knows nothing of
HTML or even when something has been entirely read. It's up to your app
to figure that out.

To determine whether you have read in all of the HTML, either the
remote server will close the TCP session, in which case you can check
some flag for that (can't remember which one) or if the connection is
not closed, it is implied that you have read all of the data when you
encounter an "</HTML> tag.

Best Regards
Johann Blake

Brett Romero
Guest
 
Posts: n/a
#3: Nov 17 '05

re: TCPClient in HTTp Request


Have you tried reading form a smaller webpage on a different site?
Also try increasing the buffer. Using a tool that reads you packets is
also helpful, such as MS Fiddler. There are others available as well.

Brett

Ignacio Machin \( .NET/ C# MVP \)
Guest
 
Posts: n/a
#4: Nov 17 '05

re: TCPClient in HTTp Request


Hi,

and why are you doing this?

you will need to implement the HTTP protocol.

In your case I think that the problem is in the receiving end, what if you
change it like:

StreamReader reader = new StreamReader( stream );

string line;
while ( (line=reader.ReadLine())!=null )
Console.Write( line );

reader.Close();



cheers,

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation



"jin" <jin@discussions.microsoft.com> wrote in message
news:AB057DF2-EA99-46C1-9B92-81486E41C541@microsoft.com...[color=blue]
> hi, i'm trying using the tcpClient to get a html file from net, instead of
> using WebClient or WebRequest,
>
> the main part of the source code is like this:
>
> private void tcpconnect()
> {
> tcp=new TcpClient("www.yahoo.com",80);
> tcp.NoDelay=false;
> tcp.ReceiveTimeout=60000;
> tcp.ReceiveBufferSize=25000;
> stream = tcp.GetStream();
> byte[] send = Encoding.ASCII.GetBytes("GET /index.html HTTP/1.0\r\n\r\n");
> stream.Write(send,0,send.Length);
>
> byte[] receive = new byte[tcp.ReceiveBufferSize];
>
> int lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize);
> string str = Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize);
>
> textBox1.Text=str;
> tcp.Close();
> stream.Close();
> }
>
> but the problem is, while i try to run this function, it didn't read all
> the
> html source code for me, but just a part, i try to run twice of the read
> method in the function, and it does continue reading for me. I think there
> might be some other function which allow us to check whether the html file
> is
> finished loading or not, but i'm dunno which is it and i can't find it
> through the msdn library. Is that anyone could help?
>
> Thank for all[/color]


Joerg Jooss
Guest
 
Posts: n/a
#5: Nov 17 '05

re: TCPClient in HTTp Request


jin wrote:
[color=blue]
> hi, i'm trying using the tcpClient to get a html file from net,
> instead of using WebClient or WebRequest,
>
> the main part of the source code is like this:
>
> private void tcpconnect()
> {
> tcp=new TcpClient("www.yahoo.com",80);
> tcp.NoDelay=false;
> tcp.ReceiveTimeout=60000;
> tcp.ReceiveBufferSize=25000;
> stream = tcp.GetStream();
> byte[] send = Encoding.ASCII.GetBytes("GET /index.html
> HTTP/1.0\r\n\r\n"); stream.Write(send,0,send.Length);
>
> byte[] receive = new byte[tcp.ReceiveBufferSize];
>
> int lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize);
> string str =
> Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize);
>
> textBox1.Text=str;
> tcp.Close();
> stream.Close();
> }
>
> but the problem is, while i try to run this function, it didn't read
> all the html source code for me, but just a part, i try to run twice
> of the read method in the function, and it does continue reading for
> me. I think there might be some other function which allow us to
> check whether the html file is finished loading or not, but i'm dunno
> which is it and i can't find it through the msdn library. Is that
> anyone could help?[/color]

Classical NetworkStream programming error: You just cannot expect to
read all data with a single call to Read(), regardless of the receive
buffer's size. You have to loop until Read() returns 0.


Cheers,
--
http://www.joergjooss.de
mailto:news-reply@joergjooss.de
Chad Z. Hower aka Kudzu
Guest
 
Posts: n/a
#6: Nov 17 '05

re: TCPClient in HTTp Request


=?Utf-8?B?amlu?= <jin@discussions.microsoft.com> wrote in
news:AB057DF2-EA99-46C1-9B92-81486E41C541@microsoft.com:[color=blue]
> hi, i'm trying using the tcpClient to get a html file from net,
> instead of using WebClient or WebRequest,[/color]

Why? I would STRONGLY recommend against this. HTTP is NOT a simple protocol. Sure you might
get a simple GET to work - but then wait till the extras of the protocl kick in, chunked transfers, etc.

byte[] receive = new byte[tcp.ReceiveBufferSize];[color=blue]
>
> int
> lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize)
> ; string str =
> Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize
> );[/color]

TCP is split into packets - what you are trying will not work. Even when you solve this - you have
solved only the tip of the iceberg.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Empower ASP.NET with IntraWeb
http://www.atozed.com/IntraWeb/
Chad Z. Hower aka Kudzu
Guest
 
Posts: n/a
#7: Nov 17 '05

re: TCPClient in HTTp Request


"Johann Blake" <johannblake@yahoo.com> wrote in news:1121852438.817575.71720
@g44g2000cwa.googlegroups.com:[color=blue]
> To determine whether you have read in all of the HTML, either the
> remote server will close the TCP session, in which case you can check
> some flag for that (can't remember which one) or if the connection is
> not closed, it is implied that you have read all of the data when you
> encounter an "</HTML> tag.[/color]

No - thats totally wrong. Use the HTTP protocol - not the content. HTTP can transport binary files,
XML, and other. And in HTML whitespace and other things could follow the </HTML> tag. Stopping
the transfer at that is a horrible hack that will lead to very poor results.

HTTP 1.1 for example keeps the connection open.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Get your ASP.NET in gear with IntraWeb!
http://www.atozed.com/IntraWeb/
Johann Blake
Guest
 
Posts: n/a
#8: Nov 17 '05

re: TCPClient in HTTp Request


There is nothing wrong with my suggestion. I've done this many times
without any problems. He wants to read HTML using a TcpClient. There's
nothing wrong with that approach. It has its advantages and
disadvantages. The thing he wasn't aware of is that HTML has nothing to
do with the actual TcpClient protocol itself. Anything can be sent with
a TcpClient.

Johann

Chad Z. Hower aka Kudzu
Guest
 
Posts: n/a
#9: Nov 17 '05

re: TCPClient in HTTp Request


"Johann Blake" <johannblake@yahoo.com> wrote in news:1121926001.566551.261750
@g44g2000cwa.googlegroups.com:[color=blue]
> There is nothing wrong with my suggestion. I've done this many times
> without any problems. He wants to read HTML using a TcpClient. There's
> nothing wrong with that approach. It has its advantages and[/color]

Yes there is - he will have troubles later. HTTP is NOT the simple protocol it seems.

Its like using a search and replace to transform an XML document. It will work only in the short
term and break very quickly. Implementing 10% of the HTTP protocol is a hack - and very poor style
guaranteed to break quickly.

The proper solution is to use an HTTP component.
[color=blue]
> disadvantages. The thing he wasn't aware of is that HTML has nothing to
> do with the actual TcpClient protocol itself. Anything can be sent with
> a TcpClient.[/color]

http://www.indyproject.org/

Believe me, I know how TCP works.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Make your ASP.NET applications run faster
http://www.atozed.com/IntraWeb/
Wessel Troost
Guest
 
Posts: n/a
#10: Nov 17 '05

re: TCPClient in HTTp Request


> int lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize);[color=blue]
> string str =
> Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize);
>[/color]
Call these functions recursively until stream.Read() returns 0 to indicate
end-of-stream.

Regards,
Wessel
Cool Guy
Guest
 
Posts: n/a
#11: Nov 17 '05

re: TCPClient in HTTp Request


Wessel Troost <nothing@like.the.sun> wrote:
[color=blue][color=green]
>> int lastreceive=stream.Read(receive,0,tcp.ReceiveBuffe rSize);
>> string str =
>> Encoding.ASCII.GetString(receive,0,tcp.ReceiveBuff erSize);
>>[/color]
> Call these functions recursively until stream.Read() returns 0 to indicate
> end-of-stream.[/color]

But an HTTP/1.1 server might *not* close the connection after sending the
response.

The OP should read the HTTP/1.1 RFC if he wants to implement an HTTP client
at this time.
Wessel Troost
Guest
 
Posts: n/a
#12: Nov 17 '05

re: TCPClient in HTTp Request


> But an HTTP/1.1 server might *not* close the connection after sending the[color=blue]
> response.
>[/color]
You can specify the HTTP version in your request, which would eliminate
this speculative problem?
[color=blue]
> The OP should read the HTTP/1.1 RFC if he wants to implement an HTTP
> client
> at this time.[/color]

To write a generic client, the OP might have to read the specification.
On the other hand, he might just want to contact a specific server, with a
specific IIS or Apache version, which behaves consistently. In which case
reading the RFC would be like shooting a mouse with a nuclear bomb.

Not that anyone could read an RFC and come up with correct code. It takes
groups of developers years to do that.

Greetings,
Wessel
Chad Z. Hower aka Kudzu
Guest
 
Posts: n/a
#13: Nov 17 '05

re: TCPClient in HTTp Request


"Wessel Troost" <nothing@like.the.sun> wrote in
news:op.st9mdvqrf3yrl7@asbel:[color=blue]
> You can specify the HTTP version in your request, which would
> eliminate this speculative problem?[/color]

Only that one. It wont help you with URL encoding, Chunked transfers, or many of the other things
in HTTP 1.0.
[color=blue]
> To write a generic client, the OP might have to read the
> specification. On the other hand, he might just want to contact a
> specific server, with a specific IIS or Apache version, which behaves
> consistently. In which case reading the RFC would be like shooting a
> mouse with a nuclear bomb.[/color]

No - you've seriously understimated the task and do not understand HTTP nor the RFC's. The RFC
is the consistency. IIS and Apache both conform to it. Various user configurations, proxies and
document types will alter your "observed consistency of a single test URL".

Just because you run one test - does not meant it will always respond in that manner.
[color=blue]
> Not that anyone could read an RFC and come up with correct code. It
> takes groups of developers years to do that.[/color]

Which is why you should use an ready made HTTP client and not write one unless you intend to do
it correctly. I can teach my wife how to write a Hello World - but saying "Well its consistent" and
she doesnt need the rest does not mean I can take a vacation and give her my job.

HTTP looks deceptively simple - and small test programs will work in the SHORT TERM.
However there are many more complexities to HTTP than first appear and such hacks will not
work long term.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Develop ASP.NET applications easier and in less time:
http://www.atozed.com/IntraWeb/
Chad Z. Hower aka Kudzu
Guest
 
Posts: n/a
#14: Nov 17 '05

re: TCPClient in HTTp Request


"Wessel Troost" <nothing@like.the.sun> wrote in
news:op.st9mdvqrf3yrl7@asbel:[color=blue]
> You can specify the HTTP version in your request, which would
> eliminate this speculative problem?[/color]

Only that one. It wont help you with URL encoding, Chunked transfers, or many of the other things
in HTTP 1.0.
[color=blue]
> To write a generic client, the OP might have to read the
> specification. On the other hand, he might just want to contact a
> specific server, with a specific IIS or Apache version, which behaves
> consistently. In which case reading the RFC would be like shooting a
> mouse with a nuclear bomb.[/color]

No - you've seriously understimated the task and do not understand HTTP nor the RFC's. The RFC
is the consistency. IIS and Apache both conform to it. Various user configurations, proxies and
document types will alter your "observed consistency of a single test URL".

Just because you run one test - does not meant it will always respond in that manner.
[color=blue]
> Not that anyone could read an RFC and come up with correct code. It
> takes groups of developers years to do that.[/color]

Which is why you should use an ready made HTTP client and not write one unless you intend to do
it correctly. I can teach my wife how to write a Hello World - but saying "Well its consistent" and
she doesnt need the rest does not mean I can take a vacation and give her my job.

HTTP looks deceptively simple - and small test programs will work in the SHORT TERM.
However there are many more complexities to HTTP than first appear and such hacks will not
work long term.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Develop ASP.NET applications easier and in less time:
http://www.atozed.com/IntraWeb/
Cool Guy
Guest
 
Posts: n/a
#15: Nov 17 '05

re: TCPClient in HTTp Request


Wessel Troost <nothing@like.the.sun> wrote:
[color=blue][color=green]
>> But an HTTP/1.1 server might *not* close the connection after sending the
>> response.
>>[/color]
> You can specify the HTTP version in your request, which would eliminate
> this speculative problem?[/color]

Of course. My mistake.
Cool Guy
Guest
 
Posts: n/a
#16: Nov 17 '05

re: TCPClient in HTTp Request


Wessel Troost <nothing@like.the.sun> wrote:
[color=blue][color=green]
>> But an HTTP/1.1 server might *not* close the connection after sending the
>> response.
>>[/color]
> You can specify the HTTP version in your request, which would eliminate
> this speculative problem?[/color]

Of course. My mistake.
Closed Thread