Morgan Cheng wrote:
I happens to surf to
http://www.codeproject.com/cs/internet/Crawler.asp, which claims that
WebRequest.GetR esponse() will block other thread calling this function
until WebResponse.Clo se() is called.
I did some experimentation .
public static void Main(string[] args)
{
for (int idx=0; idx<10; ++idx)
{
ThreadPool.Queu eUserWorkItem(n ew WaitCallback(te stWeb), idx);
}
}
private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest )WebRequest.Cre ate(uri);
Console("in thread " + idx);
HttpWebResponse response = (HttpWebRespons e)request.GetRe sponse();
Console.WriteLi ne( response.Conten tType + "; idx = " + (int)flag );
// response.Close( );
}
The code runs with output like below:
in thread 0
in thread 1
text/html; charset=UTF-8; idx=0
text/html; charset=UTF-8; idx=1
in thread 2
in thread 3
in thread 4
in thread 5
in thread 6
in thread 7
in thread 8
in thread 9
"idx" may be other value, but only 2 threads get through GetRespnse()
all the time. It seems other 18 threads are stuck at
HttpWebRequest. GetResponse().
After I un-comment the line " response.Close( )", it prints expected 20
lines. There must something occupied by HttpWebResonse before it is
closed.
Does HttpWebResonse instance occupy some resouce which there is only 2
availabe instance? If this is the case, it is really a issue for
application needs many WebResponse instance. e.g. web crawler.
The response stream is left open for you to examine the data returned b
the webresponse, but it's actually only ever downloaded should you need
it (to prevent unneeded data transfers and to make sure you get the
response within a reasonable amount of time).
I'm not sure why the number is two, but it is good practise to keep the
number of concurrent connections you open to one site to a minimum, so
that you don't overload the site in question. The WebClient
automatically makes sure you don't open too many connections.
By the way, you shouldn't just call Close after you've gotten the
webrespone. If anything happens in between the connection is likely to
remain open for some time, which is not what you would want. To make
sure it is closed in time add a using statement:
using System.Threadin g;
using System.IO;
using System.Net;
using System;
public class TestConsoleApp
{
public static void Main(string[] args)
{
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.Queu eUserWorkItem(n ew WaitCallback(te stWeb), idx);
}
Console.ReadLin e();
}
private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest )WebRequest.Cre ate(uri);
request.KeepAli ve = false;
Console.WriteLi ne("in thread " + idx);
using (HttpWebRespons e response =
(HttpWebRespons e)request.GetRe sponse())
{
Console.WriteLi ne(response.Con tentType + "; idx = " +
(int)idx);
}
}
}
The WebResponse is then automatically closed once it goes out of scope.
You can see that this only happens if you try to open many connections
to the same website. I've altered your test to show this:
using System.Threadin g;
using System.IO;
using System.Net;
using System;
public class TestConsoleApp
{
private static string[] _urls = new string[]
{
"http://www.gmail.com",
"http://www.google.com" ,
"http://www.google.co.u k",
"http://www.google.nl",
"http://www.google.ie",
"http://www.google.de",
"http://www.amazon.com" ,
"http://www.microsoft.c om",
"http://www.tweakers.ne t",
"http://www.cnn.com"
};
private static string[] _urlsSame = new string[]
{
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.google.com" ,
"http://www.google.com" ,
"http://www.google.com"
};
public static void Main(string[] args)
{
Console.WriteLi ne("Test A");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.Queu eUserWorkItem(n ew
WaitCallback(te stWebWorking), _urls[idx]);
}
Console.ReadLin e();
Console.WriteLi ne("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.Queu eUserWorkItem(n ew
WaitCallback(te stWebFaulty), _urls[idx]);
}
Console.ReadLin e();
Console.WriteLi ne("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.Queu eUserWorkItem(n ew
WaitCallback(te stWebWorking), _urlsSame[idx]);
}
Console.ReadLin e();
}
private static void testWebWorking( object url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest )WebRequest.Cre ate(uri);
request.KeepAli ve = false;
Console.WriteLi ne("opening: " + uri);
using (HttpWebRespons e response =
(HttpWebRespons e)request.GetRe sponse())
{
Console.WriteLi ne(response.Con tentType + "; uri = " + uri);
}
}
private static void testWebFaulty(o bject url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest )WebRequest.Cre ate(uri);
request.KeepAli ve = false;
Console.WriteLi ne("opening: " + uri);
HttpWebResponse response = (HttpWebRespons e)request.GetRe sponse();
Console.WriteLi ne(response.Con tentType + "; uri = " + uri);
}
}
test A works regardless of which uri you feed it.
test B only works if there are not too many connections to the same
server (first test B will succeed, second test will fail).
Jesse Houwing