469,890 Members | 1,515 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,890 developers. It's quick & easy.

screen scrape + login

n8
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?
Nov 18 '05 #1
14 7626
more info required, but here is typical login

1) you request a page with webclient
2) you are returned a redirect header to the login page.
3) you code detects the login redirect, then post the required form data to
the login page (manually view the login page to get the form fields required
and method).

note: an asp.net login site requires that you actually do a get to the
login page to get valid viewstate to postback. other systems may also
require scaping of the get data to before doing the actual post.

4) a successful post to the login will return a cookie value you must send
on subsequent requests, and a redirect header to the originally requested
page.
-- bruce (sqlwork.com)

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
| Hi,
|
| Hi have to do the followign and have been racking my brain with
| various solutions that have had no so great results.
|
| I want to use the System.Net.WebClient to submit data to a form (log a
| user in) and then redirect to the correct article.
|
| Here is the scenerio.
| If you are not logged into the site for certain articles you are
| redirected to a shtml login page. The login.shtml page posts to
| another url for authentication and then lets you in. If have clicked
| on an article that you have to log in to, then you are sent to the
| login page with an appeneded URL,
|
http://www.domainname.com?orq:http:/...a_2653091.shtm
l.
| I have tried setting a webclient request to the url that the above
| login form posts too, but I keep getting Method Not Allowed.
|
| Any Ideas?
Nov 18 '05 #2
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?


Nov 18 '05 #3
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.
--
Joe Fallon

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:av********************************@4ax.com...
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?

Nov 18 '05 #4
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.


Nov 18 '05 #5
n8
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.

Nov 18 '05 #6
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.


Nov 18 '05 #7
n8
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:

>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.

Nov 18 '05 #8
You might try a program called httplook. I think it is
http://www.httplook.com if not, google for it...

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #9
Also, if you get a fix - please let us know.

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #10
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8


Nov 18 '05 #11
n8
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Nov 18 '05 #12
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>


Nov 18 '05 #13
n8
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<0f********************************@4ax.com>. ..
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:

>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>

Nov 18 '05 #14
I remember trying a similar approach once, but I believe it is a
security feature that doesn't let us create a cookie from another
domain. The IE ActiveX control wouldn't let me pass cookies in at all
programaticaly. Argh.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 30 Nov 2004 07:51:33 -0800, na********@yahoo.com (n8) wrote:
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8


Nov 18 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Jason Steeves | last post: by
2 posts views Thread by Rob Lauer | last post: by
reply views Thread by Steve | last post: by
5 posts views Thread by crjunk | last post: by
7 posts views Thread by Swanand Mokashi | last post: by
7 posts views Thread by ljr2600 | last post: by
3 posts views Thread by Gregory A Greenman | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.