473,383 Members | 1,953 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

screen scrape + login

n8
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?
Nov 18 '05 #1
14 7856
more info required, but here is typical login

1) you request a page with webclient
2) you are returned a redirect header to the login page.
3) you code detects the login redirect, then post the required form data to
the login page (manually view the login page to get the form fields required
and method).

note: an asp.net login site requires that you actually do a get to the
login page to get valid viewstate to postback. other systems may also
require scaping of the get data to before doing the actual post.

4) a successful post to the login will return a cookie value you must send
on subsequent requests, and a redirect header to the originally requested
page.
-- bruce (sqlwork.com)

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
| Hi,
|
| Hi have to do the followign and have been racking my brain with
| various solutions that have had no so great results.
|
| I want to use the System.Net.WebClient to submit data to a form (log a
| user in) and then redirect to the correct article.
|
| Here is the scenerio.
| If you are not logged into the site for certain articles you are
| redirected to a shtml login page. The login.shtml page posts to
| another url for authentication and then lets you in. If have clicked
| on an article that you have to log in to, then you are sent to the
| login page with an appeneded URL,
|
http://www.domainname.com?orq:http:/...a_2653091.shtm
l.
| I have tried setting a webclient request to the url that the above
| login form posts too, but I keep getting Method Not Allowed.
|
| Any Ideas?
Nov 18 '05 #2
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?


Nov 18 '05 #3
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.
--
Joe Fallon

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:av********************************@4ax.com...
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?

Nov 18 '05 #4
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.


Nov 18 '05 #5
n8
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.

Nov 18 '05 #6
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.


Nov 18 '05 #7
n8
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:

>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.

Nov 18 '05 #8
You might try a program called httplook. I think it is
http://www.httplook.com if not, google for it...

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #9
Also, if you get a fix - please let us know.

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #10
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8


Nov 18 '05 #11
n8
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Nov 18 '05 #12
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>


Nov 18 '05 #13
n8
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<0f********************************@4ax.com>. ..
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:

>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>

Nov 18 '05 #14
I remember trying a similar approach once, but I believe it is a
security feature that doesn't let us create a cookie from another
domain. The IE ActiveX control wouldn't let me pass cookies in at all
programaticaly. Argh.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 30 Nov 2004 07:51:33 -0800, na********@yahoo.com (n8) wrote:
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8


Nov 18 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jason Steeves | last post by:
I have one .aspx form that my users fill out and this then takes that information and populates a second .aspx form via session variables. I need to screen scrape the second .aspx form and e-mail...
3
by: Ollie | last post by:
I know you can screen scrape a website using the System.Net.HttpWebResponse & System.Net.HttpWebRequest classes. But how do you screen scrape a secured website (https) that takes a username &...
2
by: Rob Lauer | last post by:
I have written two completely separate web applications that cannot talk directly to one another (applications "A" and "B"). Application "A" has a form that takes some input (radio buttons,...
0
by: Steve | last post by:
I am working on an application to screen scrape information from a web page. I have the base code working but the problem is I have to login before I can get the info I need. The page is hosted on...
5
by: crjunk | last post by:
I have a screen scrape page that allows the user to submit a url. When they hit submit, the page is returned back to them on my screen scrape page. Which computer actuall connects to the url to...
7
by: Swanand Mokashi | last post by:
Hi all -- I would like to create an application(call it Application "A") that I would like to mimic exactly as a form on a foreign system (Application "F"). Application "F" is on the web (so...
7
by: ljr2600 | last post by:
Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a...
3
by: Gregory A Greenman | last post by:
I'm trying to screen scrape a site that requires a password. If I access the site's login page in my browser and view the source, I see that it does not contain a viewstate. When my program...
1
by: newdev | last post by:
Hi All, Can somebody maybe please help me? - how do i screen scrape data from a dos application / window to .net application by using c#? - how do i screen scrape data from a dos application /...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.