472,354 Members | 1,907 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,354 software developers and data experts.

screen scrape + login

n8
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?
Nov 18 '05 #1
14 7802
more info required, but here is typical login

1) you request a page with webclient
2) you are returned a redirect header to the login page.
3) you code detects the login redirect, then post the required form data to
the login page (manually view the login page to get the form fields required
and method).

note: an asp.net login site requires that you actually do a get to the
login page to get valid viewstate to postback. other systems may also
require scaping of the get data to before doing the actual post.

4) a successful post to the login will return a cookie value you must send
on subsequent requests, and a redirect header to the originally requested
page.
-- bruce (sqlwork.com)

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
| Hi,
|
| Hi have to do the followign and have been racking my brain with
| various solutions that have had no so great results.
|
| I want to use the System.Net.WebClient to submit data to a form (log a
| user in) and then redirect to the correct article.
|
| Here is the scenerio.
| If you are not logged into the site for certain articles you are
| redirected to a shtml login page. The login.shtml page posts to
| another url for authentication and then lets you in. If have clicked
| on an article that you have to log in to, then you are sent to the
| login page with an appeneded URL,
|
http://www.domainname.com?orq:http:/...a_2653091.shtm
l.
| I have tried setting a webclient request to the url that the above
| login form posts too, but I keep getting Method Not Allowed.
|
| Any Ideas?
Nov 18 '05 #2
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?


Nov 18 '05 #3
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.
--
Joe Fallon

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:av********************************@4ax.com...
I have an exampe of this here:

http://odetocode.com/Articles/162.aspx

It's basically posting to the login form, getting the cookie back, and
then making sure to send the cookie along when requesting the
protected content.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 24 Nov 2004 13:55:23 -0800, na********@yahoo.com (n8) wrote:
Hi,

Hi have to do the followign and have been racking my brain with
various solutions that have had no so great results.

I want to use the System.Net.WebClient to submit data to a form (log a
user in) and then redirect to the correct article.

Here is the scenerio.
If you are not logged into the site for certain articles you are
redirected to a shtml login page. The login.shtml page posts to
another url for authentication and then lets you in. If have clicked
on an article that you have to log in to, then you are sent to the
login page with an appeneded URL,
http://www.domainname.com?orq:http:/..._2653091.shtml.
I have tried setting a webclient request to the url that the above
login form posts too, but I keep getting Method Not Allowed.

Any Ideas?

Nov 18 '05 #4
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.


Nov 18 '05 #5
n8
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponseS tream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
Scott,
FYI - that was one of the best articles on the subject I ever read.
I was completely stuck on this issue about 6 months ago and I implemented it
straight away using the concepts you presented here.

Excellent work and explanation.

Nov 18 '05 #6
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:
>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.


Nov 18 '05 #7
n8
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
Thanks for the example. I had seen your example earlier and had tried
it and always get to one particular point where I cannot seem to get
beyond. There are two hidden fields both called web.fixed_values that
appear to be something like a view state but the page is shtml. I am
and have been able to pull down the site, etc. but everytime I try and
post my data (with or without the web.fixed_values) I always get the
response Method Not Allowed. Below is the code I am using along with
the sire I am trying to access with my account. ANy further help on
this would be greatly appreciated.

private void Page_Load(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://augustachronicle.com/login.shtml";
string cookieAge = "31536000";

try
{
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
HttpWebRequest;

StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

string responseData = responseReader.ReadToEnd();
responseReader.Close();

// get the web fixed values
string fixedvalue1 = ExtractFixedValues1(responseData);

string fixedvalue2 = ExtractFixedValues2(responseData);

string postData = String.Format("web.fixed_values={0}&web.fixed_valu es={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={ 4}",fixedvalue1,fixedvalue2,userName,
password, cookieAge);

// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();

// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;

// write the form values into the request message
StreamWriter requestWriter = new
StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();

// we don't need the contents of the response, just the cookie it
issues
webRequest.GetResponse().Close();

// now we can send out cookie along with a request for the protected
page
webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new
StreamReader(webRequest.GetResponse().GetResponse Stream());

// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();

Response.Write(responseData);
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}

}

private string ExtractFixedValues1(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(
valueDelimiter, viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
s.Substring(viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
private string ExtractFixedValues2(string s)
{
string viewStateNameDelimiter = "web.fixed_values";
string valueDelimiter = "value=\"";

int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter,
viewStateNamePosition
);

int viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

string sTemp = s.Remove(0,viewStateEndPosition);

viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
viewStateValuePosition = sTemp.IndexOf(
valueDelimiter, viewStateNamePosition
);

viewStateStartPosition = viewStateValuePosition +
valueDelimiter.Length;
viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

return HttpUtility.UrlEncodeUnicode(
sTemp.Substring(
viewStateStartPosition,
viewStateEndPosition - viewStateStartPosition
)
);
}
Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<k8********************************@4ax.com>. ..
Thanks, Joe. I appreciate the feedback.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
<jf******@nospamtwcny.rr.com> wrote:

>Scott,
>FYI - that was one of the best articles on the subject I ever read.
>I was completely stuck on this issue about 6 months ago and I implemented it
>straight away using the concepts you presented here.
>
>Excellent work and explanation.

Nov 18 '05 #8
You might try a program called httplook. I think it is
http://www.httplook.com if not, google for it...

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #9
Also, if you get a fix - please let us know.

"n8" <na********@yahoo.com> wrote in message
news:6a**************************@posting.google.c om...
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message
news:<2b********************************@4ax.com>. ..
Everything looks like it is in order, Nathan. I'd examine the HTTP
traffic between your program and the server to make sure it all
matches exactly, even little things like the Agent header. I had one
financial site reject HttpWebRequests until I set the UserAgent
property to look just like IE. I guess it was a weak attempt at
preventing screen scraping programs.

--
Scott
http://www.OdeToCode.com/blogs/scott/

n 27 Nov 2004 12:39:42 -0800, na********@yahoo.com (n8) wrote:
>Thanks for the example. I had seen your example earlier and had tried
>it and always get to one particular point where I cannot seem to get
>beyond. There are two hidden fields both called web.fixed_values that
>appear to be something like a view state but the page is shtml. I am
>and have been able to pull down the site, etc. but everytime I try and
>post my data (with or without the web.fixed_values) I always get the
>response Method Not Allowed. Below is the code I am using along with
>the sire I am trying to access with my account. ANy further help on
>this would be greatly appreciated.
>
>private void Page_Load(object sender, System.EventArgs e)
>{
>string LOGIN_URL = "http://augustachronicle.com/login.shtml";
>string cookieAge = "31536000";
>
>try
>{
>HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
>HttpWebRequest;
>
>StreamReader responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>string responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>// get the web fixed values
>string fixedvalue1 = ExtractFixedValues1(responseData);
>
>string fixedvalue2 = ExtractFixedValues2(responseData);
>
>string postData =
>String.Format("web.fixed_values={0}&web.fixed_val ues={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age= {4}",fixedvalue1,fixedvalue2,userName,
>password, cookieAge);
>
>// have a cookie container ready to receive the forms auth cookie
>CookieContainer cookies = new CookieContainer();
>
>// now post to the login form
>webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
>webRequest.Method = "POST";
>webRequest.ContentType = "application/x-www-form-urlencoded";
>webRequest.CookieContainer = cookies;
>
>// write the form values into the request message
>StreamWriter requestWriter = new
>StreamWriter(webRequest.GetRequestStream());
>requestWriter.Write(postData);
>requestWriter.Close();
>
>// we don't need the contents of the response, just the cookie it
>issues
>webRequest.GetResponse().Close();
>
>// now we can send out cookie along with a request for the protected
>page
>webRequest =
>WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
>as HttpWebRequest;
>webRequest.CookieContainer = cookies;
>responseReader = new
>StreamReader(webRequest.GetResponse().GetResponse Stream());
>
>// and read the response
>responseData = responseReader.ReadToEnd();
>responseReader.Close();
>
>Response.Write(responseData);
>}
>catch (Exception ex)
>{
>Response.Write(ex.ToString());
>}

>}
>
>private string ExtractFixedValues1(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>s.Substring(viewStateStartPosition,
> viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>private string ExtractFixedValues2(string s)
>{
>string viewStateNameDelimiter = "web.fixed_values";
>string valueDelimiter = "value=\"";
>
>int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
>int viewStateValuePosition = s.IndexOf(valueDelimiter,
>viewStateNamePosition
> );
>
>int viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
>
>string sTemp = s.Remove(0,viewStateEndPosition);
>
>viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
>viewStateValuePosition = sTemp.IndexOf(
>valueDelimiter, viewStateNamePosition
>);
>
>viewStateStartPosition = viewStateValuePosition +
>valueDelimiter.Length;
>viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
>
>return HttpUtility.UrlEncodeUnicode(
>sTemp.Substring(
>viewStateStartPosition,
>viewStateEndPosition - viewStateStartPosition
>)
>);
>}
>
>
>Scott Allen <bitmask@[nospam].fred.net> wrote in message
>news:<k8********************************@4ax.com> ...
>> Thanks, Joe. I appreciate the feedback.
>>
>> --
>> Scott
>> http://www.OdeToCode.com/blogs/scott/
>>
>> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
>> <jf******@nospamtwcny.rr.com> wrote:
>>
>> >Scott,
>> >FYI - that was one of the best articles on the subject I ever read.
>> >I was completely stuck on this issue about 6 months ago and I
>> >implemented it
>> >straight away using the concepts you presented here.
>> >
>> >Excellent work and explanation.

Nov 18 '05 #10
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8


Nov 18 '05 #11
n8
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
Scott,

Thanks for the information. I added a useragent to make it look like
IE, but I still get the 405 Method not allowed error message. What is
the best way to monitor the HTTP Traffic between my application and
the remote site? Are there any tools i can download to show me what
is going back and forth?

Thanks in advance,

n8

Nov 18 '05 #12
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:
>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>


Nov 18 '05 #13
n8
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<0f********************************@4ax.com>. ..
Hmm - I'm running out of ideas n8.

I know there are sites out there blocking scrapers, but they usually
either block an IP or use client side script and DHTML to try to screw
up programs. If your app is sending the same traffic as the browser
that wouldn't be an issue.

So, my last idea is this:

Last year I had a site that would occasionaly reject my web request
from a screen scraping program. It was in a loop moving through a
paged result set, and I couldn't figure out the random failures. On a
whim I put in a few Thread.Sleep calls to slow the scraper down
between requests and it never failed. I'm not sure if they monitored
requests by IP to only allow so many per second or minute or what,
though it was definitely timing related.

I guess the only other thing I'd do is really double check those HTTP
payloads and make sure everything matches - the headers, the POST data
is properly encoded, the cookie is sent, etc. etc.

HTH!

--
Scott
http://www.OdeToCode.com/blogs/scott/\

On 29 Nov 2004 11:42:48 -0800, na********@yahoo.com (n8) wrote:
Scott,

I loaded the fiddler tool and traced the HTTP traffic. Everything
getting sent look sno different than when i go directly to it. Am I
to assume that they have a way of blocking screen scrapes and if so,
how would I explain this?

Thanks,

n8

Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<mq********************************@4ax.com>. ..
One I've used with success is Fiddler.

http://www.fiddlertool.com/fiddler/

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 28 Nov 2004 13:36:40 -0800, na********@yahoo.com (n8) wrote:

>Scott,
>
>Thanks for the information. I added a useragent to make it look like
>IE, but I still get the 405 Method not allowed error message. What is
>the best way to monitor the HTTP Traffic between my application and
>the remote site? Are there any tools i can download to show me what
>is going back and forth?
>
>Thanks in advance,
>
>n8
>

Nov 18 '05 #14
I remember trying a similar approach once, but I believe it is a
security feature that doesn't let us create a cookie from another
domain. The IE ActiveX control wouldn't let me pass cookies in at all
programaticaly. Argh.

--
Scott
http://www.OdeToCode.com/blogs/scott/

On 30 Nov 2004 07:51:33 -0800, na********@yahoo.com (n8) wrote:
a different approach. since i have been rackign my head against the
wall with this approach I thought I would try another. I thought I
would create the cookies on the fly that the site requires for the
user account and everything would be create. I can create the cookies
exactly, BUT if i change the domain property or use the domain
property the cookie does not get written, if i leave the property (do
not use it), the cooie gets written as localhost. how do i get around
this so I can set the domain name property?

thanks again

n8


Nov 18 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jason Steeves | last post by:
I have one .aspx form that my users fill out and this then takes that information and populates a second .aspx form via session variables. I need to screen scrape the second .aspx form and e-mail...
3
by: Ollie | last post by:
I know you can screen scrape a website using the System.Net.HttpWebResponse & System.Net.HttpWebRequest classes. But how do you screen scrape a secured website (https) that takes a username &...
2
by: Rob Lauer | last post by:
I have written two completely separate web applications that cannot talk directly to one another (applications "A" and "B"). Application "A" has a form that takes some input (radio buttons,...
0
by: Steve | last post by:
I am working on an application to screen scrape information from a web page. I have the base code working but the problem is I have to login before I can get the info I need. The page is hosted on...
5
by: crjunk | last post by:
I have a screen scrape page that allows the user to submit a url. When they hit submit, the page is returned back to them on my screen scrape page. Which computer actuall connects to the url to...
7
by: Swanand Mokashi | last post by:
Hi all -- I would like to create an application(call it Application "A") that I would like to mimic exactly as a form on a foreign system (Application "F"). Application "F" is on the web (so...
7
by: ljr2600 | last post by:
Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a...
3
by: Gregory A Greenman | last post by:
I'm trying to screen scrape a site that requires a password. If I access the site's login page in my browser and view the source, I see that it does not contain a viewstate. When my program...
1
by: newdev | last post by:
Hi All, Can somebody maybe please help me? - how do i screen scrape data from a dos application / window to .net application by using c#? - how do i screen scrape data from a dos application /...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made but the http to https rule only works for...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.