471,310 Members | 1,156 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,310 software developers and data experts.

Outbound HTML Authentication

Hi,

I was trying to do a simple web scraping tool, but the network they
use at work does some type of internal authentication before it lets
the request out of the network. As a result I'm getting the '401 -
Authentication Error' from the application.

I know when I use a web browser or other application that it uses the
information from my Windows AD to validate my user before it accesses
a website. I'm constantly getting asked to enter in this info before I
use Firefox, and I assume that IE picks it up automatically.

However I'm not sure how to tell the request that I'm building in my
python script to either use the info in my AD account or enter in my
user/pass automatically.

Anyone know how to do this?

Thanks
Nov 29 '07 #1
2 1071
On Nov 29, 2007 2:22 PM, Mudcat <mn******@gmail.comwrote:
Hi,

I was trying to do a simple web scraping tool, but the network they
use at work does some type of internal authentication before it lets
the request out of the network. As a result I'm getting the '401 -
Authentication Error' from the application.

I know when I use a web browser or other application that it uses the
information from my Windows AD to validate my user before it accesses
a website. I'm constantly getting asked to enter in this info before I
use Firefox, and I assume that IE picks it up automatically.

However I'm not sure how to tell the request that I'm building in my
python script to either use the info in my AD account or enter in my
user/pass automatically.
You can configure a proxy for urllib2, but your proxy probably uses
NTLM authentication which urllib2 doesn't support. Your best bet is to
use a local proxy which understands NTLM.
Nov 29 '07 #2
twill is a simple language for browsing the Web. It's designed for
automated testing of Web sites, but it can be used to interact with
Web sites in a variety of ways. In particular, twill supports form
submission, cookies, redirects, and HTTP authentication.

Mudcat wrote:
Hi,

I was trying to do a simple web scraping tool, but the network they
use at work does some type of internal authentication before it lets
the request out of the network. As a result I'm getting the '401 -
Authentication Error' from the application.

I know when I use a web browser or other application that it uses the
information from my Windows AD to validate my user before it accesses
a website. I'm constantly getting asked to enter in this info before I
use Firefox, and I assume that IE picks it up automatically.

However I'm not sure how to tell the request that I'm building in my
python script to either use the info in my AD account or enter in my
user/pass automatically.

Anyone know how to do this?

Thanks

--
Shane Geiger
IT Director
National Council on Economic Education
sg*****@ncee.net | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

Nov 29 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by John | last post: by
6 posts views Thread by William F. Zachmann | last post: by
1 post views Thread by John Rossitter | last post: by
reply views Thread by kv | last post: by
5 posts views Thread by nick | last post: by
2 posts views Thread by nick | last post: by
25 posts views Thread by bmearns | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.