471,330 Members | 1,548 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,330 software developers and data experts.

urlgrabber and ClientForm problem

2
Hi,
I'm writing a web automation script using ClientForm and urlgrabber.
I use urlgrabber because I need the "http keepalive" which doesn't exist in urllib2.

I'm facing a problem, the "form.click()" returns a urllib2.Request object which I can't pass to urlgrabber.urlopen().

In the ClientForm web page, I see "see HTMLForm.click.__doc__ if you don't have urllib2", but I have no idea what HTMLForm is.
I have tried to search it in google, nothing interesting.

Thanks for any help.

The error message is as follows.
Expand|Select|Wrap|Line Numbers
  1. Traceback (most recent call last):
  2.   File "wretch4.py", line 53, in ?
  3.     response2 = urlopen(request2)
  4.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 605, in urlopen
  5.     return default_grabber.urlopen(url, **kwargs)
  6.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 881, in urlopen
  7.     (url,parts) = opts.urlparser.parse(url, opts) 
  8.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 653, in parse
  9.     parts = urlparse.urlparse(url)
  10.   File "/usr/lib/python2.4/urlparse.py", line 50, in urlparse
  11.     tuple = urlsplit(url, scheme, allow_fragments)
  12.   File "/usr/lib/python2.4/urlparse.py", line 89, in urlsplit
  13.     i = url.find(':')
  14.   File "/usr/lib/python2.4/urllib2.py", line 207, in __getattr__
  15.     raise AttributeError, attr
  16.  


My code is as follows.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/python
  2. from ClientForm import ParseResponse
  3. from urlgrabber import urlopen
  4.  
  5.  
  6. url = 'http://www.ggggg.com'
  7.  
  8. headers = (
  9.         ('User-Agent','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/20071115 Iceweasel/2.0.0.10 (Debian-2.0.0.10-0etch1)'),
  10.         ('Accept','text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'),
  11.         ('Accept-Language','en-us,en;q=0.5'),
  12. #       ('Accept-Encoding','gzip,deflate'),
  13.         ('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7'),
  14.         ('Keep-Alive','300'),
  15.         ('Connection','keep-alive'),
  16.         ('Referer',url),
  17.         ('Content-Type','application/x-www-form-urlencoded'))
  18.  
  19. response = urlopen(url,http_headers = headers)
  20. forms = ParseResponse(response, backwards_compat=False)
  21.  
  22. form = forms[0]
  23. print form
  24.  
  25. form["passwd"] = "ggggg"
  26.  
  27.  
  28. request2 = form.click()
  29. response2 = urlopen(request2)
  30.  
  31. print response2
  32.  
  33.  
Dec 25 '07 #1
2 3725
Hi,
I'm writing a web automation script using ClientForm and urlgrabber.
I use urlgrabber because I need the "http keepalive" which doesn't exist in urllib2.

I'm facing a problem, the "form.click()" returns a urllib2.Request object which I can't pass to urlgrabber.urlopen().

In the ClientForm web page, I see "see HTMLForm.click.__doc__ if you don't have urllib2", but I have no idea what HTMLForm is.
I have tried to search it in google, nothing interesting.

Thanks for any help.

The error message is as follows.
Expand|Select|Wrap|Line Numbers
  1. Traceback (most recent call last):
  2.   File "wretch4.py", line 53, in ?
  3.     response2 = urlopen(request2)
  4.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 605, in urlopen
  5.     return default_grabber.urlopen(url, **kwargs)
  6.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 881, in urlopen
  7.     (url,parts) = opts.urlparser.parse(url, opts) 
  8.   File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 653, in parse
  9.     parts = urlparse.urlparse(url)
  10.   File "/usr/lib/python2.4/urlparse.py", line 50, in urlparse
  11.     tuple = urlsplit(url, scheme, allow_fragments)
  12.   File "/usr/lib/python2.4/urlparse.py", line 89, in urlsplit
  13.     i = url.find(':')
  14.   File "/usr/lib/python2.4/urllib2.py", line 207, in __getattr__
  15.     raise AttributeError, attr
  16.  


My code is as follows.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/python
  2. from ClientForm import ParseResponse
  3. from urlgrabber import urlopen
  4.  
  5.  
  6. url = 'http://www.ggggg.com'
  7.  
  8. headers = (
  9.         ('User-Agent','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/20071115 Iceweasel/2.0.0.10 (Debian-2.0.0.10-0etch1)'),
  10.         ('Accept','text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'),
  11.         ('Accept-Language','en-us,en;q=0.5'),
  12. #       ('Accept-Encoding','gzip,deflate'),
  13.         ('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7'),
  14.         ('Keep-Alive','300'),
  15.         ('Connection','keep-alive'),
  16.         ('Referer',url),
  17.         ('Content-Type','application/x-www-form-urlencoded'))
  18.  
  19. response = urlopen(url,http_headers = headers)
  20. forms = ParseResponse(response, backwards_compat=False)
  21.  
  22. form = forms[0]
  23. print form
  24.  
  25. form["passwd"] = "ggggg"
  26.  
  27.  
  28. request2 = form.click()
  29. response2 = urlopen(request2)
  30.  
  31. print response2
  32.  
  33.  
Hi,

I don't have an answer for you, sorry. I am not an expert Pyton programmer. But I have a problem that I think you may be able to assist me with, if you don't mind.

I have used cookielib to login to a website. I am unable to connect to additional urls within the webiste once I have logged on. I have the cookie, header, and so on setup properly - I believe. However, I notice from using 'HTTPAnalyze' program to check the header, that it is using a KEEPALIVE connection. This is new to me. I note from your problem statement that URLGRABBER may be my solution to addressing KEEPALIVE.

My problem is that I have tried to utilize URLGRABBER to handle the KEEPALIVE. I can acess the webiste and read, open, and grab: ,but I can't figure out how to login ('username' and 'passwd'), and let alone access additional pages inside the site. I am stumped. No examples available for this online after lots of searching. I was hoping that you might assist me with how to do this by providing a simple example. I am a beginner Python programmer/user.

I am using Pyton 2.5, on Windows XP, IE7 (if manually browsing)

I appreciate it very much.
Dec 26 '07 #2
hp1980
2
Hi Mcgrete,
First you got to figure out how to login to the site, I think it might be GET or POST.

You can search http GET and POST in google and the urlgrabber manual can help you too.
http://linux.duke.edu/projects/urlgrabber/help/urlgrabber.grabber.html


P.S: I have overcome the problem I met, finally I use the mechanize module, my problem is nothing to do with "KEEPALIVE".
Dec 27 '07 #3

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by Matej Cepl | last post: by
5 posts views Thread by narke | last post: by
reply views Thread by kostem | last post: by
reply views Thread by m0sf3t | last post: by
1 post views Thread by Gordon Airporte | last post: by
2 posts views Thread by Devraj | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.