urlgrabber and ClientForm problem

hp1980

Hi,
I'm writing a web automation script using ClientForm and urlgrabber.
I use urlgrabber because I need the "http keepalive" which doesn't exist in urllib2.

I'm facing a problem, the "form.click()" returns a urllib2.Request object which I can't pass to urlgrabber.urlopen().

In the ClientForm web page, I see "see HTMLForm.click.__doc__ if you don't have urllib2", but I have no idea what HTMLForm is.
I have tried to search it in google, nothing interesting.

Thanks for any help.

The error message is as follows.

Expand|Select|Wrap|Line Numbers

 
Traceback (most recent call last):

  File "wretch4.py", line 53, in ?

    response2 = urlopen(request2)

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 605, in urlopen

    return default_grabber.urlopen(url, **kwargs)

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 881, in urlopen

    (url,parts) = opts.urlparser.parse(url, opts) 

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 653, in parse

    parts = urlparse.urlparse(url)

  File "/usr/lib/python2.4/urlparse.py", line 50, in urlparse

    tuple = urlsplit(url, scheme, allow_fragments)

  File "/usr/lib/python2.4/urlparse.py", line 89, in urlsplit

    i = url.find(':')

  File "/usr/lib/python2.4/urllib2.py", line 207, in __getattr__

    raise AttributeError, attr

My code is as follows.

Expand|Select|Wrap|Line Numbers

 
#!/usr/bin/python

from ClientForm import ParseResponse

from urlgrabber import urlopen
 
url = 'http://www.ggggg.com'
 
headers = (

        ('User-Agent','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/20071115 Iceweasel/2.0.0.10 (Debian-2.0.0.10-0etch1)'),

        ('Accept','text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'),

        ('Accept-Language','en-us,en;q=0.5'),

#       ('Accept-Encoding','gzip,deflate'),

        ('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7'),

        ('Keep-Alive','300'),

        ('Connection','keep-alive'),

        ('Referer',url),

        ('Content-Type','application/x-www-form-urlencoded'))
 
response = urlopen(url,http_headers = headers)

forms = ParseResponse(response, backwards_compat=False)
 
form = forms[0]

print form
 
form["passwd"] = "ggggg"
 
request2 = form.click()

response2 = urlopen(request2)
 
print response2

Dec 25 '07 #1

Subscribe Post Reply

3921

mcgrete

Hi,
I'm writing a web automation script using ClientForm and urlgrabber.
I use urlgrabber because I need the "http keepalive" which doesn't exist in urllib2.

I'm facing a problem, the "form.click()" returns a urllib2.Request object which I can't pass to urlgrabber.urlopen().

In the ClientForm web page, I see "see HTMLForm.click.__doc__ if you don't have urllib2", but I have no idea what HTMLForm is.
I have tried to search it in google, nothing interesting.

Thanks for any help.

The error message is as follows.

Expand|Select|Wrap|Line Numbers

Traceback (most recent call last):

  File "wretch4.py", line 53, in ?

    response2 = urlopen(request2)

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 605, in urlopen

    return default_grabber.urlopen(url, **kwargs)

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 881, in urlopen

    (url,parts) = opts.urlparser.parse(url, opts)

  File "/var/lib/python-support/python2.4/urlgrabber/grabber.py", line 653, in parse

    parts = urlparse.urlparse(url)

  File "/usr/lib/python2.4/urlparse.py", line 50, in urlparse

    tuple = urlsplit(url, scheme, allow_fragments)

  File "/usr/lib/python2.4/urlparse.py", line 89, in urlsplit

    i = url.find(':')

  File "/usr/lib/python2.4/urllib2.py", line 207, in __getattr__

    raise AttributeError, attr

My code is as follows.

Expand|Select|Wrap|Line Numbers

#!/usr/bin/python

from ClientForm import ParseResponse

from urlgrabber import urlopen

url = 'http://www.ggggg.com'

headers = (

        ('User-Agent','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/20071115 Iceweasel/2.0.0.10 (Debian-2.0.0.10-0etch1)'),

        ('Accept','text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'),

        ('Accept-Language','en-us,en;q=0.5'),

#       ('Accept-Encoding','gzip,deflate'),

        ('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7'),

        ('Keep-Alive','300'),

        ('Connection','keep-alive'),

        ('Referer',url),

        ('Content-Type','application/x-www-form-urlencoded'))

response = urlopen(url,http_headers = headers)

forms = ParseResponse(response, backwards_compat=False)

form = forms[0]

print form

form["passwd"] = "ggggg"

request2 = form.click()

response2 = urlopen(request2)

print response2

Hi,

I don't have an answer for you, sorry. I am not an expert Pyton programmer. But I have a problem that I think you may be able to assist me with, if you don't mind.

I have used cookielib to login to a website. I am unable to connect to additional urls within the webiste once I have logged on. I have the cookie, header, and so on setup properly - I believe. However, I notice from using 'HTTPAnalyze' program to check the header, that it is using a KEEPALIVE connection. This is new to me. I note from your problem statement that URLGRABBER may be my solution to addressing KEEPALIVE.

My problem is that I have tried to utilize URLGRABBER to handle the KEEPALIVE. I can acess the webiste and read, open, and grab: ,but I can't figure out how to login ('username' and 'passwd'), and let alone access additional pages inside the site. I am stumped. No examples available for this online after lots of searching. I was hoping that you might assist me with how to do this by providing a simple example. I am a beginner Python programmer/user.

I am using Pyton 2.5, on Windows XP, IE7 (if manually browsing)

I appreciate it very much.

Dec 26 '07 #2

hp1980

Hi Mcgrete,
First you got to figure out how to login to the site, I think it might be GET or POST.

You can search http GET and POST in google and the urlgrabber manual can help you too.
http://linux.duke.edu/projects/urlgrabber/help/urlgrabber.grabber.html

P.S: I have overcome the problem I met, finally I use the mechanize module, my problem is nothing to do with "KEEPALIVE".

Dec 27 '07 #3

Similar topics

Seekable output from ClientForm?

by: Matej Cepl | last post by:

Hi, using python 2.3, ClientForm, and ClientCookie and I have this code: opener = ClientCookie.build_opener(ClientCookie.HTTPRefererProcessor, ClientCookie.HTTPRefreshProcessor,...

Python

A ClientForm Question

by: narke | last post by:

Does anyone here use ClientForm to handle a HTML form on client side? I got a form, within which there is a image control, it direct me to another page if i use mouse click on it. the code of...

Python

ClientForm question

by: kostem | last post by:

Hi, I need some help on using ClientForm to post to cgi and getting response. I have done this many times and it worked very well until now. I have contacted the webmaster of the page I'm...

Python

A Clientform question

by: m0sf3t | last post by:

Does anyone here use ClientForm to handle a HTML form on client side? I try to open this page https://www.orange.ch/footer/login but got this message File...

Python

Python Help ... Using ClientForm package with HTTP authentication ? Is possible ?

by: emiliano | last post by:

Hey guys, i was just googling some information about how to use the ClientForm package with a page which requires HTTP basic authentication and i got here :P ... So here is the problem, lets see if...

Python

ClientForm .click() oddity

by: Gordon Airporte | last post by:

I've written a script using ClientForm to automate opening and closing ports on my Linksys router. It works, but I wonder if there isn't a better way to do it. The problem is that the list of...

Python

urlgrabber cookie handling?

by: Devraj | last post by:

Hi everyone, I have been battling to make my code work with a HTTPS proxy, current my code uses urllib2 to to most things and works well, except that urllib2 doesn't handle HTTPS proxies. ...

Python

Automated file download from website using clientform

by: skiani | last post by:

Hi, I'm trying to automate downloading a file from a website. The site has a cookie based authentication so I use ClientCookie to first login then I fillout the form with ClientForm successfully....

Python

Java client server connection problem

by: Elaine121 | last post by:

Hi i've been batteling for hours and can't seem to find the problem. When my server runs and I press the connect button the gui freezes until the client gui is terminated.. only then the gui becomes...

Java

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp