472,378 Members | 1,219 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,378 software developers and data experts.

Re: using urllib2


Okay, I tried to follow that, and it is kinda hard. But since you obviously
know what you are doing, where did you learn this? Or where can I learn
this?
Maric Michaud wrote:

Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écritÂ*:
>I have never used the urllib or the urllib2. I really have looked online
for help on this issue, and mailing lists, but I can't figure out my
problem because people haven't been helping me, which is why I am here!
:].
Okay, so basically I want to be able to submit a word to dictionary.com
and
then get the definitions. However, to start off learning urllib2, I just
want to do a simple google search. Before you get mad, what I have found
on
urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
I
did not post the html, but I mean if you want, right click on your
browser
and hit view source of the google homepage. Basically what I want to know
is how to submit the values(the search term) and then search for that
value. Heres what I know:

import urllib2
response = urllib2.urlopen("http://www.google.com/")
html = response.read()
print html

Now I know that all this does is print the source, but thats about all I
know. I know it may be a lot to ask to have someone show/help me, but I
really would appreciate it.
This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :
>>>>[207]: import urllib, urllib2
You need to trick the server with an imaginary User-Agent.
>>>>[208]: def google_search(terms) :
return urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
+
urllib.urlencode({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:
>>>>[212]: res = google_search("python & co")
Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :
>>>>[213]: import re
>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
class=r>.*?</h2>',
res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
'Re: os x, panther, python &amp; co: msg#00041',
'Re: os x, panther, python &amp; co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees &lt; Programs &lt; Python &lt; Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python &amp; Co']
--
_____________

Maric Michaud
--
http://mail.python.org/mailman/listinfo/python-list
--
View this message in context: http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Jun 27 '08 #1
3 1864
I stumbled across this a while back: http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gmail.comwrote:
Okay, I tried to follow that, and it is kinda hard. But since you obviously
know what you are doing, where did you learn this? Or where can I learn
this?

Maric Michaud wrote:
Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
I have never used the urllib or the urllib2. I really have looked online
for help on this issue, and mailing lists, but I can't figure out my
problem because people haven't been helping me, which is why I am here!
:].
Okay, so basically I want to be able to submit a word to dictionary.com
and
then get the definitions. However, to start off learning urllib2, I just
want to do a simple google search. Before you get mad, what I have found
on
urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
I
did not post the html, but I mean if you want, right click on your
browser
and hit view source of the google homepage. Basically what I want to know
is how to submit the values(the search term) and then search for that
value. Heres what I know:
import urllib2
response = urllib2.urlopen("http://www.google.com/")
html = response.read()
print html
Now I know that all this does is print the source, but thats about allI
know. I know it may be a lot to ask to have someone show/help me, but I
really would appreciate it.
This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :
>>>[207]: import urllib, urllib2
You need to trick the server with an imaginary User-Agent.
>>>[208]: def google_search(terms) :
return urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
+
urllib.urlencode({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:
>>>[212]: res = google_search("python & co")
Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :
>>>[213]: import re
>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
class=r>.*?</h2>',
res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ....',
'Re: os x, panther, python &amp; co: msg#00041',
'Re: os x, panther, python &amp; co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees &lt; Programs &lt; Python &lt; Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python &amp; Co']
--
_____________
Maric Michaud
--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in context:http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.


Jun 27 '08 #2

I have read that multiple times. It is hard to understand but it did help a
little. But I found a bit of a work-around for now which is not what I
ultimately want. However, even when I can get to the page I want lets say,
"Http://dictionary.reference.com/browse/cheese", I look on firebug, and
extension and see the definition in javascript,

<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1.</td>
<td valign="top">the curd of milk separated from the whey and prepared in
many ways as a food. </td>

Jeff McNeil-2 wrote:


the problem being that if I use code like this to get the html of that
page in python:

response = urllib2.urlopen("the webiste....")
html = response.read()
print html

then, I get a bunch of stuff, but it doesn't show me the code with the
table that the definition is in. So I am asking how do I access this
javascript. Also, if someone could point me to a better reference than the
last one, because that really doesn't tell me much, whether it be a book
or anything.

I stumbled across this a while back:
http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gmail.comwrote:
>Okay, I tried to follow that, and it is kinda hard. But since you
obviously
know what you are doing, where did you learn this? Or where can I learn
this?

Maric Michaud wrote:
Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
I have never used the urllib or the urllib2. I really have looked
online
>for help on this issue, and mailing lists, but I can't figure out my
problem because people haven't been helping me, which is why I am
here!
>:].
Okay, so basically I want to be able to submit a word to
dictionary.com
>and
then get the definitions. However, to start off learning urllib2, I
just
>want to do a simple google search. Before you get mad, what I have
found
>on
urllib2 hasn't helped me. Anyway, How would you go about doing this.
No,
>I
did not post the html, but I mean if you want, right click on your
browser
and hit view source of the google homepage. Basically what I want to
know
>is how to submit the values(the search term) and then search for that
value. Heres what I know:
>import urllib2
response = urllib2.urlopen("http://www.google.com/")
html = response.read()
print html
>Now I know that all this does is print the source, but thats about all
I
>know. I know it may be a lot to ask to have someone show/help me, but
I
>really would appreciate it.
This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :
>>>>[207]: import urllib, urllib2
You need to trick the server with an imaginary User-Agent.
>>>>[208]: def google_search(terms) :
return
urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
+
urllib.urlencode({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:
>>>>[212]: res = google_search("python & co")
Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :
>>>>[213]: import re
>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
class=r>.*?</h2>',
res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
...',
'Re: os x, panther, python &amp; co: msg#00041',
'Re: os x, panther, python &amp; co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees &lt; Programs &lt; Python &lt; Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python &amp; Co']
--
_____________
Maric Michaud
--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in
context:http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.


--
http://mail.python.org/mailman/listinfo/python-list
--
View this message in context: http://www.nabble.com/using-urllib2-...p18165634.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Jun 27 '08 #3

I have read that multiple times. It is hard to understand but it did help a
little. But I found a bit of a work-around for now which is not what I
ultimately want. However, even when I can get to the page I want lets say,
"Http://dictionary.reference.com/browse/cheese", I look on firebug, and
extension and see the definition in javascript,

<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1.</td>
<td valign="top">the curd of milk separated from the whey and prepared in
many ways as a food. </td>

the problem being that if I use code like this to get the html of that page
in python:

response = urllib2.urlopen("the webiste....")
html = response.read()
print html

then, I get a bunch of stuff, but it doesn't show me the code with the table
that the definition is in. So I am asking how do I access this javascript.
Also, if someone could point me to a better reference than the last one,
because that really doesn't tell me much, whether it be a book or anything.

Jeff McNeil-2 wrote:

I stumbled across this a while back:
http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gmail.comwrote:
>Okay, I tried to follow that, and it is kinda hard. But since you
obviously
know what you are doing, where did you learn this? Or where can I learn
this?

Maric Michaud wrote:
Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
I have never used the urllib or the urllib2. I really have looked
online
>for help on this issue, and mailing lists, but I can't figure out my
problem because people haven't been helping me, which is why I am
here!
>:].
Okay, so basically I want to be able to submit a word to
dictionary.com
>and
then get the definitions. However, to start off learning urllib2, I
just
>want to do a simple google search. Before you get mad, what I have
found
>on
urllib2 hasn't helped me. Anyway, How would you go about doing this.
No,
>I
did not post the html, but I mean if you want, right click on your
browser
and hit view source of the google homepage. Basically what I want to
know
>is how to submit the values(the search term) and then search for that
value. Heres what I know:
>import urllib2
response = urllib2.urlopen("http://www.google.com/")
html = response.read()
print html
>Now I know that all this does is print the source, but thats about all
I
>know. I know it may be a lot to ask to have someone show/help me, but
I
>really would appreciate it.
This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :
>>>>[207]: import urllib, urllib2
You need to trick the server with an imaginary User-Agent.
>>>>[208]: def google_search(terms) :
return
urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
+
urllib.urlencode({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:
>>>>[212]: res = google_search("python & co")
Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :
>>>>[213]: import re
>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
class=r>.*?</h2>',
res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
...',
'Re: os x, panther, python &amp; co: msg#00041',
'Re: os x, panther, python &amp; co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees &lt; Programs &lt; Python &lt; Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python &amp; Co']
--
_____________
Maric Michaud
--
http://mail.python.org/mailman/listinfo/python-list

--
View this message in
context:http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.


--
http://mail.python.org/mailman/listinfo/python-list
--
View this message in context: http://www.nabble.com/using-urllib2-...p18165692.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Andre Bocchini | last post by:
I'm having some trouble using proxy authentication. I can't figure out how to authenticate with a Squid proxy. I know for a fact the proxy is using Basic instead of Digest for the authentication....
4
by: bmiras | last post by:
I've got a problem using urllib2 to get a web page. I'm going through a proxy using user/password authentification and i'm trying to get a page asking for a HTTP authentification. And I'm using...
2
by: John F Dutcher | last post by:
Can anyone comment on why the code shown in the Python error is in some way incorrect...or is there a problem with Python on my hoster's site ?? The highlites don't seem to show here...but line...
0
by: jacob c. | last post by:
When I request a URL using urllib2, it appears that urllib2 always makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use the "If-None-Match"/"ETag" HTTP headers to conserve...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
11
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source...
0
by: Phoe6 | last post by:
Hi All, I am able to use urlib2 through proxy. I give proxy credentials and use # Set the Proxy Address proxy_ip = "10.0.1.1:80" proxy_user = 'senthil_or' proxy_password_orig='password'
6
by: Jack | last post by:
I'm trying to use a proxy server with urllib2. So I have managed to get it to work by setting the environment variable: export HTTP_PROXY=127.0.0.1:8081 But I wanted to set it from the code....
1
by: Magnus.Moraberg | last post by:
Hi, I have the following code - import urllib2 from BeautifulSoup import BeautifulSoup proxy_support = urllib2.ProxyHandler({"http":"http:// 999.999.999.999:8080"}) opener =...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...
0
by: F22F35 | last post by:
I am a newbie to Access (most programming for that matter). I need help in creating an Access database that keeps the history of each user in a database. For example, a user might have lesson 1 sent...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.