473,434 Members | 1,818 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,434 software developers and data experts.

Non-blocking connect BLOCKS

jtd
Hi all,

I'm using asyncore to download a large list of web pages, and I've
noticed dispatcher.connect blocks for some hosts. I was under the
impression that non-blocking sockets do not block on connects, in
addition to reads and writes. My connect code is essentially the same
as the asyncore example:

http://docs.python.org/lib/asyncore-example.html

It seems unlikely that I am the first to encounter this problem, can
someone explain what's wrong and suggest a remedy?

Rob
Jul 18 '05 #1
2 2317
> I'm using asyncore to download a large list of web pages, and I've
noticed dispatcher.connect blocks for some hosts. I was under the
impression that non-blocking sockets do not block on connects, in
addition to reads and writes. My connect code is essentially the same
as the asyncore example:

http://docs.python.org/lib/asyncore-example.html

It seems unlikely that I am the first to encounter this problem, can
someone explain what's wrong and suggest a remedy?


Most likely the connect call is doing a DNS lookup, which means your execution
pauses while some other (non-Python) code goes and talks to the DNS server. For
many hosts the lookup will be fast (or even already cached locally, depending
on how your OS is configured), but for others the lookup may require checking
with an upstream DNS server (and in the worst case it'll involve several
upstream queries for a lookup that ultimately fails).

You can eliminate the delay by only passing in IP addresses to connect (it'll
notice that they are IP addresses rather than hostnames, and skip the DNS
lookup). The problem of course is that you need to then somehow get the DNS
addresses yourself. Maintaining a cache of resolved hostnames is a quick hack
to reduce the number of lookups, but it doesn't eliminate them. The only
alternative is to talk to the DNS server yourself - using asyncore of course so
that other connections don't block. IIRC there is some Python code for
creating/unpacking DNS packets and at one time it was even included in the
Python install (like in the Demo folder or something).

If you can find a third-party asynchronous DNS lookup library then that might
be the way to go - the above approach can get really messy (lots of details to
manage), but it also works and completely solves the problem, so basically you
have to decide how badly this problem hurts you. If you do go this route,
here's a few hints:

- on Windows you can semi-reliably detect the DNS servers by parsing the output
of 'ipconfig /all' and on Linux you can usually parse /etc/resolve.conf.

- you might also want to parse and honor the values in the /etc/hosts file
(LMHOSTS on Windows)

- you can of course skip the lookup of the hostname 'localhost'

- it might be helpful to cache both the queries that succeed and the ones that
fail, depending on your application, as failed lookups can be really slow.

- I use a simple class to track cached entries:

class AgingMap:
def __init__(self):
self.dict = {}

def Get(self, key):
try:
expTime, val = self.dict[key]
if expTime is not None and time.time() >= expTime:
val = None
except KeyError:
# Not found
val = None
return val

def Set(self, key, value, ttlSec):
expires = ttlSec
if ttlSec is not None:
expires = ttlSec + time.time()
self.dict[key] = (expires, value)

This of course grows without bounds but most of the time I don't really care.
You could add a cleanup() method or something that gets called every once in
awhile from your main event loop. A ttlSec value of None indicates a
non-expiring entry, e.g. due to an entry from the hosts file.

- IIRC you actually get a time-to-live (TTL) value for each IP returned by the
DNS, but to simplify things I usually store them all using the minimum TTL
value from the whole set (makes the cache simpler).

HTH,
-Dave
Jul 18 '05 #2
jtd
"Dave Brueck" <da**@pythonapocrypha.com> wrote in message news:<ma*************************************@pyth on.org>...
I'm using asyncore to download a large list of web pages, and I've
noticed dispatcher.connect blocks for some hosts. I was under the
impression that non-blocking sockets do not block on connects, in
addition to reads and writes. My connect code is essentially the same
as the asyncore example:

http://docs.python.org/lib/asyncore-example.html

It seems unlikely that I am the first to encounter this problem, can
someone explain what's wrong and suggest a remedy?


Most likely the connect call is doing a DNS lookup, which means your execution
pauses while some other (non-Python) code goes and talks to the DNS server.


Thank you, that was exactly the problem. Instead of messing around
with DNS lookup protocols, I rewrote the program using threads. Now
the lookups take O(1) rather than O(number of threads/user threads)
time :)
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: lothar | last post by:
re: 4.2.1 Regular Expression Syntax http://docs.python.org/lib/re-syntax.html *?, +?, ?? Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few...
5
by: klaus triendl | last post by:
hi, recently i discovered a memory leak in our code; after some investigation i could reduce it to the following problem: return objects of functions are handled as temporary objects, hence...
25
by: Yves Glodt | last post by:
Hello, if I do this: for row in sqlsth: ________pkcolumns.append(row.strip()) ________etc without a prior:
32
by: Adrian Herscu | last post by:
Hi all, In which circumstances it is appropriate to declare methods as non-virtual? Thanx, Adrian.
8
by: Bern McCarty | last post by:
Is it at all possible to leverage mixed-mode assemblies from AppDomains other than the default AppDomain? Is there any means at all of doing this? Mixed-mode is incredibly convenient, but if I...
14
by: Patrick Kowalzick | last post by:
Dear all, I have an existing piece of code with a struct with some PODs. struct A { int x; int y; };
11
by: ypjofficial | last post by:
Hello All, So far I have been reading that in case of a polymorphic class ( having at least one virtual function in it), the virtual function call get resolved at run time and during that the...
2
by: Ian825 | last post by:
I need help writing a function for a program that is based upon the various operations of a matrix and I keep getting a "non-aggregate type" error. My guess is that I need to dereference my...
0
by: amitvps | last post by:
Secure Socket Layer is very important and useful for any web application but it brings some problems too with itself. Handling navigation between secure and non-secure pages is one of the cumbersome...
399
by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:
PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.