473,473 Members | 1,492 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

how to start thread by group?

my code is not right, can sb give me a hand? thanx

for example, I have 1000 urls to be downloaded, but only 5 thread at one time
def threadTask(ulr):
download(url)

threadsAll=[]
for url in all_url:
task=threading.Thread(target=threadTask, args=[url])
threadsAll.append(task)

for every5task in groupcount(threadsAll,5):
for everytask in every5task:
everytask.start()

for everytask in every5task:
everytask.join()

for everytask in every5task: #this does not run ok
while everytask.isAlive():
pass
Oct 6 '08 #1
7 2418
En Mon, 06 Oct 2008 11:24:51 -0300, <bi******@gmail.comescribió:
On 6 Ott, 15:24, oyster <lepto.pyt...@gmail.comwrote:
>my code is not right, can sb give me a hand? thanx

for example, I have 1000 urls to be downloaded, but only 5 thread at
one time
I would restructure my code with someting like this ( WARNING: the
following code is
ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
express my idea of
the algorithm (which, also, could be wrong:-) ):
Your code creates one thread per url (but never more than MAX_THREADS
alive at the same time). Usually it's more efficient to create all the
MAX_THREADS at once, and continuously feed them with tasks to be done. A
Queue object is the way to synchronize them; from the documentation:

<code>
from Queue import Queue
from threading import Thread

num_worker_threads = 3
list_of_urls = ["http://foo.com", "http://bar.com",
"http://baz.com", "http://spam.com",
"http://egg.com",
]

def do_work(url):
from time import sleep
from random import randrange
from threading import currentThread
print "%s downloading %s" % (currentThread().getName(), url)
sleep(randrange(5))
print "%s done" % currentThread().getName()

# from this point on, copied almost verbatim from the Queue example
# at the end of http://docs.python.org/library/queue.html

def worker():
while True:
item = q.get()
do_work(item)
q.task_done()

q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.setDaemon(True)
t.start()

for item in list_of_urls:
q.put(item)

q.join() # block until all tasks are done
print "Finished"
</code>
--
Gabriel Genellina

Oct 7 '08 #2
In message <ma**************************************@python.o rg>, Gabriel
Genellina wrote:
Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.
Given that the bottleneck is most likely to be the internet connection, I'd
say the "premature optimization is the root of all evil" adage applies
here.
Oct 7 '08 #3
Lawrence D'Oliveiro wrote:
In message <ma**************************************@python.o rg>, Gabriel
Genellina wrote:
>Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.

Given that the bottleneck is most likely to be the internet connection, I'd
say the "premature optimization is the root of all evil" adage applies
here.
There is also the bottleneck of programmer time to understand, write,
and maintain. In this case, 'more efficient' is simpler, and to me,
more efficient of programmer time. Feeding a fixed pool of worker
threads with a Queue() is a standard design that is easy to understand
and one the OP should learn. Re-using tested code is certainly
efficient of programmer time. Managing a variable pool of workers that
die and need to be replaced is more complex (two loops nested within a
loop) and error prone (though learning that alternative is probably not
a bad idea also).

tjr

Oct 7 '08 #4
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <tj*****@udel.edu>
escribió:
Lawrence D'Oliveiro wrote:
>In message <ma**************************************@python.o rg>,
Gabriel Genellina wrote:
>>Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.
Given that the bottleneck is most likely to be the internet
connection, I'd
say the "premature optimization is the root of all evil" adage applies
here.

There is also the bottleneck of programmer time to understand, write,
and maintain. In this case, 'more efficient' is simpler, and to me,
more efficient of programmer time. Feeding a fixed pool of worker
threads with a Queue() is a standard design that is easy to understand
and one the OP should learn. Re-using tested code is certainly
efficient of programmer time. Managing a variable pool of workers that
die and need to be replaced is more complex (two loops nested within a
loop) and error prone (though learning that alternative is probably not
a bad idea also).
I'd like to add that debugging a program that continuously creates and
destroys threads is a real PITA.

--
Gabriel Genellina

Oct 7 '08 #5
On 7 Ott, 06:37, "Gabriel Genellina" <gagsl-...@yahoo.com.arwrote:
En Mon, 06 Oct 2008 11:24:51 -0300, <bieff...@gmail.comescribió:
On 6 Ott, 15:24, oyster <lepto.pyt...@gmail.comwrote:
my code is not right, can sb give me a hand? thanx
for example, I have 1000 urls to be downloaded, but only 5 thread at *
one time
I would restructure my code with someting like this ( WARNING: the
following code is
ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
express my idea of
the algorithm (which, also, could be wrong:-) ):

Your code creates one thread per url (but never more than MAX_THREADS *
alive at the same time). Usually it's more efficient to create all the *
MAX_THREADS at once, and continuously feed them with tasks to be done. A *
Queue object is the way to synchronize them; from the documentation:

<code>
*from Queue import Queue
*from threading import Thread

num_worker_threads = 3
list_of_urls = ["http://foo.com", "http://bar.com",
* * * * * * * * *"http://baz.com", "http://spam.com",
* * * * * * * * *"http://egg.com",
* * * * * * * * ]

def do_work(url):
* * *from time import sleep
* * *from random import randrange
* * *from threading import currentThread
* * *print "%s downloading %s" % (currentThread().getName(), url)
* * *sleep(randrange(5))
* * *print "%s done" % currentThread().getName()

# from this point on, copied almost verbatim from the Queue example
# at the end ofhttp://docs.python.org/library/queue.html

def worker():
* * *while True:
* * * * *item = q.get()
* * * * *do_work(item)
* * * * *q.task_done()

q = Queue()
for i in range(num_worker_threads):
* * * t = Thread(target=worker)
* * * t.setDaemon(True)
* * * t.start()

for item in list_of_urls:
* * *q.put(item)

q.join() * * * # block until all tasks are done
print "Finished"
</code>

--
Gabriel Genellina

Agreed.
I was trying to do what the OP was trying to do, but in a way that
works.
But keeping the thread alive and feeding them the URL is a better
design, definitly.
And no, I don't think its 'premature optimization': it is just
cleaner.

Ciao
------
FB
Oct 8 '08 #6
In message <ma**************************************@python.o rg>, Gabriel
Genellina wrote:
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <tj*****@udel.edu>
escribió:
>Lawrence D'Oliveiro wrote:
>>In message <ma**************************************@python.o rg>,
Gabriel Genellina wrote:

Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.

Given that the bottleneck is most likely to be the internet
connection, I'd say the "premature optimization is the root of all evil"
adage applies here.

Feeding a fixed pool of worker threads with a Queue() is a standard
design that is easy to understand and one the OP should learn. Re-using
tested code is certainly efficient of programmer time.

I'd like to add that debugging a program that continuously creates and
destroys threads is a real PITA.
That's God trying to tell you to avoid threads altogether.
Oct 13 '08 #7
On Oct 13, 6:54 am, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealandwrote:
In message <mailman.2151.1223416240.3487.python-l...@python.org>, Gabriel

Genellina wrote:
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <tjre...@udel.edu>
escribió:
Lawrence D'Oliveiro wrote:
>In message <mailman.2088.1223354239.3487.python-l...@python.org>,
Gabriel Genellina wrote:
>>Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.
> Given that the bottleneck is most likely to be the internet
connection, I'd say the "premature optimization is the root of all evil"
adage applies here.
Feeding a fixed pool of worker threads with a Queue() is a standard
design that is easy to understand and one the OP should learn. Re-using
tested code is certainly efficient of programmer time.
I'd like to add that debugging a program that continuously creates and
destroys threads is a real PITA.

That's God trying to tell you to avoid threads altogether.
Especially in a case like this that's tailor made for a trivial state-
machine solution if you really want multiple connections.
Oct 13 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: VMI | last post by:
In the following code, how can I display the MessageBox once MyThread has finished executing? updateFiles method copies 5 humongous files to the HDD but for some reason, the MessageBox is displayed...
16
by: Serdar Kalaycý | last post by:
Hi everybody, My problem seems a bit clichè but I could not work around. Well I read lots of MSDN papers and discussions, but my problem is a bit different from them. When I tried to run the...
12
by: Crirus | last post by:
Hi! I havea control that display messages... I added a timer to it Every time a noew mesaage should be displayed I do the following: Private sub ShowMsg(message as string) Me.lblStatus.Text...
3
by: yxq | last post by:
Hello, I start an Internet Explorer process with appointed window title(i.e. TEST), and wait for the user to abort it. the code below engross 100% CPU resource, how ameliorate the code? Thank...
37
by: ales | last post by:
Hello, I have a problem with creation of new thread. The method .Start() of newly created thread delays current thread for 0 - 1 second. Cpu while delay occurs is about 5%. Any idea? Here...
5
by: Galen Somerville | last post by:
Tooo many examples. I'm confused. VB6 ActiveX.dll with no windows or controls. Interfaces with a device that returns an x size byte array every x milliseconds for x counts. Sends a message to...
4
by: =?Utf-8?B?UGhpbA==?= | last post by:
I have a dll that I call to start a thread that will monitor a serial port and then process and store the data received from that port. On most every computer I have run this on, the program...
3
by: =?Utf-8?B?UmF5?= | last post by:
Hello. I have a c# service with four threads. Sometimes one of them just don't start, I don't know why. It happened with any of the four threads, and just in production environment, not in...
9
by: =?Utf-8?B?anVhbg==?= | last post by:
How can I do to start a thread. I tried everything... but it had no work. Any solution? It is a Sub working with a Timer control. I want to do two tasks at the same time (more or less for a...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.