By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,491 Members | 3,132 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,491 IT Pros & Developers. It's quick & easy.

Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

P: n/a
Below are 2 files that isolate the problem. Note, both programs hang
(stop responding) with hyper-threading turned on (a BIOS setting), but
work as expected with hyper-threading turned off.

Note, the Windows task manager shows 2 CPUs on the Performance tab with
hyper-threading is turned on.

Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
problem.
The operating system is MS Windows XP Professional.

winmsd.exe shows:
2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
Version: 5.1.2600 Service Pack 2 Build 2600

Could someone with a hyper-threading (or dual/multicore) CPU please
confirm this bug?

Many Thanks

Olaf
# testsleep.py
import threading
import time

class Task(threading.Thread):
def __init__(self, n, t):
threading.Thread.__init__(self)
self.n = n
self.t = t
def run(self):
print 'thread %d started' % self.n
print 'sleep time:', self.t
print time.clock()
print time.clock()
print time.clock()
print
count = 0
printCount = int(10 / self.t)
while True:
start = time.clock()
time.sleep(self.t)
stop = time.clock()
if stop - start > 1.0:
print 'thread', self.n, stop - start

count += 1
if count > printCount:
count = 0
print self.n,

def test():
thread1 = Task(1, 0.01)
thread2 = Task(2, 0.003)
thread1.start()
thread2.start()

test()

------------------------------------------------------------------------

# testsleep2.py
import thread
import time
import sys

def run(n, t):
print 'thread %d started' % n
print 'sleep time:', t
print time.clock()
print time.clock()
print time.clock()
print
count = 0
printCount = int(10 / t)
while True:
start = time.clock()
time.sleep(t)
stop = time.clock()
if stop - start > 1.0:
print 'thread', n, stop - start

count += 1
if count > printCount:
count = 0
print n,

def test():
thread.start_new_thread(run, (1, 0.01))
thread.start_new_thread(run, (2, 0.003))

# Wait until the user presses the enter key.
sys.stdin.read(1)
test()

May 3 '06 #1
Share this Question
Share on Google+
17 Replies


P: n/a
Ol********@gmail.com wrote:
Below are 2 files that isolate the problem. Note, both programs hang
(stop responding) with hyper-threading turned on (a BIOS setting), but
work as expected with hyper-threading turned off.


What do you mean "stop responding"? Not responding when you press
ctrl-c? They stop printing? If you mean stop printing, try
sys.stdout.flush() after each print

May 3 '06 #2

P: n/a
[Ol********@gmail.com]
Below are 2 files that isolate the problem. Note, both programs hang
(stop responding)
What does "stop responding" mean?
with hyper-threading turned on (a BIOS setting), but
work as expected with hyper-threading turned off.

Note, the Windows task manager shows 2 CPUs on the Performance tab with
hyper-threading is turned on.

Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
problem.
The operating system is MS Windows XP Professional.

winmsd.exe shows:
2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
Version: 5.1.2600 Service Pack 2 Build 2600

Could someone with a hyper-threading (or dual/multicore) CPU please
confirm this bug?
I don't see anything unexpected (to me) on a WinXP Pro SP2 box with HT
enabled, using 2.4.3, but I'm not sure what "not responding" means to
you.
# testsleep.py
import threading
import time

class Task(threading.Thread):
def __init__(self, n, t):
threading.Thread.__init__(self)
self.n = n
self.t = t
def run(self):
print 'thread %d started' % self.n
print 'sleep time:', self.t
print time.clock()
print time.clock()
print time.clock()
print
count = 0
printCount = int(10 / self.t)
while True:
start = time.clock()
time.sleep(self.t)
stop = time.clock()
if stop - start > 1.0:
print 'thread', self.n, stop - start
You don't _expect_ this print to execute, do you? It should trigger
very rarely (if ever).
count += 1
if count > printCount:
count = 0
print self.n,

def test():
thread1 = Task(1, 0.01)
thread2 = Task(2, 0.003)
thread1.start()
thread2.start()

test()
This is what happened when I ran it, until I got tired of watching it
and killed the job:

C:\Python24>python test1.py
thread 1 started
sleep time: 0.01
7.8499538238e-007
4.42770924877e-005
8.62618455186e-005

thread 2 started
sleep time: 0.003
0.000479349533238
0.000521904282916
0.000563649037359

1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1
# testsleep2.py
import thread
import time
import sys

def run(n, t):
print 'thread %d started' % n
print 'sleep time:', t
print time.clock()
print time.clock()
print time.clock()
print
count = 0
printCount = int(10 / t)
while True:
start = time.clock()
time.sleep(t)
stop = time.clock()
if stop - start > 1.0:
print 'thread', n, stop - start
See above.
count += 1
if count > printCount:
count = 0
print n,

def test():
thread.start_new_thread(run, (1, 0.01))
thread.start_new_thread(run, (2, 0.003))

# Wait until the user presses the enter key.
sys.stdin.read(1)
Do you want someone running this test to hit the ENTER key, or not?

test()


Works much the same for me, except I got bored quicker ;-):

C:\Python24>python test2..py
thread 1 started
sleep time: 0.01
1.14999323533e-006
8.01271757225e-005
thread 2 started
sleep time: 0.003
0.000395865318439
0.000474259857295
0.000559831706872

0.00071061346698

1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1

At that point I hit the ENTER key, and saw:

Unhandled exception in thread started by
Error in sys.excepthook:

Original exception was:
Unhandled exception in thread started by
Error in sys.excepthook:

Original exception was:
That's unfortunate, but not unexpected either. The interpreter
doesn't wait for a `thread` module thread to finish before tearing
itself down, so the threads keep running after Python has torn so much
of itself down that weird execptions occur -- and Python is _so_ torn
down by that point it can't even display a useful error message. The
interpreter does (by default) wait for `threading` module threads to
exit before tearing itself down, so those kinds of useless exit
messages weren't expected in the first test.
May 3 '06 #3

P: n/a
> What do you mean "stop responding"?

Both threads print their thread numbers (either 1 or 2) approximately
every 10 seconds. However, after a while (minutes to hours) both
programs (see above) hang!

Pressing ctrl-c (after the printing stops) causes the threads to "wake
up" from their sleep statement. And since the sleep took more than 1
seconds the thread number and the duration of the sleep is printed to
the screen.

Do you have a hyper-threading/dual/multi core CPU? Did you try this?

Olaf

May 3 '06 #4

P: n/a
Time
1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1


This is exactly what you should see. The problem I see is that after a
while (minutes to hours) the printing of 1s and 2s stops! If you press
ctrl-c at that point the threads will print how many seconds they were
stuck in the sleep statements (proving that the threads were stuck in
the sleep statement until you pressed ctrl-c).

Could you please try again (perhaps let it run overnight)? Many, many
thanks.

Olaf

May 3 '06 #5

P: n/a
Tim
Do you want someone running this test to hit the ENTER key, or not?


The purpose of the "sys.stdin.read(1)" statement is simply to prevent
the main thread from exiting and thus ending the test. And yes, I also
get an exception when I press the enter key.

Olaf

May 3 '06 #6

P: n/a

Ol********@gmail.com wrote:
What do you mean "stop responding"?


Both threads print their thread numbers (either 1 or 2) approximately
every 10 seconds. However, after a while (minutes to hours) both
programs (see above) hang!

Pressing ctrl-c (after the printing stops) causes the threads to "wake
up" from their sleep statement. And since the sleep took more than 1
seconds the thread number and the duration of the sleep is printed to
the screen.

Do you have a hyper-threading/dual/multi core CPU? Did you try this?


I don't have such CPU but I run the first program anyway. It printed

C:\py>th.py
thread 1 started
sleep time: 0.01
3.63174649292e-006
8.43682646817e-005
0.000164825417756

thread 2 started
sleep time: 0.003
0.000675225482568
0.000753447714724
0.00082943502596

1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1

I got bored and tried to stop it with ctrl-c but it didn't respond and
kept running and printing the numbers. I had to kill it from task
manager.

May 3 '06 #7

P: n/a
Serge
I got bored and tried to stop it with ctrl-c ...


Yes, you have to use the ctrl-break key to stop the first program. And
neither program every hangs on a single core CPU. It also does not
hang on a hyper-threading CPU if hyper-threading is turned off in the
BIOS.

Olaf

May 3 '06 #8

P: n/a
>> What do you mean "stop responding"?

[Ol********@gmail.com]
Both threads print their thread numbers (either 1 or 2) approximately
every 10 seconds. However, after a while (minutes to hours) both
programs (see above) hang!
Where "hang" means they stop printing.
Pressing ctrl-c (after the printing stops) causes the threads to "wake
up" from their sleep statement. And since the sleep took more than 1
seconds the thread number and the duration of the sleep is printed to
the screen.

Do you have a hyper-threading/dual/multi core CPU? Did you try this?


I was using a 3.4 GHz Pentium 4 (single core) with hyper-threading
enabled. I didn't run it for hours ;-)

But supposing I do and see a hang, it's unlikely that will have
anything to do with Python. On Windows, time.sleep() called from any
thread other than the main thread just calls the Win32 API Sleep()
function, after converting the argument to milliseconds. So it may be
more fruitful to recode your test program in C (if there is a bug
here, it's most likely in Microsoft's implementation of its Sleep()
function).
May 3 '06 #9

P: n/a
Tim
I didn't run it for hours ;-)
Please try.

The sleep statement does not return! And this should not happen. The
code above does nothing special or unusual. The problem only occurs if
2 threads use the sleep statement and hyper-threading is enabled.

We discovered this bug perhaps a year ago. The only solution was to
tell our customers to disable hyper-threading (you can imagine they did
not like our "solution" very much). It then took many days of hard
work to isolate the problem down to the code I posted above.
Where "hang" means they stop printing.


Our Python code just stops running (locking up the entire application).
We use Qt for our GUI. We have over a hundred .py files. We use a
total of 4 threads (they get created once and stay running). One
thread uses sockets.

Once the application locks up (getting stuck in a sleep statement) all
comes back to live if I pull the network cable out. So perhaps the
socket thread returns from the sleep statement and other threads return
to live because they were waiting for the socket thread.

Our software runs on both Windows and Linux. We are not sure if the
problem also happens on Linux.

In any case, if someone else can confirm the bug then this is a serious
problem meriting further investigation.

We have searched the Internet far and wide and were not able to find
any information that indicates that someone else has reported a similar
problem (neither Python nor Windows Sleep related).

Thank you for your help.

Olaf

May 4 '06 #10

P: n/a
On 2006-05-04, Ol********@gmail.com <Ol********@gmail.com> wrote:
Please try.

The sleep statement does not return!
Never, or does it just take a long time?
And this should not happen.
Dude, it's MS Windows.

It does all _sorts_ of stuff that it shouldn't.

Having sleep() take orders of magnitude longer than it should
is not an uncommon complaint for MS Windows users. There was a
fairly extensive thread in this group about that problem just a
few weeks ago. IIRC, disabling some other program or service
fixed it for one MS victem.
The code above does nothing special or unusual. The problem
only occurs if 2 threads use the sleep statement and
hyper-threading is enabled.


That part is new. I'm pretty sure other people who complained
about sleep() not returning in a reasonable amount of time saw
it all of the time.

--
Grant Edwards grante Yow! I was giving HAIR
at CUTS to th' SAUCER PEOPLE
visi.com ... I'm CLEAN!!
May 4 '06 #11

P: n/a
Grant
Having sleep() take orders of magnitude longer than it should


I seen a few times where sleep returns after some seconds or even after
tens of seconds (my code above check for that). But most of the time
it gets stuck forever.

Olaf

May 4 '06 #12

P: n/a
In <12*************@corp.supernews.com>, Grant Edwards wrote:
On 2006-05-04, Ol********@gmail.com <Ol********@gmail.com> wrote:
The sleep statement does not return!


Never, or does it just take a long time?
And this should not happen.


Dude, it's MS Windows.

It does all _sorts_ of stuff that it shouldn't.

Having sleep() take orders of magnitude longer than it should
is not an uncommon complaint for MS Windows users. There was a
fairly extensive thread in this group about that problem just a
few weeks ago. IIRC, disabling some other program or service
fixed it for one MS victem.


IIRC it was something like an NTP daemon that caused the clock to "jump"
a little and (Window's) sleep was confused.

Ciao,
Marc 'BlackJack' Rintsch
May 4 '06 #13

P: n/a
Marc
IIRC it was something like an NTP daemon that caused the clock to "jump" a little and (Window's) sleep was confused.


The problem is not a "jump" but a permanet lockup of the sleep
statement.

To all others, is there nobody out there that could run the test code
at the beginning of this post on a hyper-threading CPU?

Olaf

May 5 '06 #14

P: n/a
[Tim Peters]
I didn't run it for hours ;-)

[Ol********@gmail.com] Please try.
OK, I let the first test program run for over 24 hours by now. It
never hung. Overnight, the box did go into sleep mode, but the test
woke itself up after sleep mode ended, and the threads reported they
were sleeping for about 8 hours then.

I did this under a debug build of Python, so that if it did hang I'd
have _some_chance of getting useful info from the VC 7.1 debugger.
It's possible (but unlikely) that using a debug-build Python prevented
a hang that would have occurred had I used a release-build Python.
The sleep statement does not return! And this should not happen. The
code above does nothing special or unusual. The problem only occurs if
2 threads use the sleep statement and hyper-threading is enabled.

We discovered this bug perhaps a year ago. The only solution was to
tell our customers to disable hyper-threading (you can imagine they did
not like our "solution" very much). It then took many days of hard
work to isolate the problem down to the code I posted above.
As before, since Python merely calls the Win32 API Sleep() function,
it's extremely unlikely that the problem is due to Python.

It's quite possible that the problem is due to a tiny timing hole in
MS's implementation of Sleep(). Since I don't have the source code
for that, and disassembling is prohibited by my license ;-), I can't
pursue that.

I've seen software with honest-to-God thread-race bugs that were never
reported across years of heavy production use, until a user tried the
code on a hyper-threaded or multi-core box. Tiny timing holes can be
_extremely_ shy, and running on a box with true, or even just
finer-grained (like HT), concurrency can make them much more likely to
appear. I've never seen a software failure with (eventually) known
cause occur on an HT box that could not have happened on a non-HT box.
The statistics of extremely unlikely events aren't a natural fit to
the unabused human mind ;-)
...
Once the application locks up (getting stuck in a sleep statement) all
comes back to live if I pull the network cable out. So perhaps the
socket thread returns from the sleep statement and other threads return
to live because they were waiting for the socket thread.
That's peculiar. time.sleep() called from a thread other than the
main thread on Windows is non-interruptable (because the Windows
Sleep() function is non-interruptable). time.sleep() called from the
main thread on Windows _is_ interruptable: the main thread uses the
Win32 API WaitForSingleObject() instead, passing a handle to a custom
interrupt event; the _intent_ is so that a time.sleep() in the main
thread can be aborted by a keyboard interrupt. But it shouldn't be
possible for anything other than a keyboard interrupt or a normal
(timeout) return to knock that loose.

So if unplugging the cable knocks things lose, that just points even
stronger at a bug in the Windows kernel.
Our software runs on both Windows and Linux. We are not sure if the
problem also happens on Linux.
Well, I ran your test for a day on my Windows box -- you try running
it for a week on your Linux box ;-)
...
We have searched the Internet far and wide and were not able to find
any information that indicates that someone else has reported a similar
problem (neither Python nor Windows Sleep related).


I'm pretty sure I would have known about it if anyone reported such a
Python bug in the last 15 years. But this is the first time I've
heard it. I don't keep up with Microsoft bug reports at all.
May 5 '06 #15

P: n/a
In <11********************@j33g2000cwa.googlegroups.c om>, OlafMeding
wrote:
IIRC it was something like an NTP daemon that caused the clock to
"jump" a little and (Window's) sleep was confused.


The problem is not a "jump" but a permanet lockup of the sleep statement.


The "jump" of the system clock might confuse the systems `sleep`
implementation.

Ciao,
Marc 'BlackJack' Rintsch
May 5 '06 #16

P: n/a
On 2006-05-05, Marc 'BlackJack' Rintsch <bj****@gmx.net> wrote:
In <11********************@j33g2000cwa.googlegroups.c om>, OlafMeding
wrote:
IIRC it was something like an NTP daemon that caused the clock
to "jump" a little and (Window's) sleep was confused.


The problem is not a "jump" but a permanet lockup of the sleep
statement.


The "jump" of the system clock might confuse the systems
`sleep` implementation.


I wouldn't be a bit surprised if MS write code to wake up a
task when the currentTime == task.wakeupTime instead of when
currentTime >= task.wakeupTime. If the system time jumps past
the "wakeup" point without ever passing through it, the task
sleeps forever.

Microsoft's bungling of simple things like timer ticks is
legend in some circles:

http://www.embedded.com/showArticle....cleID=57703680
http://www.embedded.com/showArticle....leID=159902113

--
Grant Edwards grante Yow! I'm young... I'm
at HEALTHY... I can HIKE
visi.com THRU CAPT GROGAN'S LUMBAR
REGIONS!
May 5 '06 #17

P: n/a
Tim
I did this under a debug build of Python


Perhaps this is the reason why you were not able to reproduce the
problem. Could you try again with a standard build of Python?

I am a bit surprised that nobody else has tried running the short
Python program above on a hyper-threading or dual core / dual processor
system.

Olaf

May 8 '06 #18

This discussion thread is closed

Replies have been disabled for this discussion.