473,801 Members | 2,227 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Thread-safety of dict

It seems to be a commonly held belief that basic dict operations (get,
set, del) are atomic. However, since I know searching the hash table
is a multistep process, I thought I'd check it out for sure.

First, my assumption is that one thread is attempting to get a single
key, while other threads are getting, setting, and deleting different
keys. Obviously you could fail anyway if they were modifying the same
key.

The avenue of attack I'm interested in is when lookdict() calls
PyObject_RichCo mpareBool(). If this calls out to Python code then the
GIL can/will be released. Many builtin types won't release the GIL
though, notably str, so you're safe if they're all you use.

Assuming your dict did get modified, lookdict() then has some checks
to protect itself (which it does by restarting the search). It checks
if mp->ma_table got changed and then it checks if ep->me_key got
changed. So to attack it we need to: 1) have the same table address
it had before, 2) keep the key it was checking when the change
happened the same, 3) cause your target key to move to an entry it
already searched.

Getting the target key to move entries without deleting it is hard.
The only time that happens is when the dict gets resized. It's much
easier to strip the dummies from ma_table when copying from another
table, and since it's normally going to a different size it makes sure
it always has a copy. That means the table address will always change
(violating requirement #1), but there's two catches.

First, ma_smalltable always has the same address. If we could get it
to resize while staying in ma_smalltable we'd have succeeded.
However, various things come together to make that impossible:
* PyDict_MINSIZE is 8
* dictresize() uses a loop that increases newsize (which defaults to
PyDict_MINSIZE) so long as newsize <= minused
* dictresize() is only called by PyDict_SetItem after a new key is
added, but since our target key must have been there the whole time,
it will be (at least) the 2nd key in the dict
* PyDict_SetItem multiplies the number of keys by 4 before passing it
to dictresize(), meaning our 2 becomes 8. However, that's big enough
for dictresize()'s loop to go to the next-bigger size (it's just on
the limit), so we won't get ma_smalltable after all.
It's fragile and undocumented, and it's quite possible future versions
will break it, but for now this part is thread-safe.

The second catch is not thread-safe however. dictresize() always
allocates a new table. However, if you call dictresize() *twice*, it
could get the original table back again. This allows us to meet the 3
conditions I outlined, letting the attack succeed.

So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.

--
Adam Olsen, aka Rhamphoryncus
Jun 1 '07 #1
6 9761
So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.
That could be fixed by adding a generation counter to the dictionary,
right? Then an adversary would have to arrange for the generation
counter to roll over for lookdict to not notice that the
dictionary was modified.

Regards,
Martin
Jun 1 '07 #2
So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.
That could be fixed by adding a generation counter to the dictionary,
right? Then an adversary would have to arrange for the generation
counter to roll over for lookdict to not notice that the
dictionary was modified.

Regards,
Martin
Jun 1 '07 #3
"Adam Olsen" <rh****@gmail.c omwrote:
So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.
Nice work.

It would be an interesting exercise to demonstrate this in practice, and I
think it should be possible without resorting to threads (by putting
something to simulate what the other thread would do into the __cmp__
method).

I don't understand your reasoning which says it cannot stay in
ma_smalltable: PyDict_SetItem only calls dictresize when at least 2/3 of
the slots are filled. You can have 5 items in the small (8 slot) table and
the dictionary will resize to 32 slots on adding the 6th,the next resize
comes when you add the 22nd item.
Jun 1 '07 #4
On 6/1/07, "Martin v. Löwis" <ma****@v.loewi s.dewrote:
So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.

That could be fixed by adding a generation counter to the dictionary,
right? Then an adversary would have to arrange for the generation
counter to roll over for lookdict to not notice that the
dictionary was modified.
Yup. Although it'd still be technically possible to roll over the
counter, it's much easier to say how insanely unlikely it is.
Incidentally, only resizing should increment the counter; it's not
necessary to restart lookups for other accesses (and would harm
performance, as well as producing infinite loops in __cmp__ methods
that read from their containing dict.)

It occurs to me now that getting the original ma_table back could do
worse than just a failed lookup: if the size is smaller than before it
would lead to memory corruption and segfaults. That could be fixed
with with a before/after check of ma_mask.

And if you're *really* feeling paranoid you could add reference
counting to ma_table. I doubt anybody cares quite that much though.
;)

--
Adam Olsen, aka Rhamphoryncus
Jun 1 '07 #5
On Jun 1, 3:51 am, Duncan Booth <duncan.bo...@i nvalid.invalidw rote:
"Adam Olsen" <rha...@gmail.c omwrote:
So there you have it: if you're using a dict with custom classes (or
anything other than str) across multiple threads, and without locking
it, it's possible (though presumably extremely rare) for a lookup to
fail even through the key was there the entire time.

Nice work.

It would be an interesting exercise to demonstrate this in practice, and I
think it should be possible without resorting to threads (by putting
something to simulate what the other thread would do into the __cmp__
method).
I had attempted to do so, but I lost interest when I realized I'd have
to manipulate the memory allocator at the same time. ;)

I don't understand your reasoning which says it cannot stay in
ma_smalltable: PyDict_SetItem only calls dictresize when at least 2/3 of
the slots are filled. You can have 5 items in the small (8 slot) table and
the dictionary will resize to 32 slots on adding the 6th,the next resize
comes when you add the 22nd item.
What you're missing is that a slot can be in any of 3 states:
1) Active. A key is here. If the current slot doesn't match
lookdict() will try the next one.
2) Dummy. Used to be a key here, but now it's gone. lookdict() will
try the next one.
3) NULL. Never was a key here. lookdict() will stop.

lookdict() needs the NULL slots to stop searching. As keys are added
and removed from the table it will get filled with dummy slots, so
when the total number of active+dummy slots exceeds 2/3rds it will
trigger a resize (to the same size or even a smaller size!) so as to
clear out all the dummy slots (letting lookups finish sooner).

--
Adam Olsen, aka Rhamphoryncus

Jun 1 '07 #6
On May 31, 9:12 pm, "Adam Olsen" <rha...@gmail.c omwrote:
It seems to be a commonly held belief that basic dict operations (get,
set, del) are atomic.
They are atomic so long as the key does not have a custom __hash__,
__eq__, or __cmp__ method which can trigger arbitrary Python code.
With strings, ints, floats, or tuples of those, the get/set/del step
is atomic (i.e. executed in a single Python opcode).
>>from dis import dis
dis(compile(' del d[k]', 'example', 'exec'))
1 0 LOAD_NAME 0 (d)
3 LOAD_NAME 1 (k)
6 DELETE_SUBSCR
7 LOAD_CONST 0 (None)
10 RETURN_VALUE

The DELETE_SUBSCR step completes the whole action in a single opcode
(unless the object has a custom hash or equality test). Of course,
there is a possibility of a thread switch between the LOAD_NAME and
the DELETE_SUBSCR (in which case another thread could have added or
removed that key).
Raymond

Jun 2 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
2183
by: adeger | last post by:
Having trouble with my first forays into threads. Basically, the threads don't seem to be working in parallel (or you might say are blocking). I've boiled my problems to the following short code block and ensuing output. Seems like the output should be all interleaved and of course it's not. Running Python 2.2 from ActiveState on Windows XP (also doesn't work on Windows 2000). Thanks in advance! adeger
4
2897
by: Gilles Leblanc | last post by:
Hi I have started a small project with PyOpenGL. I am wondering what are the options for a GUI. So far I checked PyUI but it has some problems with 3d rendering outside the Windows platform. I know of WxPython but I don't know if I can create a WxPython window, use gl rendering code in it and then put widgets on top of that...
7
2706
by: Ivan | last post by:
Hi I have following problem: I'm creating two threads who are performing some tasks. When one thread finished I would like to restart her again (e.g. new job). Following example demonstrates that. Problem is that when program is started many threads are created (see output section), when only two should be running at any time. Can you please help me to identify me where the problem is? Best regards
4
5436
by: Matthew Groch | last post by:
Hi all, I've got a server that handles a relatively high number of concurrent transactions (on the magnitude of 1000's per second). Client applications establish socket connections with the server. Data is sent and received over these connections using the asynchronous model. The server is currently in beta testing. Sporadically over the course of the day, I'll observe the thread count on the process (via perfmon) start climbing....
5
2832
by: Razzie | last post by:
Hi all, A question from someone on a website got me thinking about this, and I wondered if anyone could explain this. A System.Threading.Timer object is garbage collected if it has no references to it. But what about threads? If I start a new thread that only does 1+1, is it garbage collected after that? If so, do all threads get garbage collected eventually that are 'done'? And if not, why not? Are there references to the thread that...
16
3314
by: droopytoon | last post by:
Hi, I start a new thread (previous one was "thread timing") because I have isolated my problem. It has nothing to do with calling unmanaged C++ code (I removed it in a test application). I have a thread "_itTaskThread" running. The application is listening on a TCP port. It accepts 2 connection from a client. I simulate a crash on the client side (using "Debug->StopDebugging").
9
3078
by: mareal | last post by:
I have noticed how the thread I created just stops running. I have added several exceptions to the thread System.Threading.SynchronizationLockException System.Threading.ThreadAbortException System.Threading.ThreadInterruptedException System.Threading.ThreadStateException to see if I could get more information about why the thread stops running but that code is never executed. Any ideas on how I can debug this?
13
5101
by: Bob Day | last post by:
Using vs2003, vb.net I start a thread, giving it a name before start. Code snippet: 'give each thread a unique name (for later identification) Trunk_Thread.Name = "Trunk_0_Thread" ' allow only 1 thread per line Trunk_Thread.ApartmentState = ApartmentState.STA
7
2698
by: Charles Law | last post by:
My first thought was to call WorkerThread.Suspend but the help cautions against this (for good reason) because the caller has no control over where the thread actually stops, and it might have a lock pending, for example. I want to be able to stop a thread temporarily, and then optionally resume it or stop it for good.
3
35447
by: John Nagle | last post by:
There's no way to set thread priorities within Python, is there? We have some threads that go compute-bound, and would like to reduce their priority slightly so the other operations, like accessing the database and servicing queries, aren't slowed as much. John Nagle
0
9698
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9558
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10527
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10057
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7595
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6834
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5492
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2963
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.