searching a value of a dict (each value is a list)

Seongsu Lee

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

Dec 9 '07 #1

Subscribe Post Reply

2378

Seongsu Lee

On 12¿ù10ÀÏ, ¿ÀÀü1½Ã23ºÐ, Seongsu Lee <se...@senux.comwrote:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

Hi,

I just let the dict work in bidirectional fashion so that
I can find out what I want by both key and value. A mark
or prefix was needed to distinguish between keys originated
from keys and keys originated from values. (value * (-1))

from pprint import pprint
dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}
for k, v in dict.items():
for x in v:
dict[x * -1] = k
pprint(dict)

{-105: 900000,
-104: 900000,
-103: 900000,
-102: 900000,
-101: 900000,
-100: 900000,
-22: 900001,
-21: 900001,
-20: 900001,
-19: 999999,
-18: 999999,
-17: 999999,
-16: 999999,
-15: 999999,
-12: 2,
-11: 2,
-10: 2,
-5: 1,
-4: 1,
-3: 1,
-2: 1,
-1: 1,
1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

What do you think of this? Ideas with less space complexity?

Dec 9 '07 #2

Pablo Ziliani

Seongsu Lee escribió:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
(...)

I want to find out the key value which has a specific
integer in the list of its value.

Sorry if this is unhelpful, but have you considered moving your data
model a proper database?
I ask because unless someone knows of a specific module, I think we are
in DB's authentic realm. Is the fastest solution, probably not just for
this particular operation you are trying to do.

Regards,
Pablo

Dec 9 '07 #3

Seongsu Lee

On 12ì›”10ì¼, ì˜¤ì*„1ì‹œ53ë¶„, Pablo Ziliani <pa...@decode.com.arwrote:

Seongsu Lee escribiÃ³:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
(...)

I want to find out the key value which has a specific
integer in the list of its value.

Sorry if this is unhelpful, but have you considered moving your data
model a proper database?
I ask because unless someone knows of a specific module, I think we are
in DB's authentic realm. Is the fastest solution, probably not just for
this particular operation you are trying to do.

Regards,
Pablo

Hi Pablo,

Thank you for your posting! I wanted to solve the problem within
a given environment, python, and I think it is solved by
a dictionary with bidirectional key. I have posted it and want to
know if other knows more elegant way to do it.

Dec 9 '07 #4

bearophileHUGS

Seongsu Lee:

What do you think of this? Ideas with less space complexity?

You can put the second group of keys in a second dictionary, so you
don't have to mangle them, and it may be a bit faster.

Regarding the space complexity, I don't know how you can reduce it
with Python. Probably you can create a list of long, sort it and use
bisect on it to find the keys. Such longs can be a combination of
shifted integer OR the other integer that is the key of the original
dict. But I don't how much you can gain like this.

Another solution to reduce space is to use a tiny external module
written in C, Pyrex or D. Here follows some simple D code you can
modify a bit to make it work with Pyd (http://pyd.dsource.org/):
import std.traits: ReturnType;
import std.stdio: writefln;

struct TyInt_int {
int el, n;
int opCmp(TyInt_int other) {
if (el == other.el)
return 0;
return (el < other.el) ? -1 : 1;
}
}

int bisect(TyElem, TyData, TyFun)(TyElem[] a, TyData x, TyFun key) {
int lo = 0;
int hi = a.length;
while (lo < hi) {
int mid = (lo + hi) / 2;
if (x < key(a[mid]))
hi = mid;
else
lo = mid + 1;
}
return lo;
}

void main() {
int[][int] int_arr;
int_arr[1] = [1, 2, 3, 4, 5];
int_arr[1] = [10, 11, 12],
int_arr[900000] = [100, 101, 102, 103, 104, 105],
int_arr[900001] = [20, 21, 22],
int_arr[999999] = [15, 16, 17, 18, 19];

int tot_len = 0;
foreach(arr; int_arr)
tot_len += arr.length;

auto data_arr = new TyInt_int[](tot_len);
int i = 0;
foreach(n, arr; int_arr)
foreach(el; arr)
data_arr[i++] = TyInt_int(el, n);

data_arr.sort;
writefln(bisect(data_arr, 100, (TyInt_int ii){return ii.el;}));
}

Bye,
bearophile

Dec 9 '07 #5

Jeremy C B Nicoll

Seongsu Lee <se***@senux.comwrote:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

Are the integers in the lists unique? I mean, could 104 occur in more than
one list? If it did, would it matter which key was returned?

How can I do this with Python? Ideas?

When I see something like this my natural response is to think that the data
structure is inappropriate for the use it's being put to.

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

Even if the dict in this form is needed for some other reason, couldn't the
code that created it also create a reverse index at the same time?

--
Jeremy C B Nicoll - my opinions are my own.

Dec 9 '07 #6

bearophileHUGS

Jeremy C B Nicoll:

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

If you are talking about my D code then I know it, the creation of the
first dict has to be skipped, if possible... The code I have posted
must be adapted.

Bye,
bearophile

Dec 9 '07 #7

Jeremy C B Nicoll

be************@lycos.com wrote:

Jeremy C B Nicoll:
The code someone else posted ...

If you are talking about my D code then I know it...

No I meant the code that used python to iterate over the dict and create
zillions of extra keys. I've deleted earlier posts in the thread and wasn't
sure who suggested that.

--
Jeremy C B Nicoll - my opinions are my own.

Dec 10 '07 #8

Zepo Len

I have a dictionary with million keys. Each value in the

dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

def find_key(dict, num):
for k in dict:
if num in dict[k]:
return k

Dec 10 '07 #9

Seongsu Lee

On 12¿ù10ÀÏ, ¿ÀÀü6½Ã49ºÐ, Jeremy C B Nicoll <jer...@omba.demon.co.ukwrote:

Seongsu Lee <se...@senux.comwrote:
Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

Are the integers in the lists unique? I mean, could 104 occur in more than
one list? If it did, would it matter which key was returned?

Yes, the intergers in the lists are unique.

How can I do this with Python? Ideas?

When I see something like this my natural response is to think that the data
structure is inappropriate for the use it's being put to.

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

Even if the dict in this form is needed for some other reason, couldn't the
code that created it also create a reverse index at the same time?

The reason I use the dict for my data is to speed up the search by
key.

The code could create also a reverse index (a reverse dict) at the
time. But as you said, it waste the space and I wanted to ask someone
who may know some way to reduce the waste of space while searching
fast.

Dec 10 '07 #10

Jeremy C B Nicoll

Seongsu Lee <se***@senux.comwrote:

The reason I use the dict for my data is to speed up the search by key.

Ok, I understand that once the overhead of creating the dict has been done,
getting access to values within it is quick. And taking the time to create
a set of reverse keys speeds up the reverse access.

Rather than scanning the whole dict and creating reverse keys for everything
in it first, there might be an advantage in making search logic test if
there is a reverse key for the negative integer to be searched for, and if
so use it, otherwise scan the dict creating and examining reverse keys until
the integer is found. That way if the integer you're looking for is early
in the dict you'd only create reverse keys for the integers from the start
of the dict until the required one.
We don't know what external(?) process created the data that you've stored
in the dict, nor what you use it for - or more to the point - how often. If
you're going to make just one search of that data then there's little point
in having a fast search after a slow dict creation. On the other hand if
you have many many searches to do the initial overhead might be acceptable.
(I don't know how slow creating the dict would be for a typical example of a
million keys each keying lists of 1-1000 integers.)

The code could create also a reverse index (a reverse dict) at the
time. But as you said, it waste the space and I wanted to ask someone
who may know some way to reduce the waste of space while searching
fast.

Is the dict used by anything else? If the data in it was held in some other
form would that cause your program (or other programs) lots of problems? If
the range of values of the integers being stored is suitable, you might
sensibly use several or many smaller dicts to store all the data (and thus
save time reverse-keying much less of it).

--
Jeremy C B Nicoll - my opinions are my own.

Dec 10 '07 #11

Adonis Vargas

Seongsu Lee wrote:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

You can try this:

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

findItem(104, items)

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Hope this helps.

Adonis Vargas

Dec 10 '07 #12

bearophileHUGS

Adonis Vargas:

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.
Adonis Vargas

After hearing this suggestion for the 300th time, I think it may be
the moment to fix this problem in Python3, and make the Python
compiler issue a syntax error if someone tries to reassign such kind
of words, like dict, set, etc.

Bye,
bearophile

Dec 10 '07 #13

Seongsu Lee

On 12¿ù10ÀÏ, ¿ÀÈÄ12½Ã18ºÐ, Adonis Vargas <adon...@REMOVETHISearthlink.net>
wrote:

Seongsu Lee wrote:
Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

You can try this:

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

findItem(104, items)

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Hi,

Yes, it works. But I think it works in O(n * m), doesn't it?
(n is # of keys in the dictionary and m is # of items in the list.)
So, we need to create a reverse index. (a reverse dictionary) or
need something better at least, I think.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Yep. :)

Hope this helps.

Adonis Vargas- µû¿Â ÅØ½ºÆ® ¼û±â±â -

- µû¿Â ÅØ½ºÆ® º¸±â -

Dec 10 '07 #14

MonkeeSage

On Dec 10, 3:50 am, Seongsu Lee <se...@senux.comwrote:

On 12¿ù10ÀÏ, ¿ÀÈÄ12½Ã18ºÐ, Adonis Vargas <adon...@REMOVETHISearthlink.net>
wrote:

Seongsu Lee wrote:
Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

You can try this:

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

findItem(104, items)

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Hi,

Yes, it works. But I think it works in O(n * m), doesn't it?
(n is # of keys in the dictionary and m is # of items in the list.)
So, we need to create a reverse index. (a reverse dictionary) or
need something better at least, I think.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Yep. :)

Hope this helps.

Adonis Vargas- µû¿Â ÅØ½ºÆ® ¼û±â±â -

- µû¿Â ÅØ½ºÆ® º¸±â -

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n). But if you only need it once, it is
a waste of time and space to create a reverse dict when your access
time is the same for the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as you
execute the first search; that way you can combine the time spent on
building the reverse dict and the lookup, to get a total of O(n*m)
rather than O(n^2*m). The first search is "free" since you need the
reverse dict anyway.

Regards,
Jordan

Dec 10 '07 #15

Neil Cerutti

On 2007-12-10, MonkeeSage <Mo********@gmail.comwrote:

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n).

It also depends on if the dictionary shall be mutated between
reverse lookups.

But if you only need it once, it is a waste of time and space
to create a reverse dict when your access time is the same for
the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as
you execute the first search; that way you can combine the time
spent on building the reverse dict and the lookup, to get a
total of O(n*m) rather than O(n^2*m). The first search is
"free" since you need the reverse dict anyway.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

--
Neil Cerutti
You only get a once-in-a-lifetime opportunity so many times. --Ike Taylor

Dec 10 '07 #16

MonkeeSage

On Dec 10, 8:31 am, Neil Cerutti <horp...@yahoo.comwrote:

On 2007-12-10, MonkeeSage <MonkeeS...@gmail.comwrote:

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n).

It also depends on if the dictionary shall be mutated between
reverse lookups.

But if you only need it once, it is a waste of time and space
to create a reverse dict when your access time is the same for
the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as
you execute the first search; that way you can combine the time
spent on building the reverse dict and the lookup, to get a
total of O(n*m) rather than O(n^2*m). The first search is
"free" since you need the reverse dict anyway.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

--
Neil Cerutti
You only get a once-in-a-lifetime opportunity so many times. --Ike Taylor

Well true, but you enter a whole other level of complexity in that
case...something like Theta(log(n*(m-n))). I might have calculated
that incorrectly, but that just goes to show how complex a lookup
is(!) in such a case.

Regards,
Jordan

Dec 10 '07 #17

MonkeeSage

On Dec 10, 8:31 am, Neil Cerutti <horp...@yahoo.comwrote:

On 2007-12-10, MonkeeSage <MonkeeS...@gmail.comwrote:

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n).

It also depends on if the dictionary shall be mutated between
reverse lookups.

But if you only need it once, it is a waste of time and space
to create a reverse dict when your access time is the same for
the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as
you execute the first search; that way you can combine the time
spent on building the reverse dict and the lookup, to get a
total of O(n*m) rather than O(n^2*m). The first search is
"free" since you need the reverse dict anyway.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

--
Neil Cerutti
You only get a once-in-a-lifetime opportunity so many times. --Ike Taylor

Dec 10 '07 #18

MonkeeSage

On Dec 10, 9:45 am, MonkeeSage <MonkeeS...@gmail.comwrote:

On Dec 10, 8:31 am, Neil Cerutti <horp...@yahoo.comwrote:

On 2007-12-10, MonkeeSage <MonkeeS...@gmail.comwrote:

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n).

It also depends on if the dictionary shall be mutated between
reverse lookups.

But if you only need it once, it is a waste of time and space
to create a reverse dict when your access time is the same for
the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as
you execute the first search; that way you can combine the time
spent on building the reverse dict and the lookup, to get a
total of O(n*m) rather than O(n^2*m). The first search is
"free" since you need the reverse dict anyway.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

--
Neil Cerutti
You only get a once-in-a-lifetime opportunity so many times. --Ike Taylor

Well true, but you enter a whole other level of complexity in that
case...something like Theta(log(n*(m-n))). I might have calculated
that incorrectly, but that just goes to show how complex a lookup
is(!) in such a case.

Regards,
Jordan

Sorry for the double-post...google is being beligerant right now.

Dec 10 '07 #19

bearophileHUGS

Seongsu Lee:

>I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers.<

Let's say each integer can be represented with 32 bits (if there are
less numbers then a 3-byte representation may suffice, but this makes
things more complex), that is 2^2 bytes. Let's say there are 2^20 keys
each one associated to 2^10 values. So to represent the values you
need 2^32 bytes. It means 4 GB, so I don't think Python suffices to
store them in RAM, because a Python int object requires quite more
than 4 bytes (only represented inside an array.array it may need just
4 bytes).

So if you can use 128 MB RAM to store such data structure you need to
store data on HD too. You probably can use a lower-level language. On
disk you can keep the reverse index, represented as an array of
records/structs, each of such structures keep two 32-bit numbers (so
such array is 8 GB). Such index is sorted according to the first
element of the struct. The first number is the value of the original
dictionary and the second nuber is its key. Inside the RAM you can
keep another sorted array that "summarizes" your whole data. When you
need a number you can do a binary search on the array in RAM, such
array gives you the position where you can read (with a seek) a little
part of the file (512 bytes may suffice), to perform a little binary
search (if the block is very little a linear scan suffices) on it too
to find the number you need. Note that the summarizing data structure
in RAM may be represented with just a Python dict too, so in the end
you can use Python to solve this problem. You may need a lower-level
language to create the 8 GB file on disk (or create it with Python,
but it may take lot of time. You may sort it with the sort unix
command).

This isn't a complete solution, but I think it may work.

Bye,
bearophile

Dec 10 '07 #20

MonkeeSage

On Dec 10, 1:28 pm, bearophileH...@lycos.com wrote:

Seongsu Lee:

I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers.<

Let's say each integer can be represented with 32 bits (if there are
less numbers then a 3-byte representation may suffice, but this makes
things more complex), that is 2^2 bytes. Let's say there are 2^20 keys
each one associated to 2^10 values. So to represent the values you
need 2^32 bytes. It means 4 GB, so I don't think Python suffices to
store them in RAM, because a Python int object requires quite more
than 4 bytes (only represented inside an array.array it may need just
4 bytes).

So if you can use 128 MB RAM to store such data structure you need to
store data on HD too. You probably can use a lower-level language. On
disk you can keep the reverse index, represented as an array of
records/structs, each of such structures keep two 32-bit numbers (so
such array is 8 GB). Such index is sorted according to the first
element of the struct. The first number is the value of the original
dictionary and the second nuber is its key. Inside the RAM you can
keep another sorted array that "summarizes" your whole data. When you
need a number you can do a binary search on the array in RAM, such
array gives you the position where you can read (with a seek) a little
part of the file (512 bytes may suffice), to perform a little binary
search (if the block is very little a linear scan suffices) on it too
to find the number you need. Note that the summarizing data structure
in RAM may be represented with just a Python dict too, so in the end
you can use Python to solve this problem. You may need a lower-level
language to create the 8 GB file on disk (or create it with Python,
but it may take lot of time. You may sort it with the sort unix
command).

This isn't a complete solution, but I think it may work.

Bye,
bearophile

Nice. :)

Regards,
Jordan

Dec 14 '07 #21

searching a value of a dict (each value is a list)

Similar topics