By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,786 Members | 1,131 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,786 IT Pros & Developers. It's quick & easy.

sparse sets of integers?

P: n/a

Does anyone have a python implementation (or python C/C+ extension) that
would allow me to perform set operations (union, intersection, difference,
number of elements, &c) over very large collections of integers?

Some of the sets may have over 10**11 (probably less than 10**13
though) integers in them, but there will tend to be runs of integers being
included or not included, so there might be 10**5 consecutive integers
included, then 10**4 that are not included, and then another 10**6 that
are.

Is there such code already available?

I wrote something like this in C ages ago, but:

1) I no longer have the code
2) Who wants to work in C if you can do it fast enough in python anyway? :)

Thanks!

Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
You have sets in Python 2.3x and 2.4x. I don't know if they can handle
your amounts of data, but i guess so.

Jul 18 '05 #2

P: n/a
[runes]
You have sets in Python 2.3x and 2.4x. I don't know if they can handle
your amounts of data, but i guess so.


The original poster (OP) spoke about 10**13 numbers, if I remember
correctly (I did not keep the original message). It all depends on the
real sparsity of the spare sets! :-) If not sparse enough, this might be
a bit inordinate for standard Python sets on most machines, also given
that each Python integer uses a lot more than 4 bytes...

On the other hand, the OP also wrote that the numbers he handles are
grouped in big and compact intervals, separated by big unpopulated
holes. So maybe his data could be represented in Python as a
user-written type hiding a sorted list of intervals, for which it should
be relatively easy to write set functions. That might make the problem
tractable enough, with reasonably speedy processing.

As last resort, the wanted sets could be represented as sorted files of
numbers. Slower, but set operations would also be easy to write.

--
François Pinard http://pinard.progiciels-bpi.ca
Jul 18 '05 #3

P: n/a

"Dan Stromberg" <st******@dcs.nac.uci.edu> wrote in message
news:pa****************************@dcs.nac.uci.ed u...

Does anyone have a python implementation (or python C/C+ extension) that
would allow me to perform set operations (union, intersection,
difference,
number of elements, &c) over very large collections of integers?

Some of the sets may have over 10**11 (probably less than 10**13
though) integers in them, but there will tend to be runs of integers
being
included or not included, so there might be 10**5 consecutive integers
included, then 10**4 that are not included, and then another 10**6 that
are.


I would call this chunky rather than sparse. Or a set (ordered list) of
integer ranges, which I am sure is how you implemented it before. I did
not find anything Googling 'Python integer range set' but would look harder
before coding.

Terry J. Reedy

Jul 18 '05 #4

P: n/a
[Dan Stromberg]
Does anyone have a python implementation (or python C/C+ extension) that
would allow me to perform set operations (union, intersection, difference,
number of elements, &c) over very large collections of integers?

Some of the sets may have over 10**11 (probably less than 10**13
though) integers in them, but there will tend to be runs of integers being
included or not included, so there might be 10**5 consecutive integers
included, then 10**4 that are not included, and then another 10**6 that
are.


This may help:

http://aspn.activestate.com/ASPN/Coo.../Recipe/230113
Raymond Hettinger
Jul 18 '05 #5

P: n/a

"Dan Stromberg" <st******@dcs.nac.uci.edu> wrote in message
news:pa****************************@dcs.nac.uci.ed u...

Does anyone have a python implementation (or python C/C+ extension) that
would allow me to perform set operations (union, intersection,
difference,
number of elements, &c) over very large collections of integers?

Some of the sets may have over 10**11 (probably less than 10**13
though) integers in them, but there will tend to be runs of integers
being
included or not included, so there might be 10**5 consecutive integers
included, then 10**4 that are not included, and then another 10**6 that
are.


I would call this chunky rather than sparse. Or a set (ordered list) of
integer ranges, which I am sure is how you implemented it before. I did
not find anything Googling 'Python integer range set' but would look harder
before coding.

Terry J. Reedy

Jul 18 '05 #6

P: n/a

"Dan Stromberg" <st******@dcs.nac.uci.edu> wrote in message
news:pa****************************@dcs.nac.uci.ed u...

Does anyone have a python implementation (or python C/C+ extension) that
would allow me to perform set operations (union, intersection,
difference,
number of elements, &c) over very large collections of integers?

Some of the sets may have over 10**11 (probably less than 10**13
though) integers in them, but there will tend to be runs of integers
being
included or not included, so there might be 10**5 consecutive integers
included, then 10**4 that are not included, and then another 10**6 that
are.


I would call this chunky rather than sparse. Or a set (ordered list) of
integer ranges, which I am sure is how you implemented it before. I did
not find anything Googling 'Python integer range set' but would look harder
before coding.

Terry J. Reedy

Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.