How to use a 5 or 6 bit integer in Python?

Glen Wheeler

Hello all,

My program uses many millions of integers, and Python is allocating
way too much memory for these. I can't have the performance hit by
using disk, so I figured I'd write a C extension to define a new type.
Problem is, my C knowledge is years old and regardless of my
attempts distutils will not recognise my installation of the MS
compiler.
I am thinking, is there any easier way to use a 5 or 6 bit integer
in python? Even a regular 8-bit would be fine. I only need to
represent either 32 or 64 distinct numbers.

thanks,
Glen

Jul 18 '05 #1

Subscribe Reply

2797

Bengt Richter

On Fri, 19 Dec 2003 11:42:49 +1100, Glen Wheeler <ad******@tpg.com.au> wrote:

Hello all,

My program uses many millions of integers, and Python is allocating
way too much memory for these. I can't have the performance hit by
using disk, so I figured I'd write a C extension to define a new type.
Problem is, my C knowledge is years old and regardless of my
attempts distutils will not recognise my installation of the MS
compiler.
I am thinking, is there any easier way to use a 5 or 6 bit integer
in python? Even a regular 8-bit would be fine. I only need to
represent either 32 or 64 distinct numbers.

You can store them efficiently in an array, e.g., for unsigned bytes
(or 'b' in place of 'B' for signed):

import array
bytes = array.array('B', range(10))
bytes array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) bytes[3]

3

We can only speculate on further help, until you tell us what you are doing ;-)

Regards,
Bengt Richter

Jul 18 '05 #2

Paul Rubin

Glen Wheeler <ad******@tpg.com.au> writes:

My program uses many millions of integers, and Python is allocating
way too much memory for these. I can't have the performance hit by
using disk, so I figured I'd write a C extension to define a new type.
Problem is, my C knowledge is years old and regardless of my
attempts distutils will not recognise my installation of the MS
compiler.
I am thinking, is there any easier way to use a 5 or 6 bit integer
in python? Even a regular 8-bit would be fine. I only need to
represent either 32 or 64 distinct numbers.

Look at the docs for the array module.

Jul 18 '05 #3

Rainer Deyke

Glen Wheeler wrote:

Hello all,

My program uses many millions of integers, and Python is allocating
way too much memory for these. I can't have the performance hit by
using disk, so I figured I'd write a C extension to define a new type.
Problem is, my C knowledge is years old and regardless of my
attempts distutils will not recognise my installation of the MS
compiler.
I am thinking, is there any easier way to use a 5 or 6 bit integer
in python? Even a regular 8-bit would be fine. I only need to
represent either 32 or 64 distinct numbers.

You're trying to solve the wrong problem. Python caches small integers, so
you've only got a very small number of distinct integer objects. This is a
good thing, because each Python object has a eight byte overhead.
Introducing a new type wouldn't result in any memory savings.

The real overhead comes from the references to those objects. Each
reference is four bytes (on 32 bit computers), and there is no way to
decrease this. Your best option is to use high level objects that contain
many integers internally but don't store them as individual Python
references. The array module is one choice, Numarray is another. If
immutability isn't an issue, you could even use strings (with each character
treated as an eight bit integer).
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com

Jul 18 '05 #4

Ben Finney

On Fri, 19 Dec 2003 11:42:49 +1100, Glen Wheeler wrote:

My program uses many millions of integers, and Python is allocating
way too much memory for these.

How have you measured this? As pointed out by others, Python doesn't
have duplicate occurrences of identical integers; it refers to the same
object multiple times.

Given this, I'm curious how you've measured the memory usage caused by
your program using millions of integers, and why you think reducing the
size of those integers will help significantly.

--
\ "I like to fill my bathtub up with water, then turn the shower |
`\ on and pretend I'm in a submarine that's been hit." -- Steven |
_o__) Wright |
Ben Finney <http://bignose.squidly.org/>

Jul 18 '05 #5

Glen Wheeler

On Fri, 19 Dec 2003 03:17:36 GMT, "Rainer Deyke" <ra*****@eldwood.com>
wrote:

You're trying to solve the wrong problem. Python caches small integers, so
you've only got a very small number of distinct integer objects. This is a
good thing, because each Python object has a eight byte overhead.
Introducing a new type wouldn't result in any memory savings.

Thanks. I had thought as much, but was not totally sure.
The real overhead comes from the references to those objects. Each
reference is four bytes (on 32 bit computers), and there is no way to
decrease this. Your best option is to use high level objects that contain
many integers internally but don't store them as individual Python
references. The array module is one choice, Numarray is another. If
immutability isn't an issue, you could even use strings (with each character
treated as an eight bit integer).

So the references to each integer is what causes the massive
overhead.
I've read up on Numarray and the array module in the docs, but
didn't see how I could adapt my current program to use these modules
because of how my data is currently organised.
I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.
Every single container type contains only integers. Am I correct in
assuming that an array cannot be used as a key for a dictionary? As
they are mutable?

Thanks for your help,
Glen

Jul 18 '05 #6

Glen Wheeler

On 19 Dec 2003 13:59:44 +1050, Ben Finney
<bi****************@and-benfinney-does-too.id.au> wrote:

On Fri, 19 Dec 2003 11:42:49 +1100, Glen Wheeler wrote:
My program uses many millions of integers, and Python is allocating
way too much memory for these.

How have you measured this? As pointed out by others, Python doesn't
have duplicate occurrences of identical integers; it refers to the same
object multiple times.

Given this, I'm curious how you've measured the memory usage caused by
your program using millions of integers, and why you think reducing the
size of those integers will help significantly.

I realise now that I was wrong in assuming this. However, Python
does use more memory than it needs to to store the data I'm
manipulating. This is probably caused by their container types,
dictionaries and tuples, however.
Reducing the amount of memory allocated by any method is my goal
right now, along with increasing the speed at which my program runs.
I'm using one dictionary to store all of the data, with tuples of
ints as keys and a tuple as the data associated with each key. The
data tuples contain a few ints and two more dictionaries.
Can you think of a way to reduce the memory used here?

Thanks,
Glen

Jul 18 '05 #7

Bengt Richter

On Fri, 19 Dec 2003 14:30:23 +1100, Glen Wheeler <ad******@tpg.com.au> wrote:

On Fri, 19 Dec 2003 03:17:36 GMT, "Rainer Deyke" <ra*****@eldwood.com>
wrote:
You're trying to solve the wrong problem. Python caches small integers, so
you've only got a very small number of distinct integer objects. This is a
good thing, because each Python object has a eight byte overhead.
Introducing a new type wouldn't result in any memory savings.

Thanks. I had thought as much, but was not totally sure.
The real overhead comes from the references to those objects. Each
reference is four bytes (on 32 bit computers), and there is no way to
decrease this. Your best option is to use high level objects that contain
many integers internally but don't store them as individual Python
references. The array module is one choice, Numarray is another. If
immutability isn't an issue, you could even use strings (with each character
treated as an eight bit integer).

So the references to each integer is what causes the massive
overhead.
I've read up on Numarray and the array module in the docs, but
didn't see how I could adapt my current program to use these modules
because of how my data is currently organised.
I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.
Every single container type contains only integers. Am I correct in
assuming that an array cannot be used as a key for a dictionary? As
they are mutable?

Yes, but you might be able to do better than tuples, depending on what
the ordering means and whether the same number can repeat in a tuple, etc.

If we can tease out a little more about your problem, maybe we can help better ;-)
E.g., how do your tuple-keys come into being and how many times are they re-used?
If there is no nesting, you could create a string key just by ''.join([chr(el) for el in tup])
which wouldn't be as compact as a compressed bit string, but would be smaller than the tuple.
If you were lucky, a change could gain you both time and space.

I'm curious what your dicts and their int tuples represent in the real world ;-)

Regards,
Bengt Richter

Jul 18 '05 #8

Rainer Deyke

Glen Wheeler wrote:

I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.

That makes things a bit harder. I expect that the big dict itself requires
at least sixteen bytes per entry. How big are your tuples? How big are the
dicts in the data tuples? What data do they contain? Are any of the tuples
or dicts shared?

Encoding the key tuples as strings is easy (''.join([chr(x) for x in
the_tuple]). Encoding the ints in the data tuples is just as easy. I'm not
sure if those are enough to solve your problem.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com

Jul 18 '05 #9

Gabriel Genellina

At 19/12/2003 14:30, you wrote:

I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.

How big are the integers? If 16 bits for each are enough, you could replace
those 2-elem tuples with an plain integer: (a,b) -> a<<16 | b
You dont waste space from a tuple object (and its construction time) plus
two references for each dictionary key used, but you eventually create many
more integer instances. And there is a penalty accessing the dictionary
since you have to compute the key.
Gabriel Genellina
Softlab SRL

Jul 18 '05 #10

Glen Wheeler

On Fri, 19 Dec 2003 06:27:59 GMT, "Rainer Deyke" <ra*****@eldwood.com>
wrote:

Glen Wheeler wrote:
I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.
That makes things a bit harder. I expect that the big dict itself requires
at least sixteen bytes per entry. How big are your tuples?

The key tuples, for the final calculation will range from 1 element
to 64 elements, with the majority at around 40-50 elements in size.
How big are the dicts in the data tuples? What data do they contain?
For the first dict: Increasingly smaller as the calculation
continues (i.e. as the key for that entry increases). Initially they
will have 64 keys (integers) with each integer having a 6-element
tuple of integers as it's data type. All integers are from 0-63.
For the second dict: Trivially small, with a maximum of 32
elements, starts at 0 and an average of around 7. Keys are kept as
integers from 0-63 and data ranges from -1-30, but the average case
will see the data at 7, as a single integer.
Are any of the tuples or dicts shared?

Unfortunately, no. This would be a major speedup for my program as
copying the dictionaries and tuples takes not only alot of memory but
alot of time too.
But the truth is that I need these to be unique on a key-by-key
basis (context from the big dict) so I can't see any way around it.
Encoding the key tuples as strings is easy (''.join([chr(x) for x in
the_tuple]). Encoding the ints in the data tuples is just as easy. I'm not
sure if those are enough to solve your problem.

I'll be sure to do this. Will it increase the hash speed for the
big dictionary significantly? Will lookup times or memory storage
requirements decrease?

Thanks,
Glen

Jul 18 '05 #11

Borcis

Glen Wheeler wrote:

I'm using one dictionary to store all of the data, with tuples of
ints as keys and a tuple as the data associated with each key. The
data tuples contain a few ints and two more dictionaries.
Can you think of a way to reduce the memory used here?

What's the length of the key tuples, and are there bounds on the
component ints ? Is the number of ints in the data tuples fixed ? What's
in the subdictionnaries, typically ? "millions of 5 bit integers" hints
of the possibility of a radical change of representation, but a radical
change of representation can only be devised if we know what's stable
- the task.

Jul 18 '05 #12

Rainer Deyke

Glen Wheeler wrote:

I'll be sure to do this. Will it increase the hash speed for the
big dictionary significantly? Will lookup times or memory storage
requirements decrease?

Using strings instead of tuples of integers should decrease memory usage. A
character in a string is stored as a single byte, whereas each tuple element
needs four bytes.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com

Jul 18 '05 #13

Derrick 'dman' Hudson

Glen :

On Fri, 19 Dec 2003 14:30:23 +1100, Glen Wheeler wrote:

I've read up on Numarray and the array module in the docs, but
didn't see how I could adapt my current program to use these modules
because of how my data is currently organised.
You may need to reorganize the data in order to save space!
I have one constantly changing dict with many millions of keys
(tuples of ints) which in turn associates itself with a tuple of ints
and two more dicts.

If you state the actual task and computation you are performing then
people can suggest alternate data structures and algorithms. Simply
describing your current data structure leaves people shooting in the
dark trying to suggest alternate ways of storing the data.

Usually time and space (memory) are mutually exclusive tradeoffs --
less computation requires more storage, less storage requires more
computation. However, there are cases where poor choice in data
structure and/or algorithm can allow a restructuring to improve both
speed and memory use.

In order to construct and evaluate data structure and algorithm
suitability a a concrete understanding of what task needs to be
performed by the program and what sort of data will be handled is
essential.

Presently you, Glen, are the only one with this knowledge of the task
and problem space. Share that knowledge with the group and the group
will then be capable of better assisting you.

-D

--
A mouse is a device used to point at the xterm you want to type in.
--Kim Alm, a.s.r

www: http://dman13.dyndns.org/~dman/ jabber: dm**@dman13.dyndns.org

Jul 18 '05 #14

Glen Wheeler

On Fri, 19 Dec 2003 14:21:16 +0100, Borcis <bo****@users.ch> wrote:

Glen Wheeler wrote:
I'm using one dictionary to store all of the data, with tuples of
ints as keys and a tuple as the data associated with each key. The
data tuples contain a few ints and two more dictionaries.
Can you think of a way to reduce the memory used here?
What's the length of the key tuples, and are there bounds on the
component ints ?

Component ints will either be 0-63 or 0-31. In a single run of the
program, this will not change throughout.
The key tuples range in length from 1-64, depending on at what stage
the program is currently at. Every tuple will have the same length at
some point; the greatest possible distance in length from any two
tuples is 1.
Is the number of ints in the data tuples fixed ?
Yes, there are always two ints, and another tuple of ints. One int
is arbitrary, and can be in the thousands but not much higher.
Another int is from 0-5. The tuple of ints is less than 6 long and
will contain either 0-63 or 0-31, depending on the parameters given to
the program at run time.
What's in the subdictionnaries, typically ? "millions of 5 bit integers" hints
of the possibility of a radical change of representation, but a radical
change of representation can only be devised if we know what's stable
- the task.

Well, the subdictionaries contents are described in another reply
written by me in this same thread. There are indeed many millions of
integers beign stored at any one time, or at least many millions of
references to these integers.
You are saying that to save any amount of memory something in the
large dictionary must be constant? I am sure everything in there is
dynamic in nature, i.e. will change with each iteration over the keys
in the large dictionary.

--
Glen

Jul 18 '05 #15

Glen Wheeler

On 19 Dec 2003 06:02:23 GMT, bo**@oz.net (Bengt Richter) wrote:

If we can tease out a little more about your problem, maybe we can help better ;-)
E.g., how do your tuple-keys come into being and how many times are they re-used?
If there is no nesting, you could create a string key just by ''.join([chr(el) for el in tup])
which wouldn't be as compact as a compressed bit string, but would be smaller than the tuple.
If you were lucky, a change could gain you both time and space.

I'll be implementing that this weekend, as my employer is expecting
results before Christmas. Damn deadlines.
I'm curious what your dicts and their int tuples represent in the real world ;-)

Well, I'm a researcher for an Australian university (University of
Wollongong) and my current task is to enumerate every possible 6 digit
binary Gray code.
Most recently people have done the 5 digit BGCs, something which I
have also accomplished, but never the 6 digit codes.
For a graphical representation, think of a cube in n-dimensions,
where n represents the number of digits in the code. A binary Gray
code around that cube represents a path starting from any arbitrary
point and then visiting every vertex exactly once.
The applications of such research are enourmous, and I'll not go
over them here, but that is my aim. To find the number of ways a 6
dimensional fly could walk a 6d cube visiting every vertex exactly
once.
I've actually been working on this problem for many months now and
have gotten so close. The previous best estimate for calculation time
(with a program written in C, I might add) was 2.8 * 10^15 years.
I've got the thing down to about 10^4 years. If the calculation
becomes tractable on a supercomputer, e.g. Barossa hosted at
www.ac3.com.au, then I'll run it on that.

I hope that's sated your curiosity for now :). If you'd like any
more information, or if anyone else would like to know anything about
this, then I'll be happy to correspond with them. I don't know how
on-topic it is for c.l.py.

--
Glen

Jul 18 '05 #16

Hartmut Goebel

Hi,

Glen Wheeler wrote:

Component ints will either be 0-63 or 0-31. In a single run of the
program, this will not change throughout.
The key tuples range in length from 1-64, depending on at what stage
the program is currently at. Every tuple will have the same length at
some point; the greatest possible distance in length from any two
tuples is 1.

I'm afraid, I'll mix up some of your data-structure, but here are my
thoughts:

- pack the key-tuples into a string using pack; you may even pack 4
intergers into one byte since the values range from 1-64 (which is
equivalent to 0-63).

- if you need easy access to eg. the last emlement of the tuple, keep it
seperate. Eg: key = ('xdef', 12)

- As far as I remeber, you wrote something about max. 64 entries. What
about using an array/list here? Array may need a lot less memory
(depending on the muber of elements of course).

Only my 2 cents.

--
Regards
Hartmut Goebel

| Hartmut Goebel | We build the crazy compilers |
| h.******@crazy-compilers.com | Compiler Manufacturer |

Jul 18 '05 #17

Oren Tirosh

On Fri, Dec 19, 2003 at 06:27:59AM +0000, Rainer Deyke wrote:

Encoding the key tuples as strings is easy (''.join([chr(x) for x in
the_tuple]). Encoding the ints in the data tuples is just as easy. I'm not
sure if those are enough to solve your problem.

Using strings is a really, really good idea. Strings are compact and
Python dictionaries are highly optimized for use with string keys with
tricks like caching of the string hash and faster code paths for special
cases like dicts that have only string keys.

There is a much faster way for converting tuples of integers to strings.
Instead of using the Python tuple type use arrays of bytes and then apply
the array object's .tostring() method:

from array import array
the_array = array('b', range(40))
the_string = the_array.tostring()
It's about 35 times faster than ''.join([chr(x) for x in the_array])
back_to_array = array('b', the_string)

Oren

Jul 18 '05 #18

Oren Tirosh

On Fri, Dec 19, 2003 at 06:26:22PM +1100, Glen Wheeler wrote:
.....

tuple of integers as it's data type. All integers are from 0-63.
For the second dict: Trivially small, with a maximum of 32
elements, starts at 0 and an average of around 7. Keys are kept as
integers from 0-63 and data ranges from -1-30, i

It's your lucky day... All your integers are from -1 to 63 which fits
nicely in Python's range of "privileged" integers from -1 to 100.
These are handled much faster by using a static table of integer
objects without the overhead of allocation and deallocatiom.

Oren

Jul 18 '05 #19

Anton Vredegoor

Glen Wheeler <ad******@tpg.com.au> wrote:

Well, I'm a researcher for an Australian university (University of
Wollongong) and my current task is to enumerate every possible 6 digit
binary Gray code.
It is possible to write a function that turns every integer into its
"reflected" form so that the normal enumeration of the integers
corresponds to a gray code (successors that differ only in one bit
position) sequence. This can be done without enumerating them all.

Here's some code from a book by Kreher and Stinson (Combinatorial
ALgorithms) which I translated into Python from the pseudo-code in the
book. I made some adaptations and came up with this:

import sets

def unrank(n,r):
T,b1 = [],0
for i in range(n-1,-1,-1):
b0 = r/(2**i)
if b0 != b1:
T.append(n-i-1)
b1 = b0
r -= b0*2**i
return T

def rank(n,T):
S = sets.Set(T)
r,b = 0,0
for i in range(n-1,-1,-1):
if n-i-1 in S:
b = 1-b
if b == 1:
r += 2**i
return r

def reflect(n,r):
S = sets.Set(unrank(n,r))
L = ["01"[i in S] for i in range(n)]
return int("".join(L),2)

def test():
n = 2**6
T = [None]
for i in range(100):
U = unrank(n,i)
print i, reflect(n,i),U
assert rank(n,U) == i
assert len(sets.Set(T) ^ sets.Set(U)) == 1
T = U

if __name__=='__main__':
test()
I hope that's sated your curiosity for now :). If you'd like any
more information, or if anyone else would like to know anything about
this, then I'll be happy to correspond with them. I don't know how
on-topic it is for c.l.py.

If it's interesting and provides an excuse for exercising our
programming skills in our favorite language, it's on topic. It's even
on topic if it's outrageously off-topic enough, so take your pick :-)

I guess I must have missed something because if the problem's solution
can be looked up in a book it probably is not what you are looking
for. So why is it not enough to have an indexing function and why do
you need to have all values literally?

Anton

Jul 18 '05 #20

Bengt Richter

On Sat, 20 Dec 2003 19:38:36 +1100, Glen Wheeler <ad******@tpg.com.au> wrote:

On 19 Dec 2003 06:02:23 GMT, bo**@oz.net (Bengt Richter) wrote:
If we can tease out a little more about your problem, maybe we can help better ;-)
E.g., how do your tuple-keys come into being and how many times are they re-used?
If there is no nesting, you could create a string key just by ''.join([chr(el) for el in tup])
which wouldn't be as compact as a compressed bit string, but would be smaller than the tuple.
If you were lucky, a change could gain you both time and space.

I'll be implementing that this weekend, as my employer is expecting
results before Christmas. Damn deadlines.
I'm curious what your dicts and their int tuples represent in the real world ;-)

Well, I'm a researcher for an Australian university (University of
Wollongong) and my current task is to enumerate every possible 6 digit
binary Gray code.
Most recently people have done the 5 digit BGCs, something which I
have also accomplished, but never the 6 digit codes.
For a graphical representation, think of a cube in n-dimensions,
where n represents the number of digits in the code. A binary Gray
code around that cube represents a path starting from any arbitrary
point and then visiting every vertex exactly once.
The applications of such research are enourmous, and I'll not go
over them here, but that is my aim. To find the number of ways a 6
dimensional fly could walk a 6d cube visiting every vertex exactly
once.
I've actually been working on this problem for many months now and
have gotten so close. The previous best estimate for calculation time
(with a program written in C, I might add) was 2.8 * 10^15 years.
I've got the thing down to about 10^4 years. If the calculation
becomes tractable on a supercomputer, e.g. Barossa hosted at
www.ac3.com.au, then I'll run it on that.

I hope that's sated your curiosity for now :). If you'd like any
more information, or if anyone else would like to know anything about
this, then I'll be happy to correspond with them. I don't know how
on-topic it is for c.l.py.

Just to see if I understand your problem statement, does the following (though it's slow)
do what you are trying to do? If so, what would be the point of generating
all those gray code sequences? IOW, how are you going to use them? Assuming my
prog generated all the 6-bit grey code seqences to a monster file, all the 64-number
sequences packed into 64-byte strings end to end, then you would either have to
read that file sequentially, or seek to some point and read 64 bytes to get a
given gray code sequence. If the latter, how about an algorithm that would
do a virtual seek into the whole data as a virtual file, and then generate
what you would read? I.e., there would be no point in storing all that data
if you could easily generate an arbitrary segment of it. Of course, maybe that's
not so easy ;-) But I'm wonder what the use is going to be, and the access pattern,
if you assumed availability in a monster file, virtual or not.

If you really do want to generate all that data, I suppose the loop and recursions
could be fanned out to parallel processing. But first, do I understand the problem?

====< gray.py >================================================= ========
# gray.py -- enumerate n-bit gray codes
# 20031220 10:34:27 alpha version by Bengt Richter. No warranties ;-)
#
def enumgraycodes(n):
"""
Enumerate all possible complete n-bit gray code sequences
returning a list of 2**n-length strings whose bytes each
contain a complete gray code sequence in the ord values.
"""
visit_mask = 2L**(2**n)-1
nbits = [1<<i for i in xrange(n)]
vbits = [1L<<i for i in xrange(2**n)]
codes = []
def visit(curr=0, visited=0L, code=''):
"""visit curr, and recursively available successors"""
visited |= vbits[curr]
code += chr(curr)
if visited == visit_mask:
codes.append(code)
return
for nbit in nbits:
succ = curr^nbit
if vbits[succ]&visited: continue
visit(succ, visited, code)
for start in xrange(2**n):
visit(start)
return codes

if __name__ == '__main__':
import sys
codes = enumgraycodes(int(sys.argv[1]))
for code in codes:
print ''.join(['%02x'%ord(c) for c in code])
================================================== ======================

Example: (even 4 bits generates >3mb and takes a while, so I won't post it ;-)

[10:41] C:\pywk\clp\gray>python gray.py 2
00010302
00020301
01000203
01030200
02030100
02000103
03020001
03010002

[10:41] C:\pywk\clp\gray>python gray.py 3
0001030206070504
0001030206040507
0001030705040602
0001050406070302
0001050406020307
0001050703020604
0002030105040607
0002030105070604
0002030706040501
0002060703010504
0002060405070301
0002060405010307
0004050706020301
0004050103020607
0004050103070602
0004060705010302
0004060203010507
0004060203070501
0100020307060405
0100020307050406
0100020604050703
0100040507060203
0100040507030206
0100040602030705
0103020004050706
0103020004060705
0103020607050400
0103070602000405
0103070504060200
0103070504000206
0105040607030200
0105040002030706
0105040002060703
0105070604000203
0105070302000406
0105070302060400
0203010004050706
[...]
0602000405010307
0706040501000203
0706040501030200
0706040002030105
0706020301000405
0706020301050400
0706020004050103
0705040602030100
0705040602000103
0705040001030206
0705010004060203
0705010302000406
0705010302060400
0703020001050406
0703020604050100
0703020604000105
0703010002060405
0703010504060200
0703010504000206

Regards,
Bengt Richter

Jul 18 '05 #21

Glen Wheeler

On 20 Dec 2003 18:35:31 GMT, bo**@oz.net (Bengt Richter) wrote:

Just to see if I understand your problem statement, does the following (though it's slow)
do what you are trying to do?

Not exactly, although if you wc -l the result and then wrote down
the number, then it would. I am only interested in the final result.
If so, what would be the point of generating
all those gray code sequences? IOW, how are you going to use them? Assuming my
prog generated all the 6-bit grey code seqences to a monster file, all the 64-number
sequences packed into 64-byte strings end to end, then you would either have to
read that file sequentially, or seek to some point and read 64 bytes to get a
given gray code sequence. If the latter, how about an algorithm that would
do a virtual seek into the whole data as a virtual file, and then generate
what you would read? I.e., there would be no point in storing all that data
if you could easily generate an arbitrary segment of it.
Generating a single code of arbitrary digits is a relatively simple
task; I am interested in generating the total *number*, not every
code. Each of the codes I am generating also has (as one of the
integers in it's data tuple) a `multiplicity' which is how many other
Gray codes of that length are symmetrically equivalent.
I am merely generating a very small subset which I can recombine (as
a polynomial, say) and then work out how many there will be in total.
Of course, maybe that's
not so easy ;-) But I'm wonder what the use is going to be, and the access pattern,
if you assumed availability in a monster file, virtual or not.

Incidentally, I did try the method you have below. Almost identical
I might add! It ran for around 3 weeks, and generated just over a
million 5-digit Gray codes. For some interest, if G(n) represents the
number of n-digit Gray codes then :

G(1) = 2
G(2) = 8 (as you show below)
G(3) = 144
G(4) = 91 392
G(5) = 187 499 658 240 (I can calculate this in under 8 hours)
G(6) = ?? (but it's around 10^24)

So if you allocate each code a minimal number of bits, storing every
5 or 6 digit BGC is a very different and much more impossible task
than that which I am trying to achieve.
If you really do want to generate all that data, I suppose the loop and recursions
could be fanned out to parallel processing. But first, do I understand the problem?

====< gray.py >================================================= ========

If instead of printing the successful code, or storing it, you
simply incremented an internal counter, then yes that is exactly the
problem I need to solve. For 6 digits.
My representations and algorithm come from alot of theory that me
and other associates have proven, although I can post it here or to
your e-mail if you would like. I haven't included any of these
tuples-as-strings speedups yet, as I'm working on integrating some new
theory into the program, but comments would always be appreciated.
It would be ironic if you did take me up on this, as I recently
deleted all the comments from my code because I felt they were
cluttering the code itself :).

--
Glen

Jul 18 '05 #22

How to use a 5 or 6 bit integer in Python?

Similar topics