459,290 Members | 1,388 Online Need help? Post your question and get tips & solutions from a community of 459,290 IT Pros & Developers. It's quick & easy.

 P: n/a Let's consider a test source code given at the very end of this posting. The question is if Python allows somehow access to the bytes of the representation of a long integer or integer in computers memory? Or does Python hide such implementation details that deep, that there is no way to get down to them? The test code below shows, that extracting bits from an integer value n is faster when using n&0x01 than when using n%2 and I suppose it is because %2 tries to handle the entire integer, where &0x01 processes only the last two bytes of it (I come to this because the speed difference between &0x01 and %2 operations depends on how large the value n is). If it were possible to 'tell' the %2 operation to operate only on one short of the integer number representation there will be probably no difference in speed. Is there a way to do this efficiently in Python like it is possible in C when using pointers and recasting? As I am on Python 2.4.2 and Microsoft Windows, I am interested in details related to this Python version (to limit the scope of the question). Claudio Here the code: import time # i = 2**25964951 - 1 i = 123456789**123 lstBitsModulo = [] start = time.clock() while i: i=i>>1 lstBitsModulo.append(i%2) speedModulo = time.clock()-start # i = 2**25964951 - 1 i = 123456789**123 lstBitsBitwiseAnd = [] start = time.clock() while i: i=i>>1 lstBitsBitwiseAnd.append(i&0x01) speedBitwiseAnd = time.clock()-start print if lstBitsBitwiseAnd == lstBitsModulo : print 'BitwiseAnd = %f '%(speedBitwiseAnd,) print 'Modulo = %f '%(speedModulo,) print # both lists are lists of long integer values print lstBitsBitwiseAnd[0:3] print lstBitsModulo[0:3] Jan 6 '06 #1
15 Replies

 P: n/a > The test code below shows, that extracting bits from an integer value n is faster when using n&0x01 than when using n%2 and I suppose it is because %2 tries to handle the entire integer, where &0x01 processes only the last two bytes of it (I come to this because the speed difference between &0x01 and %2 operations depends on how large the value n is) I doubt that the reason is in & handling less data. The reason is that % is effectively a division, whereas & is a logical operation. Which have always been _way_ faster than divisions. Of course you are right that the bitfiddling _could_ be optimized when one knows that only certain bytes of a number would be of interest. But I seriously doubt that python does any optimization here. However, I don't know for sure. Regards, Diez Jan 6 '06 #2

 P: n/a I don't know of a way to directly access the internal structure of a long, but you can speed up your example. First, is the order of the commands i=i>>1 lstBitsBitwiseAnd.append(i&0x01) what you intend? The first low order bit is discarded because you've done the shift first. And an extra 0 is appended. If you are trying to get the binary representation of a long, try i.__hex__(). This will create a string with the hex representation. Conversion from hex to binary is left as an exercise for the reader. :-) Jan 6 '06 #3

 P: n/a Claudio Grondi wrote: The question is if Python allows somehow access to the bytes of the representation of a long integer or integer in computers memory? Not sure what you mean by "Python", and "allows" here. I allow you :-) To do so, write a C file that is a Python module, and accesses the internal representation of the long. Works super-fast. You can get somewhat faster in Python than your code if you avoid producing new long objects all the time, and do the task in chunks of 30 bits. Regards, Martin import time repeats = [None]*100 start = time.clock() for repeat in repeats: i = 123456789**123 lstBitsBitwiseAnd = [] while i: lstBitsBitwiseAnd.append(i&0x01) i=i>>1 print time.clock()-start start = time.clock() for repeat in repeats: i = 123456789**123 lstBitsBitwiseAnd = [] done = False while i: i1 = int(i & 0x3FFFFFFF) i >>= 30 if i == 0: done = True for k in xrange(30): lstBitsBitwiseAnd.append(i1 & 1) i1 >>= 1 if done and i1==0: # if this is the top word, and no bits are left, # we are done break print time.clock()-start Jan 6 '06 #5

 P: n/a Claudio Grondi writes: The question is if Python allows somehow access to the bytes of the representation of a long integer or integer in computers memory? No it doesn't, and that's a good thing, since the internal representation is a little bit surprising (it stores 15 bits of the long int in each 16-bit word of the representation, to make the arithmetic routines a little simpler). If it were possible to 'tell' the %2 operation to operate only on one short of the integer number representation there will be probably no difference in speed. Is there a way to do this efficiently in Python like it is possible in C when using pointers and recasting? The usual trick to get at the contents of a long is to use hex conversion and maybe the array module (untested): a = '%x' % n if len(a) % 2 == 1: a = '0' + a s = array.array('B', binascii.unhexlify(a)) s now gives you the individual bytes of n. Jan 7 '06 #7

 P: n/a [Claudio Grondi] Let's consider a test source code given at the very end of this posting. The question is if Python allows somehow access to the bytes of the representation of a long integer or integer in computers memory? CPython does not expose its internal representation of longs at the Python level. Or does Python hide such implementation details that deep, that there is no way to get down to them? As above. The test code below shows, that extracting bits from an integer value n is faster when using n&0x01 than when using n%2 and I suppose it is because %2 tries to handle the entire integer, It not only tries, it succeeds ;-) where &0x01 processes only the last two bytes of it If x and y are positive longs, the time required to compute x&y in all recent CPythons is essentially proportional to the number of bits in min(x, y). ... If it were possible to 'tell' the %2 operation to operate only on one short of the integer number representation there will be probably no difference in speed. Is there a way to do this efficiently in Python like it is possible in C when using pointers and recasting? No. As I am on Python 2.4.2 and Microsoft Windows, I am interested in details related to this Python version (to limit the scope of the question). Doesn't really matter: same answers for all recent versions of CPython on all platforms. If you go back far enough, in older versions of CPython the time to compute x&y was proportional to the number of bits in max(x, y) (instead of min(x, y)). Jan 7 '06 #8

 P: n/a Claudio Grondi wrote: You can get somewhat faster in Python than your code if you avoid producing new long objects all the time, and do the task in chunks of 30 bits. It would be nice if you could explain why you consider chunks of 30 bits to be superior e.g. to chunks of 32 bits? With chunks of 32 bits, you might get negative numbers. Then, right-shifting bit-by-bit will never get you to zero. It also happens that the long ints are represented as arrays of 15-bit values, so that the shift by 30 *could* be implemented without shifts in the individual words; however, this optimization is not done. Regards, Martin Jan 7 '06 #11

 P: n/a Claudio Grondi wrote: ... What I am also looking for is a conversion to base 256 (i.e where the full byte is used and the string and the integer have the same actual content if on appropriate endian machine), which would make the bit gmpy supplies that, too: gmpy.binary(x) or x.binary(): returns a portable binary representation (base-256 little endian) of an mpz object, suitable for saving into a file (or db, whatever) -- this string can later be passed as the first argument to function gmpy.mpz (with a second argument with value 256) to reconstruct a copy of the original mpz object. extraction comparable easy and effective as the i.__hex__() based method. I suspect bit-extraction is still going to be faster with getbit and friends, but I'm sure you can measure that for yourself (I'm stuck with a loaner machine these days, and don't currently have gmpy at hand). Alex Jan 7 '06 #12 