444,246 Members | 1,451 Online
Need help? Post your question and get tips & solutions from a community of 444,246 IT Pros & Developers. It's quick & easy.

# Conversion of 24bit binary to int

 P: n/a Is there an effecient/fast way in python to convert binary data from file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen struct.unpack, but I am unsure how and what Python has to offer. Idar The orginal data format is stored in blocks of 512 words (1536B=3Bytes/word) on the form Ch1: 1536B (3B*512), the binary (hex) data is big endian Ch2: 1536B (3B*512) Ch3: 1536B (3B*512) and so on The equivalent c++ program looks like this: for(i=0;iMt24 fmt pdt=(unsigned char *)(&ar24[k]); *pdt =*(a+2); *(pdt+1)=*(a+1); *(pdt+2)=*(a+0); a+=3; ar24[k]-=DownloadDataOffset; // printf("%d\n",ar24[k]);//this is the number on 32 bit format } } Jul 18 '05 #1
11 Replies

 P: n/a Idar wrote: Is there an effecient/fast way in python to convert binary data from file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen struct.unpack, but I am unsure how and what Python has to offer. Idar I think the question is unclear. You say you've seen struct.unpack. So what then? Don't you think struct.unpack will work? What do you mean you are unsure how and what Python has to offer? The documentation which is on the web site clearly explains how and what struct.unpack has to offer... Please clarify. -Peter Jul 18 '05 #2

 P: n/a If I'm understanding correctly, hex has nothing to do with this and the data is really binary, so what you're looking for is probably: data = '\000\001\002' temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned big-endian integer format print temp # is now a regular python integer (in a tuple) (258L,) print repr(struct.pack( '

 P: n/a Idar wrote: Is there an effecient/fast way in python to convert binary data from file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen struct.unpack, but I am unsure how and what Python has to offer. Idar As Peter mentions, you haven't _really_ given enough information about what you need, but here is some code which will do what I _think_ you said you want... This code assumes that you have a string (named teststr here) in the source format you describe. You can get a string like this in several ways, e.g. by reading from a file object. This code then swaps every 3 characters and inserts a null byte between every group of three characters. The result is in a list, which can easily be converted back to a string by ''.join() as shown in the test printout. I would expect that either the array module or Numpy would work faster with _exactly_ the same technique, but I'm not bored enough to check that out right now. If this isn't fast enough after using array or NumPy (or after Alex, Tim, et al. get through with it), I would highly recommend Pyrex -- you can do exactly the same sorts of coercions you were doing in your C++ code. teststr = ''.join([chr(i) for i in range(128,128+20*3)]) result = len(teststr) * 4 // 3 * [chr(0)] for x in range(3): result[2-x::4] = teststr[x::3] print repr(''.join(result)) Regards, Pat Jul 18 '05 #4

 P: n/a On Tue, 11 Nov 2003 10:11:05 -0500, Peter Hansen wrote: Idar wrote: Is there an effecient/fast way in python to convert binary data from file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen struct.unpack, but I am unsure how and what Python has to offer. Idar I think the question is unclear. You say you've seen struct.unpack. So what then? Don't you think struct.unpack will work? What do you mean you are unsure how and what Python has to offer? The documentation which is on the web site clearly explains how and what struct.unpack has to offer... It is due to slack reading........ The doc says "Standard size and alignment are as follows: no alignment is required for any type (so you have to use pad bytes)................" It was unclear (at the time of reading) in the sence that I didn't see the above text + there was no example on how to handle odd-byte/padding conversion and the test program crashed! But if you know how to convert this format (the file is about 6MB) effeciently, pls do give me a hint. The data is stored binary with the format: Ch1: 1536B (512*3B) ... Ch6 1536B (512*3B) Then it is repeated again until end: Ch1 1536B (512*3B) ... Ch6 1536B (512*3B) Please clarify. -Peter -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ Jul 18 '05 #5

 P: n/a On Tue, 11 Nov 2003 10:21:29 -0500, Mike C. Fletcher wrote: If I'm understanding correctly, hex has nothing to do with this and the data is really binary, so what you're looking for is probably: Thanks for the hint!! and sorry - i ment binary! data = '\000\001\002' temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned big-endian integer format print temp # is now a regular python integer (in a tuple) (258L,) print repr(struct.pack( '

 P: n/a Thanks for the example! The format is binary with no formating characters to indicate start/end of each block (fixed size). A file is about 6MB (and about 300 of them again...), so Ch1: 1536B (512*3B) - the 3B are big endian (int) ... Ch6: 1536B (512*3B) And then it is repeated till the end (say Y sets of Ch1 (the same for Ch2,3,4,5,6)): Ch1,Y: 1536B (512*3B) ... Ch6,Y: 1536B (512*3B) And idealy I would like to convert it to this format: Ch1: Y*512*4B (normal int with little endian) Ch2 Ch3 Ch4 Ch5 Ch6 And that is the end :) Idar This code assumes that you have a string (named teststr here) in the source format you describe. You can get a string like this in several ways, e.g. by reading from a file object. This code then swaps every 3 characters and inserts a null byte between every group of three characters. The result is in a list, which can easily be converted back to a string by ''.join() as shown in the test printout. I would expect that either the array module or Numpy would work faster with _exactly_ the same technique, but I'm not bored enough to check that out right now. If this isn't fast enough after using array or NumPy (or after Alex, Tim, et al. get through with it), I would highly recommend Pyrex -- you can do exactly the same sorts of coercions you were doing in your C++ code. teststr = ''.join([chr(i) for i in range(128,128+20*3)]) result = len(teststr) * 4 // 3 * [chr(0)] for x in range(3): result[2-x::4] = teststr[x::3] print repr(''.join(result)) Regards, Pat -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ Jul 18 '05 #7

 P: n/a Idar wrote: Thanks for the example! The format is binary with no formating characters to indicate start/end of each block (fixed size). A file is about 6MB (and about 300 of them again...), so Ch1: 1536B (512*3B) - the 3B are big endian (int) .. Ch6: 1536B (512*3B) And then it is repeated till the end (say Y sets of Ch1 (the same for Ch2,3,4,5,6)): Ch1,Y: 1536B (512*3B) .. Ch6,Y: 1536B (512*3B) And idealy I would like to convert it to this format: Ch1: Y*512*4B (normal int with little endian) Ch2 Ch3 Ch4 Ch5 Ch6 And that is the end :) So, you don't really need to convert binary to int or anything, just shuffle bytes around, right? Your file starts with (e.g.), using a letter for each arbitrary binary byte: A B C D E F G H I ... and you want to output the bytes C B A 0 F E D 0 I H G 0 ... I.e, swap 3 bytes, insert a 0 byte for padding, and proceed (for all Ch1, which is spread out in the original file -- then for all Ch2, and so on). Each file fits comfortably in memory (3MB for input, becoming 4MB for output due to the padding). You can use two instances of array.array('B'), with .read for input and .write for output (just remember .read _appends_ to the array, so make a new empty one for each file you're processing -- the _output_ array you can reuse). It's LOTS of indexing and single-byte moving, so I doubt the Python native performance will be great. Still, once you've implemented and checked it out you can use psyco or pyrex to optimize it, if needed. The primitive you need is typically "copy with swapping and padding a block of 1536 input bytes [starting from index SI] to a block of 2048 output bytes" [starting from index SO -- the 0 bytes in the output you'll leave untouched after at first preparing the output array with OA = array.array('B', Y*2048*6*'\0') of course]. That's just (using predefined ranges for speed, no need to remake them every time): r512 = xrange(512) def doblock(SI, SO, IA, OA, r512=r512): ii = SI io = SO for i in r512: OA[io:io+3] = IA[ii+2:ii-1:-1] ii += 3 io += 4 so basically it only remains to compute SI and SO appropriately and loop ditto calling this primitive (or some speeded-up version thereof) 6*Y times for all the blocks in the various channels. Alex Jul 18 '05 #8

 P: n/a Alex Martelli wrote: r512 = xrange(512) def doblock(SI, SO, IA, OA, r512=r512): ii = SI io = SO for i in r512: OA[io:io+3] = IA[ii+2:ii-1:-1] ii += 3 io += 4 It's my guess this would be faster using array.array in combination with extended slicing, as per the list example I gave in a previous message, even though I'm still not bored enough to time it :) (The for loop in my previous example only requires 3 interations, rather than 512 as in this example.) Pat Jul 18 '05 #9

 P: n/a Idar wrote: Thanks for the example! The format is binary with no formating characters to indicate start/end of each block (fixed size). A file is about 6MB (and about 300 of them again...), so Ch1: 1536B (512*3B) - the 3B are big endian (int) .. Ch6: 1536B (512*3B) And then it is repeated till the end (say Y sets of Ch1 (the same for Ch2,3,4,5,6)): Ch1,Y: 1536B (512*3B) .. Ch6,Y: 1536B (512*3B) And idealy I would like to convert it to this format: Ch1: Y*512*4B (normal int with little endian) Ch2 Ch3 Ch4 Ch5 Ch6 And that is the end :) Idar OK, now that I have a beer and a specification, here is some code which (I think) should do what (I think) you are asking for. On my Athlon 2200+ (marketing number) computer, with the source file cached by the OS, it operates at around 10 source megabytes/second. (That should be about 3 minutes plus actual file I/O operations for the 300 6MB files you describe.) Verifying that it actually produces the data you expect is up to you :) Regards, Pat import array def mungeio(srcfile,dstfile, numchannels=6, blocksize=512): """ This function converts 24 bit RGB into 32 bit BGR0, and simultaneously de-interleaves video from multiple sources. The parameters are: srcfile -- an file object opened with 'rb' (or similar object) dstfile -- a file object opened with 'wb' (or similar object) numchannels -- the number of interleaved video channels blocksize -- the number of pixels per channel on each interleaved block (interleave factor) This function reads all the data from srcfile and writes it to dstfile. It is up to the caller to close both files. The function asserts that the amount of data to be read from the source file is an integral multiple of blocksize*numchannels*3. This function assumes that multiple copies of the data will easily fit into RAM, as the target file size is 6MB for the source files and 8MB for the destination files. If this is not a good assumption, it should be rearchitected to output to one file per channel, and then stitch the output files together at the end. """ srcblocksize = blocksize * 3 dstblocksize = blocksize * 4 def mungeblock(src,dstarray=array.array('B',dstblocksi ze*[0])): """ This function accepts a string representing a single source block, and returns a string representing a single destination block. """ srcarray = array.array('B',src) for i in range(3): dstarray[2-i::4] = srcarray[i::3] return dstarray.tostring() channellist = [[] for i in range(numchannels)] while 1: for channel in channellist: data = srcfile.read(srcblocksize) if len(data) != srcblocksize: break channel.append(mungeblock(data)) else: continue # (with while statement) break # Propagate break from 'for' out of 'while' # Check that input file length is valid (no leftovers), # and then write the result. assert channel is channellist[0] and not len(data) dstfile.write(''.join(sum(channellist,[]))) def mungefile(srcname,dstname): """ Actual I/O done in a separate function so it can be more easily unit-tested. """ srcfile = open(srcname,'rb') dstfile = open(dstname,'wb') mungeio(srcfile,dstfile) srcfile.close() dstfile.close() Jul 18 '05 #10

 P: n/a I just realized that, according to your spec, it ought to be possible to do the rgb -> bgr0 conversion on the entire file all at one go (no nasty headers or block headers to get in the way:) So I wrote a somewhat more comprehensible (for one thing, it gets rid of that nasty sum() everybody's been complaining about :), somewhat more memory-intensive version of the program. On my machine it executes at approximately the same speed as the original one I wrote (10 source megabytes/second), but this one might be more amenable to profiling and further point optimizations if necessary. The barebones (no comments or error-checking) functions are below. Pat import array def RgbToBgr0(srcstring): srcarray = array.array('B',srcstring) dstarray = array.array('B',len(srcstring) * 4 // 3 * chr(0)) for i in range(3): dstarray[2-i::4] = srcarray[i::3] return dstarray.tostring() def deinterleave(srcstring,numchannels=6,pixelsperbloc k=512): bytesperblock = pixelsperblock*4 totalblocks = len(srcstring) // bytesperblock blocknums = [] for i in range(numchannels): blocknums.extend(range(i,totalblocks,numchannels)) return ''.join([srcstring[i*bytesperblock:(i+1)*bytesperblock] for i in blocknums]) def mungefile(srcname,dstname): srcfile = open(srcname,'rb') dstfile = open(dstname,'wb') dstfile.write(deinterleave(RgbToBgr0(srcfile.read( )))) srcfile.close() dstfile.close() Jul 18 '05 #11

 P: n/a On Wed, 12 Nov 2003 10:53:17 +0100, rumours say that Idar might have written: But if you know how to convert this format (the file is about 6MB)effeciently, pls do give me a hint. The data is stored binary with theformat:Ch1: 1536B (512*3B)..Ch6 1536B (512*3B)Then it is repeated again until end:Ch1 1536B (512*3B)..Ch6 1536B (512*3B) So it's some audio file with 6 channels, right? (I missed the first post) I would take every chunk of 512*3 bytes, and for every 3 bytes, struct.unpack('i', _3_bytes+'\0')[0] is the 32bit value (assuming Intel's little endianness). Hope this helps (no, really :) -- TZOTZIOY, I speak England very best, Ils sont fous ces Redmontains! --Harddix Jul 18 '05 #12

### This discussion thread is closed

Replies have been disabled for this discussion.