Conversion of 24bit binary to int

Idar

Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

The orginal data format is stored in blocks of 512 words
(1536B=3Bytes/word) on the form Ch1: 1536B (3B*512), the binary (hex) data
is big endian
Ch2: 1536B (3B*512)
Ch3: 1536B (3B*512)
and so on

The equivalent c++ program looks like this:
for(i=0;i<nchn;i++)
{
for(k=0;k<segl;k++)
{
ar24[k]=0;//output array=32 bit int array->Mt24 fmt
pdt=(unsigned char *)(&ar24[k]);
*pdt =*(a+2);
*(pdt+1)=*(a+1);
*(pdt+2)=*(a+0);
a+=3;
ar24[k]-=DownloadDataOffset;
// printf("%d\n",ar24[k]);//this is the number on 32 bit format
}
}

Jul 18 '05 #1

Subscribe Post Reply

13915

Peter Hansen

Idar wrote:

Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

I think the question is unclear. You say you've seen struct.unpack.
So what then? Don't you think struct.unpack will work? What do you
mean you are unsure how and what Python has to offer? The documentation
which is on the web site clearly explains how and what struct.unpack
has to offer...

Please clarify.

-Peter

Jul 18 '05 #2

Mike C. Fletcher

If I'm understanding correctly, hex has nothing to do with this and the
data is really binary, so what you're looking for is probably:

data = '\000\001\002'
temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned big-endian integer format print temp # is now a regular python integer (in a tuple) (258L,) print repr(struct.pack( '<I', *temp )) # encode in 4-byte unsigned
little-endian integer format
'\x02\x01\x00\x00'

There are faster ways if you have a lot of such data (e.g. PIL would
likely have something to manipulate RGB to RGBA images), similarly, you
could use Numpy to add large numbers of rows simultaneously (all 512 if
I understand your description of the data correctly). Without knowing
what type of data is being loaded it's hard to give a better recommendation.

HTH,
Mike
Idar wrote:
Is there an effecient/fast way in python to convert binary data from
file (24bit hex(int) big endian) to 32bit int (little endian)? Have
seen struct.unpack, but I am unsure how and what Python has to offer.
Idar

....
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #3

Patrick Maupin

Idar wrote:

Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

As Peter mentions, you haven't _really_ given enough information
about what you need, but here is some code which will do what
I _think_ you said you want...

This code assumes that you have a string (named teststr here)
in the source format you describe. You can get a string
like this in several ways, e.g. by reading from a file object.

This code then swaps every 3 characters and inserts a null
byte between every group of three characters.

The result is in a list, which can easily be converted back
to a string by ''.join() as shown in the test printout.

I would expect that either the array module or Numpy would
work faster with _exactly_ the same technique, but I'm
not bored enough to check that out right now.

If this isn't fast enough after using array or NumPy (or
after Alex, Tim, et al. get through with it), I would
highly recommend Pyrex -- you can do exactly the same
sorts of coercions you were doing in your C++ code.
teststr = ''.join([chr(i) for i in range(128,128+20*3)])

result = len(teststr) * 4 // 3 * [chr(0)]
for x in range(3):
result[2-x::4] = teststr[x::3]

print repr(''.join(result))
Regards,
Pat

Jul 18 '05 #4

Idar

On Tue, 11 Nov 2003 10:11:05 -0500, Peter Hansen <pe***@engcorp.com> wrote:

Idar wrote:

Is there an effecient/fast way in python to convert binary data from
file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar
I think the question is unclear. You say you've seen struct.unpack.
So what then? Don't you think struct.unpack will work? What do you
mean you are unsure how and what Python has to offer? The documentation
which is on the web site clearly explains how and what struct.unpack
has to offer...

It is due to slack reading........

The doc says "Standard size and alignment are as follows: no alignment is
required for any type (so you have to use pad bytes)................"

It was unclear (at the time of reading) in the sence that I didn't see the
above text + there was no example on how to handle odd-byte/padding
conversion and the test program crashed!

But if you know how to convert this format (the file is about 6MB)
effeciently, pls do give me a hint. The data is stored binary with the
format:
Ch1: 1536B (512*3B)
...
Ch6 1536B (512*3B)
Then it is repeated again until end:
Ch1 1536B (512*3B)
...
Ch6 1536B (512*3B)

Please clarify.

-Peter

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 18 '05 #5

Idar

On Tue, 11 Nov 2003 10:21:29 -0500, Mike C. Fletcher <mc******@rogers.com>
wrote:

If I'm understanding correctly, hex has nothing to do with this and the
data is really binary, so what you're looking for is probably:
Thanks for the hint!! and sorry - i ment binary!
data = '\000\001\002'
temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned big-endian integer format print temp # is now a regular python integer (in a tuple) (258L,) print repr(struct.pack( '<I', *temp )) # encode in 4-byte unsigned
little-endian integer format
'\x02\x01\x00\x00'

There are faster ways if you have a lot of such data (e.g. PIL would
likely have something to manipulate RGB to RGBA images), similarly, you
could use Numpy to add large numbers of rows simultaneously (all 512 if I
understand your description of the data correctly). Without knowing what
type of data is being loaded it's hard to give a better recommendation.

It is binary with no formating characters to indicate start/end of each
block (fixed size).
A file is about 6MB (and about 300 of them again...),
Ch1: 1536B (512*3B) - the 3B are big endian (int)
...
Ch6: 1536B (512*3B)
And then it is repeated till the end:
Ch1: 1536B (512*3B)
...
Ch6: 1536B (512*3B)

ciao, idar

HTH,
Mike
Idar wrote:
Is there an effecient/fast way in python to convert binary data from
file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

... _______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 18 '05 #6

Idar

Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
...
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
...
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)
Idar

This code assumes that you have a string (named teststr here)
in the source format you describe. You can get a string
like this in several ways, e.g. by reading from a file object.

This code then swaps every 3 characters and inserts a null
byte between every group of three characters.

The result is in a list, which can easily be converted back
to a string by ''.join() as shown in the test printout.

I would expect that either the array module or Numpy would
work faster with _exactly_ the same technique, but I'm
not bored enough to check that out right now.

If this isn't fast enough after using array or NumPy (or
after Alex, Tim, et al. get through with it), I would
highly recommend Pyrex -- you can do exactly the same
sorts of coercions you were doing in your C++ code.
teststr = ''.join([chr(i) for i in range(128,128+20*3)])

result = len(teststr) * 4 // 3 * [chr(0)]
for x in range(3):
result[2-x::4] = teststr[x::3]

print repr(''.join(result))
Regards,
Pat

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 18 '05 #7

Alex Martelli

Idar wrote:

Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
..
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
..
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)

So, you don't really need to convert binary to int or anything, just
shuffle bytes around, right? Your file starts with (e.g.), using a
letter for each arbitrary binary byte:

A B C D E F G H I ...

and you want to output the bytes

C B A 0 F E D 0 I H G 0 ...

I.e, swap 3 bytes, insert a 0 byte for padding, and proceed (for all
Ch1, which is spread out in the original file -- then for all Ch2, and
so on). Each file fits comfortably in memory (3MB for input, becoming
4MB for output due to the padding). You can use two instances of
array.array('B'), with .read for input and .write for output (just
remember .read _appends_ to the array, so make a new empty one for
each file you're processing -- the _output_ array you can reuse).

It's LOTS of indexing and single-byte moving, so I doubt the Python
native performance will be great. Still, once you've implemented and
checked it out you can use psyco or pyrex to optimize it, if needed.

The primitive you need is typically "copy with swapping and padding
a block of 1536 input bytes [starting from index SI] to a block of
2048 output bytes" [starting from index SO -- the 0 bytes in the
output you'll leave untouched after at first preparing the output
array with OA = array.array('B', Y*2048*6*'\0') of course].
That's just (using predefined ranges for speed, no need to remake
them every time):

r512 = xrange(512)

def doblock(SI, SO, IA, OA, r512=r512):
ii = SI
io = SO
for i in r512:
OA[io:io+3] = IA[ii+2:ii-1:-1]
ii += 3
io += 4

so basically it only remains to compute SI and SO appropriately
and loop ditto calling this primitive (or some speeded-up version
thereof) 6*Y times for all the blocks in the various channels.
Alex

Jul 18 '05 #8

Patrick Maupin

Alex Martelli wrote:

r512 = xrange(512)

def doblock(SI, SO, IA, OA, r512=r512):
ii = SI
io = SO
for i in r512:
OA[io:io+3] = IA[ii+2:ii-1:-1]
ii += 3
io += 4

It's my guess this would be faster using array.array
in combination with extended slicing, as per the list
example I gave in a previous message, even though I'm
still not bored enough to time it :) (The for loop
in my previous example only requires 3 interations,
rather than 512 as in this example.)

Pat

Jul 18 '05 #9

Patrick Maupin

Idar wrote:

Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
..
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
..
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)
Idar

OK, now that I have a beer and a specification, here is some code
which (I think) should do what (I think) you are asking for.
On my Athlon 2200+ (marketing number) computer, with the source
file cached by the OS, it operates at around 10 source megabytes/second.

(That should be about 3 minutes plus actual file I/O operations
for the 300 6MB files you describe.)

Verifying that it actually produces the data you expect is up to you :)

Regards,
Pat
import array

def mungeio(srcfile,dstfile, numchannels=6, blocksize=512):
"""
This function converts 24 bit RGB into 32 bit BGR0,
and simultaneously de-interleaves video from multiple
sources. The parameters are:

srcfile -- an file object opened with 'rb'
(or similar object)
dstfile -- a file object opened with 'wb'
(or similar object)
numchannels -- the number of interleaved video channels
blocksize -- the number of pixels per channel on
each interleaved block (interleave factor)

This function reads all the data from srcfile and writes
it to dstfile. It is up to the caller to close both files.

The function asserts that the amount of data to be read
from the source file is an integral multiple of
blocksize*numchannels*3.

This function assumes that multiple copies of the data
will easily fit into RAM, as the target file size is
6MB for the source files and 8MB for the destination
files. If this is not a good assumption, it should
be rearchitected to output to one file per channel,
and then stitch the output files together at the end.
"""

srcblocksize = blocksize * 3
dstblocksize = blocksize * 4

def mungeblock(src,dstarray=array.array('B',dstblocksi ze*[0])):
"""
This function accepts a string representing a single
source block, and returns a string representing a
single destination block.
"""
srcarray = array.array('B',src)
for i in range(3):
dstarray[2-i::4] = srcarray[i::3]
return dstarray.tostring()

channellist = [[] for i in range(numchannels)]

while 1:
for channel in channellist:
data = srcfile.read(srcblocksize)
if len(data) != srcblocksize:
break
channel.append(mungeblock(data))
else:
continue # (with while statement)
break # Propagate break from 'for' out of 'while'

# Check that input file length is valid (no leftovers),
# and then write the result.

assert channel is channellist[0] and not len(data)
dstfile.write(''.join(sum(channellist,[])))
def mungefile(srcname,dstname):
"""
Actual I/O done in a separate function so it can
be more easily unit-tested.
"""
srcfile = open(srcname,'rb')
dstfile = open(dstname,'wb')
mungeio(srcfile,dstfile)
srcfile.close()
dstfile.close()

Jul 18 '05 #10

Patrick Maupin

I just realized that, according to your spec, it ought to be possible
to do the rgb -> bgr0 conversion on the entire file all at one go
(no nasty headers or block headers to get in the way:)

So I wrote a somewhat more comprehensible (for one thing, it gets rid
of that nasty sum() everybody's been complaining about :), somewhat more
memory-intensive version of the program. On my machine it executes at
approximately the same speed as the original one I wrote (10 source
megabytes/second), but this one might be more amenable to profiling
and further point optimizations if necessary.

The barebones (no comments or error-checking) functions are below.

Pat
import array

def RgbToBgr0(srcstring):
srcarray = array.array('B',srcstring)
dstarray = array.array('B',len(srcstring) * 4 // 3 * chr(0))
for i in range(3):
dstarray[2-i::4] = srcarray[i::3]
return dstarray.tostring()

def deinterleave(srcstring,numchannels=6,pixelsperbloc k=512):
bytesperblock = pixelsperblock*4
totalblocks = len(srcstring) // bytesperblock
blocknums = []
for i in range(numchannels):
blocknums.extend(range(i,totalblocks,numchannels))
return ''.join([srcstring[i*bytesperblock:(i+1)*bytesperblock]
for i in blocknums])

def mungefile(srcname,dstname):
srcfile = open(srcname,'rb')
dstfile = open(dstname,'wb')
dstfile.write(deinterleave(RgbToBgr0(srcfile.read( ))))
srcfile.close()
dstfile.close()

Jul 18 '05 #11

Christos TZOTZIOY Georgiou

On Wed, 12 Nov 2003 10:53:17 +0100, rumours say that Idar
<ip@itk.ntnu.no> might have written:

But if you know how to convert this format (the file is about 6MB)
effeciently, pls do give me a hint. The data is stored binary with the
format:
Ch1: 1536B (512*3B)
..
Ch6 1536B (512*3B)
Then it is repeated again until end:
Ch1 1536B (512*3B)
..
Ch6 1536B (512*3B)

So it's some audio file with 6 channels, right? (I missed the first
post)

I would take every chunk of 512*3 bytes, and for every 3 bytes,
struct.unpack('i', _3_bytes+'\0')[0] is the 32bit value (assuming
Intel's little endianness).

Hope this helps (no, really :)
--
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix

Jul 18 '05 #12

Similar topics

host variables and code page conversion

by: Aakash Bordia | last post by:

Hello, Does anybody know what is the documented and known behavior of inserting/updating binary columns using host variables from a client to a server which have different code pages? Will any...

Microsoft SQL Server

qbasic to vb.net conversion help. please.

by: john | last post by:

Here is the short story of what i'm trying to do. I have a 4 sided case labeling printer setting out on one of our production lines. Now then i have a vb.net application that sends data to this...

.NET Framework

Explicit unsigned/signed conversion: ANSI/ISO rules?

by: Ken Tough | last post by:

Seems like a simple thing to find out, but I'm struggling. I have googled, but everything I find is about implicit conversion, not explicit. Is this implementation-specific, or does ANSI/ISO...

C / C++

critique: conversion to binary

by: bowsayge | last post by:

Inspired by fb, Bowsayge decided to write a decimal integer to binary string converter. Perhaps some of the experienced C programmers here can critique it. It allocates probably way too much...

C / C++

integral promotion, arithmetic conversion, value preserving, unsigned preserving???

by: TTroy | last post by:

Hello, I'm relatively new to C and have gone through more than 4 books on it. None mentioned anything about integral promotion, arithmetic conversion, value preserving and unsigned preserving. ...

C / C++

Bitmap PixelFormat Conversion

by: Flix | last post by:

Is there some way to convert a Bitmap from one PixelFormat (16bit or with indexed colors) to another(24bit), without doing per pixel operations?

C# / C Sharp

How to convert arbitrary objects directly to base64 without initial string conversion?

by: Russell Warren | last post by:

I've got a case where I want to convert binary blocks of data (various ctypes objects) to base64 strings. The conversion calls in the base64 module expect strings as input, so right now I'm...

Python

conversion to binary in printf()

by: David Marsh | last post by:

I accidentally typed %b instead of %d in a printf format string and got a binary representation of the number. Is that standard C or a compiler extension?

C / C++

Hex to Binary Conversion

by: dondigitech | last post by:

I want to convert hex to binary without losing bits. I want to preserve the 8-bits because I ultimately need a 24-bit string to grab information from. I am just using this line of code for the...

C# / C Sharp

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General