By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,215 Members | 1,952 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,215 IT Pros & Developers. It's quick & easy.

How to Read Bytes from a file

P: n/a
It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

Thanks in advance for any help.

-Greg

Mar 1 '07 #1
Share this Question
Share on Google+
18 Replies


P: n/a
gr********@gmail.com <gr********@gmail.comwrote:
It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)
You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)

and then sum up the values of m[abyte] into a running total (break from
the loop when 'not abyte', i.e. you're reading 0 bytes even though
asking for 1 -- that tells you the fine is finished, remember to close
it).

A trivial way to do the countones function:

def countones(x):
assert x>=0
c = 0
while x:
c += (x&1)
x >>= 1
return c

you just don't want to call it too often, whence the previous advice to
call it just 256 times to prep a mapping.

If you download and install gmpy you can use gmpy.popcount as a fast
implementation of countones:-).
Alex
Mar 1 '07 #2

P: n/a
Alex Martelli wrote:
You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)
Wouldn't a list be more efficient?

m = [countones(c) for c in xrange(256)]
Mar 1 '07 #3

P: n/a
On Mar 1, 7:52 am, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)
import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.

Mar 1 '07 #4

P: n/a
Bart Ogryczak kirjoitti:
On Mar 1, 7:52 am, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
>It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.
I guess struct.unpack is not necessary, because:

byteOnes2 = map(count1, (ord(ch) for ch in buf))

seems to do the trick also.

Cheers,
Jussi
Mar 1 '07 #5

P: n/a
Leif K-Brooks <eu*****@ecritters.bizwrote:
Alex Martelli wrote:
You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)

Wouldn't a list be more efficient?

m = [countones(c) for c in xrange(256)]
Yes, or an array.array -- actually I meant to use m[chr(c)] above (so
you could use the character you're reading directly to index m, rather
than calling ord(byte) a bazillion times for each byte you're reading),
but if you're using the numbers (as I did before) a list or array is
better.
Alex
Mar 1 '07 #6

P: n/a
On Mar 1, 8:53 am, "Bart Ogryczak" <B.Ogryc...@gmail.comwrote:
On Mar 1, 7:52 am, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
It seems like this would be easy but I'm drawing a blank.
What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.
I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.
f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.

This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes (will this work on any
system?)

How does count1 work exactly?

Thanks for the help.

-Greg

Mar 1 '07 #7

P: n/a
On Mar 2, 12:53 am, "Bart Ogryczak" <B.Ogryc...@gmail.comwrote:
>
import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))
byteOnes = map(count1,struct.unpack('%dB'%len(buf),buf))

Mar 1 '07 #8

P: n/a
On Mar 1, 4:58 pm, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
On Mar 1, 8:53 am, "Bart Ogryczak" <B.Ogryc...@gmail.comwrote:
On Mar 1, 7:52 am, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
It seems like this would be easy but I'm drawing a blank.
What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.
I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.
f1=file('somefile','rb')
while 1:
abyte=f1.read(1)
import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))
byteOnes[n] is number is number of ones in byte n.

This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes

unpack with 'B' format gives you int value equivalent to unsigned char
(1 byte).
(will this work on any system?)
Any system with 8-bit bytes, which would mean any system made after
1965. I'm not aware of any Python implementation for UNIVAC, so I
wouldn't worry ;-)
How does count1 work exactly?
1,2,4,8,16,32,64,128 in binary are
1,10,100,1000,10000,100000,1000000,10000000
x&1 == 1 if x has first bit set to 1
x&2 == 2, so (x&2>0) == True if x has second bit set to 1
.... and so on.
In the context of int, True is interpreted as 1, False as 0.

Mar 1 '07 #9

P: n/a
On Mar 1, 12:46 pm, "Bart Ogryczak" <B.Ogryc...@gmail.comwrote:
This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes

unpack with 'B' format gives you int value equivalent to unsigned char
(1 byte).
(will this work on any system?)

Any system with 8-bit bytes, which would mean any system made after
1965. I'm not aware of any Python implementation for UNIVAC, so I
wouldn't worry ;-)
How does count1 work exactly?

1,2,4,8,16,32,64,128 in binary are
1,10,100,1000,10000,100000,1000000,10000000
x&1 == 1 if x has first bit set to 1
x&2 == 2, so (x&2>0) == True if x has second bit set to 1
... and so on.
In the context of int, True is interpreted as 1, False as 0.
Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

Thanks again everyone for the help.

-Greg

Mar 1 '07 #10

P: n/a
<gr********@gmail.comwrote:
Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?
0 to 255 inclusive, actually - that is 256 numbers...

The largest number representable in a byte is 255

eight bits, of value 128,64,32,16,8,4,2,1

Their sum is 255...

And then there is zero.

- Hendrik

Mar 2 '07 #11

P: n/a
On Mar 1, 7:36 pm, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
On Mar 1, 12:46 pm, "Bart Ogryczak" <B.Ogryc...@gmail.comwrote:
This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes
unpack with 'B' format gives you int value equivalent to unsigned char
(1 byte).
(will this work on any system?)
Any system with 8-bit bytes, which would mean any system made after
1965. I'm not aware of any Python implementation for UNIVAC, so I
wouldn't worry ;-)
How does count1 work exactly?
1,2,4,8,16,32,64,128 in binary are
1,10,100,1000,10000,100000,1000000,10000000
x&1 == 1 if x has first bit set to 1
x&2 == 2, so (x&2>0) == True if x has second bit set to 1
... and so on.
In the context of int, True is interpreted as 1, False as 0.

Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?
0-255 actually. It'd be worth it, if accessing dictionary with
precomputed values would be significantly faster then calculating the
lambda, which I doubt. I suspect it actually might be slower.
Mar 2 '07 #12

P: n/a
>>>>"Bart Ogryczak" <B.********@gmail.com(BO) wrote:
>BOAny system with 8-bit bytes, which would mean any system made after
BO1965. I'm not aware of any Python implementation for UNIVAC, so I
BOwouldn't worry ;-)
1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had such
a machine, but Python came later.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org
Mar 5 '07 #13

P: n/a
On Mar 5, 10:51 am, Piet van Oostrum <p...@cs.uu.nlwrote:
>>>"Bart Ogryczak" <B.Ogryc...@gmail.com(BO) wrote:
BOAny system with 8-bit bytes, which would mean any system made after
BO1965. I'm not aware of any Python implementation for UNIVAC, so I
BOwouldn't worry ;-)

1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had such
a machine, but Python came later.
Right, I should have written 'designed' not 'made'. UNIVACs also have
been produced until early 1980s. Anyway, I'd call it
paleoinformatics ;-)

Mar 5 '07 #14

P: n/a
En Fri, 02 Mar 2007 08:22:36 -0300, Bart Ogryczak <B.********@gmail.com>
escribió:
On Mar 1, 7:36 pm, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
>Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

0-255 actually. It'd be worth it, if accessing dictionary with
precomputed values would be significantly faster then calculating the
lambda, which I doubt. I suspect it actually might be slower.
Dictionary access is highly optimized in Python. In fact, using a
precomputed dictionary is about 12 times faster:

pyimport timeit
pycount1 = lambda x:
(x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+(x &64>0)+
(x&128>0)
pyd256 = dict((i, count1(i)) for i in range(256))
pytimeit.Timer("for x in range(256): w = d256[x]", "from __main__ import
d256"
).repeat(number=10000)
[0.54261253874445003, 0.54763468541393934, 0.54499943428564279]
pytimeit.Timer("for x in range(256): w = count1(x)", "from __main__
import cou
nt1").repeat(number=10000)
[6.1867963665773118, 6.1967124313285638, 6.1666287195719178]

--
Gabriel Genellina

Mar 6 '07 #15

P: n/a
"Piet van Oostrum" <p...@cs.uu.nlwrote:
>>>"Bart Ogryczak" <B.********@gmail.com(BO) wrote:
BOAny system with 8-bit bytes, which would mean any system made after
BO1965. I'm not aware of any Python implementation for UNIVAC, so I
BOwouldn't worry ;-)

1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had
such
a machine, but Python came later.
Those behemoths were EXPENSIVE - so it made a lot of sense to keep using
them until the point that it became obvious even to an accountant that the
maintenance cost was no longer worth it...

Would actually not surprise me if there were still a few around, doing
electricity
accounts or something.

- Hendrik

Mar 6 '07 #16

P: n/a
"Bart Ogryczak" <B.Ogr...@gmail.comwrote:

On Mar 5, 10:51 am, Piet van Oostrum <p...@cs.uu.nlwrote:
>>>>"Bart Ogryczak" <B.Ogryc...@gmail.com(BO) wrote:
>BOAny system with 8-bit bytes, which would mean any system made after
>BO1965. I'm not aware of any Python implementation for UNIVAC, so I
>BOwouldn't worry ;-)
1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had
such
a machine, but Python came later.

Right, I should have written 'designed' not 'made'. UNIVACs also have
been produced until early 1980s. Anyway, I'd call it
paleoinformatics ;-)
The correct term is: "Data Processing", or DP for short.

- Hendrik

Mar 6 '07 #17

P: n/a
"Gabriel Genellina" <ga*******@yahoo.com.arwrites:
En Fri, 02 Mar 2007 08:22:36 -0300, Bart Ogryczak
<B.********@gmail.comescribió:
>On Mar 1, 7:36 pm, "gregpin...@gmail.com" <gregpin...@gmail.com>
wrote:
>>Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

0-255 actually. It'd be worth it, if accessing dictionary with
precomputed values would be significantly faster then calculating the
lambda, which I doubt. I suspect it actually might be slower.

Dictionary access is highly optimized in Python. In fact, using a
precomputed dictionary is about 12 times faster:
Why using a dictionary and not a list?

Matthias
Mar 6 '07 #18

P: n/a
En Tue, 06 Mar 2007 17:07:45 -0300, Matthias Julius <jn***@julius-net.net>
escribió:
"Gabriel Genellina" <ga*******@yahoo.com.arwrites:
>Dictionary access is highly optimized in Python. In fact, using a
precomputed dictionary is about 12 times faster:

Why using a dictionary and not a list?
Because a dictionary is slower? :)
Using a similar test as previously posted, list access is about 15% faster
than dictionary access for such small range.

--
Gabriel Genellina

Mar 7 '07 #19

This discussion thread is closed

Replies have been disabled for this discussion.