473,657 Members | 2,680 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

creating very small types

I was thinking to code the huffman algorithm and trying to compress
something with it, but I've got a problem.
How can I represent for example a char with only 3 bits??
I had a look to the compression modules but I can't understand them much...

Thank you very much
Any good link would be appreciated of course :)
Jul 19 '05 #1
7 1439
On Wed, 27 Apr 2005 22:17:07 +0200, andrea wrote:
I was thinking to code the huffman algorithm and trying to compress
something with it, but I've got a problem.
How can I represent for example a char with only 3 bits??
I had a look to the compression modules but I can't understand them much...

Thank you very much
Any good link would be appreciated of course :)


I think the answer to this question very much depends on why you want to
do this. Easy and fast are pretty opposed in this particular domain in
Python (IMHO anyways, it's an easy bit-bashing language but it's slow
for that), and you're in one of the rare domains where it matters.

The answers strongly vary if you're trying to create something performant,
or just for your learning purposes. Or homework purposes... ;-)

Jul 19 '05 #2
Jeremy Bowers wrote:
On Wed, 27 Apr 2005 22:17:07 +0200, andrea wrote:
I was thinking to code the huffman algorithm and trying to compress
something with it, but I've got a problem.
How can I represent for example a char with only 3 bits??
I had a look to the compression modules but I can't understand them much...

Thank you very much
Any good link would be appreciated of course :)


I think the answer to this question very much depends on why you want to
do this. Easy and fast are pretty opposed in this particular domain in
Python (IMHO anyways, it's an easy bit-bashing language but it's slow
for that), and you're in one of the rare domains where it matters.

The answers strongly vary if you're trying to create something performant,
or just for your learning purposes. Or homework purposes... ;-)

No it's not for homework but for learning purposes...
I understand I can't do it easily in python, but maybe I could define a
new type in C and use it to do those dirty works, what do you think?
Jul 19 '05 #3
andrea wrote:
I was thinking to code the huffman algorithm and trying to compress
something with it, but I've got a problem.
How can I represent for example a char with only 3 bits?? I had a look to the compression modules but I can't understand them much...
.... I understand I can't do it easily in python, but maybe I could define a
new type in C and use it to do those dirty works, what do you think?


Why do you need to create 'very small types'?

You only need actual bit-twiddling when you do the encoding/de-coding right?
If you create an encoding map 'codes' as a dict of strings of '1' and '0',
encoding might look like (untested):
def encode(stream):
outchar = count = 0
for char in stream:
for bit in codes[char]:
(outchar << 1) | (bit == "1")
count +=1
if count ==8:
yield chr(outchar)
outchar = count = 0
if count:
yield chr(outchar)
HTH
Michael

Jul 19 '05 #4
Michael Spencer wrote:
andrea wrote:
I was thinking to code the huffman algorithm and trying to compresssomething with it, but I've got a problem.
How can I represent for example a char with only 3 bits??I had a look to the compression modules but I can't understand
them much...
...
I understand I can't do it easily in python, but maybe I could
define a new type in C and use it to do those dirty works, what do you

think?
Why do you need to create 'very small types'?

You only need actual bit-twiddling when you do the encoding/de-coding right? If you create an encoding map 'codes' as a dict of strings of '1' and '0', encoding might look like (untested):

def encode(stream):
outchar = count = 0
for char in stream:
for bit in codes[char]:
(outchar << 1) | (bit == "1")
count +=1
if count ==8:
yield chr(outchar)
outchar = count = 0
if count:
yield chr(outchar)


I wrote some Huffman compression code a few years ago, with

class BitWriter(objec t):
# writes individual bits to an output stream
def __init__(self, outputStream):
self.__out = outputStream
self.__bitCount = 0 # number of unwritten bits
self.__currentB yte = 0 # buffer for unwritten bits
def write(self, bit):
self.__currentB yte = self.__currentB yte << 1 | bit
self.__bitCount += 1
if self.__bitCount == BYTE_SIZE:
self.__out.writ e(chr(self.__cu rrentByte))
self.__bitCount = 0
self.__currentB yte = 0
def flush(self):
while self.__bitCount > 0:
self.write(0)

class BitReader(objec t):
# reads individual bits from an input stream
def __init__(self, inputStream):
self.__in = inputStream
self.__bits = [] # buffer to hold incoming bits
def readBit(self):
if len(self.__bits ) == 0:
# read the next byte
b = ord(self.__in.r ead(1))
# unpack the bits
self.__bits = [(b & (1<<i)) != 0 for i in range(BYTE_SIZE-1,
-1, -1)]
return self.__bits.pop (0)

Jul 19 '05 #5
On Wed, 27 Apr 2005 15:54:54 -0700, Michael Spencer <ma**@telcopart ners.com> wrote:
andrea wrote:
I was thinking to code the huffman algorithm and trying to compress
something with it, but I've got a problem.
How can I represent for example a char with only 3 bits??I had a look to the compression modules but I can't understand them much...
...
I understand I can't do it easily in python, but maybe I could define a
new type in C and use it to do those dirty works, what do you think?


Why do you need to create 'very small types'?

You only need actual bit-twiddling when you do the encoding/de-coding right?
If you create an encoding map 'codes' as a dict of strings of '1' and '0',
encoding might look like (untested):
def encode(stream):
outchar = count = 0
for char in stream:
for bit in codes[char]:
(outchar << 1) | (bit == "1")
count +=1
if count ==8:
yield chr(outchar)
outchar = count = 0
if count:

outchar <<= 8-count # make last bit field big-endian like others? yield chr(outchar)

I think I prefer little-endian bit streams though, e.g. (just hacked, not tested beyond
what you see and not recommended as efficient either way, esp the decode, but it might
be illustrative of something for Andrea ;-)

----< trivcodec.py >------------------------------------------------
#trivcodec.py
def encode(stream):
outchar = count = 0
for char in stream:
#print '%r code: %r'%(char, codes[char])
bits, width = codes[char]
outchar |= (bits<<count)
count += width
while count >= 8:
yield chr(outchar%0xf f)
outchar >>= 8
count -= 8
if count:
yield chr(outchar)

def decode(stream):
codebuf = count = 0
width = 1
for codebyte in stream:
codebyte = ord(codebyte)
codebuf |= (codebyte<<coun t)
count += 8
if width>count:
continue
while width <= count:
code = (codebuf&((1<<w idth)-1), width)
if code not in chars:
width += 1
continue
yield chars[code]
codebuf >>= width
count -= width
width = 1
width = 1

#trivial encoding dict: a=0 b=01 C=0011 D=0111 E=1011 F=1111
codes = {'a':(0x00,1), 'b':(0x01,2),
'C':(0x3+0x4*0, 4), 'D':(0x3+0x4*1, 4),
'E':(0x3+0x4*2, 4), 'F':(0x3+0x4*3, 4)}
chars = dict([(v,k) for k,v in codes.items()]) # reverse

def test(*tests):
if not tests: tests = ['abaFECbaabbDF']
for charstream in tests:
print
codestream = ''.join(list(en code(charstream )))
print '%r [%s] -(encode)-> %r [%s]' % (
charstream, len(charstream) , codestream, len(codestream) )
recovered = ''.join(list(de code(codestream )))
print '%r [%s] -(decode)-> %r [%s]' % (
codestream, len(codestream) , charstream, len(charstream) )

if __name__ == '__main__':
import sys
test(*sys.argv[1:])
--------------------------------------------------------------------

Result:

[22:02] C:\pywk\clp>py2 4 trivcodec.py

'abaFECbaabbDF' [13] -(encode)-> '\xf2;Q\xf7' [4]
'\xf2;Q\xf7' [4] -(decode)-> 'abaFECbaabbDF' [13]

[22:02] C:\pywk\clp>py2 4 trivcodec.py a b C D E F

'a' [1] -(encode)-> '\x00' [1]
'\x00' [1] -(decode)-> 'a' [1]

'b' [1] -(encode)-> '\x01' [1]
'\x01' [1] -(decode)-> 'b' [1]

'C' [1] -(encode)-> '\x03' [1]
'\x03' [1] -(decode)-> 'C' [1]

'D' [1] -(encode)-> '\x07' [1]
'\x07' [1] -(decode)-> 'D' [1]

'E' [1] -(encode)-> '\x0b' [1]
'\x0b' [1] -(decode)-> 'E' [1]

'F' [1] -(encode)-> '\x0f' [1]
'\x0f' [1] -(decode)-> 'F' [1]

[22:02] C:\pywk\clp>py2 4 trivcodec.py

'abaFECbaabbDF' [13] -(encode)-> '\xf2;Q\xf7' [4]
'\xf2;Q\xf7' [4] -(decode)-> 'abaFECbaabbDF' [13]

[22:02] C:\pywk\clp>py2 4 trivcodec.py aaaaaaaa bbbb CC DD EE FF

'aaaaaaaa' [8] -(encode)-> '\x00' [1]
'\x00' [1] -(decode)-> 'aaaaaaaa' [8]

'bbbb' [4] -(encode)-> 'U' [1]
'U' [1] -(decode)-> 'bbbb' [4]

'CC' [2] -(encode)-> '3' [1]
'3' [1] -(decode)-> 'CC' [2]

'DD' [2] -(encode)-> 'w' [1]
'w' [1] -(decode)-> 'DD' [2]

'EE' [2] -(encode)-> '\xbb' [1]
'\xbb' [1] -(decode)-> 'EE' [2]

'FF' [2] -(encode)-> '\x00' [1]
'\x00' [1] -(decode)-> 'FF' [2]

[22:03] C:\pywk\clp>py2 4 trivcodec.py aaaaaaaabbbbCCD DEEFF

'aaaaaaaabbbbCC DDEEFF' [20] -(encode)-> '\x00U3w\xbb\x0 0' [6]
'\x00U3w\xbb\x0 0' [6] -(decode)-> 'aaaaaaaabbbbCC DDEEFF' [20]
Regards,
Bengt Richter
Jul 19 '05 #6
andrea wrote:
No it's not for homework but for learning purposes...
Bengt wrote: I think I prefer little-endian bit streams though,
good point: should lead to easier decoding via right-shifts

e.g. (just hacked, not tested beyond what you see
Yep, yours looks better. Pretty soon there isn't going to be much learning left
in this for Andrea ;-)

and not recommended as efficient either way, esp the decode,

I expect that decoding by walking the coding tree is cleaner (not sure about
faster).
but it might be illustrative of something for Andrea ;-)


Michael

Jul 19 '05 #7
On Thu, 28 Apr 2005 05:07:34 GMT, bo**@oz.net (Bengt Richter) wrote:

.... some not quite correct code ;-/
(I copy/pasted and created an illusion. My code dict has no EOS, so
I decode pad zero bits as code that a single zero stands for ('a' in this case)
so that was an oversight. I should have extended the coding some more or reserved
perhaps 'F' as EOString, and tested for EOS in the decode stream.
This is revised, sorry.
You can make a 'way more efficient decoder as an exercise ;-)
Hint: if you make a dict of all the 2**width integers as keys where
width is your widest code (4 here for 2**4), then you can do a single
mask of 2**width-1 and look up the translation to (char, codewidth) directly,
according to the codewidth least significant bits, if you make all the states
of the other bits into keys that map to the same (char,codewidth ).
So for example all the even values of out 2**16 would have to map to ('a', 1)
and all the values with 2 LSBs of 01 (of which there are 4) would map to ('b',2)
and so forth. Thus the incrementing of width and trying one thing after
another is not necessary. I think that will work ...

Well, now Andrea has something to do ;-)----< trivcodec.py >------------------------------------------------ #trivcodec.py
EOS = ''
import itertools
def encode(stream):
outchar = count = 0
for char in itertools.chain (stream, [EOS]):
bits, width = codes[char]
outchar |= (bits<<count)
count += width
while count >= 8:
yield chr(outchar&0xf f)
outchar >>= 8
count -= 8
if count:
yield chr(outchar)

def decode(stream):
codebuf = count = 0
width = 1; char = None
for codebyte in stream:
codebyte = ord(codebyte)
codebuf |= (codebyte<<coun t)
count += 8
if width>count:
continue
while width <= count:
code = (codebuf&((1<<w idth)-1), width)
if code not in chars:
width += 1
continue
char = chars[code]
if char == EOS: break
yield char
codebuf >>= width
count -= width
width = 1
if char == EOS: break
width = 1

#trivial encoding dict: a=0 b=01 C=0011 D=0111 E=1011 EOS=1111
codes = {'a':(0x00,1), 'b':(0x01, 2),
'C':(0x3+0x4*0, 4), 'D':(0x3+0x4*1, 4),
'E':(0x3+0x4*2, 4), EOS:(0x3+0x4*3, 4)}
chars = dict([(v,k) for k,v in codes.items()]) # reverse

def test(*tests):
if not tests: tests = ['abaDECbaabbD']
for charstream in tests:
print
codestream = ''.join(list(en code(charstream )))
print '%r [%s] -(encode)-> %r [%s]' % (
charstream, len(charstream) , codestream, len(codestream) )
recovered = ''.join(list(de code(codestream )))
print '%r [%s] -(decode)-> %r [%s]' % (
codestream, len(codestream) , recovered, len(recovered))

if __name__ == '__main__':
import sys
test(*sys.argv[1:])--------------------------------------------------------------------

Result: Not really. Tack enough a's on the recovered chars to account for zero fill
of the last encoded byte ;-/
[22:02] C:\pywk\clp>py2 4 trivcodec.py

'abaFECbaabbDF ' [13] -(encode)-> '\xf2;Q\xf7' [4]
'\xf2;Q\xf7' [4] -(decode)-> 'abaFECbaabbDF' [13]
Fine, because the [4] bytes were totally filled.
[22:02] C:\pywk\clp>py2 4 trivcodec.py a b C D E F

'a' [1] -(encode)-> '\x00' [1]
'\x00' [1] -(decode)-> 'a' [1] Not so fine. There is no character serving as EOS mark.
'b' [1] -(encode)-> '\x01' [1]
'\x01' [1] -(decode)-> 'b' [1]

'C' [1] -(encode)-> '\x03' [1]
'\x03' [1] -(decode)-> 'C' [1]

'D' [1] -(encode)-> '\x07' [1]
'\x07' [1] -(decode)-> 'D' [1]

'E' [1] -(encode)-> '\x0b' [1]
'\x0b' [1] -(decode)-> 'E' [1]

'F' [1] -(encode)-> '\x0f' [1]
'\x0f' [1] -(decode)-> 'F' [1]


That really produced:

[ 8:03] C:\pywk\clp>py2 4 trivcodec.py a b C D E F

'a' [1] -(encode)-> '\x00' [1]
'\x00' [1] -(decode)-> 'aaaaaaaa' [8]

'b' [1] -(encode)-> '\x01' [1]
'\x01' [1] -(decode)-> 'baaaaaa' [7]

'C' [1] -(encode)-> '\x03' [1]
'\x03' [1] -(decode)-> 'Caaaa' [5]

'D' [1] -(encode)-> '\x07' [1]
'\x07' [1] -(decode)-> 'Daaaa' [5]

'E' [1] -(encode)-> '\x0b' [1]
'\x0b' [1] -(decode)-> 'Eaaaa' [5]

'F' [1] -(encode)-> '\x0f' [1]
'\x0f' [1] -(decode)-> 'Faaaa' [5]

So we need to assign a code as the EOS.
And probably translate that to '' as the return char,
and end on that.

So now it does (having given up 1111 for EOS instead of F):

[10:22] C:\pywk\clp>py2 4 trivcodec.py

'abaDECbaabbD' [12] -(encode)-> 'r;Q\xf7' [4]
'r;Q\xf7' [4] -(decode)-> 'abaDECbaabbD' [12]

[10:22] C:\pywk\clp>py2 4 trivcodec.py a b C D E

'a' [1] -(encode)-> '\x1e' [1]
'\x1e' [1] -(decode)-> 'a' [1]

'b' [1] -(encode)-> '=' [1]
'=' [1] -(decode)-> 'b' [1]

'C' [1] -(encode)-> '\xf3' [1]
'\xf3' [1] -(decode)-> 'C' [1]

'D' [1] -(encode)-> '\xf7' [1]
'\xf7' [1] -(decode)-> 'D' [1]

'E' [1] -(encode)-> '\xfb' [1]
'\xfb' [1] -(decode)-> 'E' [1]

Goes to show you, eh? Slice and dice similar lines carefully ;-)

Regards,
Bengt Richter
Jul 19 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
561
by: Altramagnus | last post by:
I have 30 - 40 type of different window. For each type I need about 20 instances of the window. When I try to create them, I get "Error creating window handle" My guess is there is a maximum number of window handle, because if I reduce to about 2 instances of each window, it can run. But not 20 instances of each window. Does anyone know what the problem is? is it really because it exceeds the maximum number of window handle?
2
5964
by: Iain Miller | last post by:
Now this shouldn't be hard but I've been struggling on the best way as to how to do this one for a day or 3 so I thought I'd ask the assembled company..... I'm writing an application that tracks a group of Sales people, the customers they deal with and the business they transact with them. I've got my head around all the tables & some of the basic Query structures OK and am beginning to delve into creating the forms I need to be able...
8
5961
by: Nanda | last post by:
hi, I am trying to generate parameters for the updatecommand at runtime. this.oleDbDeleteCommand1.CommandText=cmdtext; this.oleDbDeleteCommand1.Connection =this.oleDbConnection1; this.oleDbDeleteCommand1.Parameters.Add(new System.Data.OleDb.OleDbParameter("Original_ApplicantName", dataset.Tables.Columns.DataType, 50,
12
3153
by: Mats Lycken | last post by:
Hi, I'm creating a CMS that I would like to be plug-in based with different plugins handling different kinds of content. What I really want is to be able to load/unload plugins on the fly without restarting the application. What I did was to create an AppDomain that loaded the plugins and everything was great, until I tried to pass something else that strings between the domains...
3
1584
by: John Devlon | last post by:
Hi, I would like to create a small application that creates a new PDF-File, using an excisting PDF-file as background template, adding some text and a unique barcode. I would like to create my own type of barcode and not use any excisting types. Can anyone recommend and libs, SDKs or COMs ? thanx
19
19250
by: Tony | last post by:
I'm working on project that plays movies using Windows Media Player and I'm controlling everything with JavaScript. Per the client I only need to support IE 6 or greater which happens to make things a bit easier. What I need to do is create a playlist and play it using JavaScript. I keep on getting close but not close enough to play the dang files. Has anyone done this before and can shed some light on what worked for them?
26
5354
by: nyathancha | last post by:
Hi, How Do I create an instance of a derived class from an instance of a base class, essentially wrapping up an existing base class with some additional functionality. The reason I need this is because I am not always able to control/create all the different constructors the base class has. My problem can be described in code as follows ... /* This is the base class with a whole heap of constructors/functionality*/ public class Animal
169
9053
by: JohnQ | last post by:
(The "C++ Grammer" thread in comp.lang.c++.moderated prompted this post). It would be more than a little bit nice if C++ was much "cleaner" (less complex) so that it wasn't a major world wide untaking to create a toolchain for it. Way back when, there used to be something called "Small C". I wonder if the creator(s) of that would want to embark on creating a nice little Small C++ compiler devoid of C++ language features that make...
11
7165
by: breal | last post by:
I have three lists... for instance a = ; b = ; c = ; I want to take those and end up with all of the combinations they create like the following lists
0
8407
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8319
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8739
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8612
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6175
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5638
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
2739
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1969
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1732
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.