Python Huffman encoding

dot

Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/

Your comments are highly appreciated!

cheers,

guyon

Jul 18 '05 #1

Subscribe Post Reply

5327

David Fraser

"@ dot wrote:

Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/
Your comments are highly appreciated!

Looks cool. Now if only I had seen this before I implemented my own
huffman decoding for JPEG ... I wonder if your module could be used
instead ...

Jul 18 '05 #2

Paul McGuire

"dot" <""gumuz(\"@)looze(dot)net"> wrote in message
news:41***********************@news.wanadoo.nl...

Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/
Your comments are highly appreciated!

cheers,

guyon

Guyon -

Great first step at some self-learning with a non-trivial test program.
Here are some performance speedups that will help you out.

1. Measure, measure, measure!!! Don't just guess where the performance
spots may lie, use time, timeit, or hotspot profiler to help you find your
performance problems. I used a very crude form of profiling, in the same
vein as using printf for debugging. I created a method called elapsed(),
and then just dropped in elapsed("A"), elapsed("B"), etc. commands to help
me identify where the time is going. (Calling elapsed() with no label
resets the timer.)

elapsedLast = 0
def elapsed(label=''):
global elapsedLast
cur = time.time()
if label:
print "%s %.2f sec" % (label,cur-elapsedLast)
elapsedLast = cur
2. Your main encoding hotspot is in the bin2dec() routine. In your
newbie-ness, you have reinvented the int() built-in. Adding the code (to
replace your implementation of bin2dec):

def bin2dec(bin_number):
return int(bin_number,2)

cuts out 75% of your encoding time.

3. Your decoding method has 2 hot spots. One, predictably, is in the
conversion from decimal to binary. Unfortunately, there *is* no builtin for
this (did I read somewhere that this would be added in 2.4?), so instead, I
resorted to the time-honored lookup table approach. You only ever call
dec2bin with numbers from 0 to 255. So I followed your slow implementation
of dec2bin with the following code:

# use slow dec2bin to build lookup dict
binmap = {}
for i in range(256):
binmap[i] = dec2bin(i)
# now redefine dec2bin using lookup dict
def dec2bin(dec_number):
return binmap[dec_number]

This creates a fast lookup table using 256 calls to your slow routine, then
defines a new dec2bin method that just returns the value from the lookup
table. (In Python version 2.4, you could use a memoize decorator to
accomplish this with a single @memoize decorator
statement/declaration/well-what-is-this-thing-anyway? just before your
original dec2bin method - this is a very cool technique worth checking out -
see http://www.python.org/moin/PythonDecoratorLibrary.) This change crushes
out almost *all* the decimal-to-binary time (which in my tests was about 58%
of the decoding time)

Your second hot spot in decoding is the while loop in which you iterate over
the encoded string looking for successively longer keys. With a few minor
changes, I cut out about 48% of the processing time in your loop (or about
19% of the total decoding time).
Orig:
while not end_idx > len(bit_string):
if table.has_key(bit_string[start_idx:end_idx]):
result.append(table[bit_string[start_idx:end_idx]])
start_idx = end_idx
end_idx += 1

if bit_string[start_idx:end_idx] == '': break

Modified:
bit_string_len = len(bit_string)
while not end_idx > bit_string_len:
curkey = bit_string[start_idx:end_idx]
if curkey in table:
result.append(table[curkey])
start_idx = end_idx
end_idx += 1

bit_string does not change length during this loop, so there is no reason to
evaluate len(bit_string) each time. This alone cuts another 5% from the
original decoding time. Then, I removed the curious 'if bitstring[...'
test, which looks like some kind of vestigial debugging code - I'm not sure
I can see any conditions when it will evaluate to true. Removing this drops
another 8%. Finally, I saved the temporary value of the string slice
bit_string[start_idx:end_idx], in case it was a key found in the table
variable - no need to double-compute the slice if the key is in the table -
this carves off another 7%.

Just some general comments on performance:
1. Avoid function calls. These are major performance killers. The worst
part of your original dec2bin was that it called itself recursively.
2. try: except: overhead. Setting up the try/except frames can be
expensive. One of my early attempts at optimizing your while loop was to
use:
try:
result.append(table[bit_string[start_idx:end_idx]])
except KeyError:
pass
else:
start_idx = end_idx
But this made things much slower!
3. Loop invariants and dotted refs. If you have a while-loop, remember that
the while condition is reevaluated every iteration. If you are working your
way through a long string, don't keep calling len(reallyLongString)! Also,
once you start to work more with classes and objects, you'll find that
repeated refs to object attributes within a while loop can be improved by
copying to a local variable. That is,
while countingSomethings():
if numberOfSomethings > self.maxSomethings:
reactAccordingly()
can be performance-improved with:
maxSomethingLimit = self.maxSomethings
while countingSomethings():
if numberOfSomethings > maxSomethingLimit:
reactAccordingly()

Lastly, performance enhancements can be in direct conflict with
maintainability and generally accepted practices - so use them sparingly!!!
and comment liberally!!!

-- Paul

Jul 18 '05 #3

Guyon Morée

Wow Paul!

thanks a lot for your comments! I learned a lot already only by reading
them, I will implement them and see what the speed gains are.

Keep an eye on my blog, I will post an update on it soon!
thanks,
guyon

ps. love this group
"Paul McGuire" <pt***@austin.rr._bogus_.com> wrote in message

http://gumuz.looze.net/wordpress/ind...fman-encoding/
Great first step at some self-learning with a non-trivial test program.
Here are some performance speedups that will help you out.

Jul 18 '05 #4

Similar topics

Python speed-up

by: Guyon Morée | last post by:

Hi all, I am working on a Huffman encoding exercise, but it is kinda slow. This is not a big problem, I do this to educate myself :) So I started profiling the code and the slowdown was...

Python

Using Huffman Compression

by: dirgesh | last post by:

I am having a hard time making a Program in C/C++ that uses the Huffman Compression to compress a file. I have a file "Hello World" That i need to compress. can someone please give me an...

C / C++

Huffman Compression

by: james | last post by:

I have ran into a little snag on an old database application I am converting. Out of 63 tables(all seperate files) , two of them are compressed using Huffman Compression. I have been searching all...

Visual Basic .NET

huffman encoder

by: aarklon | last post by:

Hi all, this is the program which I saw in my colleagues book. the program was run on turbo C++ 3.0 compiler /*beginning of header file huff.h*/ #ifndef _HUFF_H_ #define _HUFF_H_

C / C++

Huffman Encoding / Decoding

by: KWSW | last post by:

The other thread got too long so decided to start a new one. Got my huffman tree working and am now moving onto encoding and decoding my given text file. My idea now is to print out all the leaf...

Java

pure python data compression (zip)

by: Rene Maurer | last post by:

Hallo I wonder if there are any pure python implementations available/known for the zip (or any other) data compression... As far as I know python's zlib uses http://www.zlib.net/, which is...

Python

Huffman Code Decoding Problem

by: recordlovelife | last post by:

So i have written a code to encode a string of a select amount of letters into huffman code. #include <iostream> #include <string> using namespace std; string Huffman(char letter);...

C / C++

Huffman Encoding.

by: Esqulax | last post by:

Hiya guys. im after a bit of guidance to be honest My task: Create a program that generates huffman codewords for a 16 symbol alphabet, the input should be the probabilities of each symbol...

Java

Huffman encoding in C

by: coucchy | last post by:

Hello everyone, i'm writing a C program that will read a text file and then encode the characters using huffman algorithm. So far i've read the files, built a tree, and now I'm building the strings...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice