By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,797 Members | 1,262 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,797 IT Pros & Developers. It's quick & easy.

Python Huffman encoding

P: n/a
dot
Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/

Your comments are highly appreciated!

cheers,

guyon
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"@ dot wrote:
Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/
Your comments are highly appreciated!


Looks cool. Now if only I had seen this before I implemented my own
huffman decoding for JPEG ... I wonder if your module could be used
instead ...
Jul 18 '05 #2

P: n/a
"dot" <""gumuz(\"@)looze(dot)net"> wrote in message
news:41***********************@news.wanadoo.nl...
Hi all, I have written a Python huffman Encoding Module, for my own
amusement. I thought it might be educational/entertaining for other
people, so I've put it on my website and wrote about it a little.

http://gumuz.looze.net/wordpress/ind...fman-encoding/
Your comments are highly appreciated!

cheers,

guyon


Guyon -

Great first step at some self-learning with a non-trivial test program.
Here are some performance speedups that will help you out.

1. Measure, measure, measure!!! Don't just guess where the performance
spots may lie, use time, timeit, or hotspot profiler to help you find your
performance problems. I used a very crude form of profiling, in the same
vein as using printf for debugging. I created a method called elapsed(),
and then just dropped in elapsed("A"), elapsed("B"), etc. commands to help
me identify where the time is going. (Calling elapsed() with no label
resets the timer.)

elapsedLast = 0
def elapsed(label=''):
global elapsedLast
cur = time.time()
if label:
print "%s %.2f sec" % (label,cur-elapsedLast)
elapsedLast = cur
2. Your main encoding hotspot is in the bin2dec() routine. In your
newbie-ness, you have reinvented the int() built-in. Adding the code (to
replace your implementation of bin2dec):

def bin2dec(bin_number):
return int(bin_number,2)

cuts out 75% of your encoding time.

3. Your decoding method has 2 hot spots. One, predictably, is in the
conversion from decimal to binary. Unfortunately, there *is* no builtin for
this (did I read somewhere that this would be added in 2.4?), so instead, I
resorted to the time-honored lookup table approach. You only ever call
dec2bin with numbers from 0 to 255. So I followed your slow implementation
of dec2bin with the following code:

# use slow dec2bin to build lookup dict
binmap = {}
for i in range(256):
binmap[i] = dec2bin(i)
# now redefine dec2bin using lookup dict
def dec2bin(dec_number):
return binmap[dec_number]

This creates a fast lookup table using 256 calls to your slow routine, then
defines a new dec2bin method that just returns the value from the lookup
table. (In Python version 2.4, you could use a memoize decorator to
accomplish this with a single @memoize decorator
statement/declaration/well-what-is-this-thing-anyway? just before your
original dec2bin method - this is a very cool technique worth checking out -
see http://www.python.org/moin/PythonDecoratorLibrary.) This change crushes
out almost *all* the decimal-to-binary time (which in my tests was about 58%
of the decoding time)

Your second hot spot in decoding is the while loop in which you iterate over
the encoded string looking for successively longer keys. With a few minor
changes, I cut out about 48% of the processing time in your loop (or about
19% of the total decoding time).
Orig:
while not end_idx > len(bit_string):
if table.has_key(bit_string[start_idx:end_idx]):
result.append(table[bit_string[start_idx:end_idx]])
start_idx = end_idx
end_idx += 1

if bit_string[start_idx:end_idx] == '': break

Modified:
bit_string_len = len(bit_string)
while not end_idx > bit_string_len:
curkey = bit_string[start_idx:end_idx]
if curkey in table:
result.append(table[curkey])
start_idx = end_idx
end_idx += 1

bit_string does not change length during this loop, so there is no reason to
evaluate len(bit_string) each time. This alone cuts another 5% from the
original decoding time. Then, I removed the curious 'if bitstring[...'
test, which looks like some kind of vestigial debugging code - I'm not sure
I can see any conditions when it will evaluate to true. Removing this drops
another 8%. Finally, I saved the temporary value of the string slice
bit_string[start_idx:end_idx], in case it was a key found in the table
variable - no need to double-compute the slice if the key is in the table -
this carves off another 7%.

Just some general comments on performance:
1. Avoid function calls. These are major performance killers. The worst
part of your original dec2bin was that it called itself recursively.
2. try: except: overhead. Setting up the try/except frames can be
expensive. One of my early attempts at optimizing your while loop was to
use:
try:
result.append(table[bit_string[start_idx:end_idx]])
except KeyError:
pass
else:
start_idx = end_idx
But this made things much slower!
3. Loop invariants and dotted refs. If you have a while-loop, remember that
the while condition is reevaluated every iteration. If you are working your
way through a long string, don't keep calling len(reallyLongString)! Also,
once you start to work more with classes and objects, you'll find that
repeated refs to object attributes within a while loop can be improved by
copying to a local variable. That is,
while countingSomethings():
if numberOfSomethings > self.maxSomethings:
reactAccordingly()
can be performance-improved with:
maxSomethingLimit = self.maxSomethings
while countingSomethings():
if numberOfSomethings > maxSomethingLimit:
reactAccordingly()

Lastly, performance enhancements can be in direct conflict with
maintainability and generally accepted practices - so use them sparingly!!!
and comment liberally!!!

-- Paul

Jul 18 '05 #3

P: n/a
Wow Paul!

thanks a lot for your comments! I learned a lot already only by reading
them, I will implement them and see what the speed gains are.

Keep an eye on my blog, I will post an update on it soon!
thanks,
guyon

ps. love this group
"Paul McGuire" <pt***@austin.rr._bogus_.com> wrote in message
http://gumuz.looze.net/wordpress/ind...fman-encoding/
Great first step at some self-learning with a non-trivial test program.
Here are some performance speedups that will help you out.

Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.