473,698 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Huffman Encoding / Decoding

72 New Member
The other thread got too long so decided to start a new one.
Got my huffman tree working and am now moving onto encoding and decoding my given text file.

My idea now is to print out all the leaf nodes into a string and split them up using tokens to get my ascii value and the huffman code and store into a collection for reference when encoding.

Just got some questions on this:

1. Is there anyway to look through the huffman tree for the ascii value since they are all now stuck in one node?

2. I was thinking of putting the codes into a map with the huffman code as the key and the ascii value as the map value since it will be sorted by the huffman code which is also determined by the frequency of the ascii value appearing. Thus less time is need to traverse the whole map. Would this be efficient or is there a better way?

Thanks :)
Aug 5 '07 #1
11 6783
JosAH
11,448 Recognized Expert MVP
The other thread got too long so decided to start a new one.
Got my huffman tree working and am now moving onto encoding and decoding my given text file.

My idea now is to print out all the leaf nodes into a string and split them up using tokens to get my ascii value and the huffman code and store into a collection for reference when encoding.

Just got some questions on this:

1. Is there anyway to look through the huffman tree for the ascii value since they are all now stuck in one node?

2. I was thinking of putting the codes into a map with the huffman code as the key and the ascii value as the map value since it will be sorted by the huffman code which is also determined by the frequency of the ascii value appearing. Thus less time is need to traverse the whole map. Would this be efficient or is there a better way?

Thanks :)
I don't know why you're talking about ASCII values. The leaf nodes represent a
character (*any* Unicode character) which is associated with a Huffman code.

A Huffman code for a character simply is a sequence of bits. Instead of printing
the character to the output, you print that bit sequence. That's the essence of
Huffman encoding; it's all about data compression.

You need a mapping from those Unicode characters to the bit sequences for
the compression or encoding part.

When you decompress such a file you should read bits from a bit sequence and
check whether or not you still have that bit sequence associated with a Unicode
character.

For a zero bit read, you take the route of a left child; go right when a 1 bit is
read. The moment you reach a leaf node you can emit the associated character.

kind regards,

Jos
Aug 5 '07 #2
KWSW
72 New Member
About the ascii part, the assignment just requires the encoding of a text file thats why I use ascii value. Also the requirement is to print the huffman code as string instead of bits(not too sure why also).

Still not too sure about what you mean by
For a zero bit read, you take the route of a left child; go right when a 1 bit is
read. The moment you reach a leaf node you can emit the associated character.
For now I just created a huffmancode object with the ascii value and its huffman code representation and put them all into an array sorrting with the shortest code first. Thus for compression I just search the ascii value and get the huffman code for it.

Was thinking of doing it the same way but I guess its gonna be very inefficient...
Aug 5 '07 #3
JosAH
11,448 Recognized Expert MVP
About the ascii part, the assignment just requires the encoding of a text file thats why I use ascii value. Also the requirement is to print the huffman code as string instead of bits(not too sure why also).

Still not too sure about what you mean by

For a zero bit read, you take the route of a left child; go right when a 1 bit is
read. The moment you reach a leaf node you can emit the associated character.
For now I just created a huffmancode object with the ascii value and its huffman code representation and put them all into an array sorrting with the shortest code first. Thus for compression I just search the ascii value and get the huffman code for it.
Was thinking of doing it the same way but I guess its gonna be very inefficient...
Suppose you have the following little Huffman tree:
Expand|Select|Wrap|Line Numbers
  1.       .
  2.      / \
  3.     /\  C
  4.    A  B
  5.  
So the encoding is:

A: 00
B: 01
C: 1

Suppose you're reading the folling bits 010011, start at the root of the tree:

0: go left; 1: go right; you're at a leaf, a B was read. Start at the root again;
0: go left; 0: go right; you're at a leaf, an A was read. Start at the root again;
1: go right; you're at a leaf, a C was read. Start at the root again;
1: go right; you're at a lead, a C was read. Start at the root again;
the end of the bit stream was read so you have read BACC

You have to keep that tree somewhere if you want to decode efficiently.
The encoding part can be done with a Map<Character, String> in your case.

kind regards,

Jos
Aug 5 '07 #4
KWSW
72 New Member
Hi josAH,

I happen to find ur set of huffman and would like to know it means by your add bit function.

Expand|Select|Wrap|Line Numbers
  1. protected void add(int bit) 
  2.         code&= (1<<len)-1;
  3.         code<<= 1; 
  4.         code|=  bit;
  5.         len++;
  6. }
I understand len++ is to add one to the code length but do not understand the rest. It seems my method of just adding "0" to the left and "1" to the right doesn't give me prefix codes....

edit:

ok i found out these out

&= "bitwise AND and assign"
<<= "shift bits left and assign"
|= "bitwise OR and assign"
<< "shift bits left"
Aug 5 '07 #5
JosAH
11,448 Recognized Expert MVP
Hi josAH,

I happen to find ur set of huffman and would like to know it means by your add bit function.

Expand|Select|Wrap|Line Numbers
  1. protected void add(int bit) 
  2.         code&= (1<<len)-1;
  3.         code<<= 1; 
  4.         code|=  bit;
  5.         len++;
  6. }
Where did you find that code? Not to worry though, I don't claim any copyright.

kind regards,

Jos
Aug 5 '07 #6
KWSW
72 New Member
Where did you find that code? Not to worry though, I don't claim any copyright.

kind regards,

Jos
Found it over at the java forums and the user name was the same as urs...

anyway I decided to use the method that prints the bit into string for me to compare and this is what i got


Expand|Select|Wrap|Line Numbers
  1. c 5 00000 0
  2. b 6 010000 10000
  3. w 6 110000 110000
  4. r 4 1000 1000
  5. e 3 100 100
  6. u 5 00010 10
  7. l 5 10010 10010
  8. i 4 1010 1010
  9. s 4 0110 110
  10. S 9 000001110 1110
  11. T 10 0100001110 100001110
  12. I 10 1100001110 1100001110
  13.  
  14.  8 10001110 10001110
  15. k 9 001001110 1001110
  16. j 9 101001110 101001110
  17. G 10 0011001110 11001110
  18. W 10 1011001110 1011001110
  19. : 10 0111001110 111001110
  20. F 10 1111001110 1111001110
  21. , 6 101110 101110
  22. d 5 11110 11110
  23. a 4 0001 1
  24. n 4 1001 1001
  25. o 4 0101 101
  26. g 6 001101 1101
  27. p 6 101101 101101
  28. m 6 011101 11101
  29. M 11 00000111101 111101
  30. z 11 10000111101 10000111101
  31. V 12 001000111101 1000111101
  32. O 12 101000111101 101000111101
  33. ( 13 0011000111101 11000111101
  34. ) 13 1011000111101 1011000111101
  35. ' 13 0111000111101 111000111101
  36. - 13 1111000111101 1111000111101
  37. N 10 0100111101 100111101
  38. ; 10 1100111101 1100111101
  39. x 10 0010111101 10111101
  40. K 13 0001010111101 1010111101
  41. [ 13 1001010111101 1001010111101
  42. ] 13 0101010111101 101010111101
  43. E 13 1101010111101 1101010111101
  44. D 11 11010111101 11010111101
  45. H 9 110111101 110111101
  46. v 7 1111101 1111101
  47. t 4 0011 11
  48. . 8 00001011 1011
  49. A 11 00010001011 10001011
  50. P 11 10010001011 10010001011
  51. q 11 01010001011 1010001011
  52. B 11 11010001011 11010001011
  53. C 10 0110001011 110001011
  54. U 12 001110001011 1110001011
  55. R 12 101110001011 101110001011
  56. J 12 011110001011 11110001011
  57. L 13 0111110001011 111110001011
  58. Y 13 1111110001011 1111110001011
  59. y 7 1001011 1001011
  60. f 6 101011 101011
  61. h 5 11011 11011
  62.   3 111 111
where the first is the character, the 2nd is my huffmancode in string and the 3rd part is the bits after using integer.toBinar yString(int).

I could just modify my code by storing the huffman code in bits and using the integer.toBinar yString(int) to write to file for my assignment requirements but than I will not be learning anything...
Aug 5 '07 #7
JosAH
11,448 Recognized Expert MVP
I could just modify my code by storing the huffman code in bits and using the integer.toBinar yString(int) to write to file for my assignment requirements but than I will not be learning anything...
It seems to look alright to me: the most frequently used letters are associated
with the shortest Huffman codes. As you can see, if the leftmost bit(s) of the
Huffman codes are 0, you can't tell the length of the code when you'd store it
as a simple int. You have to keep track of the bitlength of the code as I did in
my version.

Reading and writing bits from/to a stream is an exercise by itself.

kind regards,

Jos
Aug 5 '07 #8
KWSW
72 New Member
It seems to look alright to me: the most frequently used letters are associated
with the shortest Huffman codes. As you can see, if the leftmost bit(s) of the
Huffman codes are 0, you can't tell the length of the code when you'd store it
as a simple int. You have to keep track of the bitlength of the code as I did in
my version.

Reading and writing bits from/to a stream is an exercise by itself.

kind regards,

Jos

So this part has something to do with writing bits to the stream?

Expand|Select|Wrap|Line Numbers
  1. protected void add(int bit) 
  2. {                                                                 
  3.    code &= (1<<len)-1;
  4.    code <<= 1;
  5.    code |=  bit;
  6.    len++;
  7. }
Aug 5 '07 #9
JosAH
11,448 Recognized Expert MVP
Just still clueless on this part:

Expand|Select|Wrap|Line Numbers
  1. protected void add(int bit) 
  2. {                                                                 
  3.    code &= (1<<len)-1;
  4.    code <<= 1;
  5.    code |=  bit;
  6.    len++;
  7. }
It's just bit fiddling (I'm good at that ;-) 'code' is the bit sequence so far and 'len'
is the length of the code so far. The parameter 'bit' is either one or zero. This
little method shifts in that bit on the right side of the code and the length is
incremented. The first line 'code&= (1<<len)-1' isn't actually needed but it makes
sure that only 'len' bits in the code are used before the new bit is shifted in.
Check your Java manuals for those bit manipulating operators.

I could've written 'code= code*2+bit' but I think my mind was in a bit fiddling mood
at that time ;-)

kind regards,

Jos
Aug 5 '07 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

3
5388
by: dot | last post by:
Hi all, I have written a Python huffman Encoding Module, for my own amusement. I thought it might be educational/entertaining for other people, so I've put it on my website and wrote about it a little. http://gumuz.looze.net/wordpress/index.php/archives/2004/11/25/huffman-encoding/ Your comments are highly appreciated! cheers,
9
23705
by: Mark | last post by:
I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the "improvements", but maybe someone here can point me in the right direction... First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard...
15
3686
by: aarklon | last post by:
Hi all, this is the program which I saw in my colleagues book. the program was run on turbo C++ 3.0 compiler /*beginning of header file huff.h*/ #ifndef _HUFF_H_ #define _HUFF_H_
37
3371
by: Zhiv Kurilka | last post by:
Hi, I have a text file with following content: "((^)|(.* +))§§§§§§§§" if I read it with: k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII); k.readtotheend()
5
8054
by: recordlovelife | last post by:
So i have written a code to encode a string of a select amount of letters into huffman code. #include <iostream> #include <string> using namespace std; string Huffman(char letter); string Huffman_word(string word);
0
1820
by: Michele | last post by:
Hi there, I'm using a python script in conjunction with a JPype, to run java classes. So, here's the code: from jpype import * import os import random import math import sys
0
5384
by: coucchy | last post by:
Hello everyone, i'm writing a C program that will read a text file and then encode the characters using huffman algorithm. So far i've read the files, built a tree, and now I'm building the strings to be used in encoding, Where i'm having problem is my BuildStrings function, instead of printing the 0s and 1s I appended to the array it outputs a series of 0s and 1s. Below is the BuildStrings function and the main function. /*This funtion...
3
4290
by: mviuk | last post by:
Hi, I'm looking for a system which detaches the decoding process from the encoding process. That is, I would like a system for encoding data, but even if both the encoded data and the encoding process is known I would like it to be possible to decode the data with a seperate algorithm. I realise I didn't explain that very well so I'll say what I want to use it for. On a website I want to store details and for them to be encoded, since this...
0
9169
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9030
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8899
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8871
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7738
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6528
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5861
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3052
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2007
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.