473,388 Members | 1,322 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

On to huffman...

72
Thanks to the help over at http://www.thescripts.com/forum/thread686347.html on the character count program, I can now move onto getting the huffman code done. Unfortunately I am still rather new to Java and would again need help on getting started.

Not gonna ask for the whole code here cos its against the rules and its not gonna help me learn the language. I know that if I use C++, I can use the heap container which is present in the STL but for java is there something like that?

Went to try to google "java heap container" but all I got was some out of memory error...

Will still carry on searching and I hope someone will point me in the right direction.

Thanks In Advance :)
Aug 4 '07 #1
17 2377
JosAH
11,448 Expert 8TB
Thanks to the help over at http://www.thescripts.com/forum/thread686347.html on the character count program, I can now move onto getting the huffman code done. Unfortunately I am still rather new to Java and would again need help on getting started.

Not gonna ask for the whole code here cos its against the rules and its not gonna help me learn the language. I know that if I use C++, I can use the heap container which is present in the STL but for java is there something like that?

Went to try to google "java heap container" but all I got was some out of memory error...

Will still carry on searching and I hope someone will point me in the right direction.

Thanks In Advance :)
If you think of a Huffman tree there are two types of nodes: simple character
nodes; they are the leaves of the tree, and 'internal' nodes which are the combined
nodes of the Huffman tree.

Both nodes can return their frequency: character (leaf) nodes simply return the
frequence of the character they represent and the combined nodes return the
sum of the frequencies of their child nodes.

A node X is smaller than a node Y if X's freqency is smaller than Y's frequency.
This makes those nodes ideal to store them in a SortedSet (TreeSet).
The sorted set is the equivalent of your heap structure.

Building a Huffman tree out of such a SortedSet is breeze: simply combine
the two nodes with the lowest frequency, remove them from the set and add the
newly created combined node to the set again. Keep going until there's only
one node left in the set. That'll be the root of the Huffman tree.

An abstract node class and two extending classes (character node and a
combined node) are all that is needed.

kind regards,

Jos
Aug 4 '07 #2
KWSW
72
If you think of a Huffman tree there are two types of nodes: simple character
nodes; they are the leaves of the tree, and 'internal' nodes which are the combined
nodes of the Huffman tree.

Both nodes can return their frequency: character (leaf) nodes simply return the
frequence of the character they represent and the combined nodes return the
sum of the frequencies of their child nodes.

A node X is smaller than a node Y if X's freqency is smaller than Y's frequency.
This makes those nodes ideal to store them in a SortedSet (TreeSet).
The sorted set is the equivalent of your heap structure.

Building a Huffman tree out of such a SortedSet is breeze: simply combine
the two nodes with the lowest frequency, remove them from the set and add the
newly created combined node to the set again. Keep going until there's only
one node left in the set. That'll be the root of the Huffman tree.

An abstract node class and two extending classes (character node and a
combined node) are all that is needed.

kind regards,

Jos
Thanks for the explaination... will go and try it out... think I have a rough idea how now...

Many thanks again :)
Aug 4 '07 #3
KWSW
72
question about compareTo()

so in this case, I would then need to use that to compare two nodes for sorting for the sortedSet.

would it look something like this?

Expand|Select|Wrap|Line Numbers
  1. public int commpareTo(Object obj)
  2.     {        
  3.         node aNode = (node)obj;
  4.         return this.getFeq() - aNode.getFeq();
  5.     }
sorry still in the planning stage thats why I dun have the full code yet...
Aug 4 '07 #4
JosAH
11,448 Expert 8TB
question about compareTo()

so in this case, I would then need to use that to compare two nodes for sorting for the sortedSet.

would it look something like this?

Expand|Select|Wrap|Line Numbers
  1. public int commpareTo(Object obj)
  2.     {        
  3.         node aNode = (node)obj;
  4.         return this.getFeq() - aNode.getFeq();
  5.     }
sorry still in the planning stage thats why I dun have the full code yet...
No need to apologize; that compareTo method is about right but beware that
you never want the return value of that method to be zero. You can get around
that by comparing the huffman code for the nodes built so far when the
frequencies compare equal. You'll bump into that later; keep my remark in mind ;-)

kind regards,

Jos
Aug 4 '07 #5
KWSW
72
No need to apologize; that compareTo method is about right but beware that
you never want the return value of that method to be zero. You can get around
that by comparing the huffman code for the nodes built so far when the
frequencies compare equal. You'll bump into that later; keep my remark in mind ;-)

kind regards,

Jos

yup just ran into that problem. WIll try out the comparing of the huffman code length.

Hmm... any advise on how I can set the code of each node?
Aug 4 '07 #6
JosAH
11,448 Expert 8TB
yup just ran into that problem. WIll try out the comparing of the huffman code length.

Hmm... any advise on how I can set the code of each node?
Comparing on the code length won't help: two nodes can have the same frequency
and the same code length; you have to compare on the code itself or some
other artificial criteria; as long as no two nodes compare equal in a consistent way.

The actual comparison doesn't matter much; when the nodes are taken out
of the set to be combined, a new node is entered to the set afterwards and
the two individual nodes never will occur in that set anymore by themselves.

When you combine two nodes, prepend a 0 to the code of the left node and
prepend a 1 to the code of the right node. You can append too, it depends on
how you build up those codes and in what order you want to read the bits.

In both cases you have to keep track of the length of the code (which equals
the depth of the the node in the tree).

kind regards,

Jos
Aug 4 '07 #7
KWSW
72
hmm I keep getting missing nodes... is it something to do with the compareTo() not being able to tell which is bigger?

Expand|Select|Wrap|Line Numbers
  1. public int compareTo(Object obj)
  2.     {        
  3.         Node aNode = (Node)obj;        
  4.  
  5.         if(this.getFeq() == aNode.getFeq())
  6.         {
  7.             return this.codeLen - aNode.codeLen;
  8.         }
  9.         else
  10.         {
  11.             return this.feq - aNode.feq;
  12.         }
  13.  
  14.     }
my huffman algo
Expand|Select|Wrap|Line Numbers
  1. while (t.size() > 1)
  2.         {
  3.             Node left = (Node)t.first();
  4.             t.remove(left);
  5.             Node right = (Node)t.first();
  6.             t.remove(right);
  7.  
  8.             t.add(new Branch(left,right));
  9.  
  10.             count++;
  11.         }
Aug 4 '07 #8
JosAH
11,448 Expert 8TB
hmm I keep getting missing nodes... is it something to do with the compareTo() not being able to tell which is bigger?

Expand|Select|Wrap|Line Numbers
  1.     return this.codeLen - aNode.codeLen;
  2.  
As I wrote above: don't compare the code lenghts; compare the actual codes
themselves. If the frequencies equal it is very well possible that the code lengths
compare equal as well; don't compare those lengths.

kind regards,

Jos
Aug 4 '07 #9
KWSW
72
Ok I have run out of ideas..
Here is my code

Expand|Select|Wrap|Line Numbers
  1. import java.util.*;
  2. import java.io.*;
  3.  
  4. abstract class Node implements Comparable
  5. {        
  6.     // For CompareTo() to use
  7.     protected int x = 0;
  8.  
  9.  
  10.     // The code to represent the symbol
  11.     protected String code = "";
  12.  
  13.     // The fequency of the symbol that this node represents
  14.     protected int feq;    
  15.  
  16.     // Return fequency
  17.     public int getFeq()
  18.     {
  19.         return feq;
  20.     }
  21.  
  22.     public void increaseFeq()
  23.     {
  24.         feq++;
  25.     }                  
  26.  
  27.     // Constructor that takes in the fequency
  28.     public Node(int feq, int x)
  29.     {
  30.         this.feq = feq;
  31.         this.x = x;
  32.     }
  33.  
  34.     abstract protected void addCode(int b);
  35.  
  36.  
  37.     public int compareTo(Object obj)
  38.     {        
  39.         Node aNode = (Node)obj;                
  40.  
  41.         if(this.getFeq() == aNode.getFeq())
  42.         {            
  43.             return (this.x+this.feq+this.code.length()) - (aNode.x+this.feq+this.code.length());
  44.         }
  45.         else
  46.         {
  47.             return this.feq - aNode.feq;
  48.         }
  49.  
  50.     }
  51.  
  52. }
  53.  
  54. class Leaf extends Node
  55. {
  56.     private int ascii;
  57.  
  58.     public void addCode(int i)
  59.     {
  60.        code += i;       
  61.     }
  62.  
  63.     public Leaf(int ascii, int feq)
  64.     {
  65.         super(feq,ascii);
  66.         this.ascii = ascii;        
  67.     }        
  68.  
  69.     public String toString() 
  70.     { 
  71.         return "\n["+ascii+":"+code+","+feq+","+code.length()+"]"; 
  72.     }
  73. }
  74.  
  75. class Branch extends Node
  76. {
  77.     private Node left;
  78.     private Node right;
  79.  
  80.     public Branch(Node left, Node right) 
  81.         {
  82.         super(left.getFeq()+right.getFeq(),left.getFeq()+right.getFeq());
  83.                 this.left = left;
  84.                 this.right = right;
  85.  
  86.                 left.addCode(0);        
  87.                 right.addCode(1);
  88.     }
  89.  
  90.     protected void addCode(int i) 
  91.         { 
  92.             left.addCode(i); 
  93.             right.addCode(i); 
  94.         }
  95.  
  96.     public String toString()
  97.         {
  98.             return "Total: "+ (left.getFeq()+right.getFeq())+ left + right;
  99.         }
  100. }
  101.  
  102. class Tree
  103. {
  104.     Map<Integer,Integer> m = new HashMap<Integer,Integer>();
  105.  
  106.     public void add(int i)
  107.     {
  108.         int count;
  109.  
  110.         if(m.containsKey(i))
  111.         {
  112.             count = m.get(i);
  113.             count++;
  114.             m.put(i,count);
  115.         }
  116.         else
  117.         {
  118.             m.put(i,1);
  119.         }
  120.     }
  121.  
  122.     public void create()
  123.     {
  124.         ArrayList al = new ArrayList();                
  125.  
  126.         for(int i : m.keySet()) 
  127.         {            
  128.             Leaf l = new Leaf(i,m.get(i));
  129.             al.add(l);            
  130.         }                                                
  131.  
  132.         TreeSet t = new TreeSet();        
  133.  
  134.         for(int i = 0; i < al.size(); i++)
  135.         {
  136.             t.add(al.get(i));            
  137.         }
  138.  
  139.         do
  140.         {
  141.             Node left = (Node)t.first(); 
  142.             t.remove(left);
  143.  
  144.             Node right = (Node)t.first(); 
  145.             t.remove(right);
  146.  
  147.             t.add(new Branch(left, right));
  148.  
  149.             System.out.println(t.size());
  150.  
  151.         }while(t.size() > 1);
  152.  
  153.         System.out.println(t.first());
  154.  
  155.     }
  156.  
  157. }
I tried adding everything I can think off to the compareTo()...

my input file just has "123456789"
but my output is just 3 nodes.... please help... been at it for about 4 hours...
TIA
Aug 4 '07 #10
JosAH
11,448 Expert 8TB
Your class hierarchy is (almost?) identical to mine, so I guess that's quite alright
(he said arrogantly ;-) but your compareTo() method still is incorrect, i.e. you're
still trying to compare something using the *length* of the code. That's not correct.
Here's my version (ripped straight from the code):

Expand|Select|Wrap|Line Numbers
  1.     public int compareTo(Object obj) {
  2.         HuffmanNode that= (HuffmanNode)obj;
  3.         if (this.freq == that.freq)
  4.             return this.code-that.code;
  5.  
  6.         return this.freq-that.freq;
  7.     }
  8.  
the 'code' member represents the actual Huffman code constructed so far.
The 'freq' member is (as you can guess) the frequency count of the node.

kind regards,

Jos
Aug 4 '07 #11
KWSW
72
Your class hierarchy is (almost?) identical to mine, so I guess that's quite alright
(he said arrogantly ;-) but your comparyTo() method still is incorrect, i.e. you're
still trying to compare something using the *length* of the code. That's not correct.
Here's my version (ripped straight from the code):

Expand|Select|Wrap|Line Numbers
  1.     public int compareTo(Object obj) {
  2.         HuffmanNode that= (HuffmanNode)obj;
  3.         if (this.freq == that.freq)
  4.             return this.code-that.code;
  5.  
  6.         return this.freq-that.freq;
  7.     }
  8.  
the 'code' member represents the actual Huffman code constructed so far.
The 'freq' member is (as you can guess) the frequency count of the node.

kind regards,

Jos
Much appreciated... will give it a try... :)
Aug 4 '07 #12
KWSW
72
Much appreciated... will give it a try... :)
hmm it seems that I cant do that as my code is a string type...
Aug 4 '07 #13
JosAH
11,448 Expert 8TB
Much appreciated... will give it a try... :)
Actually it's not much of a problem: it's just that the SortedSet can't store
non-unique keys: it can't deal with different nodes having just equal frequencies,
and are thus considered equal.

If the frequencies of two disjunct nodes differ, everything is fine. Just in case
those freqencies are equal you have to define, what they call, your own PTO
(Partial Total Ordering). *Any* ordering will do as long as it is consistent.

Being the lazy bastard that I am I simply picked the 'code' (which is always
unique) member variable but you can also assign a unique sequence number
to every node you create: 0, 1, 2, 3, ... and compare those:

Expand|Select|Wrap|Line Numbers
  1. public int compareTo(Object obj) {
  2.    HuffmanNode that= (HuffmanNode)obj;
  3.    if (this.freq == that.freq)
  4.       return this.seqno-that.seqno;
  5.    else
  6.       return this.freq-that.freq;
  7. }
  8.  
This little trick (as well as my lazy little trick) makes *every* node unique so the
SortedSet can store all of them, i.e. there are no two nodes X and Y for which
X == Y holds. They're either different because of their different frequencies, and
if they're not, they're still different because of some other criteria such as sequence
numbers of because of what I did: compare the unique Huffman codes so far.

kind regards,

Jos
Aug 4 '07 #14
JosAH
11,448 Expert 8TB
hmm it seems that I cant do that as my code is a string type...
Use your imagination:

Expand|Select|Wrap|Line Numbers
  1. ...
  2. return this.code.compareTo(that.code);
  3. ...
  4.  
assuming that you're updating your 'code' variable on the fly while you're constructing
that Huffman tree.

kind regards,

Jos
Aug 4 '07 #15
KWSW
72
thanks for all the help today... gonna give it one more shot than get some sleep can carry on tmr... :)
Aug 4 '07 #16
KWSW
72
thanks for all the help today... gonna give it one more shot than get some sleep can carry on tmr... :)
ok this is rather interesting but i added a static var(0) and another one x which is = the static var++.

somehow it works... was playing on the fact that every time a node is created the value of x would increase from the base value of the static var. than just compare the 2 node's value of x and somehow it works...

dunno how it works but thought i just share it here if anyone else is having the same problem as me. would be nice if someone explains it to me as well...

now to figure out how to encode but its time to sleep... nights and many thanks again to JosAH :)
Aug 4 '07 #17
JosAH
11,448 Expert 8TB
ok this is rather interesting but i added a static var(0) and another one x which is = the static var++.

somehow it works... was playing on the fact that every time a node is created the value of x would increase from the base value of the static var. than just compare the 2 node's value of x and somehow it works...

dunno how it works but thought i just share it here if anyone else is having the same problem as me. would be nice if someone explains it to me as well...

now to figure out how to encode but its time to sleep... nights and many thanks again to JosAH :)
You're welcome of course; as I wrote above: if the frequencies differ all is fine;
if they don't you need another (consistent) way to make those nodes different.
Here's a small example: A(1) B(2) B(3) B(4) C(5) C(6) D(7) E(8) etc.

The A, B, C, etc represent the frequencies; where they are equal that serial
number (your static counter) makes the difference for the nodes, so no two
nodes are considered equal and that is just what you need for that set.

kind regards,

Jos
Aug 4 '07 #18

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: dot | last post by:
Hi all, I have written a Python huffman Encoding Module, for my own amusement. I thought it might be educational/entertaining for other people, so I've put it on my website and wrote about it a...
8
by: dirgesh | last post by:
I am having a hard time making a Program in C/C++ that uses the Huffman Compression to compress a file. I have a file "Hello World" That i need to compress. can someone please give me an...
8
by: james | last post by:
I have ran into a little snag on an old database application I am converting. Out of 63 tables(all seperate files) , two of them are compressed using Huffman Compression. I have been searching all...
15
by: aarklon | last post by:
Hi all, this is the program which I saw in my colleagues book. the program was run on turbo C++ 3.0 compiler /*beginning of header file huff.h*/ #ifndef _HUFF_H_ #define _HUFF_H_
4
by: A_StClaire_ | last post by:
hi all, I'm trying to implement Huffman coding on letters of a string. I've written functions to define the initial leaf nodes (letters and their frequencies) and to sort them. I've also put...
3
by: Udhay | last post by:
Sir, I am udhay. I am a student and i am working on Audio compression for my exam. I want to know how does the compression take place in audio?? Can anyone suggest a link for the novice to...
11
by: KWSW | last post by:
The other thread got too long so decided to start a new one. Got my huffman tree working and am now moving onto encoding and decoding my given text file. My idea now is to print out all the leaf...
5
by: recordlovelife | last post by:
So i have written a code to encode a string of a select amount of letters into huffman code. #include <iostream> #include <string> using namespace std; string Huffman(char letter);...
1
by: Esqulax | last post by:
Hiya guys. im after a bit of guidance to be honest My task: Create a program that generates huffman codewords for a 16 symbol alphabet, the input should be the probabilities of each symbol...
0
by: coucchy | last post by:
Hello everyone, i'm writing a C program that will read a text file and then encode the characters using huffman algorithm. So far i've read the files, built a tree, and now I'm building the strings...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.