473,473 Members | 1,857 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

CountWords Assn.

16 New Member
I would greatly appreciate it if any one of you kind souls could take some time to help me out with an interesting bug in my program. I have tried many times to find the source of the problem unsuccessfully and believe that a second set of eyes will do wonders. Thanks in advance for reading my post.

*********************************

So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:

LAB ASSIGNMENT A19.3

CountWords

Background:

1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
- Hyphenated-words w/out space = 1 word
- Hypenated - words w/ space = 2 words
- Apostrophes in words = 1 word



2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:

Data Structures Algorithms

Array classes
String class
ArrayList class
sorting
searches
text file processing

Assignment:

1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:

– Total number of unique words used in the file.
– Total number of words in a file.
– The top 30 words which occur the most frequently, sorted in descending order by count.

For example:

1 103 the
2 97 of
3 59 to
4 43 and
5 36 a

6 32 be
7 32 we
8 26 will
9 24 that
10 21 is

... rest of top 30 words ...

Number of words used = 525
Total # of words = 1577

Now, time for my code:

wordCounter.java:
Expand|Select|Wrap|Line Numbers
  1. import java.util.*;
  2. import java.io.*;
  3.  
  4. public class wordCounter
  5. {
  6.     private String inFileName;
  7.     private int i;
  8.     private ArrayList <String> sortedWords = new ArrayList <String> ();
  9.     private ArrayList <String> uniqueWords = new ArrayList <String> ();
  10.     private ArrayList <Word> indivCount = new ArrayList <Word> ();
  11.  
  12.     public wordCounter(String fn)
  13.     {
  14.         inFileName = fn;
  15.     }
  16.  
  17.     public void readData(ArrayList <String> fileWords)
  18.     {
  19.         Scanner in;
  20.         try
  21.         {
  22.             in = new Scanner(new File(inFileName));
  23.             int i = 0;
  24.             while(in.hasNext())
  25.             {
  26.                 fileWords.add(in.next().toLowerCase());
  27.                 i++;
  28.             }
  29.         }
  30.         catch(IOException x)
  31.         {
  32.             System.out.println("Error: " + x.getMessage());
  33.         }
  34.     }
  35.  
  36.     public void sortList(ArrayList <String> a)
  37.     {
  38.         for(int position = 0; position < a.size(); position++)
  39.         {
  40.             String key = a.get(position);
  41.  
  42.             while(position > 0 && a.get(position - 1).compareTo(key) > 0)
  43.             {
  44.                 a.set(position, a.get(position - 1));
  45.                 position--;
  46.             }
  47.  
  48.             a.set(position, key);
  49.         }
  50.         sortedWords = a;
  51.     }
  52.  
  53.     public int findUnique(ArrayList <String> fileWords)
  54.     {
  55.         uniqueWords = fileWords;
  56.  
  57.         while(i < uniqueWords.size() - 1)
  58.         {
  59.             if(uniqueWords.get(i).compareTo(uniqueWords.get(i+1)) == 0)
  60.             {
  61.                 uniqueWords.remove(i+1);
  62.             }
  63.             else
  64.             {
  65.                 i++;
  66.             }
  67.         }
  68.         return uniqueWords.size();
  69.     }
  70.  
  71.     public int returnWordTotal(ArrayList <String> a)
  72.     {
  73.         return a.size();
  74.     }
  75.  
  76.     public void top30()
  77.     {
  78.         indivCount.add(new Word(sortedWords.get(0), 1));
  79.  
  80.         for(int x = 0; x < sortedWords.size() - 1; x++)
  81.         {
  82.             if(sortedWords.get(x).compareTo(sortedWords.get(x+1)) == 0)
  83.             {
  84.                 int count = indivCount.get(x).getCount();
  85.                 indivCount.get(x).setCount(count++);
  86.             }
  87.             else
  88.             {
  89.                 indivCount.add(new Word((sortedWords.get(x)), 1));                
  90.             }
  91.  
  92.             //indivCount.add(new Word(sortedWords.get(x).getWord(), (sortedWords.get(x).getCount() + 1)));
  93.         }
  94.     }
  95.  
  96.     public void Sort()
  97.     {
  98.         mergeSort(indivCount, 0, indivCount.size() -    1);
  99.     }
  100.  
  101.     private void merge(ArrayList <Word> a, int first, int mid, int last)
  102.     {
  103.           //same as in QuadSortComparableProject
  104.           //use a temporary array and then put back into original
  105.           int i = first;
  106.           int j = 1 + mid;
  107.           ArrayList <Word> temp = new ArrayList <Word> ();
  108.  
  109.           while(i <= mid && j <= last)
  110.           {
  111.               if(a.get(i).compareTo(a.get(j)) < 0)
  112.               {
  113.                   temp.add(a.get(i));
  114.                   i++;
  115.               }
  116.               else
  117.               {
  118.                   temp.add(a.get(j));
  119.                   j++;
  120.               }
  121.           }
  122.  
  123.           if(i > mid)
  124.           {
  125.               for(int x = j; x <= last; x++)
  126.               {
  127.                   temp.add(a.get(x));
  128.               }
  129.           }
  130.           else if(j > last)
  131.           {
  132.               for(int y = i; y <= mid; y++)
  133.               {
  134.                   temp.add(a.get(y));
  135.               }
  136.           }
  137.  
  138.           for(int q = 0; q < temp.size(); q++)
  139.           {
  140.               a.set(first + q, temp.get(q));
  141.           }
  142.     }
  143.  
  144.     public void mergeSort(ArrayList <Word> a, int first, int last)
  145.     {
  146.         //same as in QuadSortComparableProject
  147.         if(first != last)
  148.         {
  149.             int mid = (first + last)/2;
  150.             mergeSort(a, first, mid);
  151.             mergeSort(a, mid + 1, last);
  152.             merge(a, first, mid, last);
  153.         }
  154.     }
  155.  
  156.  
  157.     public void displayWord()
  158.     {
  159.         System.out.printf("%8s", "Count");
  160.         System.out.printf("%15s", "Word");
  161.         System.out.println("");
  162.         for(int i = 0; i < 30; i++)  //30 used instead of: indivCount.size()
  163.         {
  164.             System.out.print(i+1);
  165.             System.out.printf("%8s", ((Word)indivCount.get(i)).getCount());
  166.             System.out.printf("%14s", ((Word)indivCount.get(i)).getWord());
  167.             System.out.println("");
  168.             if((i+1)%5 == 0)
  169.             {
  170.                 System.out.println("");
  171.             }
  172.         }
  173.     }
  174.  
  175. }
  176.  




Now, Word.java:

Expand|Select|Wrap|Line Numbers
  1. public class Word implements Comparable <Word>
  2. {
  3.     private String myWord;
  4.     private int myCount; //word occurrences
  5.  
  6.     public Word(String word, int count)
  7.     {
  8.         myWord = word;
  9.         myCount = count;
  10.     }
  11.  
  12.     public int getCount()
  13.     {
  14.         return myCount;
  15.     }
  16.  
  17.     public void setCount(int count)
  18.     {
  19.         myCount = count;
  20.     }
  21.  
  22.     public String getWord()
  23.     {
  24.         return myWord;
  25.     }
  26.  
  27.     public void setWord(String word)
  28.     {
  29.         myWord = word;
  30.     }
  31.  
  32.     public int compareTo(Word other)
  33.     {
  34.         if(myCount > other.myCount)
  35.         {
  36.             return 1;
  37.         }
  38.         else if(myCount < other.myCount)
  39.         {
  40.             return -1;
  41.         }
  42.         else
  43.         {
  44.             return 0;
  45.         }
  46.     }
  47.  
  48. }
  49.  


And finally, my tester file, wordCounterTester.java:

Expand|Select|Wrap|Line Numbers
  1. import java.util.ArrayList;
  2.  
  3.  
  4. public class wordCounterTester
  5. {
  6.     private static ArrayList <String> fileWords = new ArrayList <String> ();
  7.  
  8.     public static void main(String[] args)
  9.     {
  10.         wordCounter myCounter = new wordCounter("dream.txt");
  11.         myCounter.readData(fileWords);
  12.         myCounter.sortList(fileWords);
  13.         System.out.println("Total # of words in file: " + myCounter.returnWordTotal(fileWords));
  14.         System.out.println("Total # of unique words in file: " + myCounter.findUnique(fileWords));
  15.         myCounter.top30();
  16.         myCounter.Sort();
  17.         myCounter.displayWord();        
  18.     }
  19.  
  20. }




Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):

Total # of words in file: 1580
Total # of unique words in file: 587
Count Word
1 1 you
2 1 york.
3 1 york
4 1 years
5 1 wrote

6 1 wrongful
7 1 would
8 1 work
9 1 words
10 1 withering

11 1 with
12 1 winds
13 1 will
14 1 whose
15 1 who

16 1 white
17 1 whirlwinds
18 1 which
19 1 where
20 1 when

21 1 were
22 1 we
23 1 waters
24 1 was
25 1 warm

26 1 wallow
27 1 walk,
28 1 walk
29 1 vote.
30 1 vote




Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.

SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.

THANKS IN ADVANCE.
Attached Files
File Type: txt dream.txt (8.7 KB, 414 views)
Nov 11 '09 #1
4 4602
wizardry
201 New Member
we really should not help wth homework assignments please note for future posts! however look at what your calling to print in wordCounterTester.java your calling unique method......
Nov 11 '09 #2
slapsh0t11
16 New Member
NOTE: this is not a homework assignment for a grade. It is simply a problem my teacher has given my class to as optional review. So, helping me figure out were I went wrong would allow me to further my currently limited knowledge of Java.

Again, if anyone is willing to help me find the error in my program that leads to this incorrect output, that would be much appreciated! Thanks again.

I have traced through my code numerous times, and it seems to me that the output should be correct. I believe a new set of eyes is all that is necessary to help me solve this OPTIONAL problem.
Nov 12 '09 #3
NeoPa
32,556 Recognized Expert Moderator MVP
With homework assignments it is actually ok to help when the member has posted what they have already tried. We are always very keen to help anybody learn. What we want to avoid is posting something where the teacher (or professor or whatever) feels that we have done the assignment (or part of it) for them.

Hints and general advice, for instance on debugging techniques that are usable in various circumstances are not a problem. Care should be taken of course, not simply to hand out answers.

Please check your PMs though slapsh0t, as PMing experts directly is certainly not so acceptable.
Nov 12 '09 #4
Frinavale
9,735 Recognized Expert Moderator Expert
I don't see how NeoPa's answer was the "best answer" considering that it has nothing to do with the original problem or question. So, I have reset the best answer.

@slapsh0t11

It's very hard to read through so much code to try and figure out where you're going wrong. Its a lot easier for experts and members to look at a specific line of code or function rather than making them looking through an entire application. In the future try to reduce the question's size. Locate what you think the source of the problem is, and only post code that is relevant.

That being said...
Have you considered using a HashTable to solve your problem?

I know that your assignment requirements list what you're allowed to use, but you could easily implement a quick HashTable class using classes listed in the assignment.

You could create a new Key/Value pair for each word that you find and store it in the HashTable...if the key already exists for the word then add 1 to the Value.

If you don't know what a HashTable is then you should look it up :)

It's fairly similar to what you have really but you wouldn't be using a "Word Class".
You'd just use a HashTable with the keys being the words in the file and the values being the count of the words in the file.

It would look something like (pseudo-code):
  • If theHashTable.Keys collection contains the word then:
    • Get the Value
    • Add One to the value
    • Store the value back in the hash table at the key (the word)
  • If theHashTable.Keys collection does not contain the word then:
    • Add a new key/value pair to the HashTable:
      • the key being the word, the value being "1" (because there has only been one found so far).
  • Get the next word and Loop

Now when you want to find out which words are unique you'd just loop through the hash table keys and check the value for each key...if the value is "1" then you know that it's unique.

If you don't want to use a HashTable then you should at least be using an ArrayList of Word objects (as apposed to an ArrayList of String objects). What good is an ArrayList of String Objects to you anyways?

You would populate this ArrayList in the readData method. The catch here is to only create a new Word object for each Unique word that you find.

So you'd do something like (again pseudo-code):
  • Get the next word in the file
  • If ArrayList of Word Objects contains this word then:
    • Get the Word Object for the word from the ArrayList
    • Retrieve the current count for the word (using the getCount() method)
    • Add One to the current count value
    • Store the new value back in the Word (using the setCount() method)
  • If the ArrayList of Word Objects does not contain the word then:
    • Create a new Word Object for the word.
    • Store the Word Object in the ArrayList of Word Objects
  • Loop...

While you're looping you should be checking for the special conditions that your assignment outlines (that word space hyphen space word requirement is a little weird) and removing any punctuation that may be attached to words (I would think the word "walk" and the word "walk," would be considered the same word...but then again that word<space>-<space>word thing is weird...so check your requirements)



-Frinny
Nov 13 '09 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: jester.dev | last post by:
Hello, I'm learning Python from Python Bible, and having some problems with this code below. When I run it, I get nothing. It should open the file poem.txt (which exists in the current...
5
by: Theresa Hancock via AccessMonster.com | last post by:
I have an Excel table I need to import into Access. The name is entered into one field "Name". I'd like to have two fields in Access, FirstName and LastName. How do I do this. -- Message posted...
3
by: fb | last post by:
Hello. I have this program that I copied out of a textbook. I can't seem to get it to work. It's a rather old book, that seems to be using old K&R C. I fixed up to be more standardized, but I...
1
by: Scott Schluer | last post by:
Hi all, I got a JavaScript function from a website that uses regular expressions to count the number of words in a textbox. I'm trying to replicate it with ASP.NET so I can run a second check on...
2
by: Tim | last post by:
Hello, I've finally managed to remotely load a DLL. I've expanded the code to load it in a seperate domain to unload the appdomain, which works to a certain extend. The host application always...
19
by: jeroenvlek | last post by:
Hi there, I've never done this before, so I don't know about any layout possibilities. My apologies :) The problem is this: I've written a function: map<const char*,...
6
by: joawhzr | last post by:
Hello, my friends, I hope this is not an already asked (and resolved) question: Is it possible to find out which word or words in a text field (an address for example) are in another table? and...
5
by: gflor16 | last post by:
Problem: I have this code to run a word counter. But I have a problem when I hit the enter key, it doesn't give me any output of how many chars or words. ''' <summary> ''' Returns Word...
2
by: luv737 | last post by:
I made sure there where no white spaces before or after the PHP begin and end tags in all the scripts called from the require_once. Also these scripts have not really been touched but we upgraded...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.