473,394 Members | 1,778 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Implementing inverted index in Java

8
I am trying to implement inverted index in java from few days..but I am unable to implement it.the term and term frequencies are coming nicely but I am unable to retrieve the document Id's.I am not getting the idea how to use two treemap, or how to wrap one treemap inside another treemap.

I am attaching the code here.
Expand|Select|Wrap|Line Numbers
  1. import java.util.*;  
  2. import java.io.*;  
  3.  
  4. public class invertindex{
  5.  
  6. public static void main (String[] args)
  7. {
  8.      TreeMap <String, Integer> t1 = new TreeMap<String, Integer>();
  9.    // TreeMap <String, TreeSet> t2 = new TreeMap<String, TreeSet>();
  10.      readFile(t1);  
  11.      //print(t1);  
  12. }
  13.  
  14. public static int getWord
  15.         (String word, TreeMap <String, Integer> t1 )
  16. {
  17.  if (t1.containsKey(word))
  18.  {
  19.     return t1.get(word);
  20.  }
  21.  else {
  22.      return 0;
  23.  }
  24. }
  25.  
  26.  
  27. public static void readFile(TreeMap <String, Integer> t1 )
  28. {
  29. //    t1.clear();
  30.     Scanner File;
  31.     String word; 
  32.     Integer count;
  33.     String Docs [] = {"words.txt", "words2.txt","words3.txt", "words4.txt",};
  34.    try  
  35.       {      
  36. for (int x=0; x<Docs.length; x++)  
  37.    t1.clear();
  38.  
  39.           File f= new File(Docs[x]);
  40.           BufferedReader br= new BufferedReader(new FileReader(f));
  41.  
  42.         // File = new Scanner(new FileReader(Docs[x]));  
  43.  
  44.    String str="";
  45.       while ((str=br.readLine())!=null)  
  46.       {   
  47. //  word = File.next( );  
  48.           StringTokenizer stk=new StringTokenizer(str, " ,.-");
  49.           while(stk.hasMoreTokens())
  50.           {
  51.              word=stk.nextToken();
  52.           word = word.toLowerCase(); 
  53.  
  54.   count = getWord(word, t1) + 1;  
  55.   t1.put(word, count);  
  56.           }
  57.       }
  58.  
  59.    print(t1);
  60.    }
  61.        } 
  62.  
  63.       catch (Exception e)  
  64.       {  
  65.  System.err.println(e);  
  66.  return;  
  67.       }
  68.    }
  69.  
  70. public static void print(TreeMap<String, Integer> t1)
  71. {
  72.     System.out.println("(Term, TermFrequency)");
  73.     System.out.println("--------------------");
  74.  
  75.      for(String word : t1.keySet( ))  
  76.       {  
  77.          System.out.printf("(%s,%d);", word, t1.get(word));  
  78.       }  
  79.  
  80. }
  81. }
  82.  
Aug 19 '13 #1
0 1929

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Dave Brueck | last post by:
Below is some information I collected from a *small* project in which I wrote a Python version of a Java application. I share this info only as a data point (rather than trying to say this data...
5
by: Maurice Ling | last post by:
Hi, I have read that this had been asked before but there's no satisfactory replies then. I have a module (pA) written in python, which originally is called by another python module (pB), and...
1
by: asj | last post by:
Since Java runs eBay, is used to power most of the hundreds of millions of SIM cards in your cellphones, protects most of the security/healthcare smartcards of entire countries like taiwan, and is...
1
by: David Van D | last post by:
Hi there, A few weeks until I begin my journey towards a degree in Computer Science at Canterbury University in New Zealand, Anyway the course tutors are going to be teaching us JAVA wth bluej...
2
by: Jobs | last post by:
Download the JAVA , .NET and SQL Server interview with answers Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of...
0
by: suryanector | last post by:
anybody knows source code for programs using Index file, inverted file operations, usage of B and B++ trees in C++ language plz send them.
5
by: vd12005 | last post by:
Hello, While playing to write an inverted index (see: http://en.wikipedia.org/wiki/Inverted_index), i run out of memory with a classic dict, (i have thousand of documents and millions of terms,...
29
by: walterbyrd | last post by:
Some think it will. Up untill now, Java has never been standard across different versions of Linux and Unix. Some think that is one reason that some developers have avoided Java in favor of...
0
dmjpro
by: dmjpro | last post by:
Two three days earlier i tested a code, single and double inverted comma represented differently in MS word and Notepad; actually what happened, i was writing something on MS word and finally put it...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.