473,382 Members | 1,290 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

inverted index, postings lists, and linked lists

2
I was reading about the use of inverted indexes and I was trying to build a simple example to better understand how they work (I am most familar with PHP/MySQL). However, I am still unclear what the table structure would look like.

As far as I understand, there is a table called "dictionary" that has a column with each term being indexed (e.g., cat, dog, rat) and another with a reference to a posting list for each term. Fuurther that the posting list points to specific docIDs for each document where the term appears and that it is structured as a linked list (e.g., value/next-pointer chains).

- I assume each term/posting list would be its own table, correct?
- I am very confused about what columns would be in the posting list table. I know docID is one but what else, why? A next-pointer column? why?

I am leaving out term-frequency and position-in-doc stuff to simplify the example.
Feb 29 '12 #1
2 4581
Luuk
1,047 Expert 1GB
After reading a bit about 'inverted indexes' here:
http://en.wikipedia.org/wiki/Inverted_index

That page gives a good example, to understand how it works.

Overy posting has an index attached to it like in the example:
Expand|Select|Wrap|Line Numbers
  1. "it":     {(0, 0), (0, 3), (1, 2), (2, 0)} 
i Understand the word "it" is stored, and the list "{(0, 0), (0, 3), (1, 2), (2, 0)} " is linked to that word.

So a table could look like this:
Expand|Select|Wrap|Line Numbers
  1. CREATE TABLE `NewTable` (
  2. `word`  varchar(20) NOT NULL ,
  3. `docnr`  mediumint NOT NULL ,
  4. `position`  mediumint NOT NULL ,
  5. PRIMARY KEY (`word`, `docnr`, `position`)
  6. )
  7. ;
Because for every 'word' you should store in which document it is found ('docnr'), and at what position ('position')
Mar 31 '12 #2
Luuk
1,047 Expert 1GB
since you want to know about inverted-indexes, and someone already create a project about it:
http://code.google.com/p/inverted-index/
Mar 31 '12 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Kakarot | last post by:
I'm gona be very honest here, I suck at programming, *especially* at C++. It's funny because I actually like the idea of programming ... normally what I like I'm atleast decent at. But C++ is a...
7
by: Chris Ritchey | last post by:
Hmmm I might scare people away from this one just by the title, or draw people in with a chalange :) I'm writting this program in c++, however I'm using char* instead of the string class, I am...
2
by: Skywise | last post by:
I am fairly new to linked lists. I am trying to write a class using linked lists. It seems to work fine, but I need to know if I have any resource leaks in it because I plan on using this class...
11
by: hana1 | last post by:
Hello guys, I just have a quick question about C++. I am moving from Java to C++. And as many of Java users know, all linked list and trees are already implemented and you just have to instantiate...
3
by: s_subbarayan | last post by:
Dear all, 1)In one of our implementation for an application we are supposed to collate two linked lists.The actual problem is like this: There are two singularly linked lists, the final output...
6
by: paudirac | last post by:
Hi, I need to maintain N linked lists where N is determined at runtime. The linked lists are defined as follows, struct linked_list { int data; struct linked_list next; }
3
by: Little | last post by:
Could someone tell me what I am doing wrong here about declaring mutiple double linked lists. This is what the information is for the project and the code wil be below that. Thank your soo much for...
5
by: vd12005 | last post by:
Hello, While playing to write an inverted index (see: http://en.wikipedia.org/wiki/Inverted_index), i run out of memory with a classic dict, (i have thousand of documents and millions of terms,...
51
by: Joerg Schoen | last post by:
Hi folks! Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.) For linked lists, mergesort is the typical choice. While I was looking for a optimized implementation of mergesort...
23
by: Just Another Victim of the Ambient Morality | last post by:
I'm looking for a linked list implementation. Something iterable with constant time insertion anywhere in the list. I was wondering if deque() is the class to use or if there's something else. ...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.