473,416 Members | 1,570 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,416 developers and data experts.

Text retrieval systems - 5: the BookMark implementation

11,448 Expert 8TB
Greetings,

Introduction

At this moment we have a TextProcessor, a LibraryBuilder as well as the Library
itself. As you read last week a Library is capable of producing pieces of text
in a simple way. We also briefly mentioned the BookMark which represents a
single paragraph of text. We haven't seen it's implementation yet. This is
the topic of this week's article part.

BookMark implementation

Since BookMarks are so strongly coupled with a Library I decided to implement
a BookMark as a nested non-static class. This allows for the implementation to
use all of the Library, including the private parts of it. Creatively I named
this class the BookMarkImpl. Because it's a nested class it is defined inside
the definition of the Library class like this:

Expand|Select|Wrap|Line Numbers
  1. public class Library {
  2.    ...
  3.    private class BookMarkImpl implements BookMark {
  4.       ...
  5.    }
  6. }
  7.  
Observe that this is a private class, i.e. nobody outside the Library class
knows anything at all about this class. Just the BookMark interface is known
to everything else except for the Libraray, i.e. it knows about this class.

Also note that I didn't make this a Serializable class because if you serialize
an instance of this little class you immediately serialize the entire Library
object with it which is not what I want. It would be as if you're asking for
a little banana and it'd come with an entire gorilla attached to it.

The BookMarkImpl class has only one private int member, the paragraph number p.
Here is the first part of the definition:

Expand|Select|Wrap|Line Numbers
  1. private class BookMarkImpl implements BookMark {
  2.  
  3.     private int p;
  4.  
  5.     private BookMarkImpl(int p) { 
  6.  
  7.         if (p < 0 || p >= paragraphs.length)
  8.             throw new IndexOutOfBoundsException(
  9.             "paragraph : !(0 <= "+p+" < "+paragraphs.length);
  10.  
  11.         this.p= p; 
  12.     }
  13.  
Even the constructor of this class is private to make sure that only a Library
object can construct a BookMarkImpl directly. From the previous article part
you saw that the Library can hand out BookMarks to the outside world, so the
paragraph number may be incorrect. That's why the BookMarkImpl class protects
itself against such errors by throwing an IndexOutOfBoundsException if the
parameter does not represent a valid paragraph number.

The BookMarkImpl implements a small convenience method for its own use:

Expand|Select|Wrap|Line Numbers
  1. private Library getLibrary() { return Library.this; }
  2.  
It uses this little method for its equals() method like this:

Expand|Select|Wrap|Line Numbers
  1. public boolean equals(Object obj) {
  2.  
  3.     if (obj == null || !(obj instanceof BookMarkImpl)) return false;
  4.  
  5.     BookMarkImpl that= (BookMarkImpl)obj;
  6.  
  7.     return this.getLibrary() == that.getLibrary() && 
  8.            this.p            == that.p;
  9. }
  10.  
Two BookMarkImpls are considered equal if they refer to the same paragraph
stored in the same Library.

Of course a hashCode() method accompanies the equals() method. It is very simple:

Expand|Select|Wrap|Line Numbers
  1. public int hashCode() { return p; }
  2.  
What else could it return? These two small methods make this class ideal for
storage in a Map or Set.

Given that single paragraph number, a BookMarkImpl can return the absolute
group, book, chapter and of course the paragraph number itself. Here is how
it is done:

Expand|Select|Wrap|Line Numbers
  1. public int getGroupNumber() {
  2.  
  3.     int c= Library.this.getIndex(chapters, p);
  4.     int b= Library.this.getIndex(books, c);
  5.  
  6.     return Library.this.getIndex(groups, b);
  7. }
  8.  
  9. public int getBookNumber() {
  10.  
  11.     int c= Library.this.getIndex(chapters, p);
  12.  
  13.     return Library.this.getIndex(books, c);
  14. }
  15.  
  16. public int getChapterNumber() {
  17.  
  18.     return Library.this.getIndex(chapters, p);
  19. }
  20.  
  21. public int getParagraphNumber() {
  22.  
  23.     return p;
  24. }
  25.  
Note the heavy use of the getIndex() method which is defined in the Library
itself. We saw its definition in the previous article part. Not only can a
BookMarkImpl return absolute numbers, it is also able to return relative
index numbers, i.e. the book number relative to the group for which the
paragraph is a 'member' etc. Here are the methods:

Expand|Select|Wrap|Line Numbers
  1. public int getRelativeBook() {
  2.  
  3.     int b= getBookNumber();
  4.  
  5.     return b-groups[getGroupNumber()].getIndex();
  6. }
  7.  
  8. public int getRelativeChapter() {
  9.  
  10.     int c= getChapterNumber();
  11.  
  12.     return c-books[getBookNumber()].getIndex();
  13. }
  14.  
  15. public int getRelativeParagraph() { 
  16.  
  17.     return     p-chapters[Library.this.getIndex(chapters, p)].getIndex();
  18. }
  19.  
Of course there is no getRelativeGroup() method because groups are the top level
organization of a library and they are simply numbered 0, 1, 2 ... etc.
Observe the use of the getIndex() method again.

A BookMarkImpl can also return the text that belongs to its paragraph number;
not just the text for the paragraph itself but also for the chapter, book and
group to which the paragraph belongs:

Expand|Select|Wrap|Line Numbers
  1. public String getGroup() {
  2.  
  3.     return groups[getGroupNumber()].getName();
  4. }
  5.  
  6. public String getBook() {
  7.  
  8.     return books[getBookNumber()].getName();
  9. }
  10.  
  11. public String getChapter() { 
  12.  
  13.     return chapters[getChapterNumber()].getName(); 
  14. }
  15.  
  16. public String getParagraph() { 
  17.  
  18.     return Library.this.decompress(paragraphs[p]); 
  19. }
  20.  
Note how the decompress() method, defined in the Library class is used to
produce the actual text of the paragraph. These methods use the previously
defined methods to retrieve the correct absolute numbers given just that
single paragraph number.

A library supports 'notations' belonging to paragraphs; the user is free
to add, alter or remove any notation s/he wants. This is how a BookMarkImpl
offers the needed functionality:

Expand|Select|Wrap|Line Numbers
  1. public void putNote(String note) { notesMap.put(p, note); }
  2.  
  3. public String getNote() { return notesMap.get(p); }
  4.  
It simply retrieves the notesMap from the encapsulating Library object and
returns or sets the appropriate notation that belongs to the BookMark's
paragraph.

Finally, the BookMarkImpl implements the toString() method:

Expand|Select|Wrap|Line Numbers
  1. public String toString() {
  2.  
  3.     return getGroup()+sep+
  4.            getBook()+sep+
  5.            getChapter()+sep+
  6.                getRelativeParagraph()+sep+
  7.                getParagraph();    
  8. }
  9.  
This method uses the previous methods to retrieve all the text that belongs to
the paragraph it represents. The 'sep' variable is a variable defined in the
Library class itself and is the separator text used between the group, book,
chapter and paragraph text. Here is how it is done in the Library class:

Expand|Select|Wrap|Line Numbers
  1. private static final String SEP= "\t";
  2. ...        
  3. private String sep= SEP;
  4. ...
  5. public void setSeparator(String sep) { this.sep= sep; }
  6. public String getSeparator() { return sep; }
  7.  
This is all Java 101: as you can see there is a default separator SEP and the
user can get and set the separator used by the toString() method of the
BookMarkImpl objects. The BookMarkImpl defines all the methods of the BookMark
interface and that's all the user knows about this nested class: it implements
this interface.

Concluding remarks

We have seen an implementation of the BookMark interface in this article part.
BookMarks are also used in the next part of this article: the Queries.
A Query implements the fun part of this little project, i.e. it allows us to
query the entire text in quite a flexible way using regular expressions and
a lot more.

I hope to see you again next week and,

kind regards,

Jos
Aug 5 '07 #1
0 4039

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Vik Rubenfeld | last post by:
(Disclaimer: I know nothing about Javascript). As many here probably know, there's a web service called Furl that lets you copy the text of a site and store it for later search and retrieval. ...
0
by: SoftComplete Development | last post by:
AlphaTIX is a powerful, fast, scalable and easy to use Full Text Indexing and Retrieval library that will completely satisfy your application's indexing and retrieval needs. AlphaTIX indexing...
5
by: Atara | last post by:
I am trying to convert the following code to VB .Net, I still have some gaps (the lines that are marked with (*)) and also I need an ending condition for the while loop. any help would be...
0
by: JosAH | last post by:
Greetings, Introduction At the end of the last Compiler article part I stated that I wanted to write about text processing. I had no idea what exactly to talk about; until my wife commanded...
0
by: JosAH | last post by:
Greetings, Introduction Last week I started thinking about a text processing facility. I already found a substantial amount of text: a King James version of the bible. I'm going to use that...
0
by: JosAH | last post by:
Greetings, Introduction Before we start designing and implementing our text builder class(es), I'd like to mention a reply by Prometheuzz: he had a Dutch version of the entire bible ...
0
by: JosAH | last post by:
Greetings, the last two article parts described the design and implementation of the text Processor which spoonfeeds paragraphs of text to the LibraryBuilder. The latter object organizes, cleans...
1
by: JosAH | last post by:
Greetings, Introduction This week we start building Query objects. A query can retrieve portions of text from a Library. I don't want users to build queries by themselves, because users make...
0
by: JosAH | last post by:
Greetings, welcome back; above we discussed the peripherals of the Library class: loading and saving such an instantiation of it, the BookMark interface and then some. This part of the article...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.