469,927 Members | 1,747 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Share your developer knowledge by writing an article on Bytes.

Text retrieval systems - 5: the BookMark implementation

11,448 Expert 8TB
Greetings,

Introduction

At this moment we have a TextProcessor, a LibraryBuilder as well as the Library
itself. As you read last week a Library is capable of producing pieces of text
in a simple way. We also briefly mentioned the BookMark which represents a
single paragraph of text. We haven't seen it's implementation yet. This is
the topic of this week's article part.

BookMark implementation

Since BookMarks are so strongly coupled with a Library I decided to implement
a BookMark as a nested non-static class. This allows for the implementation to
use all of the Library, including the private parts of it. Creatively I named
this class the BookMarkImpl. Because it's a nested class it is defined inside
the definition of the Library class like this:

Expand|Select|Wrap|Line Numbers
  1. public class Library {
  2.    ...
  3.    private class BookMarkImpl implements BookMark {
  4.       ...
  5.    }
  6. }
  7.  
Observe that this is a private class, i.e. nobody outside the Library class
knows anything at all about this class. Just the BookMark interface is known
to everything else except for the Libraray, i.e. it knows about this class.

Also note that I didn't make this a Serializable class because if you serialize
an instance of this little class you immediately serialize the entire Library
object with it which is not what I want. It would be as if you're asking for
a little banana and it'd come with an entire gorilla attached to it.

The BookMarkImpl class has only one private int member, the paragraph number p.
Here is the first part of the definition:

Expand|Select|Wrap|Line Numbers
  1. private class BookMarkImpl implements BookMark {
  2.  
  3.     private int p;
  4.  
  5.     private BookMarkImpl(int p) { 
  6.  
  7.         if (p < 0 || p >= paragraphs.length)
  8.             throw new IndexOutOfBoundsException(
  9.             "paragraph : !(0 <= "+p+" < "+paragraphs.length);
  10.  
  11.         this.p= p; 
  12.     }
  13.  
Even the constructor of this class is private to make sure that only a Library
object can construct a BookMarkImpl directly. From the previous article part
you saw that the Library can hand out BookMarks to the outside world, so the
paragraph number may be incorrect. That's why the BookMarkImpl class protects
itself against such errors by throwing an IndexOutOfBoundsException if the
parameter does not represent a valid paragraph number.

The BookMarkImpl implements a small convenience method for its own use:

Expand|Select|Wrap|Line Numbers
  1. private Library getLibrary() { return Library.this; }
  2.  
It uses this little method for its equals() method like this:

Expand|Select|Wrap|Line Numbers
  1. public boolean equals(Object obj) {
  2.  
  3.     if (obj == null || !(obj instanceof BookMarkImpl)) return false;
  4.  
  5.     BookMarkImpl that= (BookMarkImpl)obj;
  6.  
  7.     return this.getLibrary() == that.getLibrary() && 
  8.            this.p            == that.p;
  9. }
  10.  
Two BookMarkImpls are considered equal if they refer to the same paragraph
stored in the same Library.

Of course a hashCode() method accompanies the equals() method. It is very simple:

Expand|Select|Wrap|Line Numbers
  1. public int hashCode() { return p; }
  2.  
What else could it return? These two small methods make this class ideal for
storage in a Map or Set.

Given that single paragraph number, a BookMarkImpl can return the absolute
group, book, chapter and of course the paragraph number itself. Here is how
it is done:

Expand|Select|Wrap|Line Numbers
  1. public int getGroupNumber() {
  2.  
  3.     int c= Library.this.getIndex(chapters, p);
  4.     int b= Library.this.getIndex(books, c);
  5.  
  6.     return Library.this.getIndex(groups, b);
  7. }
  8.  
  9. public int getBookNumber() {
  10.  
  11.     int c= Library.this.getIndex(chapters, p);
  12.  
  13.     return Library.this.getIndex(books, c);
  14. }
  15.  
  16. public int getChapterNumber() {
  17.  
  18.     return Library.this.getIndex(chapters, p);
  19. }
  20.  
  21. public int getParagraphNumber() {
  22.  
  23.     return p;
  24. }
  25.  
Note the heavy use of the getIndex() method which is defined in the Library
itself. We saw its definition in the previous article part. Not only can a
BookMarkImpl return absolute numbers, it is also able to return relative
index numbers, i.e. the book number relative to the group for which the
paragraph is a 'member' etc. Here are the methods:

Expand|Select|Wrap|Line Numbers
  1. public int getRelativeBook() {
  2.  
  3.     int b= getBookNumber();
  4.  
  5.     return b-groups[getGroupNumber()].getIndex();
  6. }
  7.  
  8. public int getRelativeChapter() {
  9.  
  10.     int c= getChapterNumber();
  11.  
  12.     return c-books[getBookNumber()].getIndex();
  13. }
  14.  
  15. public int getRelativeParagraph() { 
  16.  
  17.     return     p-chapters[Library.this.getIndex(chapters, p)].getIndex();
  18. }
  19.  
Of course there is no getRelativeGroup() method because groups are the top level
organization of a library and they are simply numbered 0, 1, 2 ... etc.
Observe the use of the getIndex() method again.

A BookMarkImpl can also return the text that belongs to its paragraph number;
not just the text for the paragraph itself but also for the chapter, book and
group to which the paragraph belongs:

Expand|Select|Wrap|Line Numbers
  1. public String getGroup() {
  2.  
  3.     return groups[getGroupNumber()].getName();
  4. }
  5.  
  6. public String getBook() {
  7.  
  8.     return books[getBookNumber()].getName();
  9. }
  10.  
  11. public String getChapter() { 
  12.  
  13.     return chapters[getChapterNumber()].getName(); 
  14. }
  15.  
  16. public String getParagraph() { 
  17.  
  18.     return Library.this.decompress(paragraphs[p]); 
  19. }
  20.  
Note how the decompress() method, defined in the Library class is used to
produce the actual text of the paragraph. These methods use the previously
defined methods to retrieve the correct absolute numbers given just that
single paragraph number.

A library supports 'notations' belonging to paragraphs; the user is free
to add, alter or remove any notation s/he wants. This is how a BookMarkImpl
offers the needed functionality:

Expand|Select|Wrap|Line Numbers
  1. public void putNote(String note) { notesMap.put(p, note); }
  2.  
  3. public String getNote() { return notesMap.get(p); }
  4.  
It simply retrieves the notesMap from the encapsulating Library object and
returns or sets the appropriate notation that belongs to the BookMark's
paragraph.

Finally, the BookMarkImpl implements the toString() method:

Expand|Select|Wrap|Line Numbers
  1. public String toString() {
  2.  
  3.     return getGroup()+sep+
  4.            getBook()+sep+
  5.            getChapter()+sep+
  6.                getRelativeParagraph()+sep+
  7.                getParagraph();    
  8. }
  9.  
This method uses the previous methods to retrieve all the text that belongs to
the paragraph it represents. The 'sep' variable is a variable defined in the
Library class itself and is the separator text used between the group, book,
chapter and paragraph text. Here is how it is done in the Library class:

Expand|Select|Wrap|Line Numbers
  1. private static final String SEP= "\t";
  2. ...        
  3. private String sep= SEP;
  4. ...
  5. public void setSeparator(String sep) { this.sep= sep; }
  6. public String getSeparator() { return sep; }
  7.  
This is all Java 101: as you can see there is a default separator SEP and the
user can get and set the separator used by the toString() method of the
BookMarkImpl objects. The BookMarkImpl defines all the methods of the BookMark
interface and that's all the user knows about this nested class: it implements
this interface.

Concluding remarks

We have seen an implementation of the BookMark interface in this article part.
BookMarks are also used in the next part of this article: the Queries.
A Query implements the fun part of this little project, i.e. it allows us to
query the entire text in quite a flexible way using regular expressions and
a lot more.

I hope to see you again next week and,

kind regards,

Jos
Aug 5 '07 #1
0 3766

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

reply views Thread by Vik Rubenfeld | last post: by
reply views Thread by SoftComplete Development | last post: by
1 post views Thread by JosAH | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.