By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,104 Members | 1,590 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,104 IT Pros & Developers. It's quick & easy.

Keywords in bibliography database

P: n/a
I am a geologist, and over the years I've accumulated quite a number
of proffesional papers on the subject, in various publications. I
would like to make a database that would help me find the information
I want, based on keywords mentioned in the needed paper.
Since I don't feel like inventing hot water, I thought I'd ask you
people a few questions.

I was planning on making tables with a many-to-many relationship for
papers and keywords and I have no problem with that, I've done it
before. My problem is how searching should be implemented. Obviously,
I want to be able to set multiple keywords as criteria for one search
which would return all the papers that satisfy the criteria (logical
OR).

I thought about a multi-select list box, but that looks a bit clumsy,
since there will be *quite a lot* of keywords and sifting through an
enormously long list box doesn't seem right. Maybe multiple combo
boxes (say, up to 10-ish) with the fill-in-as-you-type feature turned
on, all with the same RowSource (the keyword table) would be better?
How would you guys set up a mechanism to let the user define the
search criteria?

Obviously, I'd have to construct a SELECT SQL statement dinamically. I
know I can put quite a lot of ANDs and ORs in the WHERE part of SQL
SELECT. Is there a more elegant way?

I would also like to have some sort of ranking of results based on how
many of the keywords were found for a particular item, and I need some
advice on that as well. The only way that I can think of (not that
I've done too much thinking, though) would be something like multiple
passes (for each keyword) and then counting the number of occurences
of a particular paper, but that is *way* too inefficient... How is
this usually done?

Authors: as with all technical papers, probably more than one, with
the first cited being the principal author. Again many-to-many
relationship, only I thought this time I'd add a field in the junction
table defining author's rank for a particular paper. Would this be
right?
The problem arises with search results: I would like them to be in the
form of a reference, stating the author(s), title, publication, etc. I
don't really see a way to achieve this through SQL alone, but reports
could come in handy there, I'll have a look. If you have any advice on
that, I'd be very grateful.

Please don't refer me to books, unless they are in electronic form and
free on the Web, since I live in a country where they are not easy to
find.

TIA,

Nenad Loncarevic
Mar 18 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Nenad Loncarevic <na**@afrodita.rcub.bg.ac.yu> wrote in
news:jn********************************@4ax.com:
I am a geologist, and over the years I've accumulated quite a number
of proffesional papers on the subject, in various publications. I
would like to make a database that would help me find the information
I want, based on keywords mentioned in the needed paper.
Since I don't feel like inventing hot water, I thought I'd ask you
people a few questions.

I was planning on making tables with a many-to-many relationship for
papers and keywords and I have no problem with that, I've done it
before. My problem is how searching should be implemented. Obviously,
I want to be able to set multiple keywords as criteria for one search
which would return all the papers that satisfy the criteria (logical
OR).

I thought about a multi-select list box, but that looks a bit clumsy,
since there will be *quite a lot* of keywords and sifting through an
enormously long list box doesn't seem right. Maybe multiple combo
boxes (say, up to 10-ish) with the fill-in-as-you-type feature turned
on, all with the same RowSource (the keyword table) would be better?
How would you guys set up a mechanism to let the user define the
search criteria?

Obviously, I'd have to construct a SELECT SQL statement dinamically. I
know I can put quite a lot of ANDs and ORs in the WHERE part of SQL
SELECT. Is there a more elegant way?

I would also like to have some sort of ranking of results based on how
many of the keywords were found for a particular item, and I need some
advice on that as well. The only way that I can think of (not that
I've done too much thinking, though) would be something like multiple
passes (for each keyword) and then counting the number of occurences
of a particular paper, but that is *way* too inefficient... How is
this usually done?

Authors: as with all technical papers, probably more than one, with
the first cited being the principal author. Again many-to-many
relationship, only I thought this time I'd add a field in the junction
table defining author's rank for a particular paper. Would this be
right?
The problem arises with search results: I would like them to be in the
form of a reference, stating the author(s), title, publication, etc. I
don't really see a way to achieve this through SQL alone, but reports
could come in handy there, I'll have a look. If you have any advice on
that, I'd be very grateful.

Please don't refer me to books, unless they are in electronic form and
free on the Web, since I live in a country where they are not easy to
find.

TIA,

Nenad Loncarevic


My advice is to forget all this work, transfer all your papers to MS-Word
(.doc) files and learn about Indexing Service. It will do all you want,
plus a million times more and (after the learning part) it will be a
million times simpler. It's also net available so you can make your
papers and searches for them available to your colleagues or whomever.

I've seen these used for a nursing service over the net. Nurse comes to
the home, finds he/she has to give a transfusion to a person in a wheel
chair. Wants to be sure of proper procedure, caveats etc. Opens laptop,
searches for transfusion "near" wheelchair. Wham! three documents pop up,
listed by name, author, key words and date.

--
Lyle Fairfield
Mar 18 '06 #2

P: n/a
Table of papers, table of keywords, and a junction table to handle many
keywords for many papers--that all makes perfect sense.

10 different combos where the user can select a keyword is workable if you
want to ensure the user is limited to only selecting known keywords.

If you plan to offer a mix of AND and OR options, you will need to devise a
way to bracket your interface. In case you are not aware:
a AND (b OR c)
is not the same as:
(a AND b) OR c

The IN operator is also useful to avoid heaps of OR operators, e.g.:
Category IN ('cat', 'fish', 'mule')
rather than:
Category = 'cat' OR Category = 'fish' OR Category = 'mule'

If you are not too worried about restricting the user to right keywords, you
could just let them type a bunch of keywords into a text box and parse them
into the WHERE clause like the example code at the end of this reply.

Ranking is a big area on it own. If you want to know which keywords the
match and therfore how many matches, you might need to return matches for
each keyword, combine them, and then deduplicate with count. That could be a
monster UNION ALL query that becomes the source for a Totals query to give
the count. If you actually want more complex ranking, you might end up
writing each keyword's result to a temp table, and running the Totals query
to count and sum the ranking values.

Your structure for author is correct. Unfortunately the SQL language is not
good at combining, so I suspect most of us use the approach suggeted in this
article:
http://www.mvps.org/access/modules/mdl0004.htm
You can call the function in the query (though for some reason the last
couple of versions of Access run user-defined functions more slowly in the
context of queries.)

With authors, there can be a priority of ordering them, i.e. the lead author
is listed first. Your junction table will need to specify the order of the
authors for the publication, and your ranking might want to take that into
account also.

Hope that is of some use to you.
This code illustrates how to parse multiple keywords from a single text box,
based on a space between words:
(Requires Access 2000 or later)

Private Sub txtKeywords_AfterUpdate()
Dim strWhere As String
Dim strWord As String
Dim varKeywords As Variant 'Array of keywords.
Dim i As Integer
Dim lngLen As Long

If IsNull(Me.txtKeywords) Then 'Show all if blank.
If Me.FilterOn Then
Me.FilterOn = False
End If
Else
varKeywords = Split(Me.txtKeywords, " ")
If UBound(varKeywords) >= 99 Then '99 words max.
MsgBox "Too many words."
Else
'Build up the Where string from the array.
For i = LBound(varKeywords) To UBound(varKeywords)
strWord = Trim$(varKeywords(i))
If strWord <> vbNullString Then
strWhere = strWhere & "([Notes] Like ""*" & strWord &
"*"") OR "
End If
Next
lngLen = Len(strWhere) - 4 'Without trailing " OR ".
If lngLen > 0 Then
Me.Filter = Left(strWhere, lngLen)
Me.FilterOn = True
Else
Me.FilterOn = False
End If
End If
End If
End Sub
--
Allen Browne - Microsoft MVP. Perth, Western Australia.
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Nenad Loncarevic" <na**@afrodita.rcub.bg.ac.yu> wrote in message
news:jn********************************@4ax.com...
I am a geologist, and over the years I've accumulated quite a number
of proffesional papers on the subject, in various publications. I
would like to make a database that would help me find the information
I want, based on keywords mentioned in the needed paper.
Since I don't feel like inventing hot water, I thought I'd ask you
people a few questions.

I was planning on making tables with a many-to-many relationship for
papers and keywords and I have no problem with that, I've done it
before. My problem is how searching should be implemented. Obviously,
I want to be able to set multiple keywords as criteria for one search
which would return all the papers that satisfy the criteria (logical
OR).

I thought about a multi-select list box, but that looks a bit clumsy,
since there will be *quite a lot* of keywords and sifting through an
enormously long list box doesn't seem right. Maybe multiple combo
boxes (say, up to 10-ish) with the fill-in-as-you-type feature turned
on, all with the same RowSource (the keyword table) would be better?
How would you guys set up a mechanism to let the user define the
search criteria?

Obviously, I'd have to construct a SELECT SQL statement dinamically. I
know I can put quite a lot of ANDs and ORs in the WHERE part of SQL
SELECT. Is there a more elegant way?

I would also like to have some sort of ranking of results based on how
many of the keywords were found for a particular item, and I need some
advice on that as well. The only way that I can think of (not that
I've done too much thinking, though) would be something like multiple
passes (for each keyword) and then counting the number of occurences
of a particular paper, but that is *way* too inefficient... How is
this usually done?

Authors: as with all technical papers, probably more than one, with
the first cited being the principal author. Again many-to-many
relationship, only I thought this time I'd add a field in the junction
table defining author's rank for a particular paper. Would this be
right?
The problem arises with search results: I would like them to be in the
form of a reference, stating the author(s), title, publication, etc. I
don't really see a way to achieve this through SQL alone, but reports
could come in handy there, I'll have a look. If you have any advice on
that, I'd be very grateful.

Please don't refer me to books, unless they are in electronic form and
free on the Web, since I live in a country where they are not easy to
find.

TIA,

Nenad Loncarevic

Mar 18 '06 #3

P: n/a
I've implemented keyword searches by simply using a large
textbox with the caption "Enter keywords using a comma (,)to
separate each keyword." I've found just typing the keywords
faster than messing with comboboxes, list boxes, etc.

VBA code then parses the string into a quoted and delimited list
used by the in clause
Cim D as string 'Delimiter
Dim P as integer 'Pointer
dim Q as string 'Quote
dim strSQL as string

D= ","
P=1
Q= chr(34)

dim StrInput as string, strOutput as string

P= instr(strinput,"," ) 'Check this: I may have the order
backwards.
Do While P>0
strOutput = Q & left(strInput,P-1) & Q & D
Strinput = mid(strInput,P+1)
P= instr(strinput,"," )
loop

strSQL = "SELECT DocumentID from DocKey where keyword in (" &
stroutput & ")"

Q

Nenad Loncarevic <na**@afrodita.rcub.bg.ac.yu> wrote in
news:jn********************************@4ax.com:
I am a geologist, and over the years I've accumulated quite a
number of proffesional papers on the subject, in various
publications. I would like to make a database that would help
me find the information I want, based on keywords mentioned in
the needed paper. Since I don't feel like inventing hot water,
I thought I'd ask you people a few questions.

I was planning on making tables with a many-to-many
relationship for papers and keywords and I have no problem
with that, I've done it before. My problem is how searching
should be implemented. Obviously, I want to be able to set
multiple keywords as criteria for one search which would
return all the papers that satisfy the criteria (logical OR).

I thought about a multi-select list box, but that looks a bit
clumsy, since there will be *quite a lot* of keywords and
sifting through an enormously long list box doesn't seem
right. Maybe multiple combo boxes (say, up to 10-ish) with the
fill-in-as-you-type feature turned on, all with the same
RowSource (the keyword table) would be better? How would you
guys set up a mechanism to let the user define the search
criteria?

Obviously, I'd have to construct a SELECT SQL statement
dinamically. I know I can put quite a lot of ANDs and ORs in
the WHERE part of SQL SELECT. Is there a more elegant way?

I would also like to have some sort of ranking of results
based on how many of the keywords were found for a particular
item, and I need some advice on that as well. The only way
that I can think of (not that I've done too much thinking,
though) would be something like multiple passes (for each
keyword) and then counting the number of occurences of a
particular paper, but that is *way* too inefficient... How is
this usually done?

Authors: as with all technical papers, probably more than one,
with the first cited being the principal author. Again
many-to-many relationship, only I thought this time I'd add a
field in the junction table defining author's rank for a
particular paper. Would this be right?
The problem arises with search results: I would like them to
be in the form of a reference, stating the author(s), title,
publication, etc. I don't really see a way to achieve this
through SQL alone, but reports could come in handy there, I'll
have a look. If you have any advice on that, I'd be very
grateful.

Please don't refer me to books, unless they are in electronic
form and free on the Web, since I live in a country where they
are not easy to find.

TIA,

Nenad Loncarevic


--
Bob Quintal

PA is y I've altered my email address.
Mar 18 '06 #4

P: n/a
Indexing Service is a lot of work; you have to do virtually nothing
except start it
(http://www.windowsnetworking.com/art...g-Service.html)
and run code like this:

Dim c As ADODB.Connection
Dim r As ADODB.Recordset
Dim s As String
Set c = New ADODB.Connection
With c
.Provider = "MSIDXS"
.Open
.DefaultDatabase = "system"
Set r = c.Execute("SELECT filename, docauthor, dockeywords FROM
Scope()" _
& "WHERE filename LIKE '%.doc'" _
& "AND CONTAINS('promise near pay') " _
& "ORDER BY FileName")
s = r.GetString(adClipString, , vbTab, vbNewLine, "null")
End With
Debug.Print s

In about a billionth of a second it gives me the three word documents I
have with the word promise near the word pay, together with their
creator and any keywords entered; there a a gazillion other values that
can be retrieved as well.

promissorynote.doc Lyle Fairfield Kathy, Swissport
promissorynote2.doc Lyle Fairfield Kathy, CRA
promissorynote3.doc Lyle Fairfield null

Mar 18 '06 #5

P: n/a
Thanks for the advice, Lyle. The problem is that all the papers are on
paper, in geological magazines that date all the way back to 1940's.
Not really suitable for transferring to Word. Or maybe I didn't quite
understand what you meant?
Also, I like playing with Access. So far I've even made two databases
that could almost be considered commercial grade, one for a small firm
selling construction material to handle inventory and one for a small
marketing agency selling ads in various newspapers, that handled
practically everything, newspapers, clients, prices, billing etc. Did
them both for some friends, free of charge, just for the fun of it.
So I consider this as a personal challenge and something I *want* to
do and I think it will be a good learning experience.

Regards,
Nenad

My advice is to forget all this work, transfer all your papers to MS-Word
(.doc) files and learn about Indexing Service. It will do all you want,
plus a million times more and (after the learning part) it will be a
million times simpler. It's also net available so you can make your
papers and searches for them available to your colleagues or whomever.

I've seen these used for a nursing service over the net. Nurse comes to
the home, finds he/she has to give a transfusion to a person in a wheel
chair. Wants to be sure of proper procedure, caveats etc. Opens laptop,
searches for transfusion "near" wheelchair. Wham! three documents pop up,
listed by name, author, key words and date.


Mar 19 '06 #6

P: n/a
On Sat, 18 Mar 2006 21:05:50 +0800, "Allen Browne"
<Al*********@SeeSig.Invalid> wrote:
10 different combos where the user can select a keyword is workable if you
want to ensure the user is limited to only selecting known keywords.
That is exactly what I intend.
The IN operator is also useful to avoid heaps of OR operators, e.g.:
Category IN ('cat', 'fish', 'mule')
rather than:
Category = 'cat' OR Category = 'fish' OR Category = 'mule'
Thanks, I'll surely use it.
Ranking is a big area on it own. If you want to know which keywords the
match and therfore how many matches, you might need to return matches for
each keyword, combine them, and then deduplicate with count. That could be a
monster UNION ALL query that becomes the source for a Totals query to give
the count. If you actually want more complex ranking, you might end up
writing each keyword's result to a temp table, and running the Totals query
to count and sum the ranking values.
This is where you lost me a bit. Could you elaborate on this if it's
not too much trouble? Or (just a suggestion) could you maybe write an
article on this (when you find the time) and put it on your Web site?
I'm sure I'm one of many people that find it extremly useful, along
with your posts in this newsgroup. Your explanations are always
comprehensive, thorough and to the point (this is not sucking up).
Your structure for author is correct. Unfortunately the SQL language is not
good at combining, so I suspect most of us use the approach suggeted in this
article:
http://www.mvps.org/access/modules/mdl0004.htm
You can call the function in the query (though for some reason the last
couple of versions of Access run user-defined functions more slowly in the
context of queries.)
I'll read it for sure. If I have any questions about it I'll ask.
With authors, there can be a priority of ordering them, i.e. the lead author
is listed first. Your junction table will need to specify the order of the
authors for the publication, and your ranking might want to take that into
account also.
The authors are not going to be part of criteria, so I don't expect
that to affect ranking.
Hope that is of some use to you.


It most certainly is, thanks very much.

Nenad
Mar 19 '06 #7

P: n/a
I wonder if time required for analysis and data entry of these articles
would exceed the time to scan them into Word Documents with optical
character recognition technology. For me, once everything gets heated
up and running smoothly I suppose it's 30 seconds a page; I have no
idea how long it takes with more advanced hardware and software and
someone with less clumsy fingers (almost everyone) operating it.

Regardless, if you want to play ... enjoy!

Mar 19 '06 #8

P: n/a
The concept of the UNION query it is to build a SELECT for each keyword
being used, and return it in a monster query that has duplicates of the
article, e.g.:

SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'cat'
UNION ALL
SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'fish'
UNION ALL
SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'mule';

You can then create another query that counts how often an article occurs in
the results:
SELECT ArticleID Count(Keyword) AS Ranking
FROM Query1
GROUP BY ArticleID
ORDER BY Count(Keyword) DESC;

For a more elaborate ranking, you could assign a weighting to each keyword.

While I have not used Lyle's suggestion, I quite like the concept of putting
to work the algorithm built into the operating system instead of having to
code your own. It might not be possible for you if you cannot get the
documents into the computer, or there might be other problems with the
approach (such as the way the operating system approaches known file types
and fails to handle zip files as expected), but it could be worth
investigating.

--
Allen Browne - Microsoft MVP. Perth, Western Australia.
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Nenad Loncarevic" <na**@afrodita.rcub.bg.ac.yu> wrote in message
news:v2********************************@4ax.com...
On Sat, 18 Mar 2006 21:05:50 +0800, "Allen Browne"
<Al*********@SeeSig.Invalid> wrote:
10 different combos where the user can select a keyword is workable if you
want to ensure the user is limited to only selecting known keywords.


That is exactly what I intend.
The IN operator is also useful to avoid heaps of OR operators, e.g.:
Category IN ('cat', 'fish', 'mule')
rather than:
Category = 'cat' OR Category = 'fish' OR Category = 'mule'


Thanks, I'll surely use it.
Ranking is a big area on it own. If you want to know which keywords the
match and therfore how many matches, you might need to return matches for
each keyword, combine them, and then deduplicate with count. That could be
a
monster UNION ALL query that becomes the source for a Totals query to give
the count. If you actually want more complex ranking, you might end up
writing each keyword's result to a temp table, and running the Totals
query
to count and sum the ranking values.


This is where you lost me a bit. Could you elaborate on this if it's
not too much trouble? Or (just a suggestion) could you maybe write an
article on this (when you find the time) and put it on your Web site?
I'm sure I'm one of many people that find it extremly useful, along
with your posts in this newsgroup. Your explanations are always
comprehensive, thorough and to the point (this is not sucking up).
Your structure for author is correct. Unfortunately the SQL language is
not
good at combining, so I suspect most of us use the approach suggeted in
this
article:
http://www.mvps.org/access/modules/mdl0004.htm
You can call the function in the query (though for some reason the last
couple of versions of Access run user-defined functions more slowly in the
context of queries.)


I'll read it for sure. If I have any questions about it I'll ask.
With authors, there can be a priority of ordering them, i.e. the lead
author
is listed first. Your junction table will need to specify the order of the
authors for the publication, and your ranking might want to take that into
account also.


The authors are not going to be part of criteria, so I don't expect
that to affect ranking.
Hope that is of some use to you.


It most certainly is, thanks very much.

Nenad

Mar 20 '06 #9

P: n/a
On Mon, 20 Mar 2006 08:29:57 +0800, "Allen Browne"
<Al*********@SeeSig.Invalid> wrote:
The concept of the UNION query it is to build a SELECT for each keyword
being used, and return it in a monster query that has duplicates of the
article, e.g.:

SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'cat'
UNION ALL
SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'fish'
UNION ALL
SELECT ArticleID, Keyword FROM tblArticleKeyword WHERE Keyword = 'mule';

You can then create another query that counts how often an article occurs in
the results:
SELECT ArticleID Count(Keyword) AS Ranking
FROM Query1
GROUP BY ArticleID
ORDER BY Count(Keyword) DESC;
Thanks, Allen, I get it now. It's pretty much what I had in mind when
I wrote about multiple passes, one for each keyword, and then counting
the number of occurences. I hoped there was a more elegant way,
perhaps built into Access, that I didn't know of.
While I have not used Lyle's suggestion, I quite like the concept of putting
to work the algorithm built into the operating system instead of having to
code your own. It might not be possible for you if you cannot get the
documents into the computer, or there might be other problems with the
approach (such as the way the operating system approaches known file types
and fails to handle zip files as expected), but it could be worth
investigating.


I don't think it's feaslible. This is something for personal use, and
I don't feel like investing in some fancy equipment. I do have a
scanner, but the majority of papers are in cyrillic (I live in
Serbia). By the time I get everything up and running I'd probably go
nuts. With the average of 10 pages per one paper, it would take far
less time to type in the title and other data than to scan. Also,
there's the question of storage space. Geology is all about maps and
other images, so it would take up a lot of disk space, and like I said
it's for personal use. I'd probably need a RAID or something for the
thousands of pages (literaly) that would have to be scanned and
stored... I had the idea of doing that while I worked for the State
Geological Survey, but noone would listen back then.

Thanks anyway.

Nenad
Mar 20 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.