472,793 Members | 2,186 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,793 software developers and data experts.

Full Text File Search with Indexing Service on Windows (cont.)

Here's the rest of the tutorial I started earlier:

Aside from text within a document, Indexing Service let you search on
meta information stored in the files. For example, MusicArtist and
MusicAlbum let you find MP3 and other music files based on the singer
and album name; DocAuthor let you find Office documents created by a
certain user; DocAppName let you find files of a particular program,
and so on.

Indexing Service uses plug-ins known as iFilters to extract information
from files it indexes. A default installation of Windows has iFilters
for many common file formats like HTML, Word, PowerPoint, and Excel.
You can extend Indexing Service's capability by installing additional
iFilters. Many are listed at http://www.ifilter.org/, with support
available for PDF, Photoshop, ZIP, Visio, Open Office, and others.

In the previous example, we used CONTAINS(Contents, '$keyword') to
search for a particular key word. Only files containing that exact word
would be returned. If $keyword is 'date,' then Indexing Service would
find those files with the word "date" but not those containing 'dates.'
To relax the criteria somewhat, we can use the FORMSOF (INFLECTIONAL,
<word>) construct. Example:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, date)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Now Indexing Service will look for all the inflected forms of the word:
date, dates, dating, dated, etc. If the word specified is "good," then
it'd look for good, better, best, and well.

To search on a partial word, we use the * sign:

$keyword = ' "kn*" ';

The double-quotation marks indicate a wild-card search. The above
pattern means any word starting with "kn" is considered a match.

Indexing Service also supports the use of the <fieldLIKE '%pattern%'
and <field= 'value' SQL expressions. They are best avoided, however,
as they can be incredible slow: Matching against the value of a field
often means reading from the files.

To sort the results, we add an ORDER BY clause:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, good)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')
ORDER BY size DESC";
$res = oledb_query($sql, $link);

The above example list the files found from the biggest to the
smallest. "ORDER BY write DESC" would list the more recently modified
files first, while "ORDER BY create DESC" list first the ones more
recently created. You can, of course, also use these file attributes as
search criteria.

Thus far we have been searching on the computer's default catalog. If
searching will be done only in a particular folder, it's worthwhile to
create a separate catalog. You can do this in the Computer Management
console. To search different catalog to OLE-DB, you specify the catalog
name in the connection string as the data source::

$link = oledb_open("Provider=MSIDXS; Data Source=web_cat");

Finally, what if you want to search files residing on a network server?
While it's possible to index a network drive, it's not terribly
efficient. Instead, you'd want to enable Indexing Service on that
computer and perform the search there.

To search a remote catalog, we prepend the SCOPE() statement with the
computer name and the catalog name:

$dir = '\\fileserver\projects'
$keyword = 'FORMSOF (INFLECTIONAL, bad)';
$sql = "SELECT filename, size, path
FROM fileserver.System..SCOPE('DEEP TRAVERSAL OF
\"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Note that the double period is not a typo. Windows Authentication is
used to determine what files are visible. For the code above to work
the web server has to run as a user on the network.

Aug 22 '06 #1
3 9447
I just came across this and it is spectacular. It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze. Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file
attributes.

Mike

Chung Leong wrote:
Here's the rest of the tutorial I started earlier:
....

Aug 23 '06 #2

su*******@yahoo.com wrote:
I just came across this and it is spectacular. It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze. Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file
attributes.
I'm not really an expert in Indexing Service. Here's something I just
came across:

http://www.unis.no/Search/ixqLang_UNIS.Htm

The query string described in the document goes into CONTAINS()
statement. I realize now that what I said about the double quoted
strings was wrong. It's used for searching multiple words in a sequence
(i.e. a sentence). You can use the prefix* syntax without the double
quotes.

To look specifiy multiple criteria, you just join them together in the
WHERE clauses as you would when querying a database.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE CONTAINS(contents, 'love AND NOT sex')
AND size 10240
AND write '01-01-2006'
ORDER BY size

The statement above looks for files containing 'love' but not 'sex',
that are larger than 10K and modified some time this year, and lists
them from the smallest to biggest.

To do a wildcard match against the filename, you use the LIKE
'%pattern%' syntax.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE filename LIKE '%.mp3'

This statement looks for files with the .mp3 extension.

Aug 24 '06 #3
Thanks for the reply and leads. After I posted I was thinking about
the queries and realized about the WHERE ... AND ... thing. Also
looking at how MS implements it in their search dialog helped me
understand what was going on.

Thanks again.

Mike

Aug 25 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: kalio80 | last post by:
Hi everyone I posted an enquiry earlier about using c++ code to convert text files between linux & widows. I ended up with this code: #include <iostream> #include <fstream> using namespace...
1
by: Markus Weber | last post by:
Hi All, we have a problem with the Full Text Catalog Search. We use the following SQL Statement for matching companies from a table: select company, lastname, firstname, pkcustomers,...
0
by: Robert Oschler | last post by:
I have a database table with a field that is indexed as a "full-text" search, since I want the capabiity. However, I also want the ability to quickly retrieve records from that table that are ins...
6
by: Richard L Rosenheim | last post by:
I'm trying to write an error log from a web service. I'm getting an exception of "Access to the path {filename} is denied." when I attempt to create the file. I tried specifying the root...
0
by: Chung Leong | last post by:
Here's a short tutorial on how to the OLE-DB extension to access Windows Indexing Service. Impress your office-mates with a powerful full-text search feature on your intranet. It's easier than you...
10
by: noro | last post by:
Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW:
4
Mague
by: Mague | last post by:
I have got a website with a few friends from school(year8). We have got a website up. I want it so it looks for a textfile and reads and writes it into a textbox. It does do this. The only problem is...
7
by: Spam Catcher | last post by:
Is it possible to change the autocomplete behaviour to search for the word instead of just checking for the prefix? For example, if I have the following items: Big Mac Quarter Pounder...
2
by: =?Utf-8?B?SmVycnkgQw==?= | last post by:
I have a server 2008 IIS 7.0 with indexing service installed. I have created the catalog and have a test page using these posts:...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.