473,415 Members | 1,562 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,415 software developers and data experts.

Full Text File Search with Indexing Service on Windows (cont.)

Here's the rest of the tutorial I started earlier:

Aside from text within a document, Indexing Service let you search on
meta information stored in the files. For example, MusicArtist and
MusicAlbum let you find MP3 and other music files based on the singer
and album name; DocAuthor let you find Office documents created by a
certain user; DocAppName let you find files of a particular program,
and so on.

Indexing Service uses plug-ins known as iFilters to extract information
from files it indexes. A default installation of Windows has iFilters
for many common file formats like HTML, Word, PowerPoint, and Excel.
You can extend Indexing Service's capability by installing additional
iFilters. Many are listed at http://www.ifilter.org/, with support
available for PDF, Photoshop, ZIP, Visio, Open Office, and others.

In the previous example, we used CONTAINS(Contents, '$keyword') to
search for a particular key word. Only files containing that exact word
would be returned. If $keyword is 'date,' then Indexing Service would
find those files with the word "date" but not those containing 'dates.'
To relax the criteria somewhat, we can use the FORMSOF (INFLECTIONAL,
<word>) construct. Example:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, date)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Now Indexing Service will look for all the inflected forms of the word:
date, dates, dating, dated, etc. If the word specified is "good," then
it'd look for good, better, best, and well.

To search on a partial word, we use the * sign:

$keyword = ' "kn*" ';

The double-quotation marks indicate a wild-card search. The above
pattern means any word starting with "kn" is considered a match.

Indexing Service also supports the use of the <fieldLIKE '%pattern%'
and <field= 'value' SQL expressions. They are best avoided, however,
as they can be incredible slow: Matching against the value of a field
often means reading from the files.

To sort the results, we add an ORDER BY clause:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, good)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')
ORDER BY size DESC";
$res = oledb_query($sql, $link);

The above example list the files found from the biggest to the
smallest. "ORDER BY write DESC" would list the more recently modified
files first, while "ORDER BY create DESC" list first the ones more
recently created. You can, of course, also use these file attributes as
search criteria.

Thus far we have been searching on the computer's default catalog. If
searching will be done only in a particular folder, it's worthwhile to
create a separate catalog. You can do this in the Computer Management
console. To search different catalog to OLE-DB, you specify the catalog
name in the connection string as the data source::

$link = oledb_open("Provider=MSIDXS; Data Source=web_cat");

Finally, what if you want to search files residing on a network server?
While it's possible to index a network drive, it's not terribly
efficient. Instead, you'd want to enable Indexing Service on that
computer and perform the search there.

To search a remote catalog, we prepend the SCOPE() statement with the
computer name and the catalog name:

$dir = '\\fileserver\projects'
$keyword = 'FORMSOF (INFLECTIONAL, bad)';
$sql = "SELECT filename, size, path
FROM fileserver.System..SCOPE('DEEP TRAVERSAL OF
\"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Note that the double period is not a typo. Windows Authentication is
used to determine what files are visible. For the code above to work
the web server has to run as a user on the network.

Aug 22 '06 #1
3 9519
I just came across this and it is spectacular. It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze. Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file
attributes.

Mike

Chung Leong wrote:
Here's the rest of the tutorial I started earlier:
....

Aug 23 '06 #2

su*******@yahoo.com wrote:
I just came across this and it is spectacular. It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze. Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file
attributes.
I'm not really an expert in Indexing Service. Here's something I just
came across:

http://www.unis.no/Search/ixqLang_UNIS.Htm

The query string described in the document goes into CONTAINS()
statement. I realize now that what I said about the double quoted
strings was wrong. It's used for searching multiple words in a sequence
(i.e. a sentence). You can use the prefix* syntax without the double
quotes.

To look specifiy multiple criteria, you just join them together in the
WHERE clauses as you would when querying a database.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE CONTAINS(contents, 'love AND NOT sex')
AND size 10240
AND write '01-01-2006'
ORDER BY size

The statement above looks for files containing 'love' but not 'sex',
that are larger than 10K and modified some time this year, and lists
them from the smallest to biggest.

To do a wildcard match against the filename, you use the LIKE
'%pattern%' syntax.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE filename LIKE '%.mp3'

This statement looks for files with the .mp3 extension.

Aug 24 '06 #3
Thanks for the reply and leads. After I posted I was thinking about
the queries and realized about the WHERE ... AND ... thing. Also
looking at how MS implements it in their search dialog helped me
understand what was going on.

Thanks again.

Mike

Aug 25 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: kalio80 | last post by:
Hi everyone I posted an enquiry earlier about using c++ code to convert text files between linux & widows. I ended up with this code: #include <iostream> #include <fstream> using namespace...
1
by: Markus Weber | last post by:
Hi All, we have a problem with the Full Text Catalog Search. We use the following SQL Statement for matching companies from a table: select company, lastname, firstname, pkcustomers,...
0
by: Robert Oschler | last post by:
I have a database table with a field that is indexed as a "full-text" search, since I want the capabiity. However, I also want the ability to quickly retrieve records from that table that are ins...
6
by: Richard L Rosenheim | last post by:
I'm trying to write an error log from a web service. I'm getting an exception of "Access to the path {filename} is denied." when I attempt to create the file. I tried specifying the root...
0
by: Chung Leong | last post by:
Here's a short tutorial on how to the OLE-DB extension to access Windows Indexing Service. Impress your office-mates with a powerful full-text search feature on your intranet. It's easier than you...
10
by: noro | last post by:
Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW:
4
Mague
by: Mague | last post by:
I have got a website with a few friends from school(year8). We have got a website up. I want it so it looks for a textfile and reads and writes it into a textbox. It does do this. The only problem is...
7
by: Spam Catcher | last post by:
Is it possible to change the autocomplete behaviour to search for the word instead of just checking for the prefix? For example, if I have the following items: Big Mac Quarter Pounder...
2
by: =?Utf-8?B?SmVycnkgQw==?= | last post by:
I have a server 2008 IIS 7.0 with indexing service installed. I have created the catalog and have a test page using these posts:...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.