By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,262 Members | 1,161 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,262 IT Pros & Developers. It's quick & easy.

INDEX possible for reverse wildcards?

P: n/a
If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.

Thanks!

- Robert
Jul 20 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Robert Brown wrote:
If I use _reverse_ wildcard search will it always result in a table
scan?
I think so. How else do you do it? You can't generate a list of all
possible wildcards that the string might match, or anything like that.

Basically, the only way I can answer the question mentally is to look
at each wildcard pattern and see whether the 'www.superdupershoes.com'
string matches it - and that's what the optimizers are doing too. So,
the whole table scan is necessary.
Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
None that I can think of.
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.


The example was a good idea - it explains clearly what you're after.

--
Jonathan Leffler #include <disclaimer.h>
Email: jl******@earthlink.net, jl******@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Jul 20 '05 #2

P: n/a

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com
Assuming that you have wildcard at the end of the string (which is the only
case where normal wildcard index scan can be employed; -- well, with one
tiuny exception http://otn.oracle.com/oramag/code/tips2002/100602.html) you
table can be reorganized into

table email_address (id int, email_begin_range varchar, email_end_range
varchar)

with the entries like this

2, www.shoes., 'www.shoes.'||char(255)
....
select id from email_address where 'www.superdupershoes.com' like
email;


And your query becomes

select id from email_address where 'www.shoes.com' between
email_begin_range varchar and email_end_range

Finding a set of intervals covering a point is hard. You can employ R-Tree
index or Bitmapped index, but it would be never as efficient as B-Tree index
range scan in the "reverse" case. Plus, having wildcards in the middle and
the of the string would certainly compromise even this idea.
Jul 20 '05 #3

P: n/a
Jonathan Leffler <jl******@earthlink.net> wrote:
Robert Brown wrote:
If I use _reverse_ wildcard search will it always result in a table
scan?


I think so. How else do you do it? You can't generate a list of all
possible wildcards that the string might match, or anything like that.


Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only need
to look at ones where the second is 'w', '%', or '_'. And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain you
anything, as they all either start with 'w' or a wild-card.)

Note that I am not saying that Oracle actually does do this, I'm simply
saying that it would be conceptually possible for it to do so, without
requiring any changes to the index structure.

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;


Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #4

P: n/a
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only need
to look at ones where the second is 'w', '%', or '_'.
And if the first character happens to be '%', then?
And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain you anything, as they all either start with 'w' or a wild-card.)


Jul 20 '05 #5

P: n/a
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only
need to look at ones where the second is 'w', '%', or '_'.
And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you expect?)

Xho
And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain

you
anything, as they all either start with 'w' or a wild-card.)


--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #6

P: n/a

<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w', '%', or '_'. And if the first character is a 'w' or a '_', you only
need to look at ones where the second is 'w', '%', or '_'.


And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you

expect?)

Trigger happy posting, sorry. The answer was in the second part of your
original message.
Jul 20 '05 #7

P: n/a
After takin a swig o' Arrakan spice grog, ct****@hotmail.com belched out:
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
> Well, you would only need to look at index entries that start with 'w',
> '%', or '_'. And if the first character is a 'w' or a '_', you only
> need to look at ones where the second is 'w', '%', or '_'.


And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you expect?)


I think you're missing the point of the question.

The question is whether or not one can construct an index that makes
it useful to search parts of the interior of a string.

Supposing we have the search criteria:

select * from some_table where name like 'Bro%'; -- 1.

That could be (character set dependent) transformed into the query:

select * from some_table where name >= 'Bro' and name <= 'Brp'; -- 2.

which wouldmake good use of a btree index on NAME to cut down the
number of entries searched to a dull roar.

In contrast that sort of transformation cannot be done with

select * from some_table where name like '%owne'; -- 3.

Unless there's some Rather Clever sort of index involved, the best you
can usually do with that is to do a sequential scan across the table,
looking for matches.

There is actually a pretty easy way of getting a suitable Clever Index
for that scenario. Some databases support "functional indices," and
you might set up an index based on some function of the name. Let's
say:

create index reverse_name on some_table(reverse(name));

In that case, the clever thing would be to transform Query #3 into...

select * from some_table where reverse(name) >= 'enwo'
reverse(name) <= 'enwp';

which could use the "reverse_name" index to do this quickly.

Regrettably, that falls apart if we propose a query like #4...

select * from some_table where name like '%rown%';

Neither of the "obvious" b-tree indices can help that query.

That being said, there do exist indexing schemes specific to text
search that might even be helpful for queries based on extensive
wildcarding and/or regular expressions. Patricia Tries or Arrays can
be used for such purposes, allowing searches to in effect start in the
"interior" of the search criterion.

See _Information Retrieval_ by Frakes and Baeza-Yates [Prentice Hall]
for more details.

This is the sort of extension that "text-oriented" database systems
add in.
--
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/advocacy.html
Rules of the Evil Overlord #171. "I will not locate a base in a
volcano, cave, or any other location where it would be ridiculously
easy to bypass security by rappelling down from above."
<http://www.eviloverlord.com/>
Jul 20 '05 #8

P: n/a
Christopher Browne <cb******@acm.org> wrote:
After takin a swig o' Arrakan spice grog, ct****@hotmail.com belched out:
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
> Well, you would only need to look at index entries that start with
> 'w', '%', or '_'. And if the first character is a 'w' or a '_', you
> only need to look at ones where the second is 'w', '%', or '_'.

And if the first character happens to be '%', then?
Then you check the rest of it to see if it matches. (What did you
expect?)


I think you're missing the point of the question.

The question is whether or not one can construct an index that makes
it useful to search parts of the interior of a string.


Actually, the original question was rather Oracle *does* do it, not whether
it could do it. When someone suggested that Oracle could not do it with
their current index structure, I suggested they could at least get *some*
benefit from it without changing the physical index structure, only
changing some of the code that accesses it.


Supposing we have the search criteria:

select * from some_table where name like 'Bro%'; -- 1.
But that is what the original poster specifically excluded. He wanted
it the other way around:

select * from some_table where 'Brown' like name;

Where the wildcards reside in the column, not in the bind value.

That could be (character set dependent) transformed into the query:

select * from some_table where name >= 'Bro' and name <= 'Brp'; -- 2.

which wouldmake good use of a btree index on NAME to cut down the
number of entries searched to a dull roar.
Yes, and in the case under discussion, my example could be transformed
to something like:

select * from some_table where 'Brown' like name and substr(name,1,1) in
('B','_','%').

Where the "substr... in ..." part could (at least theoretically) be
supported by the index. Of course for maximum benefit you would have to
have to code the access method itself, not just some query re-write,
because you could apply it recursively to each successive letter, up until
a % is found, at which point you would have to return to the brute force
method (or some even more clever method)


In contrast that sort of transformation cannot be done with

select * from some_table where name like '%owne'; -- 3.

Unless there's some Rather Clever sort of index involved, the best you
can usually do with that is to do a sequential scan across the table,
looking for matches.


Meanwhile, in the situation we were actually talking about, with the
pattern in the column, not in the bind variable---Sure, you still have to
scan the patterns that do start with %, but there may be an aweful lot of
patterns that don't start with % (or with _, or with the first letter of
the query), and hence can be ruled out using the index, without ever
scanning them.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #9

P: n/a
x
**** Post for FREE via your newsreader at post.usenet.com ****
"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.


It is possible to define an order on patterns ?


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
*** Usenet.com - The #1 Usenet Newsgroup Service on The Planet! ***
http://www.usenet.com
Unlimited Download - 19 Seperate Servers - 90,000 groups - Uncensored
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Jul 20 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.