473,382 Members | 1,512 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

INDEX possible for reverse wildcards?

If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.

Thanks!

- Robert
Jul 20 '05 #1
9 3956
Robert Brown wrote:
If I use _reverse_ wildcard search will it always result in a table
scan?
I think so. How else do you do it? You can't generate a list of all
possible wildcards that the string might match, or anything like that.

Basically, the only way I can answer the question mentally is to look
at each wildcard pattern and see whether the 'www.superdupershoes.com'
string matches it - and that's what the optimizers are doing too. So,
the whole table scan is necessary.
Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
None that I can think of.
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.


The example was a good idea - it explains clearly what you're after.

--
Jonathan Leffler #include <disclaimer.h>
Email: jl******@earthlink.net, jl******@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Jul 20 '05 #2

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com
Assuming that you have wildcard at the end of the string (which is the only
case where normal wildcard index scan can be employed; -- well, with one
tiuny exception http://otn.oracle.com/oramag/code/tips2002/100602.html) you
table can be reorganized into

table email_address (id int, email_begin_range varchar, email_end_range
varchar)

with the entries like this

2, www.shoes., 'www.shoes.'||char(255)
....
select id from email_address where 'www.superdupershoes.com' like
email;


And your query becomes

select id from email_address where 'www.shoes.com' between
email_begin_range varchar and email_end_range

Finding a set of intervals covering a point is hard. You can employ R-Tree
index or Bitmapped index, but it would be never as efficient as B-Tree index
range scan in the "reverse" case. Plus, having wildcards in the middle and
the of the string would certainly compromise even this idea.
Jul 20 '05 #3
Jonathan Leffler <jl******@earthlink.net> wrote:
Robert Brown wrote:
If I use _reverse_ wildcard search will it always result in a table
scan?


I think so. How else do you do it? You can't generate a list of all
possible wildcards that the string might match, or anything like that.


Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only need
to look at ones where the second is 'w', '%', or '_'. And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain you
anything, as they all either start with 'w' or a wild-card.)

Note that I am not saying that Oracle actually does do this, I'm simply
saying that it would be conceptually possible for it to do so, without
requiring any changes to the index structure.

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;


Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #4
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only need
to look at ones where the second is 'w', '%', or '_'.
And if the first character happens to be '%', then?
And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain you anything, as they all either start with 'w' or a wild-card.)


Jul 20 '05 #5
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w',
'%', or '_'. And if the first character is a 'w' or a '_', you only
need to look at ones where the second is 'w', '%', or '_'.
And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you expect?)

Xho
And so on. Kind of
like a skip scan. (or course, in the example given, this wouldn't gain

you
anything, as they all either start with 'w' or a wild-card.)


--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #6

<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
Well, you would only need to look at index entries that start with 'w', '%', or '_'. And if the first character is a 'w' or a '_', you only
need to look at ones where the second is 'w', '%', or '_'.


And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you

expect?)

Trigger happy posting, sorry. The answer was in the second part of your
original message.
Jul 20 '05 #7
After takin a swig o' Arrakan spice grog, ct****@hotmail.com belched out:
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
> Well, you would only need to look at index entries that start with 'w',
> '%', or '_'. And if the first character is a 'w' or a '_', you only
> need to look at ones where the second is 'w', '%', or '_'.


And if the first character happens to be '%', then?


Then you check the rest of it to see if it matches. (What did you expect?)


I think you're missing the point of the question.

The question is whether or not one can construct an index that makes
it useful to search parts of the interior of a string.

Supposing we have the search criteria:

select * from some_table where name like 'Bro%'; -- 1.

That could be (character set dependent) transformed into the query:

select * from some_table where name >= 'Bro' and name <= 'Brp'; -- 2.

which wouldmake good use of a btree index on NAME to cut down the
number of entries searched to a dull roar.

In contrast that sort of transformation cannot be done with

select * from some_table where name like '%owne'; -- 3.

Unless there's some Rather Clever sort of index involved, the best you
can usually do with that is to do a sequential scan across the table,
looking for matches.

There is actually a pretty easy way of getting a suitable Clever Index
for that scenario. Some databases support "functional indices," and
you might set up an index based on some function of the name. Let's
say:

create index reverse_name on some_table(reverse(name));

In that case, the clever thing would be to transform Query #3 into...

select * from some_table where reverse(name) >= 'enwo'
reverse(name) <= 'enwp';

which could use the "reverse_name" index to do this quickly.

Regrettably, that falls apart if we propose a query like #4...

select * from some_table where name like '%rown%';

Neither of the "obvious" b-tree indices can help that query.

That being said, there do exist indexing schemes specific to text
search that might even be helpful for queries based on extensive
wildcarding and/or regular expressions. Patricia Tries or Arrays can
be used for such purposes, allowing searches to in effect start in the
"interior" of the search criterion.

See _Information Retrieval_ by Frakes and Baeza-Yates [Prentice Hall]
for more details.

This is the sort of extension that "text-oriented" database systems
add in.
--
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/advocacy.html
Rules of the Evil Overlord #171. "I will not locate a base in a
volcano, cave, or any other location where it would be ridiculously
easy to bypass security by rappelling down from above."
<http://www.eviloverlord.com/>
Jul 20 '05 #8
Christopher Browne <cb******@acm.org> wrote:
After takin a swig o' Arrakan spice grog, ct****@hotmail.com belched out:
"Mikito Harakiri" <mi*********@iahu.com> wrote:
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
> Well, you would only need to look at index entries that start with
> 'w', '%', or '_'. And if the first character is a 'w' or a '_', you
> only need to look at ones where the second is 'w', '%', or '_'.

And if the first character happens to be '%', then?
Then you check the rest of it to see if it matches. (What did you
expect?)


I think you're missing the point of the question.

The question is whether or not one can construct an index that makes
it useful to search parts of the interior of a string.


Actually, the original question was rather Oracle *does* do it, not whether
it could do it. When someone suggested that Oracle could not do it with
their current index structure, I suggested they could at least get *some*
benefit from it without changing the physical index structure, only
changing some of the code that accesses it.


Supposing we have the search criteria:

select * from some_table where name like 'Bro%'; -- 1.
But that is what the original poster specifically excluded. He wanted
it the other way around:

select * from some_table where 'Brown' like name;

Where the wildcards reside in the column, not in the bind value.

That could be (character set dependent) transformed into the query:

select * from some_table where name >= 'Bro' and name <= 'Brp'; -- 2.

which wouldmake good use of a btree index on NAME to cut down the
number of entries searched to a dull roar.
Yes, and in the case under discussion, my example could be transformed
to something like:

select * from some_table where 'Brown' like name and substr(name,1,1) in
('B','_','%').

Where the "substr... in ..." part could (at least theoretically) be
supported by the index. Of course for maximum benefit you would have to
have to code the access method itself, not just some query re-write,
because you could apply it recursively to each successive letter, up until
a % is found, at which point you would have to return to the brute force
method (or some even more clever method)


In contrast that sort of transformation cannot be done with

select * from some_table where name like '%owne'; -- 3.

Unless there's some Rather Clever sort of index involved, the best you
can usually do with that is to do a sequential scan across the table,
looking for matches.


Meanwhile, in the situation we were actually talking about, with the
pattern in the column, not in the bind variable---Sure, you still have to
scan the patterns that do start with %, but there may be an aweful lot of
patterns that don't start with % (or with _, or with the first letter of
the query), and hence can be ruled out using the index, without ever
scanning them.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 20 '05 #9
x
**** Post for FREE via your newsreader at post.usenet.com ****
"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
If I use _reverse_ wildcard search will it always result in a table
scan? Is it possible to get the DB (Oracle or SQL server) to use
indexes when doing reverse wildcard match?

let's say I have:

table email_address (id int, email varchar)

with the following entries

2, www.%shoes.%
3, w%.super%shoes.%
4, %webbox.somecopany.com

select id from email_address where 'www.superdupershoes.com' like
email;

this returns 2,3

But the query always results in a table scan even if I add an index to
email. What kind of index can I employ in this situation?
Please note that this is a _reverse_ search, the opposite of what's
normally done, i.e. select from email_address where email like
'www.%shoes.com'.


It is possible to define an order on patterns ?


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
*** Usenet.com - The #1 Usenet Newsgroup Service on The Planet! ***
http://www.usenet.com
Unlimited Download - 19 Seperate Servers - 90,000 groups - Uncensored
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: sql_server_2000_user | last post by:
Hi, I have a table with about 305 million rows, and a composite primary key that consists of an ascending int and an ascending varchar(18), which is typically of length 13. Even if all the keys...
1
by: greg.kujawa | last post by:
I am passing along an Index Server 2.0 query though URL strings calling the ASP. I am searching for records by a person's last name and their phone number. Something like: ...
17
by: Dima Tkach | last post by:
Hi, everybody! I just ran into a weird problem on 7.3.4. Here is a simple testcase: rapidb=# create table nametab (name text); CREATE TABLE rapidb=# create index name_idx on nametab(name);...
10
by: Alvaro Puente | last post by:
Hi all! Do any of you know if wildcards are accepted when calling rename() function? Thanks/Alvaro
6
by: Andrew Robinson | last post by:
given a Hashtable and an Index, how do I retrieve the key? Hashtable h = new Hashtable(); h.Add("red", 3); h.Add("blue", 99); h.Add("green", 33); how do I get the key at index 2? h.Keys...
0
by: Jay.Pemberton | last post by:
I have stored procedure that does the following select into Indexing Services. SELECT LEFT(FILENAME, 21) AS Filename, LEFT(REVERSE(SUBSTRING(REVERSE(DIRECTORY), 1, CHARINDEX('\',...
2
by: natG | last post by:
On a 3 column table, the PK consists of all 3 columns. I have queries on the first two columns, as well as on the last two columns. (Select * where column3=x and column2=y.) I was hoping that...
4
by: p175 | last post by:
What are the advantages / disadvantages of having primary keys in lieu of ordinary indexes ? If there are no foreign relationships defined, is it better to have a two column index allowing reverse...
4
by: ronc85 | last post by:
I'm using ASP.NET 2.0 and C#. With the FileUpload control is it possible to set Wildcards (e.g., *.xls and *.doc) so that when the user clicks the Browse button it defaults to Excel and/or Word...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.