By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,233 Members | 1,500 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,233 IT Pros & Developers. It's quick & easy.

UCASE really slows queries down

P: n/a
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.

Oct 13 '06 #1
Share this Question
Share on Google+
16 Replies


P: n/a

shawno wrote:
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.
can't u give just like this

select count(*) from blah where column1 like ucase('%whatever%')

Oct 13 '06 #2

P: n/a
In article <11**********************@h48g2000cwc.googlegroups .com>,
st*******@gmail.com says...
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.

Offcourse it's slowing down your queries. You want to compare with a
string that's not in the index, so a tablescan is needed to find all
matching values.
You could add a generated column to the table which contains the ucase
value of that column and create an index on it. The optimizer will
automatically use this index.
Oct 13 '06 #3

P: n/a
can't u give just like this
>
select count(*) from blah where column1 like ucase('%whatever%')
Unfortunately not because the search needs to be case insensitive and
the columns are mixed case.
eg. a search of 'John' with the above code would not return rows with
'john', 'JOHN', etc... and it needs to.

At this point, my only option is to convert all the columns in the
table to uppercase to avoid having to do the UCASE, but this is going
to be a major pain as their are over a hundred columns that I would
have to do this for.

Oct 13 '06 #4

P: n/a
While stranded on information super highway shawno wrote:
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)
You are executing ucase() twice for every row in result set, this could
make query run slow. You can cut that into half by providing the string
in uppercase (e.g. '%WHATEVER%'). You can further improve performance
if you create another column that contains data from column in uppercase,
you insert/update command should add data in uppercase to this column.

>
Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.
--
Hemant Shah /"\ ASCII ribbon campaign
E-mail: No************@xnet.com \ / ---------------------
X against HTML mail
TO REPLY, REMOVE NoJunkMail / \ and postings
FROM MY E-MAIL ADDRESS.
-----------------[DO NOT SEND UNSOLICITED BULK E-MAIL]------------------
I haven't lost my mind, Above opinions are mine only.
it's backed up on tape somewhere. Others can have their own.
Oct 13 '06 #5

P: n/a
Gert van der Kooij wrote:
Offcourse it's slowing down your queries. You want to compare with a
string that's not in the index, so a tablescan is needed to find all
matching values.
You could add a generated column to the table which contains the ucase
value of that column and create an index on it. The optimizer will
automatically use this index.
Thanks for the reply, that would explain the slow query time if it
needs to do a full scan each time. So you are saying I would have to
add another column for each column in my table that could contain mixed
text, and then when I do and update/insert just insert a UCASE version
of the column into the duplicate column? I guess this would work, but
it would mean almost doubling the number of columns in my table(s), so
I would go from approximately 100 to 200 columns. Aside from the
amount of time it would take me to re-create the tables and copy over
all the existing data (which is alot), are there any other performance
issues with doing this?

Oct 13 '06 #6

P: n/a
"Gert van der Kooij" <no****@invalid.nlwrote in message
news:MP************************@news.xs4all.nl...
In article <11**********************@h48g2000cwc.googlegroups .com>,
st*******@gmail.com says...
>I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.


Offcourse it's slowing down your queries. You want to compare with a
string that's not in the index, so a tablescan is needed to find all
matching values.
You could add a generated column to the table which contains the ucase
value of that column and create an index on it. The optimizer will
automatically use this index.
What do you think about the wildcard search ? Seems to me that even an index
wouldn't solve the problem. Also in the other mail he says that he is using
a lot of columns on this way. I think db2 always decides to do a table scan
if you are searching on more than one column. Maybe the Net Search Extender
is a better choice for this kind of queries ?

Even the maintenance of the indexes should be quite heavy. (I don't know if
this system have many UID statements).
Oct 13 '06 #7

P: n/a
The design certainly looks "odd" (in other usenet groups stronger words
would be chosen by some ;-)
100 indexes on a table just isn't right.
Also I'm surprised that arbitrary searches are being done on arbitrary
columns. That indicates a rather atypical design.
How long are these columns any way?

I think it would be appropriate to discuss the schema first instead of
searching for a quick fix.
I know that is not the desired answer, but it is the right answer.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Oct 13 '06 #8

P: n/a
While stranded on information super highway shawno wrote:
>can't u give just like this

select count(*) from blah where column1 like ucase('%whatever%')

Unfortunately not because the search needs to be case insensitive and
the columns are mixed case.
eg. a search of 'John' with the above code would not return rows with
'john', 'JOHN', etc... and it needs to.

At this point, my only option is to convert all the columns in the
table to uppercase to avoid having to do the UCASE, but this is going
to be a major pain as their are over a hundred columns that I would
have to do this for.
How about using Net Search Extender?

--
Hemant Shah /"\ ASCII ribbon campaign
E-mail: No************@xnet.com \ / ---------------------
X against HTML mail
TO REPLY, REMOVE NoJunkMail / \ and postings
FROM MY E-MAIL ADDRESS.
-----------------[DO NOT SEND UNSOLICITED BULK E-MAIL]------------------
I haven't lost my mind, Above opinions are mine only.
it's backed up on tape somewhere. Others can have their own.
Oct 13 '06 #9

P: n/a
Paul Peters wrote:
I think db2 always decides to do a table scan
if you are searching on more than one column.
No, that's not correct. The DB2 optimizer will create many different plans
where the access is a table scan, an index scan, multiple index scans
combined with a merge etc. The costs for each plan are estimated and the
cheapest plan is chosen.

The only real answer to the OP's question can be given if we see the access
plans. Everything else is just wild guessing.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Oct 13 '06 #10

P: n/a
Odd is an understatement for sure, but unfortunately this is what you
get when a java developer is forced to make such decisions with little
to no DB2 experience and limited DBA experience. There is no DBA on
our project, we just recently migrated to DB2 from SQL server (forced
to do so), and nobody has any clue how about performance and design
from a DB2 optimization perspective, so we're just kinda guessing at
how things should be done.
As for the arbitrary ad-hoc searches, that is how the app was initially
designed and delivered so there is no control over that. The user is
allowed to search on any of the 100 fields in the table and each query
has to be a like. Usually the user will search on one field, but they
can search on multiple if they so desire. So given that scenario, I'm
open to suggestions other than creating an index for each field. Keep
in mind that there are actually multiple tables with 100+ fields so in
total there will be 1000+ indexes in total for the database. There are
also reports that do a count on each column for various reasons, so
again the need for an index.
All of the data is read-only, and the app is just for searching. The
data is input on an annual basis by loading from a number of csv files.

So what else did you want to know about the schema?
Serge Rielau wrote:
The design certainly looks "odd" (in other usenet groups stronger words
would be chosen by some ;-)
100 indexes on a table just isn't right.
Also I'm surprised that arbitrary searches are being done on arbitrary
columns. That indicates a rather atypical design.
How long are these columns any way?

I think it would be appropriate to discuss the schema first instead of
searching for a quick fix.
I know that is not the desired answer, but it is the right answer.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Oct 13 '06 #11

P: n/a
I'd be a lot more interested in the application that REQUIRES? indexes
on each of 106 columns. Insert performance must be terrible and, if some
of the indexes have low cardinality, they won't be used at all.

You may end up with better overall performance and a lot less disk space
by disposing of poor (and rarely used) indexes, adding pre-generated
uppercase columns for common search columns and not using indexes on the
mixed case columns. You'll need to explain variations of your queries to
see what is necessary to have the optimizer choose to use your upper
case indexes. (Don't forget to try different optimization levels.) Worst
case will be a rewrite of the queries to select the mixed case columns
while using the upper case columns in the WHERE clauses.

Phil Sherman

shawno wrote:
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.
Oct 13 '06 #12

P: n/a
What is the content of the data?
Words? Free Text? Codes?
If it's free text then LIKE will hardly be an appropriate predicates.
I assume that on SQL Server you used a case-insensitive database.
So since obviously case doesn't matter what speaks against uppercasing
(or lowercasing) the data on the disk?
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Oct 13 '06 #13

P: n/a
The data is a mixed bag of words and codes. I can't get too specific,
but its basically a flat file that is received from multiple countries,
all of which may have their own standards on how they input their data
which we have no control over. So one file might be mixed case,
another not, one file might have '0123ABC' as a code for one column and
another country might use something completely different or maybe they
don't used that column so it is NULL, hence most columns are nullable
varchar in the table.

Yes, the SQL server was case-insensitive for searching, i.e.
A search for 'John' returned every row with those four letters
regardless of case.

So DB2 must provide the same functionality. As far as I am aware,
correct me if I'm wrong, the only way to do this with DB2 is with UCASE
and LIKE.

Serge Rielau wrote:
What is the content of the data?
Words? Free Text? Codes?
If it's free text then LIKE will hardly be an appropriate predicates.
I assume that on SQL Server you used a case-insensitive database.
So since obviously case doesn't matter what speaks against uppercasing
(or lowercasing) the data on the disk?
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Oct 13 '06 #14

P: n/a
shawno wrote:
The data is a mixed bag of words and codes. I can't get too specific,
but its basically a flat file that is received from multiple countries,
all of which may have their own standards on how they input their data
which we have no control over. So one file might be mixed case,
another not, one file might have '0123ABC' as a code for one column and
another country might use something completely different or maybe they
don't used that column so it is NULL, hence most columns are nullable
varchar in the table.

Yes, the SQL server was case-insensitive for searching, i.e.
A search for 'John' returned every row with those four letters
regardless of case.

So DB2 must provide the same functionality. As far as I am aware,
correct me if I'm wrong, the only way to do this with DB2 is with UCASE
and LIKE.
Yes. DB2 supports generated columns to speed up such queries. This
typically works because only select columns have this requirement (like
names).
Now you stated what you have to hanlde, but what is typical?
If I have a code: aBc12k do users REALLY search for '%C12%'.
If you use a collation which sorts "A next to a" then you could do a
case insensitive search by looking for:
column BETWEEN LOWER(:var) AND UPPER(:var)
because
'aBc12k' BETWEEN 'abc12k' AND 'ABC12K'
That would be very efficient.
You can also use that for LIKE var || '%'
'aBc12k' BETWEEN 'ab ' AND 'ABZZZZ' AND UCASE('aBc12k') LIKE 'AB%'
The BETWEEN will box in the search space and like will do the final
filter.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Oct 13 '06 #15

P: n/a
Amplifying the point made by Gert van der Kooij in post five of this
thread, here's a (possibly) helpful link:

http://www-128.ibm.com/developerwork...3adamache.html

--Jeff

Phil Sherman wrote:
I'd be a lot more interested in the application that REQUIRES? indexes
on each of 106 columns. Insert performance must be terrible and, if some
of the indexes have low cardinality, they won't be used at all.

You may end up with better overall performance and a lot less disk space
by disposing of poor (and rarely used) indexes, adding pre-generated
uppercase columns for common search columns and not using indexes on the
mixed case columns. You'll need to explain variations of your queries to
see what is necessary to have the optimizer choose to use your upper
case indexes. (Don't forget to try different optimization levels.) Worst
case will be a rewrite of the queries to select the mixed case columns
while using the upper case columns in the WHERE clauses.

Phil Sherman

shawno wrote:
I'm relatively new to DB2 and have noticed a problem with my queries
when doing a like with ucase.

Since queries need to be case insensitive, I have to do a ucase on the
column(s) being searched on.

eg. select count(*) from blah where ucase(column1) like
ucase('%whatever%)

Such a query takes about a minute to run which is crazy. If I get rid
of the ucase on the column I'm searching, it only takes about a second
or two to run. I'm obviously missing something here.

The table I am querying contains about 1.6 million rows, 104 columns
(all of which have an index), and I've done a re-org of the indexes and
the ran the runstats from control center. That is about all I know
about how to tweak things in DB2.
Oct 13 '06 #16

P: n/a
>At this point, my only option is to convert all the columns in the table to uppercase to avoid having to do the UCASE, but this is going to be a major pain as their are over a hundred columns that I would have to do this for. <,

Yes, but it is a major pain only once. Would rather have thousands and
thousands of minor pains forever. And while you are doing this,
remember to a CHECK (<col= UCASE(<col>)) to the tables, so that you
will not have to do this again.

Oct 14 '06 #17

This discussion thread is closed

Replies have been disabled for this discussion.