Timings... sometimes there are almost too many ways to do the same thing.
The only significant findings I see from all the below timings is:
1) Integer math is generally fastest, naturally. Bigint math isn't much
slower, for integers that all fit within an integer.
2) Converting float to varchar is relatively slow, and should be avoided if
possible. Converting from integer to varchar or varchar to int is several
times faster.
3) Most significantly, and less obvious, CASE expr WHEN .... recomputes expr
for each WHEN condition, unfortunately, and is the same speed (or perhaps
slightly slower) as listing WHEN expr = value for each condition. Perhaps an
indexed computed column (somehow materialized) would be advisable when
possible to avoid repeated computations in CASE..WHEN expressions (if that
helps..).
Note that if you divide by COUNT(*), most timings below are below one
microsecond per row, so this all may not be very significant in most
applications, unless you do frequent aggregations of some sort.
COUNT(*) FROM [my_sf_table] = 477446 rows
The result from each query = either 47527 or 47527.0
Platform: Athlon 2000 XP w/512MB RAM, table seems to be cached in RAM, SQL
2000, all queries run at least 3 times and minimum timings shown (msec).
SRP is a REAL (4 bytes)
Fastest ones are near the end.
CPU SQL
(ms)
-- Convert to varchar (implicitly) and compare right two digits
-- (original version -- no I didn't write it)
4546 select sum(case right(srp,2)
when '99' then 1 when '49' then 1 else 0 end)
from sf
-- Use LIKE for a single comparison instead of two, much faster
-- Note that the big speedup indicates that
-- CASE expr WHEN y then a WHEN z then b .
-- recalculates expr for each WHEN clause
2023 select sum(case when srp like '%[49]9' then 1 else 0 end)
from sf
-- Floating point method of taking a modulus (lacking fmod/modf)
2291 select sum(case round(srp - 100e*floor(srp*.01e),0)
when 99 then 1 when 49 then 1 else 0 end)
from sf
-- Round to nearest 50 and compare with 49
1322 select sum(case round(srp-50e*floor(srp*.02e),0)
when 49 then 1 else 0 end)
from sf
-- Divide by 49 by multiplying by (slightly larger than) 1e/49e
811 select sum(floor((cast(srp as integer)%50)*2.04082E-2))
from sf
-- Integer approach without using CASE
731 select sum(coalesce(nullif(sign(cast(srp as
integer)%50-48),-1),0))
from sf
-- My original integer approach
651 select sum(case cast(srp as integer)%100
when 99 then 1 when 49 then 1 else 0 end)
from sf
-- Modulus 50 integer approach without CASE
481 select sum((cast(srp as integer)%50)/49)
from sf
-- Modulus 50 integer approach
460 select sum(case cast(srp as integer)%50
when 49 then 1 else 0 end)
from sf
-- bigint without CASE
531 select sum((cast(srp as bigint)%50)/49)
from sf
-- bigint with CASE
521 select sum(case cast(srp as bigint)%50
when 49 then 1 else 0 end)
from sf
-- get SIGN to return -1 or 0, then add 1
-- much better than the coalesce+nullif approach
500 select sum(sign(cast(srp as integer)%50-49)+1)
from sf
-- SIGN with BIGINT
551 select sum(sign(cast(srp as bigint)%50-49)+1)
from sf
BTW, I know srp should be int to begin with for this to be faster... Okay,
so...
select cast(srp as int) srp into sf from [my_real_sf_table]
720 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf
339 select sum(1+sign(srp%50-49)) from sf
310 select sum(srp%50/49) from sf
300 select sum(case srp%50 when 49 then 1 else 0 end) from sf
What if it were a char(7)?
select cast(cast(srp as integer) as char(7)) srp into sf2 from
[my_sf_table]
801 select sum(case right(rtrim(srp),2) when '49' then 1
when '99' then 1 else 0 end) from sf2
717 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2
405 select sum(srp%50/49) from sf2
391 select sum(case srp%50 when 49 then 1 else 0 end) from sf2
How about varchar(7)?
drop table sf2
select cast(cast(srp as integer) as varchar(7)) srp into sf2 from
[my_sf_table]
581 select sum(case right(srp,2) when '49' then 1
when '99' then 1 else 0 end) from sf2
569 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2
LIKE is faster on VARCHAR than on CHAR columns...
Apparently it has to effectively RTRIM the trailing spaces during the LIKE
operation.
Is binary collation any faster?
drop table sf2
select cast(cast(srp as integer) as varchar(7))
COLLATE Latin1_General_BIN srp
into sf2 from tbl_superfile
561 select sum(case right(srp,2) when '49' then 1
when '99' then 1 else 0 end) from sf2
530 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2
Binary collation comparisons are slightly faster, though it's not a big
difference (with just two characters being compared).
662 select sum(case convert(binary(2),right(srp,2))
when 0x3439 then 1 when 0x3939 then 1 else 0 end) from sf2
-----------
5037 select right(srp,2) srp,count(*) from my_sf_table
group by right(srp,2)
order by right(srp,2)
920 select cast(srp as int)%100 srp,count(*) from my_sf_table
group by cast(srp as int)%100
order by cast(srp as int)%100
---
On the one hand, premature optimization can be a waste of time and energy.
On the other hand, understanding performance implications of various
operations can help write more efficient systems.
In any case, an indexed computed column or one updated on a trigger could
virtually eliminate the need for any of these calculations to be performed,
except upon insertion or update, so maybe my comparisons aren't very
meaningful for most applications, considering we're talking about less than
3 microseconds per row here worst-case.
But the results remind me, some recommend avoiding Identity when it's not
necessary. I find Identity(int,1,1) to be a nice, compact surrogate key that
is useful for quick comparisons, grouping, etc, and so on. Also, it seems
most appropriate as the primary key to all lookup tables in a star schema in
OLAP data warehousing. (?) Of course, in some situations, it's not
appropriate, particularly when having a surrogate key violates data
integrity by allowing duplicates that would not be allowed with a proper
primary key constraint, or when the surrogate key is completely redundant
with (especially a short) single-column unique key value that would be a
better selection as the primary key. With multi-column primary keys, I think
it's sometimes convenient to have a surrogate Identity if only for IN
clauses that reference that identity column (though EXISTS can usually
replace those, so maybe that's a weak excuse for an extra column.)