Timings... sometimes there are almost too many ways to do the same thing.

The only significant findings I see from all the below timings is:

1) Integer math is generally fastest, naturally. Bigint math isn't much

slower, for integers that all fit within an integer.

2) Converting float to varchar is relatively slow, and should be avoided if

possible. Converting from integer to varchar or varchar to int is several

times faster.

3) Most significantly, and less obvious, CASE expr WHEN .... recomputes expr

for each WHEN condition, unfortunately, and is the same speed (or perhaps

slightly slower) as listing WHEN expr = value for each condition. Perhaps an

indexed computed column (somehow materialized) would be advisable when

possible to avoid repeated computations in CASE..WHEN expressions (if that

helps..).

Note that if you divide by COUNT(*), most timings below are below one

microsecond per row, so this all may not be very significant in most

applications, unless you do frequent aggregations of some sort.

COUNT(*) FROM [my_sf_table] = 477446 rows

The result from each query = either 47527 or 47527.0

Platform: Athlon 2000 XP w/512MB RAM, table seems to be cached in RAM, SQL

2000, all queries run at least 3 times and minimum timings shown (msec).

SRP is a REAL (4 bytes)

Fastest ones are near the end.

CPU SQL

(ms)

-- Convert to varchar (implicitly) and compare right two digits

-- (original version -- no I didn't write it)

4546 select sum(case right(srp,2)

when '99' then 1 when '49' then 1 else 0 end)

from sf

-- Use LIKE for a single comparison instead of two, much faster

-- Note that the big speedup indicates that

-- CASE expr WHEN y then a WHEN z then b .

-- recalculates expr for each WHEN clause

2023 select sum(case when srp like '%[49]9' then 1 else 0 end)

from sf

-- Floating point method of taking a modulus (lacking fmod/modf)

2291 select sum(case round(srp - 100e*floor(srp*.01e),0)

when 99 then 1 when 49 then 1 else 0 end)

from sf

-- Round to nearest 50 and compare with 49

1322 select sum(case round(srp-50e*floor(srp*.02e),0)

when 49 then 1 else 0 end)

from sf

-- Divide by 49 by multiplying by (slightly larger than) 1e/49e

811 select sum(floor((cast(srp as integer)%50)*2.04082E-2))

from sf

-- Integer approach without using CASE

731 select sum(coalesce(nullif(sign(cast(srp as

integer)%50-48),-1),0))

from sf

-- My original integer approach

651 select sum(case cast(srp as integer)%100

when 99 then 1 when 49 then 1 else 0 end)

from sf

-- Modulus 50 integer approach without CASE

481 select sum((cast(srp as integer)%50)/49)

from sf

-- Modulus 50 integer approach

460 select sum(case cast(srp as integer)%50

when 49 then 1 else 0 end)

from sf

-- bigint without CASE

531 select sum((cast(srp as bigint)%50)/49)

from sf

-- bigint with CASE

521 select sum(case cast(srp as bigint)%50

when 49 then 1 else 0 end)

from sf

-- get SIGN to return -1 or 0, then add 1

-- much better than the coalesce+nullif approach

500 select sum(sign(cast(srp as integer)%50-49)+1)

from sf

-- SIGN with BIGINT

551 select sum(sign(cast(srp as bigint)%50-49)+1)

from sf

BTW, I know srp should be int to begin with for this to be faster... Okay,

so...

select cast(srp as int) srp into sf from [my_real_sf_table]

720 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf

339 select sum(1+sign(srp%50-49)) from sf

310 select sum(srp%50/49) from sf

300 select sum(case srp%50 when 49 then 1 else 0 end) from sf

What if it were a char(7)?

select cast(cast(srp as integer) as char(7)) srp into sf2 from

[my_sf_table]

801 select sum(case right(rtrim(srp),2) when '49' then 1

when '99' then 1 else 0 end) from sf2

717 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2

405 select sum(srp%50/49) from sf2

391 select sum(case srp%50 when 49 then 1 else 0 end) from sf2

How about varchar(7)?

drop table sf2

select cast(cast(srp as integer) as varchar(7)) srp into sf2 from

[my_sf_table]

581 select sum(case right(srp,2) when '49' then 1

when '99' then 1 else 0 end) from sf2

569 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2

LIKE is faster on VARCHAR than on CHAR columns...

Apparently it has to effectively RTRIM the trailing spaces during the LIKE

operation.

Is binary collation any faster?

drop table sf2

select cast(cast(srp as integer) as varchar(7))

COLLATE Latin1_General_BIN srp

into sf2 from tbl_superfile

561 select sum(case right(srp,2) when '49' then 1

when '99' then 1 else 0 end) from sf2

530 select sum(case when srp like '%[49]9' then 1 else 0 end) from sf2

Binary collation comparisons are slightly faster, though it's not a big

difference (with just two characters being compared).

662 select sum(case convert(binary(2),right(srp,2))

when 0x3439 then 1 when 0x3939 then 1 else 0 end) from sf2

-----------

5037 select right(srp,2) srp,count(*) from my_sf_table

group by right(srp,2)

order by right(srp,2)

920 select cast(srp as int)%100 srp,count(*) from my_sf_table

group by cast(srp as int)%100

order by cast(srp as int)%100

---

On the one hand, premature optimization can be a waste of time and energy.

On the other hand, understanding performance implications of various

operations can help write more efficient systems.

In any case, an indexed computed column or one updated on a trigger could

virtually eliminate the need for any of these calculations to be performed,

except upon insertion or update, so maybe my comparisons aren't very

meaningful for most applications, considering we're talking about less than

3 microseconds per row here worst-case.

But the results remind me, some recommend avoiding Identity when it's not

necessary. I find Identity(int,1,1) to be a nice, compact surrogate key that

is useful for quick comparisons, grouping, etc, and so on. Also, it seems

most appropriate as the primary key to all lookup tables in a star schema in

OLAP data warehousing. (?) Of course, in some situations, it's not

appropriate, particularly when having a surrogate key violates data

integrity by allowing duplicates that would not be allowed with a proper

primary key constraint, or when the surrogate key is completely redundant

with (especially a short) single-column unique key value that would be a

better selection as the primary key. With multi-column primary keys, I think

it's sometimes convenient to have a surrogate Identity if only for IN

clauses that reference that identity column (though EXISTS can usually

replace those, so maybe that's a weak excuse for an extra column.)