By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,018 Members | 930 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,018 IT Pros & Developers. It's quick & easy.

help with logic

P: n/a
Hello all,

This is probably not related to db2 but due to fabulous responses i
got earlier for my questions, i am convinced that this forum is full
of smart people.

Here is question:

I have data rows say count of 3000 which i got from sql. i call this
universe.
Now i need samples out of this universe. say my sample size is 45.
this is for audit and samples are random rows picked from universe. i
can use random functions but business people does not want it that
random. so i divide universe by sample and start from first row of
universe skipping universe/sample rows and created a sample. There are
more complications to it but not relevant here. My questions is if
there is some statistical/mathematical formula or way to do this
same thing say pick rows from large dataset which are equally
distributed ( i might not be using the right term "equally
distributed").

Regards,
udbadmin

Mar 21 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
db2admin wrote:
Hello all,

This is probably not related to db2 but due to fabulous responses i
got earlier for my questions, i am convinced that this forum is full
of smart people.

Here is question:

I have data rows say count of 3000 which i got from sql. i call this
universe.
Now i need samples out of this universe. say my sample size is 45.
this is for audit and samples are random rows picked from universe. i
can use random functions but business people does not want it that
random. so i divide universe by sample and start from first row of
universe skipping universe/sample rows and created a sample. There are
more complications to it but not relevant here. My questions is if
there is some statistical/mathematical formula or way to do this
same thing say pick rows from large dataset which are equally
distributed ( i might not be using the right term "equally
distributed").
I believe you want this:

SELECT ...
FROM ( SELECT ..., row_number() over(...) AS rn
FROM ... ) AS t
WHERE rn = universe/sample

Note that you will have to provide an ordering criterion in the OVER(...)
clause. Otherwise, you get an arbitrary order/numbering, and then you
could use RAND() as well:

SELECT ...
FROM ( SELECT ..., RAND() AS rnd
FROM ... ) AS n
ORDER BY rnd
FETCH FIRST sample ROWS ONLY

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany
Mar 21 '07 #2

P: n/a
db2admin wrote:
Hello all,

This is probably not related to db2 but due to fabulous responses i
got earlier for my questions, i am convinced that this forum is full
of smart people.

Here is question:

I have data rows say count of 3000 which i got from sql. i call this
universe.
Now i need samples out of this universe. say my sample size is 45.
this is for audit and samples are random rows picked from universe. i
can use random functions but business people does not want it that
random. so i divide universe by sample and start from first row of
universe skipping universe/sample rows and created a sample. There are
more complications to it but not relevant here. My questions is if
there is some statistical/mathematical formula or way to do this
same thing say pick rows from large dataset which are equally
distributed ( i might not be using the right term "equally
distributed").
In addition to Knuts response, this might fit your needs:

select * from T tablesample system (x)

or

select * from T TABLESAMPLE BERNOULLI(x)
See: http://tinyurl.com/2sd77b
/Lennart

Mar 21 '07 #3

P: n/a
Knut Stolze wrote:
[...]
I believe you want this:

SELECT ...
FROM ( SELECT ..., row_number() over(...) AS rn
FROM ... ) AS t
WHERE rn = universe/sample
But this will only return 1 row, should it not be something like:

WHERE mod(a*rn,b) = 0
/Lennart

Note that you will have to provide an ordering criterion in the OVER(...)
clause. Otherwise, you get an arbitrary order/numbering, and then you
could use RAND() as well:

SELECT ...
FROM ( SELECT ..., RAND() AS rnd
FROM ... ) AS n
ORDER BY rnd
FETCH FIRST sample ROWS ONLY
Mar 21 '07 #4

P: n/a
Lennart wrote:
[...]
>
See: http://tinyurl.com/2sd77b
Beats me why the url didnt work. Here is another shot:

http://publib.boulder.ibm.com/infoce...28%31%29%22%20

/Lennart

Mar 21 '07 #5

P: n/a
Lennart wrote:
Knut Stolze wrote:
[...]
>I believe you want this:

SELECT ...
FROM ( SELECT ..., row_number() over(...) AS rn
FROM ... ) AS t
WHERE rn = universe/sample

But this will only return 1 row, should it not be something like:

WHERE mod(a*rn,b) = 0
Yes, of course...

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany
Mar 22 '07 #6

P: n/a
On Mar 22, 5:14 am, Knut Stolze <sto...@de.ibm.comwrote:
Lennart wrote:
Knut Stolze wrote:
[...]
I believe you want this:
SELECT ...
FROM ( SELECT ..., row_number() over(...) AS rn
FROM ... ) AS t
WHERE rn = universe/sample
But this will only return 1 row, should it not be something like:
WHERE mod(a*rn,b) = 0

Yes, of course...

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany
Thanks for all this help
I get error message when i try to use tablesample BERNOULLI function

SQL0270N Function not supported (Reason code = "43"). SQLSTATE=42997

here is output from db2level

DB21085I Instance "ABCD" uses "64" bits and DB2 code release
"SQL08025"
with level identifier "03060106".
Informational tokens are "DB2 v8.1.1.112", "s060429", "U807381", and
FixPak
"12".
Product is installed at "/usr/opt/db2_08_01".

when i look at help
reason code 43 is

43 Please reissue the runstats command and set the unsupported
option off.

How to reissue runstats command with unsupported option off

Regards,
udbadmin

Mar 22 '07 #7

P: n/a
On Mar 22, 2:17 pm, "db2admin" <jag...@gmail.comwrote:
On Mar 22, 5:14 am, Knut Stolze <sto...@de.ibm.comwrote:
Lennart wrote:
Knut Stolze wrote:
[...]
>I believe you want this:
>SELECT ...
>FROM ( SELECT ..., row_number() over(...) AS rn
> FROM ... ) AS t
>WHERE rn = universe/sample
But this will only return 1 row, should it not be something like:
WHERE mod(a*rn,b) = 0
Yes, of course...
--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Thanks for all this help
I get error message when i try to use tablesample BERNOULLI function

SQL0270N Function not supported (Reason code = "43"). SQLSTATE=42997

here is output from db2level

DB21085I Instance "ABCD" uses "64" bits and DB2 code release
"SQL08025"
with level identifier "03060106".
Informational tokens are "DB2 v8.1.1.112", "s060429", "U807381", and
FixPak
"12".
Product is installed at "/usr/opt/db2_08_01".

when i look at help
reason code 43 is

43 Please reissue the runstats command and set the unsupported
option off.

How to reissue runstats command with unsupported option off

Regards,
udbadmin
I figured it out
I was trying to use tablesample BERNOULLI function on another
subselect and db2 cries for that
table should be solid table like real table, MQT or DTT
i put contents of subselect in DTT and used tablesample on DTT. I
worked fine

thanks to everyone for great help

regards,
jagdip singh

Mar 22 '07 #8

P: n/a
Here is an implementation of the additive congruential method of
generating values in pseudo-random order and is due to Roy Hann of
Rational Commerce Limited, a CA-Ingres consulting firm. It is based
on a shift-register and an XOR-gate, and it has its origins in
cryptography. While there are other ways to do this, this code is
nice because:

1) The algorithm can be written in C or another low level language for
speed. But math is fairly simple even in base ten.

2) The algorithm tends to generate successive values that are
(usually) "far apart", which is handy for improving the performance of
tree indexes. You will tend to put data on separate physical data
pages in storage.

3) The algorithm does not cycle until it has generated every possible
value, so we don't have to worry about duplicates. Just count how
many calls have been made to the generator.

4) The algorithm produces uniformly distributed values, which is a
nice mathematical property to have. It also does not include zero.

Generalizing the algorithm to arbitrary binary word sizes, and
therefore longer number sequences, is not as easy as you might think.
Finding the "tap" positions where bits are extracted for feedback
varies according to the word-size in an extremely non-obvious way.
Choosing incorrect tap positions results in an incomplete and usually
very short cycle, which is unusable. If you want the details and tap
positions for words of one to 100 bits, see E. J. Watson, "Primitive
Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962, p.
368-369. Here is code for a 31-bit integer, which you can use:

see the details at:

http://www.rationalcommerce.com/reso...surrogates.htm

UPDATE generator31
SET keyval
= keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/2, 2), 2) * 8;

Mar 23 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.