By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,670 Members | 1,526 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,670 IT Pros & Developers. It's quick & easy.

DB2 vs MySQL - performance on large tables

P: n/a
Hi all,

I am running a database containing large datasets:

frames: 20 thousand rows,
coordinates: 170 million row.

The database has been implemented with:

IBM DB2 v8.1
MySQL v3.23.54

using similar DDL and same indexes. And I have run a SQL joining two tables:

select c.x,c.y,c.z from coordinate as c,frame as f
where fid<1000 and f.tid=1 and f.id=c.fid

The query operates on 3400 billion possible joints and returns about 8.5 million records. It seems MySQL outperforms DB2 well ahead:

DB2: 7 minutes
MySQL: 3 minutes

MySQL is running with default settings while DB2:

Buffer pool: 500 MB (1GB system memory)
More seetings see: previous message titled -
"Improve query performance on large table"

Should I assume MySQL does run much faster than DB2 on large tables? Could anyone advice? BTW, one thing I should also mention that returned data formats:

DB2: 1.279200e+01 -1.596900e+01 -3.979500e+01
MySQL: 12.792 -15.969 -39.795

(x,y,z are defined as REAL in both databases)

I doubt this could make the performance difference.
Best regards,

Bing


Nov 12 '05
Share this Question
Share on Google+
57 Replies


P: n/a
Yes, just confirm that the OS is

Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

Some hardware info below:-

CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

Mem info:

Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
Swap: 945864704 344354816 601509888
MemTotal: 1027860 kB
MemFree: 60220 kB
MemShared: 452048 kB
Buffers: 125500 kB
Cached: 39164 kB
SwapCached: 249064 kB
Active: 550596 kB

I haven't upgraded to RH9 since it doesn't offically supported DB2.

Thanks,

Bing

Blair Adamache wrote:
The operating system being used it almost certainly Linux.

Daniel Morgan wrote:
... I don't know what operating system is being used here, one can't
even guarantee that a single file laid down on a disk will be
contiguous except within a raw
partition.



Nov 12 '05 #51

P: n/a
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;
Currently under rebuilding indexes. Will test the sql later.
The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000


The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

One strange thing if I run this:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

$date; db2 "SELECT ...." > output.file; date

Many thanks,

Bing


Nov 12 '05 #52

P: n/a
"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:bp**********@news.ox.ac.uk...
Yes, just confirm that the OS is

Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

Some hardware info below:-

CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

Mem info:

Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
Swap: 945864704 344354816 601509888
MemTotal: 1027860 kB
MemFree: 60220 kB
MemShared: 452048 kB
Buffers: 125500 kB
Cached: 39164 kB
SwapCached: 249064 kB
Active: 550596 kB

I haven't upgraded to RH9 since it doesn't offically supported DB2.

Thanks,

Bing

Bing,

You may already be doing this, but just in case you are not, here are some
further suggestions:

Since you have 2 processors, you should try to encourage DB2 intra-partition
parallelism (if you are not already doing it). This can be done even without
DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
thereof) containers in a DMS tablespace. Then make sure that degree of
parallelism parameter is set to 2 or ANY.

This would work even better if you could create the containers on separate
physical drives. When you create the containers with the Control Center, it
will help you configure the extent size and prefetch size in an optimal
configuration. Basically, the prefetch size should be a multiple of the
extent size.
Nov 12 '05 #53

P: n/a
Mark A wrote:
Since you have 2 processors, you should try to encourage DB2 intra-partition
parallelism (if you are not already doing it). This can be done even without
DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
thereof) containers in a DMS tablespace. Then make sure that degree of
parallelism parameter is set to 2 or ANY.

Thanks for the info. Just confirm that the parallelism parameter has been set to 2.

$ db2 get dbm configuration | grep parall

Maximum query degree of parallelism (MAX_QUERYDEGREE) = 2
Enable intra-partition parallelism (INTRA_PARALLEL) = YES

The DB2 is ESE 8.1.3 Patch 4.
This would work even better if you could create the containers on separate
physical drives. When you create the containers with the Control Center, it
will help you configure the extent size and prefetch size in an optimal
configuration. Basically, the prefetch size should be a multiple of the
extent size.


In my case, the server has only one hard drive at the mo. The DB2 optimised:

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing
Nov 12 '05 #54

P: n/a
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;
The SQL completed in just 22 seconds:

Tue Nov 18 17:56:27 GMT 2003

X Y Z
------------------------ ------------------------ ------------------------
-9.31600E+000 +3.30400E+000 +5.01200E+000
-3.14100E+000 +3.27600E+000 -1.13500E+000
+5.08800E+000 +4.52000E+000 -1.06080E+001
-1.24380E+001 +1.80490E+001 -6.61100E+000
-7.53300E+000 +2.36800E+000 +4.16500E+000
+6.69900E+000 -5.14100E+000 -2.55800E+000
-4.95500E+000 +1.48270E+001 -1.08720E+001
+1.59270E+001 -5.64900E+000 -1.12780E+001
-1.40620E+001 +1.65760E+001 -3.51400E+000
+1.11800E+000 -1.15960E+001 +9.47800E+000
+7.34000E+000 +2.82100E+000 -1.11610E+001
+3.46800E+000 +3.53200E+000 -8.00000E+000
+6.26100E+000 +7.68700E+000 -1.49480E+001
-4.98700E+000 +8.91000E+000 -4.92300E+000
+1.21160E+001 +4.85600E+000 -1.79720E+001
-1.58230E+001 +9.37200E+000 +5.45100E+000
-8.39700E+000 -5.00200E+000 +1.23990E+001
+1.18110E+001 +4.91800E+000 -1.77290E+001
+1.08710E+001 -8.07400E+000 -3.79700E+000
+3.70300E+000 +1.52000E+001 -1.99030E+001
+1.66220E+001 -5.71000E+000 -1.19120E+001
+4.61500E+000 +9.30000E+000 -1.49150E+001
+1.45170E+001 -1.02910E+001 -5.22600E+000
+7.60300E+000 -1.56300E+001 +7.02700E+000
+6.84300E+000 +6.03200E+000 -1.38750E+001
-9.42900E+000 +2.74500E+000 +5.68400E+000
-3.78400E+000 +7.61800E+000 -4.83400E+000
+6.33700E+000 +7.54500E+000 -1.48820E+001
-1.01420E+001 +1.88600E+001 -9.71800E+000
-1.47090E+001 +2.67000E+000 +1.10390E+001
+7.58700E+000 +6.17000E+000 -1.47570E+001
-1.80590E+001 +8.09400E+000 +8.96500E+000

32 record(s) selected.

Tue Nov 18 17:56:49 GMT 2003

The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.


Does this means I need to increase the fetch rate? Can you advice?

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing

Nov 12 '05 #55

P: n/a
Serge,

Serge Rielau wrote:
Let's try the following experiment:
SELECT COUNT(*) FROM (SELECT
case WHEN x <> y then cast(raise_error('70000', 'dummy') end,
case WHEN x <> y then cast(raise_error('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
vs.
SELECT
case WHEN x <> y then cast(raise_error('70000', 'dummy') end,
case WHEN x <> y then cast(raise_error('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ORDER BY C.FID,C.AID;


It seems db2 doesn't like "end":

SQL0104N An unexpected token "end" was found following "or('70000', 'dummy')".
Expected tokens may include: "AS <cast_target> )". SQLSTATE=42601

Cheers,

Bing
Nov 12 '05 #56

P: n/a
oops, .... cast(raise_error(...) AS INT) end

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Nov 12 '05 #57

P: n/a
Bing Wu <bi**@biop.ox.ac.uk> wrote in message news:<bp**********@news.ox.ac.uk>...
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;


Currently under rebuilding indexes. Will test the sql later.
The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000


The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

One strange thing if I run this:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

$date; db2 "SELECT ...." > output.file; date


Bing,

Didn't realise that the plain select returned the same number of rows
as the query containing the join - although in hindsight this was
obvious from your other posts.

Your explain is for a slightly different query to the original shown,
which makes things a bit harder - it shows a multi-column index that
might possibly allow your other query to be satisfied by index access
only (can't tell without DDL).

To be honest I'm pretty confused. Your data volumes for coordinate
sound like they must be in the region of 1300000 4K pages. To scan
that would take 20 minutes at the assumed default DB2 optimizer
transferrate of 9ms. Maybe your "hidden" index also contains the Z
column, so that index-only can be used, or maybe your disk may be
higher spec. Really you also need to explain the plain select to check
what that is doing.

The query containing the join and with 4 columns returned is quite
likely to be using indexes to avoid the need to table scan the entire
coordinate table, and instead scan just the required ranges.
Potentially this could be much faster.

I still think you will be bottlenecking on output to client, but feel
no certainty about what's going on given the discrepancies between
some of your posts - moving target hard to hit. Maybe you could
clarify a bit?
DG
Nov 12 '05 #58

57 Replies

This discussion thread is closed

Replies have been disabled for this discussion.