473,837 Members | 1,754 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

DB2 vs MySQL - performance on large tables

Hi all,

I am running a database containing large datasets:

frames: 20 thousand rows,
coordinates: 170 million row.

The database has been implemented with:

IBM DB2 v8.1
MySQL v3.23.54

using similar DDL and same indexes. And I have run a SQL joining two tables:

select c.x,c.y,c.z from coordinate as c,frame as f
where fid<1000 and f.tid=1 and f.id=c.fid

The query operates on 3400 billion possible joints and returns about 8.5 million records. It seems MySQL outperforms DB2 well ahead:

DB2: 7 minutes
MySQL: 3 minutes

MySQL is running with default settings while DB2:

Buffer pool: 500 MB (1GB system memory)
More seetings see: previous message titled -
"Improve query performance on large table"

Should I assume MySQL does run much faster than DB2 on large tables? Could anyone advice? BTW, one thing I should also mention that returned data formats:

DB2: 1.279200e+01 -1.596900e+01 -3.979500e+01
MySQL: 12.792 -15.969 -39.795

(x,y,z are defined as REAL in both databases)

I doubt this could make the performance difference.
Best regards,

Bing


Nov 12 '05
57 25541
Yes, just confirm that the OS is

Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

Some hardware info below:-

CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

Mem info:

Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
Swap: 945864704 344354816 601509888
MemTotal: 1027860 kB
MemFree: 60220 kB
MemShared: 452048 kB
Buffers: 125500 kB
Cached: 39164 kB
SwapCached: 249064 kB
Active: 550596 kB

I haven't upgraded to RH9 since it doesn't offically supported DB2.

Thanks,

Bing

Blair Adamache wrote:
The operating system being used it almost certainly Linux.

Daniel Morgan wrote:
... I don't know what operating system is being used here, one can't
even guarantee that a single file laid down on a disk will be
contiguous except within a raw
partition.



Nov 12 '05 #51
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;
Currently under rebuilding indexes. Will test the sql later.
The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000


The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

One strange thing if I run this:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

$date; db2 "SELECT ...." > output.file; date

Many thanks,

Bing


Nov 12 '05 #52
"Bing Wu" <bi**@biop.ox.a c.uk> wrote in message
news:bp******** **@news.ox.ac.u k...
Yes, just confirm that the OS is

Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

Some hardware info below:-

CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

Mem info:

Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
Swap: 945864704 344354816 601509888
MemTotal: 1027860 kB
MemFree: 60220 kB
MemShared: 452048 kB
Buffers: 125500 kB
Cached: 39164 kB
SwapCached: 249064 kB
Active: 550596 kB

I haven't upgraded to RH9 since it doesn't offically supported DB2.

Thanks,

Bing

Bing,

You may already be doing this, but just in case you are not, here are some
further suggestions:

Since you have 2 processors, you should try to encourage DB2 intra-partition
parallelism (if you are not already doing it). This can be done even without
DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
thereof) containers in a DMS tablespace. Then make sure that degree of
parallelism parameter is set to 2 or ANY.

This would work even better if you could create the containers on separate
physical drives. When you create the containers with the Control Center, it
will help you configure the extent size and prefetch size in an optimal
configuration. Basically, the prefetch size should be a multiple of the
extent size.
Nov 12 '05 #53
Mark A wrote:
Since you have 2 processors, you should try to encourage DB2 intra-partition
parallelism (if you are not already doing it). This can be done even without
DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
thereof) containers in a DMS tablespace. Then make sure that degree of
parallelism parameter is set to 2 or ANY.

Thanks for the info. Just confirm that the parallelism parameter has been set to 2.

$ db2 get dbm configuration | grep parall

Maximum query degree of parallelism (MAX_QUERYDEGRE E) = 2
Enable intra-partition parallelism (INTRA_PARALLEL ) = YES

The DB2 is ESE 8.1.3 Patch 4.
This would work even better if you could create the containers on separate
physical drives. When you create the containers with the Control Center, it
will help you configure the extent size and prefetch size in an optimal
configuration. Basically, the prefetch size should be a multiple of the
extent size.


In my case, the server has only one hard drive at the mo. The DB2 optimised:

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing
Nov 12 '05 #54
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;
The SQL completed in just 22 seconds:

Tue Nov 18 17:56:27 GMT 2003

X Y Z
------------------------ ------------------------ ------------------------
-9.31600E+000 +3.30400E+000 +5.01200E+000
-3.14100E+000 +3.27600E+000 -1.13500E+000
+5.08800E+000 +4.52000E+000 -1.06080E+001
-1.24380E+001 +1.80490E+001 -6.61100E+000
-7.53300E+000 +2.36800E+000 +4.16500E+000
+6.69900E+000 -5.14100E+000 -2.55800E+000
-4.95500E+000 +1.48270E+001 -1.08720E+001
+1.59270E+001 -5.64900E+000 -1.12780E+001
-1.40620E+001 +1.65760E+001 -3.51400E+000
+1.11800E+000 -1.15960E+001 +9.47800E+000
+7.34000E+000 +2.82100E+000 -1.11610E+001
+3.46800E+000 +3.53200E+000 -8.00000E+000
+6.26100E+000 +7.68700E+000 -1.49480E+001
-4.98700E+000 +8.91000E+000 -4.92300E+000
+1.21160E+001 +4.85600E+000 -1.79720E+001
-1.58230E+001 +9.37200E+000 +5.45100E+000
-8.39700E+000 -5.00200E+000 +1.23990E+001
+1.18110E+001 +4.91800E+000 -1.77290E+001
+1.08710E+001 -8.07400E+000 -3.79700E+000
+3.70300E+000 +1.52000E+001 -1.99030E+001
+1.66220E+001 -5.71000E+000 -1.19120E+001
+4.61500E+000 +9.30000E+000 -1.49150E+001
+1.45170E+001 -1.02910E+001 -5.22600E+000
+7.60300E+000 -1.56300E+001 +7.02700E+000
+6.84300E+000 +6.03200E+000 -1.38750E+001
-9.42900E+000 +2.74500E+000 +5.68400E+000
-3.78400E+000 +7.61800E+000 -4.83400E+000
+6.33700E+000 +7.54500E+000 -1.48820E+001
-1.01420E+001 +1.88600E+001 -9.71800E+000
-1.47090E+001 +2.67000E+000 +1.10390E+001
+7.58700E+000 +6.17000E+000 -1.47570E+001
-1.80590E+001 +8.09400E+000 +8.96500E+000

32 record(s) selected.

Tue Nov 18 17:56:49 GMT 2003

The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.


Does this means I need to increase the fetch rate? Can you advice?

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing

Nov 12 '05 #55
Serge,

Serge Rielau wrote:
Let's try the following experiment:
SELECT COUNT(*) FROM (SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
vs.
SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ORDER BY C.FID,C.AID;


It seems db2 doesn't like "end":

SQL0104N An unexpected token "end" was found following "or('70000' , 'dummy')".
Expected tokens may include: "AS <cast_target> )". SQLSTATE=42601

Cheers,

Bing
Nov 12 '05 #56
oops, .... cast(raise_erro r(...) AS INT) end

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Nov 12 '05 #57
Bing Wu <bi**@biop.ox.a c.uk> wrote in message news:<bp******* ***@news.ox.ac. uk>...
Database Guy wrote:
Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;


Currently under rebuilding indexes. Will test the sql later.
The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000


The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

One strange thing if I run this:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

$date; db2 "SELECT ...." > output.file; date


Bing,

Didn't realise that the plain select returned the same number of rows
as the query containing the join - although in hindsight this was
obvious from your other posts.

Your explain is for a slightly different query to the original shown,
which makes things a bit harder - it shows a multi-column index that
might possibly allow your other query to be satisfied by index access
only (can't tell without DDL).

To be honest I'm pretty confused. Your data volumes for coordinate
sound like they must be in the region of 1300000 4K pages. To scan
that would take 20 minutes at the assumed default DB2 optimizer
transferrate of 9ms. Maybe your "hidden" index also contains the Z
column, so that index-only can be used, or maybe your disk may be
higher spec. Really you also need to explain the plain select to check
what that is doing.

The query containing the join and with 4 columns returned is quite
likely to be using indexes to avoid the need to table scan the entire
coordinate table, and instead scan just the required ranges.
Potentially this could be much faster.

I still think you will be bottlenecking on output to client, but feel
no certainty about what's going on given the discrepancies between
some of your posts - moving target hard to hit. Maybe you could
clarify a bit?
DG
Nov 12 '05 #58

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6827
by: Kevin Crosbie | last post by:
Hi all, (Sorry for the cross-post, there doesn't appear to be much activity on comp.database.oracle)I'm trying to get the last 300 rows from the difference between 2 large tables and my queries are taking at least 10 minutes to do this. I'm running on quite a fast server and the tables are not very large, 3,000,000 to 30,000,000 rows. I've tried the following:
0
1780
by: Joseph Norris | last post by:
Group, I have been working with Mysql for about 5 years - mainly in LAMP shops. The tables have been between 20-100 thousand records size. Now I have a project where the tables are in the millions of records. This is very new to me and I am noticing that my queries are really slowwwwww! What are the options that I have to speed my queries on the mysql side with
3
1575
by: Jeremy Howard | last post by:
I am finding delete queries on large InnoDB tables very slow - are there ways to speed this up? I have a table with about 100 million rows: I am trying to delete just a few of these rows (the following select takes a couple of seconds): > SELECT count(*) -> FROM UserSnap
1
2123
by: boonkit | last post by:
Hi all, I cannot find MySQl performance info at linux "top" command. I already tested on slow MySQL query that took more than 5 minutes (create fulltext index), but during the 5 minutes, i cannot find any mysql process at top command. Below is the system information: Linux version: 2.6.9-1.667smp (Red Hat 3.4.2-6.fc3) MySQL version: 4.1.11 (source installation)
1
1409
by: steve | last post by:
I am researching mysql performance optimization. Does anyone who good books, or good url’s for this. I have seen a couple of good books on Amazon, but if you have read anything good, let me know. -- Posted using the http://www.dbforumz.com interface, at author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbforumz.com/mySQL-Good-resources-performance-optimization-ftopict220297.html...
2
2326
by: Shashikant Kore | last post by:
Hi, I am using MySQL for a table which will have 100M+ records, avg length of records being 130 bytes. When the number of records reach approx. 25M (and the file size close to 4GB), the rate of inserts falls drastically from 800 per second to 30-40 per second. Details: * MySQL 3.23.58 on Fedora Core 3
1
2014
by: Dino Vliet | last post by:
Hi folks, I'm busy with analyzing some data and therefore will have to store 2 big tables containing 50 million obs & 25 million obs. Selecting some interesting obs can decrease these tables with maybe a factor 10 but eventually I will have to combine them using a join. These tables contain almost 20 colums and after selecting the appropriate ones I will have maybe 10 colums.
1
2402
by: murray | last post by:
Large table performance problem with MYSQL performance with ASP.NET I am doing some benchmarking of an application using MYSQL under Windows and I confused about the results. Environment Emachine 6417 with amd3200+ 64 bit processor 3.3GB ram mysql 4.1 with odbc 3.51 MYISAM windows 2003 server std edition
0
1351
by: OctoTools | last post by:
Large table performance problem with MYSQL performance with ASP.NET I am doing some benchmarking of an application using MYSQL under Windows and I confused about the results. Environment Emachine 6417 with amd3200+ 64 bit processor 3.3GB ram mysql 4.1 with odbc 3.51 MYISAM windows 2003 server std edition
0
9853
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9696
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10289
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9423
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7827
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5681
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4062
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3131
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.