473,890 Members | 2,051 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

DB2 vs MySQL - performance on large tables

Hi all,

I am running a database containing large datasets:

frames: 20 thousand rows,
coordinates: 170 million row.

The database has been implemented with:

IBM DB2 v8.1
MySQL v3.23.54

using similar DDL and same indexes. And I have run a SQL joining two tables:

select c.x,c.y,c.z from coordinate as c,frame as f
where fid<1000 and f.tid=1 and f.id=c.fid

The query operates on 3400 billion possible joints and returns about 8.5 million records. It seems MySQL outperforms DB2 well ahead:

DB2: 7 minutes
MySQL: 3 minutes

MySQL is running with default settings while DB2:

Buffer pool: 500 MB (1GB system memory)
More seetings see: previous message titled -
"Improve query performance on large table"

Should I assume MySQL does run much faster than DB2 on large tables? Could anyone advice? BTW, one thing I should also mention that returned data formats:

DB2: 1.279200e+01 -1.596900e+01 -3.979500e+01
MySQL: 12.792 -15.969 -39.795

(x,y,z are defined as REAL in both databases)

I doubt this could make the performance difference.
Best regards,

Bing


Nov 12 '05
57 25557
> > If you followed this discussion from the beginning, it was stated that
the
person was scanning and retrieving a very large number of rows in a table with over 4 million rows. Sequential prefetch for the table and/or indexes would probably be used in an optimal access path. In such cases, table and index placement is important, as is extent size, prefetch size, and
bufferpool page size. The optimum configuration is somewhat different for most flavors of RAID, and needs to be done carefully if performance is
paramount (the guy is complaining about performance).

I would agree that in a multi-user transaction system (definitely not the system described here), dataset placement is not critical, especially with decent sized buffer pools.


I would suggest that it isn't important in the described system either.
To assume that sequential means that a head on a disk is doing one and
only one thing until it is done isn't real world. It is nice theory
but not real-world. With many operating systems, and I don't know what
operating system is being used here, one can't even guarantee that a
single file laid down on a disk will be contiguous except within a raw
partition.
--
Daniel Morgan


Maybe you need to do some research on DB2 sequential prefetch for tables and
indexes.
Nov 12 '05 #41
Mark A wrote:
Maybe you need to do some research on DB2 sequential prefetch for tables and
indexes.


Oracle and Informix that have lots of things that work in a specific
way in theory too. But when we take them into the lab and look at
what is actually happening we find something very different.

Now this may not be true of DB2 on OS/390 or some other
hardware-operating system combination where Oracle has complete control.
But it is absolutely true with DB2, or any other RDBMS on, for example,
the Windows operating system on Intel hardware.

Let me quote from: https://aurora.vcu.edu/db2help/db2d0/frame3.htm#preftch

"Reading several consecutive pages into the buffer pool using a single
I/O operation can greatly reduce the overhead associated with running
your application. In addition, performing multiple I/O operations in
parallel to read in several ranges of pages at the same time can help
reduce the time your application needs to wait for I/O operations to
complete."

All very nice. But what does this have to do with separating tables
and indexes? And the answer on a real-world system is nothing. That it
is meaningless is something we have proven in the labs over and over
again. I'm not saying the prefetch isn't valuable. Prefetch was not part
of the converation. I said separating tables and indexes serves no
useful purpose.

The only value is in spreading I/O over as many spindles and heads as
possible and as evenly as possible. And once you have accomplished that
reading a page of index or a page of table is just that ... reading a
page. No RDBMS and no operating systems cares which is which ... the
overhead is the same. And if you stripe multiple physical disks into a
single logical disk ... the page can be anywhere.
--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.wash ington.edu
(replace 'x' with a 'u' to reply)

Nov 12 '05 #42
Oracle and Informix that have lots of things that work in a specific
way in theory too. But when we take them into the lab and look at
what is actually happening we find something very different.

Now this may not be true of DB2 on OS/390 or some other
hardware-operating system combination where Oracle has complete control.
But it is absolutely true with DB2, or any other RDBMS on, for example,
the Windows operating system on Intel hardware.

Let me quote from: https://aurora.vcu.edu/db2help/db2d0/frame3.htm#preftch

"Reading several consecutive pages into the buffer pool using a single
I/O operation can greatly reduce the overhead associated with running
your application. In addition, performing multiple I/O operations in
parallel to read in several ranges of pages at the same time can help
reduce the time your application needs to wait for I/O operations to
complete."

All very nice. But what does this have to do with separating tables
and indexes? And the answer on a real-world system is nothing. That it
is meaningless is something we have proven in the labs over and over
again. I'm not saying the prefetch isn't valuable. Prefetch was not part
of the converation. I said separating tables and indexes serves no
useful purpose.

The only value is in spreading I/O over as many spindles and heads as
possible and as evenly as possible. And once you have accomplished that
reading a page of index or a page of table is just that ... reading a
page. No RDBMS and no operating systems cares which is which ... the
overhead is the same. And if you stripe multiple physical disks into a
single logical disk ... the page can be anywhere.
--
Daniel Morgan


In DB2,putting tables and indexes in separate tablespaces is a prerequisite
for putting them on different drives and in different bufferpools. I thought
that was understood, but maybe not. So your statement that it serves no
useful purpose is incorrect.
Nov 12 '05 #43
Mark A wrote:
In DB2,putting tables and indexes in separate tablespaces is a prerequisite
for putting them on different drives and in different bufferpools. I thought
that was understood, but maybe not. So your statement that it serves no
useful purpose is incorrect.


Since when haven't you been able to install DB2 on a RAID 0 drive array
(by which I mean RAID 0, RAID 0+1 and RAID 1+0)?

Assuming DB2 still works on RAID arrays your statement is not valid. The
The relationship between tablespaces and drives is logical; not physical.
--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.wash ington.edu
(replace 'x' with a 'u' to reply)

Nov 12 '05 #44
Mark's second point however reamins in principle:
bufferpools can be associated with tablespaces. And hence separating
index and table allows to prefer index pages over nornmal rows in the
buffer-pool.
Now I'd like to agree that in the scenario described above this would
likely not matter. An it requires deep understanding of your system to
properly opimize. (Just like stick shift requires a deeper understanding
of how your car works :-)

Anyway I woudl liek to propose an experiment which I will do at a
different spot in this thread.

Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Nov 12 '05 #45
bing,

Let's try the following experiment:
SELECT COUNT(*) FROM (SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
vs.
SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ORDER BY C.FID,C.AID;

What would this experiment do?
First the case with raise_error() will ensure DB2 doesn't simplify the
query (can't drop the raise_error()) Be conscious about placing the
WHERE in the inner select!
Now. Using the count(*) will cut out all the network traffic, so we'll
get a fair idea of how much work DB2 (the engine) in doing vs. how much
is spent in the client-server communication.
If the cost is in the engine all those nice proposals about indexes,
bufferpool, etc may be relevant. If not, they are pointless.
The one dial that hasn't been mentioned yet is the communication buffer.
You should already get BLOCKING (i.e prefetching of the resultset into
the comm buffer) Now we need to dial up the buffer itself.
I think it's the DB2COMM special register, likely a client side thing
(never trust the SQL compiler guy ;-)

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Nov 12 '05 #46
"Mark A" <ma@switchboard .net> wrote in message news:<gk******* ***********@new s.uswest.net>.. .
Sequential prefetch for the table and/or indexes
would probably be used in an optimal access path. In such cases, table and
index placement is important, as is extent size, prefetch size, and
bufferpool page size.


Think Bing stated he's running on a single disk. He also stated that
DB2 is "slow" (10 minutes) when doing a simple select returning many
rows. So I still feel 20,000 rows/sec is all he's likely to get.
There's no point making DB2 run the query faster internally if the
coordinator/client interaction can't spool the results out fast
enough.

Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;

The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000
DG
Nov 12 '05 #47
In article <bp**********@h anover.torolab. ibm.com>, sr*****@ca.eye-
bee-m.com says...
The one dial that hasn't been mentioned yet is the communication buffer.
You should already get BLOCKING (i.e prefetching of the resultset into
the comm buffer) Now we need to dial up the buffer itself.
I think it's the DB2COMM special register, likely a client side thing
(never trust the SQL compiler guy ;-)


You're right:)
It's the DBM CFG parm RQRIOBLK (Max requester I/O block size ).
By default it's 32767, maximum is 65535.

But it is used only when connecting from a remote client.
I don't know enough about the internal memory sizes for local
clients, but the RQRIOBLK size matching with the OPTIMIZE FOR n ROWS
clause on a select improves performance for remote clients.
Nov 12 '05 #48
The operating system being used it almost certainly Linux.

Daniel Morgan wrote:
... I don't know what
operating system is being used here, one can't even guarantee that a
single file laid down on a disk will be contiguous except within a raw
partition.


Nov 12 '05 #49
Blair Adamache wrote:
The operating system being used it almost certainly Linux.

Daniel Morgan wrote:
... I don't know what operating system is being used here, one can't
even guarantee that a single file laid down on a disk will be
contiguous except within a raw
partition.


Can't speak to a generic, much less any specific Linux version. But
some operating systems are designed to attempt optimization by
spreading a single file across a disk even when it is a single file
on a brand new disk. May not affect you but something to consider.
--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.wash ington.edu
(replace 'x' with a 'u' to reply)

Nov 12 '05 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6829
by: Kevin Crosbie | last post by:
Hi all, (Sorry for the cross-post, there doesn't appear to be much activity on comp.database.oracle)I'm trying to get the last 300 rows from the difference between 2 large tables and my queries are taking at least 10 minutes to do this. I'm running on quite a fast server and the tables are not very large, 3,000,000 to 30,000,000 rows. I've tried the following:
0
1783
by: Joseph Norris | last post by:
Group, I have been working with Mysql for about 5 years - mainly in LAMP shops. The tables have been between 20-100 thousand records size. Now I have a project where the tables are in the millions of records. This is very new to me and I am noticing that my queries are really slowwwwww! What are the options that I have to speed my queries on the mysql side with
3
1575
by: Jeremy Howard | last post by:
I am finding delete queries on large InnoDB tables very slow - are there ways to speed this up? I have a table with about 100 million rows: I am trying to delete just a few of these rows (the following select takes a couple of seconds): > SELECT count(*) -> FROM UserSnap
1
2127
by: boonkit | last post by:
Hi all, I cannot find MySQl performance info at linux "top" command. I already tested on slow MySQL query that took more than 5 minutes (create fulltext index), but during the 5 minutes, i cannot find any mysql process at top command. Below is the system information: Linux version: 2.6.9-1.667smp (Red Hat 3.4.2-6.fc3) MySQL version: 4.1.11 (source installation)
1
1412
by: steve | last post by:
I am researching mysql performance optimization. Does anyone who good books, or good urlís for this. I have seen a couple of good books on Amazon, but if you have read anything good, let me know. -- Posted using the http://www.dbforumz.com interface, at author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbforumz.com/mySQL-Good-resources-performance-optimization-ftopict220297.html...
2
2328
by: Shashikant Kore | last post by:
Hi, I am using MySQL for a table which will have 100M+ records, avg length of records being 130 bytes. When the number of records reach approx. 25M (and the file size close to 4GB), the rate of inserts falls drastically from 800 per second to 30-40 per second. Details: * MySQL 3.23.58 on Fedora Core 3
1
2015
by: Dino Vliet | last post by:
Hi folks, I'm busy with analyzing some data and therefore will have to store 2 big tables containing 50 million obs & 25 million obs. Selecting some interesting obs can decrease these tables with maybe a factor 10 but eventually I will have to combine them using a join. These tables contain almost 20 colums and after selecting the appropriate ones I will have maybe 10 colums.
1
2405
by: murray | last post by:
Large table performance problem with MYSQL performance with ASP.NET I am doing some benchmarking of an application using MYSQL under Windows and I confused about the results. Environment Emachine 6417 with amd3200+ 64 bit processor 3.3GB ram mysql 4.1 with odbc 3.51 MYISAM windows 2003 server std edition
0
1357
by: OctoTools | last post by:
Large table performance problem with MYSQL performance with ASP.NET I am doing some benchmarking of an application using MYSQL under Windows and I confused about the results. Environment Emachine 6417 with amd3200+ 64 bit processor 3.3GB ram mysql 4.1 with odbc 3.51 MYISAM windows 2003 server std edition
0
9970
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9810
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11207
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10794
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10896
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10443
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7153
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5830
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4251
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.