MySQL millions row OLTP dimensioning

Federico

Hi everybody,

I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

As a rought estimate the database has around 80 tables 4-5 of these are
supposed to grow up to 5.000.000 records another 30% will be in the
100.000 range while the remaining will be in the 10.000 range. The
biggest table (unfortunately one of the millions rows table) is around
70 fields but the remaining are definitely smaller.
The application is a java web intranet application that will serve
around 500 users (will grow up to 1000). Almost all queries are pretty
simple: no recursion, no subqueries, aggregate values queries on small
ad hoc tables where condition on indexed fields and in many cases just
on primary keys. The only complicacy in our queries is sometime the
number of joins that can get up to 5 table join, in some situation the
data for a page is retieved with a single query on a million rows table
plus 3/4 queries on the other smaller tables or a query on a million
row table joined with another 3/4 smaller tables.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

What do you think? I'd like to hear real world experience with similar
size dimensioned MySQL db or suggestion on how to have a better
hardware estimate.

Nov 2 '06 #1

Subscribe Post Reply

4880

Axel Schwenke

"Federico" <fr***@nexus.itwrote:

>
I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

[description snipped]

Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

Regarding JOINs - you should go for MySQL 5.x. JOIN handling has some
flaws in 4.x and earlier. With proper indexing JOINs on big tables
should not be critical.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

*NEVER* use RAID5 for a database. RAID5 yields poor performance for
short writes (that is: writes that are shorter than the RAID block
size). If you want redundancy (you want it) go for RAID10.

The memory requirements depend on the chosen storage engine. MyISAM
uses memory slightly better but misses some features. Also MyISAM may
cause severe performance degradation due to locking conflicts. If you
chose MyISAM, add the size of the index files of the active tables.
This is your key_buffer size. Multiply by 2. This is the total memory
needed. Guessing what indexes are considered "active" is difficult.
In most cases you can do well with just 10-20% of all indexes.

If you use InnoDB you should calculate approx. twice the memory.
And twice the disk space too.
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.
HTH, XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Nov 2 '06 #2

Federico

>
Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

you are right, I've missed this crucial points (also because it's
difficult now to give more than rough estimates for some of these
things).

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

I think each page would require a reasonable number of queries to be
composed (usually not more than 5, often just a couple) and we can use
also an application layer cache for most used queries.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

>
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows. We plan to use SLES10 that has a Linux 2.6
and EM64 processor support.
How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?
Thanks a lot for your really valuable suggestions!

Nov 2 '06 #3

Axel Schwenke

"Federico" <fr***@nexus.itwrote:

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

Well, if you target InnoDB already, read/write ratio is not so
important any more. Due to it's more granular locking (compared to
MyISAM) InnoDB does well for concurrent read/writes. It will use
some memory to keep row backups though.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

You can do that in one query. See SQL_CALC_FOUND_ROWS.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.

We plan to use SLES10 that has a Linux 2.6 and EM64 processor support.

Good. You should definitely go 64 bit.

How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

Dunno. I never counted queries/table. Regarding users: the web servers
were configured to allow up to 100 active requests. This totals to
max. 1200 possible clients. In real life there were seldom more than
300 concurrent connections to the database. If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

(disk md4 holds the binary log for the replication to the fallback sql
node, disk ssd5 holds the table space; ssd5 was located on an external
storage box - connected via 1GBit fibre channel
since this is MySQL 3.23 there is no query cache -no cached queries
each webserver holds a replica of some quasi-static tables, there were
additional ~500 queries/sec distributed to the web servers)
XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Nov 2 '06 #4

Federico

>
Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

You can do that in one query. See SQL_CALC_FOUND_ROWS.

This is very interesting exactly wha I was looking for. Even though I
don't know if hibernate will allow me to use anything similar...

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.

IIRC?
We will have some bigger tables but all query are indexed and almost
all indexed fields are few bytes (from INT or TIMESTAMP) almost no
indexed VARCHAR.

How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

we are already using tomcat jndi mysql connection pooling and have
template classes that automatically relase the connection as long as
they have finished with that.

>
For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

very interesting. Do you think dedicated Gigaethernet connection
between 2 servers would have pretty much the same performance? (I mean
I know fiber channel is theorically better but maybe is not needed in
my case).

since this is MySQL 3.23 there is no query cache -no cached queries

don't know whether to use query cache or not: as I told you we can use
hibernate caching that would directly cache in RAM or on disk without
even going to the database for positive hits.
On the other side I don't know how much do we really need caching as
most users will issue different queries returning different results.
E.g. does it has much sense caching on an order tables when each users
is returning just his orders?

Even though I know there are many things more to consider the web
application you had on this server make me think we are not that wrong
in our hardware configuration. Maybe we just need more RAM now we have
just 4G, might increase to 8 or 16G and yes I know with databases ram
is never enough :-)

Do you know about any good profiling/sizing tool? My problem is that I
do not already have the populated database and it's difficult to make
realistic profiling just with few data, I had a look on DBMonster,
looks pretty good but it seems kind of abandoned project. Is there any
similar or better tool around even commercial?

Nov 2 '06 #5

Similar topics

133

oracle - mysql comparison

by: jonathan | last post by:

hey all, I realize that this question might pop up from time to time, but I haven't seen it a while and things might of changed, so - Right now (July 2004) how does mysql stand up in...

MySQL Database

Would MySQL handle Millions of records tables

by: Mathieu Pagé | last post by:

Hi, I already found on the MySQL web site that some users did have good results with tables of some millions records. But, what I want to do is store some hundreds millions records in a...

MySQL Database

ETL: OLTP -> DATA Store

by: LineVoltageHalogen | last post by:

Greetings All, I was wondering if any of you would share some of your experiences regarding the task of loading a data store from an Oltp source. We are using Analysis Services in a BI product...

Microsoft SQL Server

How to fix the date in a MySQL db

by: Newsscanner | last post by:

Hello, In my MySQL database, one of the fields eople have to fill in is "DOB" (Date of Birth). I have now managed to begin inserting data into my DB via a form, the data type for the DOB field...

MySQL Database

Looking for Experiences Running DB2 for OLTP Database (>300GB)

by: Michael Sherf | last post by:

I'm looking for "real life" experiences of companies running DB2 in an 24x7 OLTP environment. Seems like most folks are using DB2 for DSS applications, rather than OLTP. If your DB2 production...

DB2 Database

DB2 v MySql

by: Mairhtin O'Feannag | last post by:

Hello, I have a client (customer) who asked the question : "Why would I buy and use UDB, when MySql is free?" I had to say I was stunned. I have no experience with MySql, so I was left sort...

DB2 Database

MVP's: Determine Dimensioning Class or Assembly

by: keith | last post by:

If I declare a variable and set it = nothing then pass it byref into a routine that accepts an optional argument of the same type with a default of nothing, is there anyway to determine where the...

.NET Framework

MS SQL Server 2005 Vs MySQL 5.x

by: Coldfire | last post by:

Since i cannot show the differences in a two-column like table. I am first putting MS SQL Server 2005 and then MySQL 5.x. MS SQL Server 2005 Brief Overview - SQL Server is a full-fledged...

Software Development

Problem with migrating huge tables from MySQL database.

by: archeykun | last post by:

Hi all, I have to migrate the data from MySQL database to SQL Server using SSIS Wizard, also in this case I have used MySQL ODBC 3.51 Driver as a driver for migration. When I loaded the data about...

Microsoft SQL Server

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++