MySQL millions row OLTP dimensioning

Federico

Hi everybody,

I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

As a rought estimate the database has around 80 tables 4-5 of these are
supposed to grow up to 5.000.000 records another 30% will be in the
100.000 range while the remaining will be in the 10.000 range. The
biggest table (unfortunately one of the millions rows table) is around
70 fields but the remaining are definitely smaller.
The application is a java web intranet application that will serve
around 500 users (will grow up to 1000). Almost all queries are pretty
simple: no recursion, no subqueries, aggregate values queries on small
ad hoc tables where condition on indexed fields and in many cases just
on primary keys. The only complicacy in our queries is sometime the
number of joins that can get up to 5 table join, in some situation the
data for a page is retieved with a single query on a million rows table
plus 3/4 queries on the other smaller tables or a query on a million
row table joined with another 3/4 smaller tables.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

What do you think? I'd like to hear real world experience with similar
size dimensioned MySQL db or suggestion on how to have a better
hardware estimate.

Nov 2 '06 #1

Subscribe Reply

4889

Axel Schwenke

"Federico" <fr***@nexus.it wrote:

>
I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

[description snipped]

Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

Regarding JOINs - you should go for MySQL 5.x. JOIN handling has some
flaws in 4.x and earlier. With proper indexing JOINs on big tables
should not be critical.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

*NEVER* use RAID5 for a database. RAID5 yields poor performance for
short writes (that is: writes that are shorter than the RAID block
size). If you want redundancy (you want it) go for RAID10.

The memory requirements depend on the chosen storage engine. MyISAM
uses memory slightly better but misses some features. Also MyISAM may
cause severe performance degradation due to locking conflicts. If you
chose MyISAM, add the size of the index files of the active tables.
This is your key_buffer size. Multiply by 2. This is the total memory
needed. Guessing what indexes are considered "active" is difficult.
In most cases you can do well with just 10-20% of all indexes.

If you use InnoDB you should calculate approx. twice the memory.
And twice the disk space too.
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.q r). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.
HTH, XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Nov 2 '06 #2

Federico

>
Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

you are right, I've missed this crucial points (also because it's
difficult now to give more than rough estimates for some of these
things).

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

I think each page would require a reasonable number of queries to be
composed (usually not more than 5, often just a couple) and we can use
also an application layer cache for most used queries.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

>
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.q r). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows. We plan to use SLES10 that has a Linux 2.6
and EM64 processor support.
How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?
Thanks a lot for your really valuable suggestions!

Nov 2 '06 #3

Axel Schwenke

"Federico" <fr***@nexus.it wrote:

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

Well, if you target InnoDB already, read/write ratio is not so
important any more. Due to it's more granular locking (compared to
MyISAM) InnoDB does well for concurrent read/writes. It will use
some memory to keep row backups though.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

You can do that in one query. See SQL_CALC_FOUND_ ROWS.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.

We plan to use SLES10 that has a Linux 2.6 and EM64 processor support.

Good. You should definitely go 64 bit.

How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

Dunno. I never counted queries/table. Regarding users: the web servers
were configured to allow up to 100 active requests. This totals to
max. 1200 possible clients. In real life there were seldom more than
300 concurrent connections to the database. If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

(disk md4 holds the binary log for the replication to the fallback sql
node, disk ssd5 holds the table space; ssd5 was located on an external
storage box - connected via 1GBit fibre channel
since this is MySQL 3.23 there is no query cache -no cached queries
each webserver holds a replica of some quasi-static tables, there were
additional ~500 queries/sec distributed to the web servers)
XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Nov 2 '06 #4

Federico

>
Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

You can do that in one query. See SQL_CALC_FOUND_ ROWS.

This is very interesting exactly wha I was looking for. Even though I
don't know if hibernate will allow me to use anything similar...

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.

IIRC?
We will have some bigger tables but all query are indexed and almost
all indexed fields are few bytes (from INT or TIMESTAMP) almost no
indexed VARCHAR.

How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

we are already using tomcat jndi mysql connection pooling and have
template classes that automatically relase the connection as long as
they have finished with that.

>
For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

very interesting. Do you think dedicated Gigaethernet connection
between 2 servers would have pretty much the same performance? (I mean
I know fiber channel is theorically better but maybe is not needed in
my case).

since this is MySQL 3.23 there is no query cache -no cached queries

don't know whether to use query cache or not: as I told you we can use
hibernate caching that would directly cache in RAM or on disk without
even going to the database for positive hits.
On the other side I don't know how much do we really need caching as
most users will issue different queries returning different results.
E.g. does it has much sense caching on an order tables when each users
is returning just his orders?

Even though I know there are many things more to consider the web
application you had on this server make me think we are not that wrong
in our hardware configuration. Maybe we just need more RAM now we have
just 4G, might increase to 8 or 16G and yes I know with databases ram
is never enough :-)

Do you know about any good profiling/sizing tool? My problem is that I
do not already have the populated database and it's difficult to make
realistic profiling just with few data, I had a look on DBMonster,
looks pretty good but it seems kind of abandoned project. Is there any
similar or better tool around even commercial?

Nov 2 '06 #5

Similar topics

133

8966

oracle - mysql comparison

by: jonathan | last post by:

hey all, I realize that this question might pop up from time to time, but I haven't seen it a while and things might of changed, so - Right now (July 2004) how does mysql stand up in comparison to oracle? We are seriously considering migrating our multi-processor oracle system to mysql to save on licensing costs, and would need several features that mysql may or may not have:

MySQL Database

5416

Would MySQL handle Millions of records tables

by: Mathieu Pagé | last post by:

Hi, I already found on the MySQL web site that some users did have good results with tables of some millions records. But, what I want to do is store some hundreds millions records in a table. 190 millions to start and maybe a lot more after. Does someone already use MySQL with such a quantity of data ? I was also asking myself how a simple SELECt query like this one below

MySQL Database

2016

ETL: OLTP -> DATA Store

by: LineVoltageHalogen | last post by:

Greetings All, I was wondering if any of you would share some of your experiences regarding the task of loading a data store from an Oltp source. We are using Analysis Services in a BI product that requires data to be pulled from one of our products, an OLTP database. The design is to first run an ETL process from the OLTP source into an operational data store, from here Analysis Services will pull its data to do its thing. Now, for...

Microsoft SQL Server

2351

How to fix the date in a MySQL db

by: Newsscanner | last post by:

Hello, In my MySQL database, one of the fields eople have to fill in is "DOB" (Date of Birth). I have now managed to begin inserting data into my DB via a form, the data type for the DOB field is "DATE", but every time I add data and then check my db, I see the DOB displayed as "0000-00-00". Has anyone got any idea how I could put this right? TIA,

MySQL Database

2227

Looking for Experiences Running DB2 for OLTP Database (>300GB)

by: Michael Sherf | last post by:

I'm looking for "real life" experiences of companies running DB2 in an 24x7 OLTP environment. Seems like most folks are using DB2 for DSS applications, rather than OLTP. If your DB2 production environment meets the following criteria, I'd love to hear from you: 1. Using DB2 as an OLTP database 2. Using DB2 via custom apps (not 3rd party apps like Peoplesoft etc.) 3. DB2 database instance >300GB 4. Have used DB2 for at least 1 year,...

DB2 Database

8390

DB2 v MySql

by: Mairhtin O'Feannag | last post by:

Hello, I have a client (customer) who asked the question : "Why would I buy and use UDB, when MySql is free?" I had to say I was stunned. I have no experience with MySql, so I was left sort of stammering and sputtering, and managed to pull out something I heard a couple of years back - that there was no real transaction safety in MySql. In flight transactions could be lost.

DB2 Database

1215

MVP's: Determine Dimensioning Class or Assembly

by: keith | last post by:

If I declare a variable and set it = nothing then pass it byref into a routine that accepts an optional argument of the same type with a default of nothing, is there anyway to determine where the varable was decalred? I know I could include another argument that specifies whether i'm passing it in or not, but was looking for a more elegant solution. If the dimensioning class cannot be determined, can the dimensioning assembly? i.e.

.NET Framework

12880

MS SQL Server 2005 Vs MySQL 5.x

by: Coldfire | last post by:

Since i cannot show the differences in a two-column like table. I am first putting MS SQL Server 2005 and then MySQL 5.x. MS SQL Server 2005 Brief Overview - SQL Server is a full-fledged database system developed specifically for large enterprise databases. All advanced features of a relational database are fully implemented. - Once you purchase the product, you are only limited to the Sybase-derived engine.

Software Development

1195

Problem with migrating huge tables from MySQL database.

by: archeykun | last post by:

Hi all, I have to migrate the data from MySQL database to SQL Server using SSIS Wizard, also in this case I have used MySQL ODBC 3.51 Driver as a driver for migration. When I loaded the data about 10 millions rows, the migration completely works. But when I loaded the data about 18 millions rows, I got this message "Error loaded table_name ]: An error occured executing the provided SQL command." Here's the command I've used SELECT * ...

Microsoft SQL Server

7880

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8255

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8010

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8242

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6665

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5739

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5413

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

2389

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1217

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General