473,383 Members | 1,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

MySQL millions row OLTP dimensioning

Hi everybody,

I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

As a rought estimate the database has around 80 tables 4-5 of these are
supposed to grow up to 5.000.000 records another 30% will be in the
100.000 range while the remaining will be in the 10.000 range. The
biggest table (unfortunately one of the millions rows table) is around
70 fields but the remaining are definitely smaller.
The application is a java web intranet application that will serve
around 500 users (will grow up to 1000). Almost all queries are pretty
simple: no recursion, no subqueries, aggregate values queries on small
ad hoc tables where condition on indexed fields and in many cases just
on primary keys. The only complicacy in our queries is sometime the
number of joins that can get up to 5 table join, in some situation the
data for a page is retieved with a single query on a million rows table
plus 3/4 queries on the other smaller tables or a query on a million
row table joined with another 3/4 smaller tables.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

What do you think? I'd like to hear real world experience with similar
size dimensioned MySQL db or suggestion on how to have a better
hardware estimate.

Nov 2 '06 #1
4 4880
"Federico" <fr***@nexus.itwrote:
>
I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.
[description snipped]

Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

Regarding JOINs - you should go for MySQL 5.x. JOIN handling has some
flaws in 4.x and earlier. With proper indexing JOINs on big tables
should not be critical.
At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5
*NEVER* use RAID5 for a database. RAID5 yields poor performance for
short writes (that is: writes that are shorter than the RAID block
size). If you want redundancy (you want it) go for RAID10.

The memory requirements depend on the chosen storage engine. MyISAM
uses memory slightly better but misses some features. Also MyISAM may
cause severe performance degradation due to locking conflicts. If you
chose MyISAM, add the size of the index files of the active tables.
This is your key_buffer size. Multiply by 2. This is the total memory
needed. Guessing what indexes are considered "active" is difficult.
In most cases you can do well with just 10-20% of all indexes.

If you use InnoDB you should calculate approx. twice the memory.
And twice the disk space too.
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.
HTH, XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Nov 2 '06 #2
>
Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)
you are right, I've missed this crucial points (also because it's
difficult now to give more than rough estimates for some of these
things).

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

I think each page would require a reasonable number of queries to be
composed (usually not more than 5, often just a couple) and we can use
also an application layer cache for most used queries.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

>
Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows. We plan to use SLES10 that has a Linux 2.6
and EM64 processor support.
How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?
Thanks a lot for your really valuable suggestions!

Nov 2 '06 #3
"Federico" <fr***@nexus.itwrote:
We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.
Well, if you target InnoDB already, read/write ratio is not so
important any more. Due to it's more granular locking (compared to
MyISAM) InnoDB does well for concurrent read/writes. It will use
some memory to keep row backups though.
Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.
You can do that in one query. See SQL_CALC_FOUND_ROWS.
Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.
Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.
We plan to use SLES10 that has a Linux 2.6 and EM64 processor support.
Good. You should definitely go 64 bit.
How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?
Dunno. I never counted queries/table. Regarding users: the web servers
were configured to allow up to 100 active requests. This totals to
max. 1200 possible clients. In real life there were seldom more than
300 concurrent connections to the database. If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

(disk md4 holds the binary log for the replication to the fallback sql
node, disk ssd5 holds the table space; ssd5 was located on an external
storage box - connected via 1GBit fibre channel
since this is MySQL 3.23 there is no query cache -no cached queries
each webserver holds a replica of some quasi-static tables, there were
additional ~500 queries/sec distributed to the web servers)
XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Nov 2 '06 #4
>
Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

You can do that in one query. See SQL_CALC_FOUND_ROWS.
This is very interesting exactly wha I was looking for. Even though I
don't know if hibernate will allow me to use anything similar...

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.
IIRC?
We will have some bigger tables but all query are indexed and almost
all indexed fields are few bytes (from INT or TIMESTAMP) almost no
indexed VARCHAR.

How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.
we are already using tomcat jndi mysql connection pooling and have
template classes that automatically relase the connection as long as
they have finished with that.
>
For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/
very interesting. Do you think dedicated Gigaethernet connection
between 2 servers would have pretty much the same performance? (I mean
I know fiber channel is theorically better but maybe is not needed in
my case).
since this is MySQL 3.23 there is no query cache -no cached queries
don't know whether to use query cache or not: as I told you we can use
hibernate caching that would directly cache in RAM or on disk without
even going to the database for positive hits.
On the other side I don't know how much do we really need caching as
most users will issue different queries returning different results.
E.g. does it has much sense caching on an order tables when each users
is returning just his orders?

Even though I know there are many things more to consider the web
application you had on this server make me think we are not that wrong
in our hardware configuration. Maybe we just need more RAM now we have
just 4G, might increase to 8 or 16G and yes I know with databases ram
is never enough :-)

Do you know about any good profiling/sizing tool? My problem is that I
do not already have the populated database and it's difficult to make
realistic profiling just with few data, I had a look on DBMonster,
looks pretty good but it seems kind of abandoned project. Is there any
similar or better tool around even commercial?

Nov 2 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

133
by: jonathan | last post by:
hey all, I realize that this question might pop up from time to time, but I haven't seen it a while and things might of changed, so - Right now (July 2004) how does mysql stand up in...
2
by: Mathieu Pagé | last post by:
Hi, I already found on the MySQL web site that some users did have good results with tables of some millions records. But, what I want to do is store some hundreds millions records in a...
1
by: LineVoltageHalogen | last post by:
Greetings All, I was wondering if any of you would share some of your experiences regarding the task of loading a data store from an Oltp source. We are using Analysis Services in a BI product...
10
by: Newsscanner | last post by:
Hello, In my MySQL database, one of the fields eople have to fill in is "DOB" (Date of Birth). I have now managed to begin inserting data into my DB via a form, the data type for the DOB field...
3
by: Michael Sherf | last post by:
I'm looking for "real life" experiences of companies running DB2 in an 24x7 OLTP environment. Seems like most folks are using DB2 for DSS applications, rather than OLTP. If your DB2 production...
39
by: Mairhtin O'Feannag | last post by:
Hello, I have a client (customer) who asked the question : "Why would I buy and use UDB, when MySql is free?" I had to say I was stunned. I have no experience with MySql, so I was left sort...
3
by: keith | last post by:
If I declare a variable and set it = nothing then pass it byref into a routine that accepts an optional argument of the same type with a default of nothing, is there anyway to determine where the...
0
Coldfire
by: Coldfire | last post by:
Since i cannot show the differences in a two-column like table. I am first putting MS SQL Server 2005 and then MySQL 5.x. MS SQL Server 2005 Brief Overview - SQL Server is a full-fledged...
1
by: archeykun | last post by:
Hi all, I have to migrate the data from MySQL database to SQL Server using SSIS Wizard, also in this case I have used MySQL ODBC 3.51 Driver as a driver for migration. When I loaded the data about...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.