Connect with Expertise | Find Experts, Get Answers, Share Insights

Comparison of Database performance

 
Join Date: Nov 2009
Posts: 13
#1: Jan 6 '10
Hi all,

I happen to propose a DBMS, the requirement is that we have 1.6million rows in DB having 15 simple columns and at worse we need to come compare 10,000 rows(trasactions) one by one with this 1.6 figure, only 5 columns are compared and filter out some transactions.

and we need get back the results within two days. We are using Java on most probably some linux flavour. Assuming indexing e.t.c are applied

So, what would you suggest that which database we use. can MYSQL or Postgre handle this type of transaction processing. And can we i get calculations on transaction throughput and performance with MySQL and PostgreSQL from some link.

Thanx

dgreenhouse's Avatar
E
C
 
Join Date: May 2008
Location: San Francisco
Posts: 154
#2: Jan 13 '10

re: Comparison of Database performance


Well that's 80 Billion comparisons, so it will take awhile
1.6M X 10,000 X 5 = 80,000,000,000

If you have a machine already setup, I'd just create some dummy data and 'let her rip!' :-)

i.e.

1- Create two tables
a- Table A - 15 columns
b- Table B - 5-15 columns

2- Generate some dummy data in table A & B

3- Create a comparison algorithm

4- Have timing code embedded in the script

5- PostgreSQL would be my first choice for this as you'd need to segment the results if MySQL is used.
Postgres Database specs
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 - 1600 depending on column types
Maximum Indexes per Table Unlimited

6- If you have a beefy enough machine, you could possibly create a comparison matrix table from a single select statement using Postgres or some other large db engine, but you'd probably end up with connection timeouts.

Regardless, this will run for awhile.

Finally, you should be able to do a rough estimate off of a single comparison unit.
i.e.
1- set up timing (in microseconds)
2- compare five columns
3- update timing and output timing results
4- Multiple this time X 16,000,000,000
5- It won't take into consideration the operating system, disk access, and other network issues that will arise during a full run, but it'll give you a rough estimate of what you're in for.
Reply