Huge Data - PostgreSQL Database

Sezai YILMAZ

Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.

The hardware is: PIII 800 MHz processor, 512 MB RAM, and IDE hard disk
drive.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #1

Subscribe Post Reply

2299

Richard Huxton

On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:

Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.

PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #2

Sezai YILMAZ

Richard Huxton wrote:

On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:

Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.

PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

I use count() for some statistics. Just to show how many records
collected so far.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #3

Sezai YILMAZ

Richard Huxton wrote:

PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

Nov 22 '05 #4

Shridhar Daithankar

On Wednesday 14 January 2004 17:57, Sezai YILMAZ wrote:

Richard Huxton wrote:
What are you using the count() for?

I use count() for some statistics. Just to show how many records
collected so far.

Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

This would give you approximate count. I believe it should suffice for your
needs.

HTH

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #5

Matthew Lunnon

Have you run 'vacuum analyze log;'? Also I believe that in Oracle count(1) used to be quicker than count(*).
Matthew
----- Original Message -----
From: Sezai YILMAZ
To: Richard Huxton
Cc: pg***********@postgresql.org
Sent: Wednesday, January 14, 2004 12:39 PM
Subject: Re: [GENERAL] Huge Data
Richard Huxton wrote:

PG uses MVCC to manage concurrency. A downside of this is that to verifythe
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

-sezai
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

__________________________________________________ ___________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

Nov 22 '05 #6

Shridhar Daithankar

On Wednesday 14 January 2004 18:22, Matthew Lunnon wrote:

select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

How about

select logid, agentid, logbody from log where logid='3000000';

or

select logid, agentid, logbody from log where logid=3000000::int4;

Basically you need to typecast the constant. Then it would use the index.

I am not sure of first form of it though. I recommend you use the later form.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #7

Richard Huxton

On Wednesday 14 January 2004 12:39, Sezai YILMAZ wrote:

select logid, agentid, logbody from log where logid=3000000;

At a guess, because logid is bigint, whereas 300000 is taken to be integer.
Try ... where logid = 300000::bigint;

This is in the FAQ too I think, and is certainly in the archives.

Other things you might come across:
SELECT max() involves a sequential scan just like count(), you can rewrite it
as SELECT target_column FROM my_table ORDER BY target_column DESC LIMIT 1

The config values are very conservative. You will definitely want to tune them
for performance. See the articles here for a good introduction:
http://www.varlena.com/varlena/Gener...bits/index.php

The VACUUM command is used to reclaim unused space, and the ANALYZE command to
regenerate statistics. It's worth reading up on both.

You can use EXPLAIN ANALYSE <query here> to see the plan that PG uses. I think
there's a discussion of it at http://techdocs.postgresql.org/

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #8

Richard Huxton

On Wednesday 14 January 2004 12:27, Sezai YILMAZ wrote:

Richard Huxton wrote:
There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

I use count() for some statistics. Just to show how many records
collected so far.

If you want an accurate number without scanning the table, you'll need to use
a trigger to keep a count up to date.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #9

Sezai YILMAZ

Shridhar Daithankar wrote:

Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

Thank you very much Shridhar. This one is responsive immediately. I
think I will use this method for gathering row count. But I complain to
break SQL standards. The code will become unmovable.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #10

Franco Bruno Borghesi

If the mentioned solution fits your needs, you could create a stored
procedure using that. The postgreSQL implementation could select from
pg_class, while the same function in other database could execute the
select count() on the table.

On Wed, 2004-01-14 at 10:25, Sezai YILMAZ wrote:

Shridhar Daithankar wrote:

Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

Thank you very much Shridhar. This one is responsive immediately. I
think I will use this method for gathering row count. But I complain to
break SQL standards. The code will become unmovable.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQBABUbK21dVnhLsBV0RAuyGAJ4z6AJcbnEw9njiOrtbsF nd/a7sGwCeNeTy
G/GkE8IfE/MSNGLqGsHGoNM=
=197f
-----END PGP SIGNATURE-----

Nov 22 '05 #11

Sezai YILMAZ

Shridhar Daithankar wrote:

On Wednesday 14 January 2004 18:22, Matthew Lunnon wrote:

select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

How about

select logid, agentid, logbody from log where logid='3000000';

Oh my god. It is unbelievable. The result is great. Thanks for all guys
who helped me.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #12

Shridhar Daithankar

On Wednesday 14 January 2004 18:55, Sezai YILMAZ wrote:

Shridhar Daithankar wrote:
Rather than doing count(*), you should either cache the count in
application memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

Thank you very much Shridhar. This one is responsive immediately. I
think I will use this method for gathering row count. But I complain to
break SQL standards. The code will become unmovable.

Well, you could document it somewhere for your reference. It is not that
hard..:-)

And remember, the value you get is just an estimate. You need to analyze the
table often with respect to it's update/insert/delete activity to keep the
estimate reasonable accurate. Vacuuming would also update the estimate.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #13

Similar topics

Huge XML data needed

by: Beda Christoph Hammerschmidt | last post by:

I wat to perform some performance measurements on an XML database. FOr this reason i need some huge XML sample data. The data should be not too structured and a lot of reasonable queries should...

.NET Framework

XML or a (relational) database for HUGE files

by: hakhan | last post by:

Hello, I need to store huge(+/- 100MB) data. Furthermore, my GUI application must select data portions from these huge data files in order to do some post-processing. I wonder in which format I...

.NET Framework

data directory growing huge

by: Esger Abbink | last post by:

Hello, it is very possible that this is a well described problem, but I have not been able to find the solution. On two production server (7.2rc2) of ours the data directory is growing to...

PostgreSQL Database

Huge form with data binding

by: amanatio | last post by:

I have a huge form with many data bound controls on it and 34 tables in database (and of course 34 data adapters and 34 datasets). The form is extremely slow to design (huge delay when I go to code...

C# / C Sharp

Trouble with huge amount of State Server Sessions Timed out

by: Daniel Walzenbach | last post by:

Hi, I have a web application which sometimes throws an â€œout of memoryâ€ exception. To get an idea what happens I traced some values using performance monitor and got the following values (for...

ASP.NET

huge variables

by: Peter Hansen | last post by:

Is in any way possible to define a variable as a integer-value which contain about 300 numbers in a row? I have thought about an Override-function for Data-types but I dunno how - but if it is...

Visual Basic .NET

huge XML files, XSLT memory problems, Java & SAX...

by: Jeff Calico | last post by:

I have 2 XML data files that I want to extract data from simultaneously and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs...

.NET Framework

My ViewState is huge! What can I do?

by: Gummy | last post by:

Hello, I have an ASPX page on which I place a UserControl 15 times (they only need to be static controls on the page). This UserControl is a set of two listboxes with radiobuttons above the...

ASP.NET

Need Help On Best Querying ( is LINQ work with Huge amount of data..)

by: ranganadh | last post by:

Dear Group members, I am new to LINQ, pls help on the deeling with huge amount of data with the C# stand Alone application. I have two file, which contains more then 2 lacs lines in every...

C# / C Sharp

dealing with huge data

by: pereges | last post by:

ok so i have written a program in C where I am dealing with huge data(millions and lots of iterations involved) and for some reason the screen tends to freeze and I get no output every time I...

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice