473,788 Members | 3,078 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Huge Data

Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.

The hardware is: PIII 800 MHz processor, 512 MB RAM, and IDE hard disk
drive.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #1
12 2327
On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:
Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.


PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #2
Richard Huxton wrote:
On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:

Hi,

I use PostgreSQL 7.4 for storing huge amount of data. For example 7
million rows. But when I run the query "select count(*) from table;", it
results after about 120 seconds. Is this result normal for such a huge
table? Is there any methods for speed up the querying time? The huge
table has integer primary key and some other indexes for other columns.


PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

I use count() for some statistics. Just to show how many records
collected so far.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #3
Richard Huxton wrote:
PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?


select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

-sezai
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #4
On Wednesday 14 January 2004 17:57, Sezai YILMAZ wrote:
Richard Huxton wrote:
What are you using the count() for?


I use count() for some statistics. Just to show how many records
collected so far.


Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

This would give you approximate count. I believe it should suffice for your
needs.

HTH

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 22 '05 #5
Have you run 'vacuum analyze log;'? Also I believe that in Oracle count(1) used to be quicker than count(*).
Matthew
----- Original Message -----
From: Sezai YILMAZ
To: Richard Huxton
Cc: pg***********@p ostgresql.org
Sent: Wednesday, January 14, 2004 12:39 PM
Subject: Re: [GENERAL] Huge Data
Richard Huxton wrote:
PG uses MVCC to manage concurrency. A downside of this is that to verifythe
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?



select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

-sezai
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

_______________ _______________ _______________ _______________ _________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

Nov 22 '05 #6
On Wednesday 14 January 2004 18:22, Matthew Lunnon wrote:
select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?


How about

select logid, agentid, logbody from log where logid='3000000' ;

or

select logid, agentid, logbody from log where logid=3000000:: int4;

Basically you need to typecast the constant. Then it would use the index.

I am not sure of first form of it though. I recommend you use the later form.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #7
On Wednesday 14 January 2004 12:39, Sezai YILMAZ wrote:

select logid, agentid, logbody from log where logid=3000000;


At a guess, because logid is bigint, whereas 300000 is taken to be integer.
Try ... where logid = 300000::bigint;

This is in the FAQ too I think, and is certainly in the archives.

Other things you might come across:
SELECT max() involves a sequential scan just like count(), you can rewrite it
as SELECT target_column FROM my_table ORDER BY target_column DESC LIMIT 1

The config values are very conservative. You will definitely want to tune them
for performance. See the articles here for a good introduction:
http://www.varlena.com/varlena/Gener...bits/index.php

The VACUUM command is used to reclaim unused space, and the ANALYZE command to
regenerate statistics. It's worth reading up on both.

You can use EXPLAIN ANALYSE <query here> to see the plan that PG uses. I think
there's a discussion of it at http://techdocs.postgresql.org/

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #8
On Wednesday 14 January 2004 12:27, Sezai YILMAZ wrote:
Richard Huxton wrote:
There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?


I use count() for some statistics. Just to show how many records
collected so far.


If you want an accurate number without scanning the table, you'll need to use
a trigger to keep a count up to date.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #9
Shridhar Daithankar wrote:
Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

Thank you very much Shridhar. This one is responsive immediately. I
think I will use this method for gathering row count. But I complain to
break SQL standards. The code will become unmovable.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
1969
by: Beda Christoph Hammerschmidt | last post by:
I wat to perform some performance measurements on an XML database. FOr this reason i need some huge XML sample data. The data should be not too structured and a lot of reasonable queries should make sense. Any idea, where i can get this data ??
0
1236
by: hakhan | last post by:
Hello, I need to store huge(+/- 100MB) data. Furthermore, my GUI application must select data portions from these huge data files in order to do some post-processing. I wonder in which format I should put my data in? XML or just a (relational) database? Or should I use an XML database (native or xml-enabled?)??? I am a little bit confused .... If I'd put the data in XML files, then loading the entire XML tree in memory(DOM) would...
3
4303
by: Esger Abbink | last post by:
Hello, it is very possible that this is a well described problem, but I have not been able to find the solution. On two production server (7.2rc2) of ours the data directory is growing to very large sizes while the data that is actually in the db's isnt 1. that large and 2. growing. The databases see a fairly limited/constant use at the moment. The data
5
2448
by: amanatio | last post by:
I have a huge form with many data bound controls on it and 34 tables in database (and of course 34 data adapters and 34 datasets). The form is extremely slow to design (huge delay when I go to code from design mode or vice versa) and to show. I didn't design the form but I WILL redisgn it from scratch. What would you propose me to do? The form now haw a tab control with 7 tabs and about 600 text boxes that are bounded to 34 datasets... ...
6
3807
by: Daniel Walzenbach | last post by:
Hi, I have a web application which sometimes throws an “out of memory” exception. To get an idea what happens I traced some values using performance monitor and got the following values (for one day): \\FFDS24\ASP.NET Applications(_LM_W3SVC_1_Root_ATV2004)\Errors During Execution: 7 \\FFDS24\ASP.NET Apps v1.1.4322(_LM_W3SVC_1_Root_ATV2004)\Compilations
7
1601
by: Peter Hansen | last post by:
Is in any way possible to define a variable as a integer-value which contain about 300 numbers in a row? I have thought about an Override-function for Data-types but I dunno how - but if it is possible that a Data-type which can hold an unlimited numbers of numbers :D exists I would be happy or if it is possible to Override the Dim-function... Hilsen fra Peter
12
5364
by: Jeff Calico | last post by:
I have 2 XML data files that I want to extract data from simultaneously and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs out of space. I only need a few branches of elements from the original XML, so I am seeking a recomended way of building a DOM for XSLT of only the elements that I need. I'm writing a Java application that invokes Xalan, and reading up on SAX...
3
3489
by: Gummy | last post by:
Hello, I have an ASPX page on which I place a UserControl 15 times (they only need to be static controls on the page). This UserControl is a set of two listboxes with radiobuttons above the listbox (to select between viewing a code or description in the listbox). There are also left and right arrows that move the selected items between the listboxes. In my Page_Load I assign each UserControl the appropriate DataTable and this...
0
1365
by: ranganadh | last post by:
Dear Group members, I am new to LINQ, pls help on the deeling with huge amount of data with the C# stand Alone application. I have two file, which contains more then 2 lacs lines in every file suppose file1 like ...
16
2045
by: pereges | last post by:
ok so i have written a program in C where I am dealing with huge data(millions and lots of iterations involved) and for some reason the screen tends to freeze and I get no output every time I execute it. However, I have tried to reduce the amount of data and the program runs fine. What could possibly be done to resolve this ?
0
9498
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10363
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5535
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4069
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.