473,385 Members | 1,622 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Insert speed question

Hello List,

I'm importing some data from Foxpro to Postgres, there is atable wich
contains aprox 4.8 million rows and it size about 830MB. I uploaded it
to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
Now I'm inserting some data from that table to a brand new table in
Postgresql, for that I'm doing insert into ... select from. The point is
inserting this data from one table to another table in Postgresql took
about 35 minutes ago. Is that the expected behavior in Postgres?

BTW both tables have no indices or triggers, my Postgres version is 7.4
running on a dual Xeon 2.8 with 2GB ram and about 11GB available on the
partition where Postgres is Installed.

Settings in postgresql.conf are:

effective_cache_size = 170000 # typically 8KB each
sort_mem = 131072 # min 64, size in KB
checkpoint_segments = 10
shared_buffers = 63000 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 400 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 80000 # min 1000, fsm is free space map,
max_locks_per_transaction = 64 # min 10
tcpip_socket = true
max_connections = 128
Thanks in advance
--
Sinceramente,
Josué Maldonado.
"Te dejaré de amar el día que un pintor pinte sobre su tela el sonido de
una lágrima."

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #1
8 2529
Josué Maldonado wrote:
sort_mem = 131072 # min 64, size in KB


128 MB for sort_mem is really an huge ammount of
memory considering that is not system-wide but
almost for process ( under certain operations a
single process can use more then this quantity ).
Hackers: am I wrong ?
Regards
Gaetano Mendola
Nov 23 '05 #2
On Tuesday 01 June 2004 01:35, Josué Maldonado wrote:
Hello List,

I'm importing some data from Foxpro to Postgres, there is atable wich
contains aprox 4.8 million rows and it size about 830MB. I uploaded it
to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
Now I'm inserting some data from that table to a brand new table in
Postgresql, for that I'm doing insert into ... select from. The point is
inserting this data from one table to another table in Postgresql took
about 35 minutes ago. Is that the expected behavior in Postgres?


Can you generate explain analyze for insert into.. select from? Most probably
it is using seq. scan because you haven't analysed after inserting 4.8M rows.

Do a vacuum verbose analyze tablename and reattempt inter into.. select from.

You can also read general tuning guide at

http://www.varlena.com/varlena/Gener...bits/index.php

HTH

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #3
Gaetano Mendola wrote:
Josué Maldonado wrote:
sort_mem = 131072 # min 64, size in KB

128 MB for sort_mem is really an huge ammount of
memory considering that is not system-wide but
almost for process ( under certain operations a
single process can use more then this quantity ).
Hackers: am I wrong ?


Not a hacker, but you're right. It's the amount of memory *per sort*. Of
course, Josue might have a Terabyte of RAM but it's unlikely.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #4
Thanks for your responses,

I did the vacuum but I cannot make the insert again at this moment, even
when that server is not in production so all the resources should be
dedicated to Postgres I think I still have some perfomance issues

Did some changes to postgresql.conf according the tuning guide:
tcpip_socket = true
max_connections = 28
shared_buffers = 32768 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 500 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 80000 # min 1000, fsm is free space map, ~6
max_locks_per_transaction = 64 # min 10
sort_mem = 16384 # min 64, size in KB
vacuum_mem = 419430 # min 1024, size in KB
checkpoint_segments = 10
effective_cache_size = 819200 # typically 8KB each

Shmmax is:
/proc/sys/kernel: cat shmmax
536870912

A simple query on the 4.8 million row table:

dbmund=# explain analyze select * from pkardex where pkd_procode='8959';
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_pkardex_procode on pkardex (cost=0.00..3865.52
rows=991 width=287) (actual time=10.879..100.914 rows=18 loops=1)
Index Cond: (pkd_procode = '8959'::bpchar)
Total runtime: 101.057 ms
(3 rows)
A simple query on 1.2 million row

explain analyze select * from pmdoc where pdc_docto='744144';
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_pmdoc_docto on pmdoc (cost=0.00..5.20 rows=2
width=206) (actual time=0.081..0.085 rows=1 loops=1)
Index Cond: (pdc_docto = '744144'::bpchar)
Total runtime: 0.140 ms
(3 rows)
I would appreciate any comment or suggestion, does a hardware upgrade is
needed, does it seems "normal" for postgresql perfomance.

Thanks in advance


El 01/06/2004 1:35 AM, Shridhar Daithankar en su mensaje escribio:
On Tuesday 01 June 2004 01:35, Josué Maldonado wrote:
Hello List,

I'm importing some data from Foxpro to Postgres, there is atable wich
contains aprox 4.8 million rows and it size about 830MB. I uploaded it
to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
Now I'm inserting some data from that table to a brand new table in
Postgresql, for that I'm doing insert into ... select from. The point is
inserting this data from one table to another table in Postgresql took
about 35 minutes ago. Is that the expected behavior in Postgres?

Can you generate explain analyze for insert into.. select from? Most probably
it is using seq. scan because you haven't analysed after inserting 4.8M rows.

Do a vacuum verbose analyze tablename and reattempt inter into.. select from.

You can also read general tuning guide at

http://www.varlena.com/varlena/Gener...bits/index.php

HTH

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

--
Sinceramente,
Josué Maldonado.
"Toda otra ciencia es perjudicial a quien no posee la ciencia de la
bondad." Michel Eyquen de Montaigne. Filósofo y escritor francés.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #5
On Tuesday 01 June 2004 21:42, Josué Maldonado wrote:
Thanks for your responses,

I did the vacuum but I cannot make the insert again at this moment, even
when that server is not in production so all the resources should be
dedicated to Postgres I think I still have some perfomance issues
I am not sure I understand. You could not insert? Why? Was there any problem
with database? Can you use typical linux tools such as vmstat/top to locate
the bottleneck?
Did some changes to postgresql.conf according the tuning guide:
tcpip_socket = true
max_connections = 28
shared_buffers = 32768 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 500 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 80000 # min 1000, fsm is free space map, ~6
max_locks_per_transaction = 64 # min 10
sort_mem = 16384 # min 64, size in KB
vacuum_mem = 419430 # min 1024, size in KB
checkpoint_segments = 10
effective_cache_size = 819200 # typically 8KB each
OK, I would say the parameters are still slightly oversized but there is no
perfect set of parameters. You still might have to tune it according to your
usual workload.
A simple query on the 4.8 million row table:

dbmund=# explain analyze select * from pkardex where pkd_procode='8959';
QUERY
PLAN
---------------------------------------------------------------------------
------------------------------------------------------------ Index Scan
using ix_pkardex_procode on pkardex (cost=0.00..3865.52 rows=991
width=287) (actual time=10.879..100.914 rows=18 loops=1)
Index Cond: (pkd_procode = '8959'::bpchar)
Total runtime: 101.057 ms
(3 rows)
A simple query on 1.2 million row

explain analyze select * from pmdoc where pdc_docto='744144';
QUERY PLAN

---------------------------------------------------------------------------
--------------------------------------------- Index Scan using
ix_pmdoc_docto on pmdoc (cost=0.00..5.20 rows=2 width=206) (actual
time=0.081..0.085 rows=1 loops=1)
Index Cond: (pdc_docto = '744144'::bpchar)
Total runtime: 0.140 ms
(3 rows)
I wouldn't say these timings have performance issues. 100ms is pretty fastso
much is 0.140 ms.

Note that there is a latency involved. No matter how much you tune, it can not
drop below a certain level. On my last machine(P-III/1GHz with IDE disk) I
observed it to be 200ms no matter what you do. But it could do 70 concurrent
connections with worst case latency of 210ms.(This was long back so number
means little but this is just an illustration)

This could be different on your setup but trend should be roughly same.
I would appreciate any comment or suggestion, does a hardware upgrade is
needed, does it seems "normal" for postgresql perfomance.


I would ask the question otherway round. What is the level of performance you
are looking at for your current workload. By how much this performance is
worse than your expectation?

IMO it is essential to set a target for performance tuning otherwise it
becomes an endless loop with minimal returns..

HTH

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6
Hello Shridhar,

El 02/06/2004 1:16 AM, Shridhar Daithankar en su mensaje escribio:
I am not sure I understand. You could not insert? Why? Was there any problem
with database? Can you use typical linux tools such as vmstat/top to locate
the bottleneck?

I was unable to make the insert at that moment, after the changes to
postgresql.conf the speed increased, here is the explain:

dbmund=# explain analyze
dbmund-# insert into pk2
dbmund-# (pkd_stamp,pkd_fecha,
doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund(# pkd_docto,pkd_es,pkd_qtysold,pkd_qtyinv,
dbmund(# pkd_unidad,pkd_price,pkd_costo,pkd_saldo,
dbmund(# pkd_procode,pkd_custid,pkd_vendedor,pkd_tipprice,
dbmund(# pkd_totprice,pkd_estanulo,pkd_estmes,pkd_porcomisi on,
dbmund(# pkd_rutafk,pkd_provcode,pkd_depto,pkd_pk,
dbmund(# pkd_totcost,pkd_doctipo2,pkd_doctipo,fk_autorizaci on,
dbmund(# fk_devolucion,pkd_udindexfk,pkd_unidadfk,pkd_prodn o,
dbmund(# pkd_gravada,pkd_fraimp,pkd_imsove,pkd_clmayor,
dbmund(# pkd_cajanum,pkd_es)
dbmund-# select fkardex,facfec,facnum,es,tqtysold,
dbmund-# invqty,unidad,fprice,fcost,saldo,
dbmund-# substr(prod_no,8,4),codclie,who_sold,
dbmund-# pre_tipo,fprice*tqtysold,'U',dtos(fkardex),
dbmund-# por_comisi,'XXX',substr(prod_no,1,3),
dbmund-# substr(prod_no,5,2),OID,fcost*tqtysold,
dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund-# fra_imp,imsove,clmayor,cajanum,es
dbmund-# from hisventa
dbmund-# ;
ERROR: column "pkd_es" specified more than once
dbmund=# explain analyze
dbmund-# insert into pk2
dbmund-# (pkd_stamp,pkd_fecha,
doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund(# pkd_docto,pkd_es,pkd_qtysold,pkd_qtyinv,
dbmund(# pkd_unidad,pkd_price,pkd_costo,pkd_saldo,
dbmund(# pkd_procode,pkd_custid,pkd_vendedor,pkd_tipprice,
dbmund(# pkd_totprice,pkd_estanulo,pkd_estmes,pkd_porcomisi on,
dbmund(# pkd_rutafk,pkd_provcode,pkd_depto,pkd_pk,
dbmund(# pkd_totcost,pkd_doctipo2,pkd_doctipo,fk_autorizaci on,
dbmund(# fk_devolucion,pkd_udindexfk,pkd_unidadfk,pkd_prodn o,
dbmund(# pkd_gravada,pkd_fraimp,pkd_imsove,pkd_clmayor,
dbmund(# pkd_cajanum)
dbmund-# select fkardex,facfec,facnum,es,tqtysold,
dbmund-# invqty,unidad,fprice,fcost,saldo,
dbmund-# substr(prod_no,8,4),codclie,who_sold,
dbmund-# pre_tipo,fprice*tqtysold,'U',dtos(fkardex),
dbmund-# por_comisi,'XXX',substr(prod_no,1,3),
dbmund-# substr(prod_no,5,2),OID,fcost*tqtysold,
dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund-# fra_imp,imsove,clmayor,cajanum
dbmund-# from hisventa;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Seq Scan on hisventa (cost=0.00..633607.24 rows=4882546 width=149)
(actual time=26.647..363517.935 rows=4882546 loops=1)
Total runtime: 1042927.167 ms
(2 rows)

dbmund=#

Did some changes to postgresql.conf according the tuning guide:
tcpip_socket = true
max_connections = 28
shared_buffers = 32768 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 500 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 80000 # min 1000, fsm is free space map, ~6
max_locks_per_transaction = 64 # min 10
sort_mem = 16384 # min 64, size in KB
vacuum_mem = 419430 # min 1024, size in KB
checkpoint_segments = 10
effective_cache_size = 819200 # typically 8KB each

OK, I would say the parameters are still slightly oversized but there is no
perfect set of parameters. You still might have to tune it according to your
usual workload.


As I said before the server is not yet in production, the expected
connections are 80-100 in normal day, the users tasks in the system
affects the following areas: inventory, sales, customers, banks, and
accounting basically, I know there is no rule for tuning but I'll
aprecciate your comment about the parameters for such scenario.
I would ask the question otherway round. What is the level of performance you
are looking at for your current workload. By how much this performance is
worse than your expectation?


Since I have not tested the server with the production workload yet,
maybe my perpception of performance is not rigth focused, basically my
expectation is database must be faster than the current old legacy
Foxpro system.

Thanks,
--
Sinceramente,
Josué Maldonado.
"La monogamia es como estar obligado a comer papas fritas todos los
dias." -- Henry Miller. (1891-1980) Escritor estadounidense.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #7
On Wed, Jun 02, 2004 at 08:50:16AM -0600, Josué Maldonado wrote:
dbmund=# explain analyze
dbmund-# insert into pk2
dbmund-# (pkd_stamp,pkd_fecha,
doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund(# pkd_docto,pkd_es,pkd_qtysold,pkd_qtyinv, ^^^^^^ dbmund(# pkd_unidad,pkd_price,pkd_costo,pkd_saldo,
dbmund(# pkd_procode,pkd_custid,pkd_vendedor,pkd_tipprice,
dbmund(# pkd_totprice,pkd_estanulo,pkd_estmes,pkd_porcomisi on,
dbmund(# pkd_rutafk,pkd_provcode,pkd_depto,pkd_pk,
dbmund(# pkd_totcost,pkd_doctipo2,pkd_doctipo,fk_autorizaci on,
dbmund(# fk_devolucion,pkd_udindexfk,pkd_unidadfk,pkd_prodn o,
dbmund(# pkd_gravada,pkd_fraimp,pkd_imsove,pkd_clmayor,
dbmund(# pkd_cajanum,pkd_es) ^^^^^^ dbmund-# select fkardex,facfec,facnum,es,tqtysold,
dbmund-# invqty,unidad,fprice,fcost,saldo,
dbmund-# substr(prod_no,8,4),codclie,who_sold,
dbmund-# pre_tipo,fprice*tqtysold,'U',dtos(fkardex),
dbmund-# por_comisi,'XXX',substr(prod_no,1,3),
dbmund-# substr(prod_no,5,2),OID,fcost*tqtysold,
dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impues,
dbmund-# fra_imp,imsove,clmayor,cajanum,es
dbmund-# from hisventa
dbmund-# ;
ERROR: column "pkd_es" specified more than once ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So fix your query! Also what do you expect to happen if you put
constants in the column list? This certainly looks like a mistake to
me. Anyway you should really format your query better so you can
understand it and see obvious mistakes.
dbmund=# explain analyze
dbmund-# insert into pk2 [...] QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Seq Scan on hisventa (cost=0.00..633607.24 rows=4882546 width=149)
(actual time=26.647..363517.935 rows=4882546 loops=1)
Total runtime: 1042927.167 ms
(2 rows)


So you are inserting 4 million rows. This makes a lot of I/O so no
wonder it takes a long time. I'm not sure if the time is reasonable or
not though; 4M rows/1M ms = 4 rows/ms. Not that bad.

I would ask the question otherway round. What is the level of performance
you are looking at for your current workload. By how much this performance
is worse than your expectation?


Since I have not tested the server with the production workload yet,
maybe my perpception of performance is not rigth focused, basically my
expectation is database must be faster than the current old legacy
Foxpro system.


If you are going to have big load, you should at least try to code a
simulation with big load, doing random queries (not any query but the
actual queries you'll get from your system -- for example if this is a
web-based app you can try to use Siege or something along those lines).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No reniegues de lo que alguna vez creíste"
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #8
Hello Alvaro,

El 02/06/2004 9:58 AM, Alvaro Herrera en su mensaje escribio:
So fix your query! Also what do you expect to happen if you put
constants in the column list? This certainly looks like a mistake to
me. Anyway you should really format your query better so you can
understand it and see obvious mistakes.


I'm sorry I did copy a wrong piece of the clipboard :(
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Seq Scan on hisventa (cost=0.00..633607.24 rows=4882546 width=149)
(actual time=26.647..363517.935 rows=4882546 loops=1)
Total runtime: 1042927.167 ms
(2 rows)

So you are inserting 4 million rows. This makes a lot of I/O so no
wonder it takes a long time. I'm not sure if the time is reasonable or
not though; 4M rows/1M ms = 4 rows/ms. Not that bad.


Agree, insert time got better
I would ask the question otherway round. What is the level of performance
you are looking at for your current workload. By how much this performance
is worse than your expectation?


Since I have not tested the server with the production workload yet,
maybe my perpception of performance is not rigth focused, basically my
expectation is database must be faster than the current old legacy
Foxpro system.

If you are going to have big load, you should at least try to code a
simulation with big load, doing random queries (not any query but the
actual queries you'll get from your system -- for example if this is a
web-based app you can try to use Siege or something along those lines).


I'll take your word and will make such tests

--
Sinceramente,
Josué Maldonado.
"Tiene algo que ocultar aquel que se toma a mal las críticas." Helmut
Schmidt. Político alemán.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: John Keeling | last post by:
Dear all, I tried the test program below. My interest is to examine timing differences between insert vs. append & reverse for a list. My results on my XP Python 2.3.4 are as follows:...
4
by: wriggs | last post by:
Hi, Any suggestions on the following as I've kind of run out of ideas. I have 2 servers which are the same spec ie box, processor etc. The only difference I can tell is that the production box...
11
by: Sezai YILMAZ | last post by:
Hello I need high throughput while inserting into PostgreSQL. Because of that I did some PostgreSQL insert performance tests. ------------------------------------------------------------ --...
20
by: John Bailo | last post by:
I have a c# program that loops through a table on a DB2 database. On each iteration it assigns data to values in the SqlParameter collection. The command text is an INSERT statement to a Sql...
10
by: surya | last post by:
i have a table name is HH table it has two columns 'hhno' and hhname' HH tabele hhno hhname 100 suresh 101 baba 103 ram...
2
by: lior3790 | last post by:
Hello, I'm working on a project which carrying a large scale of database (over 5 millions records). I'm looking for a way to improve the communication speed with the SQL server in any way that i...
1
by: Tim H | last post by:
Is there an STL or similar container that behaves like a map (string keys or tempate<Tkeykeys with template<Tvalvalues) but can be iterated in insert order? I whipped up a keyed-vector as a...
8
by: SaltyBoat | last post by:
Needing to import and parse data from a large PDF file into an Access 2002 table: I start by converted the PDF file to a html file. Then I read this html text file, line by line, into a table...
24
by: Henry J. | last post by:
My app needs to insert thousand value rows into a mostly empty table (data are read from a file). I can either use inserts, or use merge. The advantage of using merge is that in the few cases...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.