473,385 Members | 1,333 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

extreme memory use when loading in a lot of data

I have some historic data that I want to analyze. To do this I set up
postgres on a spare box I picked up for cheap, which just lucked into
having tons of RAM (1.5G). I set up postgres to use 10000 buffers,
and recompiled the kernel to allow 2Gb data size limit per process.

Since this is historical data, I'm actually merging a couple of dumps
that span the time range. I've dealt with eliminating any conflicting
data (ie, clashing unique keys) but I'm not 100% sure that the foreign
key constraints are all met. Thus, when loading the data from the
second dump, I am leaving the FK triggers on.

Now, this is where my trouble has begun... On importing row 29,796,801
for the first big table, I get this (after 27 hours!):

pg_restore: ERROR: out of memory
DETAIL: Failed on request of size 44.
CONTEXT: COPY msg_recipients, line 29796801: "75312 3434358 sent"
pg_restore: [archiver (db)] error returned by PQendcopy

(I had originally tried this with a 1Gb data size limit on the
process, and it died at around row 15 million.)

The curious thing is that watching the process size grow on another
window, is that it shrunk considerably a few minutes before it
croaked:
% ps axuw | grep postg
pgsql 26032 6.0 7.9 2186888 123440 ?? D Thu09AM 122:26.10 postmaster: khera vkmlm [local] COPY (postgres)

Note here that we are using nearly 2gb virtual memory (I'm sure a
bunch of that is stack and the shared memory segment).

a few minutes later I saw this:

% ps axuw | grep postg
pgsql 26032 10.4 5.5 91840 85624 ?? S Thu09AM 123:17.24 postmaster: khera vkmlm [local] idle (postgres)

note the process size is down to 91M. A few minutes later I got the
out of memory error. This is very curious because I don't expect the
process to release the memory back to the OS like that.

There are about 157 million records in this table, three columns wide.

FreeBSD 4.10-PRERELEASE, PostgreSQL 7.4.2

So, is there any way to load this in or do I need to either break it
into chunks (no small feat) or disable triggers during load and hope
and pray there are no FK violations?

Is there some leak or does it just take that much RAM to load in data
from a table?

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #1
3 1867

On Fri, 21 May 2004, Vivek Khera wrote:
I have some historic data that I want to analyze. To do this I set up
postgres on a spare box I picked up for cheap, which just lucked into
having tons of RAM (1.5G). I set up postgres to use 10000 buffers,
and recompiled the kernel to allow 2Gb data size limit per process.

Since this is historical data, I'm actually merging a couple of dumps
that span the time range. I've dealt with eliminating any conflicting
data (ie, clashing unique keys) but I'm not 100% sure that the foreign
key constraints are all met. Thus, when loading the data from the
second dump, I am leaving the FK triggers on.
I'd suggest dropping the constraints, adding the data and adding the
constraint again. If you're using 7.4 the speed will be better for
checking the constraint, and if the constraint is not satisfied, you'll
need to remove the offending row and recreate the constraint, but that's
better than having to reimport.
Now, this is where my trouble has begun... On importing row 29,796,801
for the first big table, I get this (after 27 hours!):


I'd wonder if some large portion of the memory is the deferred trigger
queue which doesn't yet spill over to disk when it gets too large.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2
Vivek Khera <kh***@kcilink.com> writes:
Since this is historical data, I'm actually merging a couple of dumps
that span the time range. I've dealt with eliminating any conflicting
data (ie, clashing unique keys) but I'm not 100% sure that the foreign
key constraints are all met. Thus, when loading the data from the
second dump, I am leaving the FK triggers on.


I think you'd be better off to drop the FK constraint, import, and
re-add the constraint. The out-of-memory problem is probably due to the
list of deferred trigger firings (one per tuple, or more if you have
multiple FKs to check). Even if you had enough memory, you'd not have
enough patience for all those retail FK checks to occur after the COPY
finishes.

At least in 7.4, adding an FK constraint on an existing table should
produce a better plan than the retail checks involved in adding rows to
a table with an existing FK constraint.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #3
>> key constraints are all met. Thus, when loading the data from the
second dump, I am leaving the FK triggers on.


I'd suggest dropping the constraints, adding the data and adding the
constraint again. If you're using 7.4 the speed will be better for
checking the constraint, and if the constraint is not satisfied, you'll
need to remove the offending row and recreate the constraint, but
that's
better than having to reimport.


Thanks to Stephan and Tom for the same suggestion. I'm also glad to
know it is not a leak but just a large amount of memory use. The load
is still taking a LOOONG time because it is just a lowly IDE disk (but
it is UDMA100)... 96 hours so far and the memory usage is steady.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Anon | last post by:
Hello all, I'm hoping for some guidance here... I am a c/c++ "expert", but a complete python virgin. I'm trying to create a program that loads in the entire FreeDB database (excluding the...
2
by: assi | last post by:
Hello all We are developing a large dotnet application, which includes ~ 120 assemblies. (total size of all binaries is ~ 20MB). Our application also references the following dotnet assemblies:...
16
by: Justin Lazanowski | last post by:
Cross posting this question on the recommendation of an I have a .NET application that I am developing in C# I am loading information in from a dataset, and then pushing the dataset to a grid,...
8
by: vikram | last post by:
i have series of questions 1.How a c program is loaded in memory i mean the whats is the structure that the code segment?? data segment?? 2.When you say const int *p; where is p...
1
by: Teemu Keiski | last post by:
Hi, I have following type of scenario (also explained here http://blogs.aspadvice.com/joteke/archive/2005/01/10/2196.aspx ) We have problematic web server (wink2 Standard, 1.5GB of physical...
4
by: slaprade | last post by:
I am loading a weeks worth of web logs into a dataset using Imports Microsoft.Data.Odbc These are text - fixed length fields so I have written a schema for them. The adapter fill looks like this...
8
by: Sean | last post by:
I have a service that is pulling alot of records from a SQL Server table in a DataSet. This process takes up alot of memory, which is to be expected. But when the process is finished, I am clearing...
2
by: Mike | last post by:
Hi, I am new to C and having problems with the following program. Basically I am trying to read some files, loading data structures into memory for latter searching. I am trying to use structres...
4
by: FrankMeng | last post by:
When I tried to load 2GB tiff file, I got an "out of memory" error. I have 512MB memory, but I set virtual memory to 5GB. Is it possible to cut the big tiff file into several sub-images without...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.