473,406 Members | 2,371 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Bulk import date into table.

----------------------------------------
BACKGROUND:

My company (www.glass.biz) keeps information about products in a
network-shared (samba) filesystem. Each product has assigned to it
directory, and there our engineers save informations for ERP system.
This information is gathered by travelling whole directory structure
and reading text files. Our engineers work whole days producing more
and more information about new products (individual orders).

----------------------------------------
REAL PROBLEM:

Now I want to make ERP system faster and more reliable by using
PostgreSQL. But one question arrives I can't answer on myself nor by
googling around.

Every period of time (for example 1 day, 1 hour, or 1 minute) I will
have to synchronize my database with information stored in text files.
So how do I tell Postgres to:

1. insert row if data about product doesn't already exist in table

2. delete row if data about product was deleted from text files

3. update row if data has changed
----------------------------------------
MY SOLUTIONS:

1. Start transaction. For each entry in text file do:
-- use SELECT to check if entry exists
-- if entry exists update values
-- if entry doesn't exist insert values
Commit transaction.

In this solution rows that should be deleted are not deleted
and this is wrong. Furthermore I do select for each entry,
and this could take very long, so I think this is not good
solution.

2. Start transaction. Delete all rows from table.
Insert rows from text file. Commit transaction.

In this solution everything works okey, but I think this is
strange to replace whole table by encapsulating it into
transaction. From documentation I know that Postgres will
keep old rows until vacumm is done.
----------------------------------------

I would be very grateful if someone experienced would
advise any reasonable solution.

Leszek Dubiel
Jul 19 '05 #1
4 5257
The world rejoiced as le****@dubiel.pl (Leszek Dubiel) wrote:
2. Start transaction. Delete all rows from table.
Insert rows from text file. Commit transaction.

In this solution everything works okey, but I think this is
strange to replace whole table by encapsulating it into
transaction. From documentation I know that Postgres will keep
old rows until vacumm is done.


Option #2 is certainly the simplest way to do this.

In #1, you missed one of the steps:

-- If entry in database does not exist in file, then delete from
database

which essentially isn't something you can do "for entry in text file."

It's _way_ more complex to do it row by row, particularly when part of
the logic isn't row-by-row.

You'll want to VACUUM the table in question a little while after each
time it gets replaced, by the way.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxdistributions.html
All ITS machines now have hardware for a new machine instruction --
XOI Execute Operator Immediate.
Please update your programs.
Jul 19 '05 #2
> > 2. Start transaction. Delete all rows from table.
Insert rows from text file. Commit transaction.


Option #2 is certainly the simplest way to do this.


Thank you very much. I thought that it is to simple
to be truth.

Leszek Dubiel
Jul 19 '05 #3
Clinging to sanity, le****@dubiel.pl (Leszek Dubiel) mumbled into her beard:
> 2. Start transaction. Delete all rows from table.
> Insert rows from text file. Commit transaction.


Option #2 is certainly the simplest way to do this.


Thank you very much. I thought that it is to simple
to be truth.


If the table you keep replacing gets Real Big, this approach will get
steadily Less Nice, as you'll be replacing a whole lot of data on a
regular basis. So that would be bad.

But if, as you say, the data in the data file is always the new,
"authoritative" source of data, then while there may be clever ways to
diminish the amount of work needed to load it, regularly replacing the
NonAuthoritative Data in the database with the Authoritative Data in
the file is surely appropriate.

"Replace all the data" is definitely a "Exercise Brute Force" sort of
method.

When Brute Force works, it works. Sometimes we get query plans that
involve Seq Scans, which are the query equivalent to Brute Force,
because that is, like it or not, the best way to get the answer.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www3.sympatico.ca/cbbrowne/lisp.html
Computers in the future may weigh no more than 1.5 tons. -Popular
Mechanics, forecasting the relentless march of science, 1949
Jul 19 '05 #4
Christopher Browne wrote:
Clinging to sanity, le****@dubiel.pl (Leszek Dubiel) mumbled into her beard:
2. Start transaction. Delete all rows from table.
Insert rows from text file. Commit transaction.

Option #2 is certainly the simplest way to do this.


Thank you very much. I thought that it is to simple
to be truth.

If the table you keep replacing gets Real Big, this approach will get
steadily Less Nice, as you'll be replacing a whole lot of data on a
regular basis. So that would be bad.

But if, as you say, the data in the data file is always the new,
"authoritative" source of data, then while there may be clever ways to
diminish the amount of work needed to load it, regularly replacing the
NonAuthoritative Data in the database with the Authoritative Data in
the file is surely appropriate.


You may get away with using the modification time on those files and
keep track of them in the database. It would be faster, but more
sensitive to errors (a filesystem that reports times incorrectly, for
example) and you would still need to traverse the entire filesystem to
find the files that need updating.

If your OS supports it, you could write something that triggers the
database when a file changes. Something like 'kernel queues' could help
you in that case. You'll probably have to write some library.

Another solution would be to use a filesystem that itself is in a
database. I know Oracle used to have something like that (I never used
it, though), but there may be open source alternatives. I would love to
see one for PostgreSQL :D

You said "text files"... Ever thought about CVS (or alternatives)?

And then there is the brute force approach mentioned before, of course.
Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: me | last post by:
I'm also having problems getting the bulk insert to work. I don't know anything about it except what I've gleened from BOL but I'm not seeming to get anywhere...Hopefully there is some little (or...
4
by: Mike | last post by:
I am getting a type mismatch error when I do a bulk insert. ---Begin Error Msg--- Server: Msg 4864, Level 16, State 1, Line 1 Bulk insert data conversion error (type mismatch) for row 1, column...
16
by: Philip Boonzaaier | last post by:
I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums...
0
by: Kiran | last post by:
Hi, Does some one here know how to trigger export(bulk/non bulk) and use bulk import without having to manually edit the Xsd file before import. BTW I am assuming that XML is the correct choice...
9
by: David Rysdam | last post by:
I have a large amount of data that I copy in and out of Sybase very often. Now I also want to copy this data in and out of postgres. I have an existing script that creates the entire database(s)...
3
by: Davy B | last post by:
I am trying to import a data file, which is tab delimited, using BULK INSERT. I have used BCP to create a format file, since the destination table has around 20 columns, but the data file has only...
11
by: Ted | last post by:
OK, I tried this: USE Alert_db; BULK INSERT funds FROM 'C:\\data\\myData.dat' WITH (FIELDTERMINATOR='\t', KEEPNULLS, ROWTERMINATOR='\r\n');
3
by: Davor | last post by:
I'm trying to import data from flat file in table and have few problems. 1. Field Delimiter is ',' (comma). If ',' occurs in quoted string it is still treated as field delimiter. This is BUG or...
2
by: fperri | last post by:
I am using SQL Server 2005 and SQL Server Management Studio Express. I'm new to stored procedures and I was creating this one to test out the BULK INSERT sql command. When I execute it says...
3
by: akdemirc | last post by:
i have a problem with large data import to a db in sql server.. Actually i have an application that collects data from an environment and dispatches this data to different csv files for sql server to...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.