473,573 Members | 2,715 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

populate table with large csv file

have the table "numberchec k"
Attribute | Type | Modifier
-----------+------------+----------
svcnumber | integer | not null
svcqual | varchar(9) |
svcequip | char(1) |
svctroub | varchar(6) |
svcrate | varchar(4) |
svcclass | char(1) |
trailer | varchar(3) |
Index: numbercheck_pke y

also have a csv file
7057211380,Y,,, 3,B
7057216800,Y,,, 3,B
7057265038,Y,,, 3,B
7057370261,Y,,, 3,B
7057374613,Y,,, 3,B
7057371832,Y,,, 3,B
4166336554,Y,,, 3,B
4166336863,Y,,, 3,B
7057201148,Y,,, 3,B

aside from parsing the csv file through a PHP interface, what isthe easiest way
to get that csv data importted into the postgres database. thoughts?

thanks

Dave

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #1
11 6955
Dave [Hawk-Systems] wrote:
aside from parsing the csv file through a PHP interface, what isthe easiest way
to get that csv data importted into the postgres database. thoughts?


see COPY:
http://www.postgresql.org/docs/view....=sql-copy.html

Joe
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #2
Dave [Hawk-Systems] wrote:
aside from parsing the csv file through a PHP interface, what isthe easiest way
to get that csv data importted into the postgres database. thoughts?


Assuming the CSV file data is well formed, use psql and
the COPY command.

In psql, create the table. Then issue command:

copy <tablename> from 'filename' using delimiters ',';
--
P. J. "Josh" Rovero Sonalysts, Inc.
Email: ro****@sonalyst s.com www.sonalysts.com 215 Parkway North
Work: (860)326-3671 or 442-4355 Waterford CT 06385
*************** *************** *************** *************** ***********
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3
On Thu, 2003-09-25 at 11:38, Dave [Hawk-Systems] wrote:
have the table "numberchec k"
Attribute | Type | Modifier
-----------+------------+----------
svcnumber | integer | not null
svcqual | varchar(9) |
svcequip | char(1) |
svctroub | varchar(6) |
svcrate | varchar(4) |
svcclass | char(1) |
trailer | varchar(3) |
Index: numbercheck_pke y

also have a csv file
7057211380,Y,,, 3,B
7057216800,Y,,, 3,B
7057265038,Y,,, 3,B
7057370261,Y,,, 3,B
7057374613,Y,,, 3,B
7057371832,Y,,, 3,B
4166336554,Y,,, 3,B
4166336863,Y,,, 3,B
7057201148,Y,,, 3,B

aside from parsing the csv file through a PHP interface, what isthe easiest way
to get that csv data importted into the postgres database. thoughts?


No matter what you do, it's going to barf: svcnumber is a 32-bit
integer, and 7,057,211,380 is significantly out of range.

Once you change svcnumber to bigint, the COPY command will easily
suck in the csv file.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@c ox.net
Jefferson, LA USA

"Python is executable pseudocode; Perl is executable line noise"
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #4
>> aside from parsing the csv file through a PHP interface, what isthe
easiest way
to get that csv data importted into the postgres database. thoughts?


Assuming the CSV file data is well formed, use psql and
the COPY command.

In psql, create the table. Then issue command:

copy <tablename> from 'filename' using delimiters ',';


perfect solution that was overlooked.

Unfortunately processing the 143mb file which would result in a database size of
approx 500mb takes an eternity. As luck would have it we can get away with just
dropping to an exec and doing a cat/grep for any data we need... takes 2-3
seconds.

the copy command is definately a keeper as I am not looking at replacing code
elsewhere with a simpler model using that.

Thanks

Dave

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #5
On Fri, 2003-09-26 at 06:58, Dave [Hawk-Systems] wrote:
[snip]
Unfortunately processing the 143mb file which would result in a database size of
approx 500mb takes an eternity. As luck would have it we can get away with just


Something's not right, then. I loaded 30GB in about 8 hours, on
a slow system, with non-optimized IO. Did you drop the indexes
first?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@c ox.net
Jefferson, LA USA

"As the night fall does not come at once, neither does
oppression. It is in such twilight that we must all be aware of
change in the air - however slight - lest we become unwitting
victims of the darkness."
Justice William O. Douglas
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #6

We are doing some performance testing among various databases (Oracle, MySQL
and Postgres).

One of the queries is showing Postgres lagging quite a bit:

SELECT count(*)
FROM commercial_enti ty, country, user_account, address_list
LEFT JOIN state_province ON address_list.st ate_province_id =
state_province. state_province_ id
LEFT JOIN contact_info ON address_list.co ntact_info_id =
contact_info.co ntact_info_id
WHERE address_list.ad dress_type_id = 101
AND commercial_enti ty.commercial_e ntity_id =
address_list.co mmercial_entity _id
AND address_list.co untry_id = country.country _id
AND commercial_enti ty.user_account _id = user_account.us er_account_id
AND user_account.us er_role_id IN (101, 101);

I ran a "vacuum analyze" after realizing that I had loaded all the data into
the database without redoing the statistics; the query jumped from 19
seconds to 41 seconds _after_ the analyze.

I'd also like to make sure my query is performing correctly - I want all the
count of records where the commercial_enti ty matches user_account,
address_list, country, and a left-outer-join on address_list-province and
address_list-contact_info.

Finally, I read some posts on the shared_buffers; they stated that the
shared_buffers should be set to 1/4 to 1/5 of total memory available. Is
that correct? I give the MySQL/InnoDB buffers about 70% of the 2 gig on the
machine.
Here's the explain (I'm not too familiar with reading a Postgres
explain...):

----------------------------------------------------------------------------
----------------------------------------------------------------
Aggregate (cost=52951.09. .52951.09 rows=1 width=116)
-> Merge Join (cost=52941.61. .52950.83 rows=105 width=116)
Merge Cond: ("outer".countr y_id = "inner".country _id)
-> Index Scan using country_pkey on country (cost=0.00..7.5 4
rows=231 width=11)
-> Sort (cost=52941.61. .52941.88 rows=105 width=105)
Sort Key: address_list.co untry_id
-> Merge Join (cost=52729.54. .52938.07 rows=105 width=105)
Merge Cond: ("outer".commer cial_entity_id =
"inner".commerc ial_entity_id)
-> Sort (cost=8792.01.. 8792.52 rows=201 width=36)
Sort Key: commercial_enti ty.commercial_e ntity_id
-> Nested Loop (cost=0.00..878 4.31 rows=201
width=36)
-> Index Scan using usr_acc_usr_rol e_id_i
on user_account (cost=0.00..240 3.08 rows=1401 width=12)
Index Cond: (user_role_id =
101::numeric)
-> Index Scan using comm_ent_usr_ac c_id_i
on commercial_enti ty (cost=0.00..4.5 4 rows=1 width=24)
Index Cond:
(commercial_ent ity.user_accoun t_id = "outer".user_ac count_id)
-> Sort (cost=43937.53. .44173.84 rows=94526 width=69)
Sort Key: address_list.co mmercial_entity _id
-> Merge Join (cost=29019.03. .32585.73
rows=94526 width=69)
Merge Cond: ("outer".contac t_info_id =
"inner".contact _info_id)
-> Index Scan using contact_info_pk ey on
contact_info (cost=0.00..336 6.76 rows=56435 width=12)
-> Sort (cost=29019.03. .29255.34
rows=94526 width=57)
Sort Key:
address_list.co ntact_info_id
-> Merge Join
(cost=16930.18. .18354.55 rows=94526 width=57)
Merge Cond:
("outer".state_ province_id = "inner".state_p rovince_id)
-> Index Scan using
state_province_ pkey on state_province (cost=0.00..3.8 1 rows=67 width=11)
-> Sort
(cost=16930.18. .17166.50 rows=94526 width=46)
Sort Key:
address_list.st ate_province_id
-> Seq Scan on
address_list (cost=0.00..688 2.52 rows=94526 width=46)
Filter:
(address_type_i d = 101::numeric)

What's the "Sort (cost...)"?

I noticed that joining the address_list to country was slow; there was no
index on just country_id; there were composite indexes on multiple columns,
so I added one and did a vacuum analyze on the table, and got:

Aggregate (cost=54115.74. .54115.74 rows=1 width=116)
-> Merge Join (cost=54105.91. .54115.46 rows=109 width=116)
Merge Cond: ("outer".countr y_id = "inner".country _id)
-> Index Scan using country_pkey on country (cost=0.00..7.5 4
rows=231 width=11)
-> Sort (cost=54105.91. .54106.19 rows=110 width=105)
Sort Key: address_list.co untry_id
-> Merge Join (cost=53884.34. .54102.18 rows=110 width=105)
Merge Cond: ("outer".commer cial_entity_id =
"inner".commerc ial_entity_id)
-> Sort (cost=8792.01.. 8792.52 rows=201 width=36)
Sort Key: commercial_enti ty.commercial_e ntity_id
-> Nested Loop (cost=0.00..878 4.31 rows=201
width=36)
-> Index Scan using usr_acc_usr_rol e_id_i
on user_account (cost=0.00..240 3.08 rows=1401 width=12)
Index Cond: (user_role_id =
101::numeric)
-> Index Scan using comm_ent_usr_ac c_id_i
on commercial_enti ty (cost=0.00..4.5 4 rows=1 width=24)
Index Cond:
(commercial_ent ity.user_accoun t_id = "outer".user_ac count_id)
-> Sort (cost=45092.32. .45335.37 rows=97221 width=69)
Sort Key: address_list.co mmercial_entity _id
-> Merge Join (cost=29770.81. .33338.09
rows=97221 width=69)
Merge Cond: ("outer".contac t_info_id =
"inner".contact _info_id)
-> Index Scan using contact_info_pk ey on
contact_info (cost=0.00..336 6.76 rows=56435 width=12)
-> Sort (cost=29770.81. .30013.86
rows=97221 width=57)
Sort Key:
address_list.co ntact_info_id
-> Merge Join
(cost=17271.79. .18731.55 rows=97221 width=57)
Merge Cond:
("outer".state_ province_id = "inner".state_p rovince_id)
-> Index Scan using
state_province_ pkey on state_province (cost=0.00..3.8 1 rows=67 width=11)
-> Sort
(cost=17271.79. .17514.84 rows=97221 width=46)
Sort Key:
address_list.st ate_province_id
-> Seq Scan on
address_list (cost=0.00..688 2.52 rows=97221 width=46)
Filter:
(address_type_i d = 101::numeric)

No difference. Note that all the keys that are used in the joins are
numeric(10)'s, so there shouldn't be any cast-issues.

When you create a primary key on a table, is an index created (I seem to
remember a message going by stating that an index would be added).

For comparison, our production Oracle database (running on nearly identical
hardware - the Postgres machine has IDE-RAID-5 and the Oracle machine has
RAID mirroring) takes between 1 and 2 seconds.

I've got one last question, and I really hope responses don't get
sidetracked by it; I see alot of negative comments towards MySQL, many of
them stating that it's a database layer overtop of the file system. Can
someone explain why Postgres is better than MySQL 4.0.14 using InnoDB?
MySQL, on the above query, with one less index (on address_list.co untry)
takes 0.20 seconds.

David.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7
On Sat, 27 Sep 2003, David Griffiths wrote:

We are doing some performance testing among various databases (Oracle, MySQL
and Postgres).

One of the queries is showing Postgres lagging quite a bit:

SELECT count(*)
FROM commercial_enti ty, country, user_account, address_list
LEFT JOIN state_province ON address_list.st ate_province_id =
state_province. state_province_ id
LEFT JOIN contact_info ON address_list.co ntact_info_id =
contact_info.co ntact_info_id
WHERE address_list.ad dress_type_id = 101
AND commercial_enti ty.commercial_e ntity_id =
address_list.co mmercial_entity _id
AND address_list.co untry_id = country.country _id
AND commercial_enti ty.user_account _id = user_account.us er_account_id
AND user_account.us er_role_id IN (101, 101);


I guess that this question has been discussed very often - but I cannot
remember why exactly. Is there a pointer to a technical explanation? Has
it something to do with MVCC? But ist it one of MVCC's benefits that we
can make a consistent online backup without archiving redo locks (Oracle
can't, DB2 can). Is DB2 slower than Oracle in such cases (count(*)) as
well?

Workaround:
We can sometimes fake a bit to avoid such costly queries and set up a
trigger that calls a function that increases a counter in a separate
counter table. Then we are lightning-fast.

But many users compain about PostgreSQL's poor count(*) performance,
that's true and can be critical when someone wants to replace another
database product by PostgreSQL.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #8
Holger Marzen <ho****@marzen. de> writes:
But many users compain about PostgreSQL's poor count(*) performance,


I don't think that's relevant here. Some other DB's have shortcuts for
determining the total number of rows in a single table, that is they can
do "SELECT COUNT(*) FROM a_table" quickly, but David's query is messy
enough that I can't believe anyone can actually do it without forming
the join result.

What I'd ask for is EXPLAIN ANALYZE output. Usually, if a complex query
is slower than it should be, it's because the planner is picking a bad
plan. So you need to look at how its estimates diverge from reality.
But plain EXPLAIN doesn't show the reality, only the estimates ...

David, could we see EXPLAIN ANALYZE for the query, and also the table
schemas (psql \d displays would do)? Also, please take it to
pgsql-performance, it's not really on-topic for pgsql-general.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #9
I guess that this question has been discussed very often - but I cannot
remember why exactly. Is there a pointer to a technical explanation? Has
it something to do with MVCC? But ist it one of MVCC's benefits that we
can make a consistent online backup without archiving redo locks (Oracle
can't, DB2 can). Is DB2 slower than Oracle in such cases (count(*)) as
well?

Workaround:
We can sometimes fake a bit to avoid such costly queries and set up a
trigger that calls a function that increases a counter in a separate
counter table. Then we are lightning-fast.

But many users compain about PostgreSQL's poor count(*) performance,
that's true and can be critical when someone wants to replace another
database product by PostgreSQL.


This is but one of many tests we're doing. The count(*) performance is not
the deciding factor. This query was pulled from our production system, and
I've
extracted the exact tables and data from the production system to test.

MySQL with MyISAM does in fact cheat on the count(*). InnoDB does not,
however. The "explain" indicates that it's doing the work, and analyzing the
tables dropped the cost of the query from .35 seconds to .20 seconds.

Here's the same query, but selecting data (to test the databases ability to
find a single row quicky):

SELECT current_timesta mp;
SELECT company_name, address_1, address_2, address_3, city,
address_list.st ate_province_id , state_province_ short_desc, country_desc,
zip_code, address_list.co untry_id,
contact_info.em ail, commercial_enti ty.user_account _id, phone_num_1,
phone_num_fax, website, boats_website
FROM commercial_enti ty, country, user_account,
address_list LEFT JOIN state_province ON address_list.st ate_province_id =
state_province. state_province_ id
LEFT JOIN contact_info ON address_list.co ntact_info_id =
contact_info.co ntact_info_id
WHERE address_list.ad dress_type_id = 101
AND commercial_enti ty.commercial_e ntity_id=225528
AND commercial_enti ty.commercial_e ntity_id =
address_list.co mmercial_entity _id
AND address_list.co untry_id = country.country _id
AND commercial_enti ty.user_account _id = user_account.us er_account_id
AND user_account.us er_role_id IN (101, 101);
SELECT current_timesta mp;

Postgres takes about 33 seconds to get the row back.

Here's the "EXPLAIN":

Nested Loop (cost=0.00..645 70.33 rows=1 width=385)
-> Nested Loop (cost=0.00..645 67.30 rows=1 width=361)
-> Nested Loop (cost=0.00..645 63.97 rows=1 width=349)
Join Filter: ("outer".commer cial_entity_id =
"inner".commerc ial_entity_id)
-> Index Scan using commercial_enti ty_pkey on
commercial_enti ty (cost=0.00..5.0 5 rows=1 width=94)
Index Cond: (commercial_ent ity_id = 225528::numeric )
-> Materialize (cost=63343.66. .63343.66 rows=97221
width=255)
-> Merge Join (cost=0.00..633 43.66 rows=97221
width=255)
Merge Cond: ("outer".contac t_info_id =
"inner".contact _info_id)
-> Nested Loop (cost=0.00..830 457.52 rows=97221
width=222)
Join Filter: ("outer".state_ province_id =
"inner".state_p rovince_id)
-> Index Scan using addr_list_ci_id _i on
address_list (cost=0.00..586 676.65 rows=97221 width=205)
Filter: (address_type_i d =
101::numeric)
-> Seq Scan on state_province
(cost=0.00..1.6 7 rows=67 width=17)
-> Index Scan using contact_info_pk ey on
contact_info (cost=0.00..336 6.76 rows=56435 width=33)
-> Index Scan using user_account_pk ey on user_account
(cost=0.00..3.3 2 rows=1 width=12)
Index Cond: ("outer".user_a ccount_id =
user_account.us er_account_id)
Filter: (user_role_id = 101::numeric)
-> Index Scan using country_pkey on country (cost=0.00..3.0 1 rows=1
width=24)
Index Cond: ("outer".countr y_id = country.country _id)
(20 rows)

David.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
8107
by: alex | last post by:
Hi, I am looking for a way to populate an HTML table from an external local text file which looks like this: DATE/TIME LAT. LON. DEPTH. ML. -------------------- ---------- ---------- ------- ------- 21/03/2005-04:06:03 XX,XX XX,XX 171 3,42 21/03/2005-12:23:53 XX,XX ...
7
6213
by: Sharon | last post by:
I have successfully loaded a DataSet object with a XML schema (XSD). Now I wish to populate the tables that was created in the DataSet. I have an XML file/string that contain all the needed data in the same format as the XSD (the XML file/string was created using this same schema). The XML file/string may contain data for a single table or...
3
2027
by: Chifo | last post by:
hello. i have a problem with a populate html table with data from table here it's the problem two querys retrieving data from table, one of querys show me a colletion of data from 6:00 am to 15:30 pm timestampz, i put this on a part of html table. when time on my if condition es great than 15:31, showme data from 15:31 to 23:59 timestamp...
5
5893
by: Rich | last post by:
Hello, I have a search application to search data in tables in a database (3 sql server tables). I populate 2 comboboxes with with data from each table. One combobox will contain unique CompanyID's. The second combobox will contain unique memberID's. Each of the tables that I have to search contain a CompanyID and a memberID field, and...
1
1581
by: elena | last post by:
Hi, All I have flat text file with fixed record lenght w/o record delimiter, i need read this file and populate Access Database table, how can i achive this? The Record count is 1465321 records: I have to keep connection open and just each time i parse the record i have to add it to a table, Any suggestions of a better way will be appreciated...
1
2598
by: Bilal Bhutta | last post by:
Hello, I'm stuck on this problem for quite some time and hope somebody would be able to guide me. Basically, I need to populate a large number of "template" XML files which have all elements/attributes etc. defined but the values in these elements/attributes might be blank or incorrect, as two examples below: <?xml version="1.0"...
1
6493
by: vj | last post by:
How i can populate all fileds dynamically in jsp page based on contents found in xml file? I have written jsp servlets and java class file. i transferred automatic data from jsp to servlet then to java class which creates a xml file based on values entered in dynamic jsp page. Now i want to read all those values entered to xml in my other jsp...
0
4042
by: vijendra | last post by:
How i can populate all fileds dynamically in jsp page based on contents found in xml file?I have written jsp servlets and java class file. i transferred automatic data from jsp to servlet then to java class which creates a xml file based on values entered in dynamic jsp page. Now i want to read all those values entered to xml in my other jsp...
5
17672
by: joshua.nicholes | last post by:
I have an access database that consists of two tables.A data collection table and a species list table. The data collection table has about 1500 records in it and the species list has about 600. The species list has 7 fields the first is a four digit unique identifier (species) it is set as the primary key. I have created a relationship to...
0
1336
KodeKrazy
by: KodeKrazy | last post by:
I have a table with the following columns, ID (A unique value) Style#, Color, Size (There a more columns than that, but those are the ones I will need for this exercise.) There is a row for each Style#, in every Color, and every Size. (e.g.: Rec1) Style A, White, Small Rec2) Style A, White, Med Rec3) Style A, White, Large Rec4) Style A,...
0
7674
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8182
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
8039
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5559
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5271
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3708
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3708
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2170
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
1008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.