473,748 Members | 2,625 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Date format for bulk copy

I have a large amount of data that I copy in and out of Sybase very
often. Now I also want to copy this data in and out of postgres. I
have an existing script that creates the entire database(s) from scratch
in Sybase and then uses the Sybase bulk copy tool "bcp" to copy the data
in.

I already did a brute force port of this script to postgres once, but
I'm trying to do it more elegantly now that I know what issues I'm going
to run into. One of them is date formats in the bcp files. Sybase bulk
copies the date fields out in this format:

Mar 4 1973 10:28:00:000AM

Postgresql's COPY (or psql \copy) doesn't like that format. In
particular, it doesn't like the millisecond field at the end. If I
understand the docs correctly, postgres wants the millisecond field to
be proceeded by a decimal point instead of a colon. In my brute force
port, I just bulk copied the date fields into temporary tables and then
did a to_timestamp(fi eld, 'Mon DD YYYY HH:MI:SS:MSAM') .

That worked, but required a lot of additional logic in my script to
handle the temp tables and conversions. I'd hate to have to keep all
that overhead in there to basically handle a conversion of a colon to a
decimal point.

So my questions are these:

0) I thought of creating a user-defined data type for this, but it seems
like overkill, especially if I'd have to provide all kinds of helper
functions for things like date incrementation or comparison or
whatever. Am I off track?
1) Are there any tools out there that allow for specifying the field
format of a COPY?
2) If not, is it reasonable or unreasonable to modify the postgresql
source (I'm running Beta 3) to handle a colon as a millisecond
delimiter? (If so, where do I look?)
3) If I did create such a patch, would the postgresql accept it into the
tree?

I have a similarish problem with another field type. In Sybase it's a
binary format. In postgres it is a binary format (bytea). But Sybase
bcps the data out in ASCII. Sybase recognizes that when it is a binary
field and auto-converts the ASCII back to binary. Postgres doesn't.
Again, I created a temporary table and did a decode(field, 'hex') to the
real table. It seems reasonable to expect to be able to bulk copy
ASCII-encoded binary values into binary fields. Probably this field is
best described by a user-defined type....?

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #1
9 10035

David Rysdam <dr*****@ll.mit .edu> writes:
In my brute force port, I just bulk copied the date
fields into temporary tables and then did a to_timestamp(fi eld, 'Mon DD YYYY
HH:MI:SS:MSAM') . Again, I created a temporary table and did a decode(field, 'hex') to the
real table.


This is the standard approach. You're rather lucky these are the only
data representation changes you've had to do so far. I fear you'll run into
more and more complex changes over time and trying to avoid the temporary
table will get harder and harder.

If it were me I would consider processing the files in perl. It should be
pretty easy to do both of these modifications very quickly.
If you really want to go with a custom C code then you might be able to just
grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
separate module and create new functions with modified names. Load it with
CREATE FUNCTION byteain ... AS 'my_bytea_funcs .so' 'my_byteain';

Or maybe create the function as my_byteain in postgres and then update the
catalog entries somehow. I'm not sure how to do that but it shouldn't be too
hard. And it might make it easier to do the substitution for the data load and
then undo the change afterwards.

Doing the same for timmestamp is a bit trickier but you could copy
ParseDateTime from datetime.c as a static function for your module.

Be careful though, test this out thoroughly on a test database. I'm not sure
of all the impacts of altering the in/out functions for data types. I expect
it would break pg_dump, for example. And I would worry about the statistics
tables too.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2
On Wed, Oct 13, 2004 at 10:06:58AM -0400, David Rysdam wrote:
Sybase bulk copies the date fields out in this format:

Mar 4 1973 10:28:00:000AM

Postgresql's COPY (or psql \copy) doesn't like that format.
You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL. This is
usually a trivial task for Perl, awk, sed, or the like.
I have a similarish problem with another field type. In Sybase it's a
binary format. In postgres it is a binary format (bytea). But Sybase
bcps the data out in ASCII. Sybase recognizes that when it is a binary
field and auto-converts the ASCII back to binary. Postgres doesn't.
Again, I created a temporary table and did a decode(field, 'hex') to the
real table.


Sounds like Sybase is dumping in hex, whereas PostgreSQL expects
octal. If you can't change the dump format, then again, filtering
the data through a script might work.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #3
Michael Fuhr wrote:
On Wed, Oct 13, 2004 at 10:06:58AM -0400, David Rysdam wrote:
Sybase bulk copies the date fields out in this format:

Mar 4 1973 10:28:00:000AM

Postgresql' s COPY (or psql \copy) doesn't like that format.


You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL. This is
usually a trivial task for Perl, awk, sed, or the like.

Right, I *can* do this. But then I have to build knowledge into that
script so it can find each of these date fields (there's like 20 of them
across 10 different files) and then update that knowledge each time it
changes. I'm still leaning towards just making postgres accept at ':'
delimiter for milliseconds. Also, how much would a secondary script
slow down the bulk copy, if any?
I have a similarish problem with another field type. In Sybase it's a
binary format. In postgres it is a binary format (bytea). But Sybase
bcps the data out in ASCII. Sybase recognizes that when it is a binary
field and auto-converts the ASCII back to binary. Postgres doesn't.
Again, I created a temporary table and did a decode(field, 'hex') to the
real table.


Sounds like Sybase is dumping in hex, whereas PostgreSQL expects
octal. If you can't change the dump format, then again, filtering
the data through a script might work.

Oh, so I can load binary data into PG if it's ASCII-encoded octal? Why
not the user-defined type with associated user-defined input function?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #4
Greg Stark wrote:
David Rysdam <dr*****@ll.mit .edu> writes:
In my brute force port, I just bulk copied the date
fields into temporary tables and then did a to_timestamp(fi eld, 'Mon DD YYYY
HH:MI:SS:MSAM ').

Again, I created a temporary table and did a decode(field, 'hex') to the
real table.


This is the standard approach. You're rather lucky these are the only
data representation changes you've had to do so far. I fear you'll run into
more and more complex changes over time and trying to avoid the temporary
table will get harder and harder.

No, I think I'm OK there. These are programmaticall y-generated values
and I've already been through them all once. Just the millisecond issue
and the hex binary issue AFAIK.
If it were me I would consider processing the files in perl. It should be
pretty easy to do both of these modifications very quickly.
Very quick and easy to do one time. A little trickier to handle in an
elegant, maintainable way for the dozens of data reloads I do every
month for GBs of data onto two different server types.
If you really want to go with a custom C code then you might be able to just
grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
separate module and create new functions with modified names. Load it with
CREATE FUNCTION byteain ... AS 'my_bytea_funcs .so' 'my_byteain';

Or maybe create the function as my_byteain in postgres and then update the
catalog entries somehow. I'm not sure how to do that but it shouldn't be too
hard. And it might make it easier to do the substitution for the data load and
then undo the change afterwards.
Why not create a type and then define the load function to be the
equivalent of "decode('hex')" ?
Doing the same for timmestamp is a bit trickier but you could copy
ParseDateTim e from datetime.c as a static function for your module.

Be careful though, test this out thoroughly on a test database. I'm not sure
of all the impacts of altering the in/out functions for data types. I expect
it would break pg_dump, for example. And I would worry about the statistics
tables too.

This is kind of a hybrid of my suggestions and the problems are a hybrid
as well. :)

1) Just change the timestamp type so that it allows a ':' delimiter for
milliseconds. Potential problems: Other parts of the code won't expect
it. People don't want that.

2) Create a new type. Potential problem: Things like date ranges
probably wouldn't work anymore, since the server wouldn't know it's a
date now.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #5
On Wed, Oct 13, 2004 at 01:32:01PM -0400, David Rysdam wrote:
Michael Fuhr wrote:
You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL. This is
usually a trivial task for Perl, awk, sed, or the like.
Right, I *can* do this. But then I have to build knowledge into that
script so it can find each of these date fields (there's like 20 of them
across 10 different files) and then update that knowledge each time it
changes.


In your case that's a reasonable argument against filtering the
data with a script. Using a regular expression in the script might
reduce or eliminate the need for some of the logic, but then you'd
run the risk of reformatting data that shouldn't have been touched.
I'm still leaning towards just making postgres accept at ':'
delimiter for milliseconds.
Based on your requirements, that might indeed be a better solution.
I'd probably choose to extend PostgreSQL rather than hack what
already exists, though. Doing the latter might break something
else and you have to remember to add the hack every time you upgrade
the server software. That can cause headaches for whoever inherits
the system from you unless it's well-documented.
Also, how much would a secondary script slow down the bulk copy,
if any?
Probably some, but perhaps not enough to be significant. I'd expect
the database to be the bottleneck, but I'd have to run tests to say
for certain.
Sounds like Sybase is dumping in hex, whereas PostgreSQL expects
octal. If you can't change the dump format, then again, filtering
the data through a script might work.

Oh, so I can load binary data into PG if it's ASCII-encoded octal?


Yes -- see the "Binary Data Types" documentation:

http://www.postgresql.org/docs/7.4/s...pe-binary.html
Why not the user-defined type with associated user-defined input function?


If filtering the data is awkward, then that might be a better way
to go.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #6
Right, I *can* do this. But then I have to build knowledge into that
script so it can find each of these date fields (there's like 20 of them
across 10 different files) and then update that knowledge each time it
changes.


In your case that's a reasonable argument against filtering the
data with a script. Using a regular expression in the script might
reduce or eliminate the need for some of the logic, but then you'd
run the risk of reformatting data that shouldn't have been touched.


Yes, but :

You can have your script make a query in the database to fetch the data
types of the fields and then know which ones are to be transformed and
how. The script would take as arguments a dump file and a
database,schema .table, would read the file and pipe the transformed data
into a psql with a COPY FROM stdin command... could save you a lot of work
no ?

A bonus is that your script can complain if it detects incompatibiliti es,
and be more fool-proof. Plu

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #7
Michael Fuhr wrote:
On Wed, Oct 13, 2004 at 01:32:01PM -0400, David Rysdam wrote:

Michael Fuhr wrote:

You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL. This is
usually a trivial task for Perl, awk, sed, or the like.

Right, I *can* do this. But then I have to build knowledge into that
script so it can find each of these date fields (there's like 20 of them
across 10 different files) and then update that knowledge each time it
changes.


In your case that's a reasonable argument against filtering the
data with a script. Using a regular expression in the script might
reduce or eliminate the need for some of the logic, but then you'd
run the risk of reformatting data that shouldn't have been touched.
I'm still leaning towards just making postgres accept at ':'
delimiter for milliseconds.


Based on your requirements, that might indeed be a better solution.
I'd probably choose to extend PostgreSQL rather than hack what
already exists, though. Doing the latter might break something
else and you have to remember to add the hack every time you upgrade
the server software. That can cause headaches for whoever inherits
the system from you unless it's well-documented.

By "extend PostgreSQL" do you mean create a custom input_function for
timestamp? Are there docs that give hints for replacing the input
function of an existing type? Someone else replied similarly, but I'm
afraid I'm not familiar enough with PG to decipher it all.
Why not the user-defined type with associated user-defined input function?


If filtering the data is awkward, then that might be a better way
to go.

I think I will, when I get to that point.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #8
On Wed, Oct 13, 2004 at 08:36:50PM +0200, Pierre-Fr?d?ric Caillaud wrote:

You can have your script make a query in the database to fetch the
data types of the fields and then know which ones are to be transformed
and how. The script would take as arguments a dump file and a
database,schema .table, would read the file and pipe the transformed data
into a psql with a COPY FROM stdin command... could save you a lot of work
no ?


That's a good idea, and it would mean not having to hack PostgreSQL
or create custom types or custom input functions for standard types.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #9
On Wed, Oct 13, 2004 at 03:37:14PM -0400, David Rysdam wrote:
Michael Fuhr wrote:

I'd probably choose to extend PostgreSQL rather than hack what
already exists, though.


By "extend PostgreSQL" do you mean create a custom input_function for
timestamp? Are there docs that give hints for replacing the input
function of an existing type? Someone else replied similarly, but I'm
afraid I'm not familiar enough with PG to decipher it all.


I'd been thinking about your idea of creating a custom type or
providing a custom input function for a standard type. I've
done simple tests where I updated a type's typinput field in
pg_catalog.pg_t ype and it worked, but I don't know what problems
that might cause. One thing that comes to mind is that I don't
know if pg_dump/pg_dumpall would recognize such a change for a
standard type. The more I think about it, the less I'm convinced
that it's a good idea.

I like Pierre-Frédéric Caillaud's suggestion to have a
filter script query the database for field positions and types
and reformat the fields as appropriate. That would allow the
filter to have knowledge of which fields should be reformatted
without having to program that logic explicitly.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2368
by: simon | last post by:
I have virtual directory where I run test.asp page with only this code inside: <%Response.Write now()%> As a result, if I go into the IE and browse for the test.asp page, I get : 1/22/2004 14:18:04 If I copy this page on some other(existing) virtual directory, I get as a result 22.1.2004 14:19:06
0
3018
by: Karam Chand | last post by:
Hello Is there anyway to change the default date format (possibly in a .ini or .cfg file) from YYYY-MM-DD to another format ? (ie, mm/dd/yyyy or any user-defined format). The useage that I'm looking at is to import a CSV file into a table (using Load data infile .... command) where dates are stored in mm/dd/yyyy format and can
15
43012
by: Simon Brooke | last post by:
I'm investigating a bug a customer has reported in our database abstraction layer, and it's making me very unhappy. Brief summary: I have a database abstraction layer which is intended to mediate between webapps and arbitrary database backends using JDBC. I am very unwilling indeed to write special-case code for particular databases. Our code has worked satisfactorily with many databases, including many instances MS SQLServer 2000...
1
6379
by: Patrick Dunnigan | last post by:
Hi, I am attempting a bulk copy from a c program into SQL Server 2000 using DBLib in freeTDS 0.63 RC11 (gcc 3.4.3, RH 9). I am getting an error message that I cannot find any documentation on. The server is sending back the following: "Received invalid row length 2 from bcp client. Minimum row size is 4." I know the row is longer 2 bytes (see below). Once this happened I created a
9
6372
by: insomniux | last post by:
Hi, I am having a problem with formatting the default value of a date field. It has been discussed earlier in many topics, but still I cannot solve the problem. What's happening: I have various forms which are based on an ODBC-linked tables. In one of the forms, I have a control which shows the date of a date field in my database (storage type=date). The default value for the control is set to '=Date()', the format property is set to...
20
35610
by: andreas | last post by:
When I copy a vb.net project using date formats from one PC with a windows date format f.e. dd/mm/yyyy to another PC having a format yy/mm/dd then I get errors. How can I change for a while in the project the date format in vb.code ( not in Windows) and how can I find out which date format the PC Windows is using. Thanks for any response
19
2847
by: Khafancoder | last post by:
Hi guys, in my db i have these three tables 1.Stores 2.Products 3.Parts their structure is something like : Stores ----Products ----Parts
10
5820
by: ARC | last post by:
Hello all, General question for back-end database that has numerous date fields where the database will be used in regions that put the month first, and regions that do not. Should I save a date format in the table design, such as: mm/dd/yyyy? What I've done for years is to store the date format in date fields, then on the forms, based on their region, I would set the date formats on form_load
4
2554
by: buntyp | last post by:
Hello Friends I am new to this programing world so please help me I want to creat an application that CAN COPY FILES OF PARTICULAR MONTH AS PER ITS LAST WRITEN DATE (LAST MODIFIYED DATE) FOR EXAMPLE IF THERE IS A FOLDER IT CONTAIN LOTS OF FILES OF DIFRENNT MONTH FORM THAT I WANT TO COPY ONLY 01/04/09 TO 30/04/09 FILES TO RESPECTED FOLDER . IN SHORT I WANT TO CREAT APPLICATION THAT CAN COPY DATE WISE BULK FILE TO OTHER FOLDERS
0
8995
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8832
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9381
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9254
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6799
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4608
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4879
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3316
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2791
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.