By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,712 Members | 1,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,712 IT Pros & Developers. It's quick & easy.

importing data

P: n/a

Hello, perhaps you may have some advice.

The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.

In addition to this change, to insert a backslash, a double-bacckslash must be
emitted.

After we discovered this we wrote a short filter program to pipe input data
through to encode high-ascii characters correctly as well as escape slashes.
This program can be seen here:
http://www.neverlight.com/~mental/pginput-filter.c
What we're wondering about is, is there perhaps a better or easier way to
handle data like this? Granted we didnt spend a ton of time on google, but
we did search the docs a little before settling on a filter for the sake
of expediency.

--
Mental (Me****@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Mental <Me****@NeverLight.com> writes:
The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.


While it's certainly possible to do that, I don't see anyplace in the
current documentation that recommends it. What did you conclude that
from?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #2

P: n/a
On Sat, Jan 17, 2004 at 10:27:09PM -0500, Tom Lane wrote:
Mental <Me****@NeverLight.com> writes:
The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.


While it's certainly possible to do that, I don't see anyplace in the
current documentation that recommends it. What did you conclude that
from?


http://www.postgresql.org/docs/7.3/static/sql-copy.html indicated:

\digits , Backslash followed by one to three octal digits specifies the
character with that numeric code

We were having trouble with characters that were high ascii encoded.
Perhaps it was how we were connecting to do the import, but we found that
escaping them as so helped. After filtering, data is copied into the
tables like so:

ENCODING='SQL_ASCII'
$FILTER_DATA
psql -U $USER -c "SET CLIENT_ENCODING TO '$ENCODING'; copy $i from
'$DATA_DIR/$i-noheader.tab' NULL as '' " $DB

--
Mental (Me****@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #3

P: n/a
Mental <Me****@NeverLight.com> writes:
We were having trouble with characters that were high ascii encoded.


You probably need to pay attention to your client_encoding setting,
and perhaps also reconsider what database encoding you are using.
If either of these is not SQL_ASCII then it had better be an accurate
description of the character set you are using, else you're in for a
world of hurt :-(. Also, setting client_encoding to SQL_ASCII when the
database encoding is something else does not get you out of having to
respect the encoding setting --- it just prevents any automatic
conversion from happening during I/O. The data you ship had better be
in the database encoding in this case.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.