By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,811 Members | 1,978 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,811 IT Pros & Developers. It's quick & easy.

UNICODE problem on 7.4 with COPY

P: n/a
When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicode file format.

To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and try to import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via the FreeBSD box (to test samba didn't munge it).

FreeBSD 5.1-RELEASE #0
PGSql 7.4 (dl'd and compiled fri 28th Nov 2003)
Dual 800MHz P3's

I create a database with encoding = UNICODE.
I create a table

CREATE TABLE testunicode
(
anum int4
) WITHOUT OIDS;

I then use psql to import the file, which is a single column of integers.

copy testunicode from '/home/toby/itxt/anum.txt';
ERROR: invalid input syntax for integer: "1"
CONTEXT: COPY testunicode, line 1, column anum: "1"
When viewing the file as hex I see:
FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00
1 . 1 . 2 . 7 . 9 . 0 . . . . .

According to http://www.crispen.org/src/archive/0013.html

FF FE UTF-16/UCS-2, big endian

So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.

Btw, the actual stuff I want to import is larger and more complex, this little table is to demonstrate the problem.

Help would be muchly appreciated.
Toby

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Toby Doig wrote:
....
So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.

try converting the file to utf-8.

iconv -t utf-8 -f utf-16 < unicode-file.txt > utf-8-file.txt


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #2

P: n/a
Toby Doig schrieb:
When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicode file format.

To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and try to import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via the FreeBSD box (to test samba didn't munge it).

FreeBSD 5.1-RELEASE #0
PGSql 7.4 (dl'd and compiled fri 28th Nov 2003)
Dual 800MHz P3's

I create a database with encoding = UNICODE.
I create a table

CREATE TABLE testunicode
(
anum int4
) WITHOUT OIDS;

I then use psql to import the file, which is a single column of integers.

copy testunicode from '/home/toby/itxt/anum.txt';
ERROR: invalid input syntax for integer: "1"
CONTEXT: COPY testunicode, line 1, column anum: "1"
When viewing the file as hex I see:
FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00
1 . 1 . 2 . 7 . 9 . 0 . . . . .

According to http://www.crispen.org/src/archive/0013.html

FF FE UTF-16/UCS-2, big endian
See also
http://www.unicode.org/unicode/faq/utf_bom.html#22


So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.


Postgresql only accepts a stream of chars in the given client
encoding. This defaults to "utf-8" when you set up your
db as "unicode". psql does not read the BOM information
in files since it does not operate on files but on streams.
The same I fear is true for postgresqls COPY command.

I think a patch made by you is appreciated :-)

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.