473,396 Members | 2,017 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

UNICODE problem on 7.4 with COPY

When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicode file format.

To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and try to import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via the FreeBSD box (to test samba didn't munge it).

FreeBSD 5.1-RELEASE #0
PGSql 7.4 (dl'd and compiled fri 28th Nov 2003)
Dual 800MHz P3's

I create a database with encoding = UNICODE.
I create a table

CREATE TABLE testunicode
(
anum int4
) WITHOUT OIDS;

I then use psql to import the file, which is a single column of integers.

copy testunicode from '/home/toby/itxt/anum.txt';
ERROR: invalid input syntax for integer: "ÿþ1"
CONTEXT: COPY testunicode, line 1, column anum: "ÿþ1"
When viewing the file as hex I see:
FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00
ÿ þ 1 . 1 . 2 . 7 . 9 . 0 . . . . .

According to http://www.crispen.org/src/archive/0013.html

FF FE UTF-16/UCS-2, big endian

So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.

Btw, the actual stuff I want to import is larger and more complex, this little table is to demonstrate the problem.

Help would be muchly appreciated.
Toby

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1
2 3956
Toby Doig wrote:
....
So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.

try converting the file to utf-8.

iconv -t utf-8 -f utf-16 < unicode-file.txt > utf-8-file.txt


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #2
Toby Doig schrieb:
When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicode file format.

To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and try to import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via the FreeBSD box (to test samba didn't munge it).

FreeBSD 5.1-RELEASE #0
PGSql 7.4 (dl'd and compiled fri 28th Nov 2003)
Dual 800MHz P3's

I create a database with encoding = UNICODE.
I create a table

CREATE TABLE testunicode
(
anum int4
) WITHOUT OIDS;

I then use psql to import the file, which is a single column of integers.

copy testunicode from '/home/toby/itxt/anum.txt';
ERROR: invalid input syntax for integer: "ÿþ1"
CONTEXT: COPY testunicode, line 1, column anum: "ÿþ1"
When viewing the file as hex I see:
FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00
ÿ þ 1 . 1 . 2 . 7 . 9 . 0 . . . . .

According to http://www.crispen.org/src/archive/0013.html

FF FE UTF-16/UCS-2, big endian
See also
http://www.unicode.org/unicode/faq/utf_bom.html#22


So, what is going wrong? Why can't I import this very simple unicode file?
I've searched the archives and google, but to no avail.


Postgresql only accepts a stream of chars in the given client
encoding. This defaults to "utf-8" when you set up your
db as "unicode". psql does not read the BOM information
in files since it does not operate on files but on streams.
The same I fear is true for postgresqls COPY command.

I think a patch made by you is appreciated :-)

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
by: Hallvard B Furuseth | last post by:
Has someone got a Python routine or module which converts Unicode strings to lowercase (or uppercase)? What I actually need to do is to compare a number of strings in a case-insensitive manner,...
1
by: Jaime Montes | last post by:
I have found that adding in the start of the file the character '-1' and '-2' I can read the file as a Unicode, and to write any character I have to write pairs of character so for 'a' I write '0'...
1
by: lkrubner | last post by:
>Alan J. Flavell Oct 7 2004, 1:44 pm show options >>On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote: >> at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla.ac.uk> said: >> >I think you...
1
by: André | last post by:
I'm attempting to override a wxHtmlWindow method in order to pre-process the file before displaying it. I'm using a unicode version of wxPython. I don't think my problem are wxPython-specific, but...
1
by: jrs_14618 | last post by:
Hello All, This post is essentially a reply a previous post/thread here on this mailing.database.myodbc group titled: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode I was...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
8
by: Richard Schulman | last post by:
The following program fragment works correctly with an ascii input file. But the file I actually want to process is Unicode (utf-16 encoding). The file must be Unicode rather than ASCII or...
1
by: anonymous | last post by:
1 Objective to write little programs to help me learn German. See code after numbered comments. //Thanks in advance for any direction or suggestions. tk 2 Want keyboard answer input, for...
2
by: =?Utf-8?B?UmljaA==?= | last post by:
I just got a new workstation - which I had to reload VS2005 (myself). On my old workstation - the VS2005 supported unicode chars in the text of controls. On the new workstation - I just get the...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.