473,387 Members | 3,810 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

importing data


Hello, perhaps you may have some advice.

The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.

In addition to this change, to insert a backslash, a double-bacckslash must be
emitted.

After we discovered this we wrote a short filter program to pipe input data
through to encode high-ascii characters correctly as well as escape slashes.
This program can be seen here:
http://www.neverlight.com/~mental/pginput-filter.c
What we're wondering about is, is there perhaps a better or easier way to
handle data like this? Granted we didnt spend a ton of time on google, but
we did search the docs a little before settling on a filter for the sake
of expediency.

--
Mental (Me****@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #1
3 1447
Mental <Me****@NeverLight.com> writes:
The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.


While it's certainly possible to do that, I don't see anyplace in the
current documentation that recommends it. What did you conclude that
from?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #2
On Sat, Jan 17, 2004 at 10:27:09PM -0500, Tom Lane wrote:
Mental <Me****@NeverLight.com> writes:
The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.


While it's certainly possible to do that, I don't see anyplace in the
current documentation that recommends it. What did you conclude that
from?


http://www.postgresql.org/docs/7.3/static/sql-copy.html indicated:

\digits , Backslash followed by one to three octal digits specifies the
character with that numeric code

We were having trouble with characters that were high ascii encoded.
Perhaps it was how we were connecting to do the import, but we found that
escaping them as so helped. After filtering, data is copied into the
tables like so:

ENCODING='SQL_ASCII'
$FILTER_DATA
psql -U $USER -c "SET CLIENT_ENCODING TO '$ENCODING'; copy $i from
'$DATA_DIR/$i-noheader.tab' NULL as '' " $DB

--
Mental (Me****@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #3
Mental <Me****@NeverLight.com> writes:
We were having trouble with characters that were high ascii encoded.


You probably need to pay attention to your client_encoding setting,
and perhaps also reconsider what database encoding you are using.
If either of these is not SQL_ASCII then it had better be an accurate
description of the character set you are using, else you're in for a
world of hurt :-(. Also, setting client_encoding to SQL_ASCII when the
database encoding is something else does not get you out of having to
respect the encoding setting --- it just prevents any automatic
conversion from happening during I/O. The data you ship had better be
in the database encoding in this case.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: steve | last post by:
Hi, I have researched but have not found a good solution to this problem. I am importing large amounts of data (over 50 Meg) into a new mysql db that I set up. I use >mysql dbname <...
11
by: Grim Reaper | last post by:
I am importing a .csv file into Access that has 37 fields. My problem is that sometimes the last field only has data at the end of the column (it looks like when you import a file into Access, for...
1
by: sparks | last post by:
I have never done this and wanted to ask people who have what is the best way. One person said import it to excel, then import it into access table. but since this will be done a lot, I am...
7
by: Darren | last post by:
I have been attempting to create a reservation planning form in excel that imports Data from an Access database and inserts that information automaticly into the correct spreed sheet and the...
2
by: nutthatch | last post by:
I want to be able to import an Excel spreadsheet into Access 2K using the macro command Transferspreadsheet. However, the file I am importing (over which I have no control) contains some records...
7
by: Timothy Shih | last post by:
Hi, I am trying to figure out how to use unmanaged code using P/Invoke. I wrote a simple function which takes in 2 buffers (one a byte buffer, one a char buffer) and copies the contents of the byte...
0
by: Mike Collins | last post by:
I am importing a XML file and have not been having the best of luck in doing this, but I do have the following solution below. I will not be importing more than 2000 records at a time, but will be...
2
by: Mike Collins | last post by:
I am importing a XML file and have not been having the best of luck in doing this, but I do have the following solution below. I will not be importing more than 2000 records at a time, but will be...
5
by: hharriel | last post by:
Hi, I am hoping someone can help me with an issue I am having with excel and ms access. I have collected data (which are in individual excel files) from 49 different school districts. All...
12
by: JMO | last post by:
I can import a csv file with no problem. I can also add columns to the datagrid upon import. I want to be able to start importing at the 3rd row. This will pick up the headers necessary for the...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.