By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,265 Members | 1,333 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,265 IT Pros & Developers. It's quick & easy.

How to load a Unicode file into the database in the same order as the file order

P: n/a

The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a
no-email wrote:
The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary


Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries
perform better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #2

P: n/a

"David Gugick" <da***********@imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries perform
better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.


I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would be
happy with simply adding a line number to each line of the Unicode text
file, and then loading the file with line number determining the order, but
I want to avoid programming in another language if possible. Eventually the
loaded text would be converted to proper relational tables. This doesn't
have to do with improving performance. Does this help?

Thanks.
Jul 23 '05 #3

P: n/a
no-email wrote:
"David Gugick" <da***********@imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need
them sorted in some way after the import, you can create a clustered
index on the table to get the rows ordered in a way that helps your
queries perform better.

In general, I think BCP processes the rows in the file sequentially.
But again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.


I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would
be happy with simply adding a line number to each line of the Unicode
text file, and then loading the file with line number determining the
order, but I want to avoid programming in another language if
possible. Eventually the loaded text would be converted to proper
relational tables. This doesn't have to do with improving
performance. Does this help?
Thanks.


Yes. It sounds like you have rows in a specific order that will need to
be processed once on SQL Server. You want to preserve the order of rows
in the file so the rows can be processed in the same order once on SQL
Server.

In order to do this in any relational database, you need a sort key.
There is never a guarantee that a query you run without an ORDER BY will
return rows in the same order in any consistent way.

My understanding is that BCP feeds the rows in the order they appear in
a file. I can't imagine any reason it would or could do it differently.
In that case, you want to insert the data into a table that contains an
IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #4

P: n/a
Do you know what order the source file is sorted in? If so, and if the
sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that
information from the other data.

If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl

not sure if that will work with unicode data though.
--
David Portas
SQL Server MVP
--

Jul 23 '05 #5

P: n/a

"David Portas" <RE****************************@acm.org> wrote in
message:
Do you know what order the source file is sorted in? If so, and if the sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that information from the other data.
Unfortunately the data file consists of simple lines of text with no
other way to extract potential column information before loading it
into the database.
If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl
not sure if that will work with unicode data though.


Good suggestion but it does fail with Unicode. Thanks anyway.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #6

P: n/a

"David Gugick" <da***********@imceda.com> wrote:
My understanding is that BCP feeds the rows in the order they appear in a file. I can't imagine any reason it would or could do it differently. In that case, you want to insert the data into a table that contains an IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.


In general BCP loads the data in the same order as the file but not
always. The ordering sometimes reverses for thousands of rows, or
skips certain rows, but you need to check it carefully to find the
misordering. You can create a table with an identity column and load
the data, but again if the rows are not loaded in the same sequence as
the file this won't matter. You will end up with an ordered table that
is unfortunately not in the same order as the original file.

To be honest I have tried all these suggestions in the past. My
typical solution would be to open the original file in Excel, add a
rownumber column and then save the resulting file as a Unicode file.
This works up until around a maximum of 63,000 rows. You can break a
file into 63,000 row subfiles, but this would be too tedious if you
have row counts approaching a million.

Thanks for the suggestions but I may have to learn some C#.
Unfortunately Visual Basic has problems with Unicode, as I suspect
also C++ and C have similar problems.


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #7

P: n/a
What about using the CTS Import Wizard to import the data from the flat
file into a table with an identity.

Where is this data coming from? Is there any way to recreate it with a
counter column included in the output?
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #8

P: n/a

"David Gugick" <da***********@imceda.com> wrote:
What about using the CTS Import Wizard to import the data from the flat file into a table with an identity.
It's the same problem. The table order generally follows the order in
the file but not always.
Where is this data coming from? Is there any way to recreate it with a counter column included in the output?


It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #9

P: n/a
no-email wrote:
It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.


I'm not sure what problems you would have as long as the tool you are
using to edit the data is unicode aware. I use TextEdit for editing
(www.textpad.com) and it has simple replacement expressions.

Assuming you had each row of data on a single line, you could simply do
the following:

1- Add a leading CARRIAGE RETURN to the file
2- Open the Replace dialog
3- Check the Regular Expression option
4- Type "\n" - WITHOUT QUOTES in the Find What entry- means New Line
character
5- Type "\n\i\t" - WITHOUT QUOTES in the Replace With entry - means New
Line + Auto Number + TAB
6- Click Replace All
7 - Remove the leading carriage return in the file
8 - Click FILE SAVE AS and make sure the UNICODE option is selected

You can replace the TAB character with whatever your file requires or
add DOUBLE QUOTES around the Auto Number, etc.

You can download a free trial of TextPad on the web site.

--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #10

P: n/a

"David Gugick" <da***********@imceda.com> wrote in message
news:#Z**************@tk2msftngp13.phx.gbl...
no-email wrote:
It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid potential problems with Unicode. Once the data gets into SQL Server I don't have any problems, as long as the table order exactly matches the file order.
I'm not sure what problems you would have as long as the tool you

are using to edit the data is unicode aware. I use TextEdit for editing
(www.textpad.com) and it has simple replacement expressions.


Let me give it a try. With Word it just locks up after a few minutes
and dies. Notepad is way too slow on the replacements. I have 512 Meg
of memory but I still have problems. I'll try textpad.

Thanks for the suggestion.

Jul 23 '05 #11

P: n/a

"David Gugick" <da***********@imceda.com> wrote in message
news:#Z**************@tk2msftngp13.phx.gbl...
no-email wrote:
It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid potential problems with Unicode. Once the data gets into SQL Server I don't have any problems, as long as the table order exactly matches the file order.
I'm not sure what problems you would have as long as the tool you

are using to edit the data is unicode aware. I use TextEdit for editing
(www.textpad.com) and it has simple replacement expressions.


I tried it but it doesn't properly handle Unicode.

Thanks anyway.
Jul 23 '05 #12

P: n/a
no-email wrote:
I tried it but it doesn't properly handle Unicode.

Thanks anyway.


Textpad does handle unicode. I use it with unicode data all the time.
What problems are you having with it? Just because it doesn't look right
in the editor doesn't mean it's not saving the file properly.

Could you elaborate on the issue you are seeing?
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #13

P: n/a

"David Gugick" <da***********@imceda.com> wrote in message
news:eB**************@TK2MSFTNGP10.phx.gbl...
no-email wrote:
I tried it but it doesn't properly handle Unicode.

Thanks anyway.
Textpad does handle unicode. I use it with unicode data all the

time. What problems are you having with it? Just because it doesn't look right in the editor doesn't mean it's not saving the file properly.

Could you elaborate on the issue you are seeing?


I open the original Unicode file and save it using UTF-8 encoding.

I then open the file with Textpad using File->Open, with UTF-8
encoding, and I get the following error message:

"WARNING: "filename" contains characters that do not exist in code
page 1253 (ANSI-Greek). They will be converted to the system default
character, if you click OK."

All the Unicode characters are converted to question marks.


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #14

P: n/a
David Gugick (da***********@imceda.com) writes:
Textpad does handle unicode. I use it with unicode data all the time.
What problems are you having with it? Just because it doesn't look right
in the editor doesn't mean it's not saving the file properly.


Are you using any version 5 beta?

Textpad 4.7 can read Unicode files, but if there actually is data outside
you ANSI code pages, that data will be mutilated. I have had problems
with as simple things as BKS files (control files for NT backup). If
I edit them with Textpad, NT backup does not like the file after I've
been to it.

See http://www.abaris.se/abaperls/doc/textpad.html for a couple of
links to similar tools. I have not evaulated them with regards to
Unicode, but I have a vague recollection that UltraEdit may cut it.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #15

P: n/a
> In general BCP loads the data in the same order as the file but not
always. The ordering sometimes reverses for thousands of rows, or
skips certain rows, but you need to check it carefully to find the
misordering. You can create a table with an identity column and load
the data, but again if the rows are not loaded in the same sequence as
the file this won't matter. You will end up with an ordered table that
is unfortunately not in the same order as the original file.


Have you tried loading to a table with an identity column with BCP with
batch size set to 1?

Craig
Jul 23 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.