473,890 Members | 1,355 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to load a Unicode file into the database in the same order as the file order


The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #1
15 4332
no-email wrote:
The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary


Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries
perform better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #2

"David Gugick" <da***********@ imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries perform
better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.


I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would be
happy with simply adding a line number to each line of the Unicode text
file, and then loading the file with line number determining the order, but
I want to avoid programming in another language if possible. Eventually the
loaded text would be converted to proper relational tables. This doesn't
have to do with improving performance. Does this help?

Thanks.
Jul 23 '05 #3
no-email wrote:
"David Gugick" <da***********@ imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need
them sorted in some way after the import, you can create a clustered
index on the table to get the rows ordered in a way that helps your
queries perform better.

In general, I think BCP processes the rows in the file sequentially.
But again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.


I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would
be happy with simply adding a line number to each line of the Unicode
text file, and then loading the file with line number determining the
order, but I want to avoid programming in another language if
possible. Eventually the loaded text would be converted to proper
relational tables. This doesn't have to do with improving
performance. Does this help?
Thanks.


Yes. It sounds like you have rows in a specific order that will need to
be processed once on SQL Server. You want to preserve the order of rows
in the file so the rows can be processed in the same order once on SQL
Server.

In order to do this in any relational database, you need a sort key.
There is never a guarantee that a query you run without an ORDER BY will
return rows in the same order in any consistent way.

My understanding is that BCP feeds the rows in the order they appear in
a file. I can't imagine any reason it would or could do it differently.
In that case, you want to insert the data into a table that contains an
IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #4
Do you know what order the source file is sorted in? If so, and if the
sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that
information from the other data.

If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl

not sure if that will work with unicode data though.
--
David Portas
SQL Server MVP
--

Jul 23 '05 #5

"David Portas" <RE************ *************** *@acm.org> wrote in
message:
Do you know what order the source file is sorted in? If so, and if the sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that information from the other data.
Unfortunately the data file consists of simple lines of text with no
other way to extract potential column information before loading it
into the database.
If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl
not sure if that will work with unicode data though.


Good suggestion but it does fail with Unicode. Thanks anyway.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #6

"David Gugick" <da***********@ imceda.com> wrote:
My understanding is that BCP feeds the rows in the order they appear in a file. I can't imagine any reason it would or could do it differently. In that case, you want to insert the data into a table that contains an IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.


In general BCP loads the data in the same order as the file but not
always. The ordering sometimes reverses for thousands of rows, or
skips certain rows, but you need to check it carefully to find the
misordering. You can create a table with an identity column and load
the data, but again if the rows are not loaded in the same sequence as
the file this won't matter. You will end up with an ordered table that
is unfortunately not in the same order as the original file.

To be honest I have tried all these suggestions in the past. My
typical solution would be to open the original file in Excel, add a
rownumber column and then save the resulting file as a Unicode file.
This works up until around a maximum of 63,000 rows. You can break a
file into 63,000 row subfiles, but this would be too tedious if you
have row counts approaching a million.

Thanks for the suggestions but I may have to learn some C#.
Unfortunately Visual Basic has problems with Unicode, as I suspect
also C++ and C have similar problems.


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #7
What about using the CTS Import Wizard to import the data from the flat
file into a table with an identity.

Where is this data coming from? Is there any way to recreate it with a
counter column included in the output?
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #8

"David Gugick" <da***********@ imceda.com> wrote:
What about using the CTS Import Wizard to import the data from the flat file into a table with an identity.
It's the same problem. The table order generally follows the order in
the file but not always.
Where is this data coming from? Is there any way to recreate it with a counter column included in the output?


It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
Jul 23 '05 #9
no-email wrote:
It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.


I'm not sure what problems you would have as long as the tool you are
using to edit the data is unicode aware. I use TextEdit for editing
(www.textpad.com) and it has simple replacement expressions.

Assuming you had each row of data on a single line, you could simply do
the following:

1- Add a leading CARRIAGE RETURN to the file
2- Open the Replace dialog
3- Check the Regular Expression option
4- Type "\n" - WITHOUT QUOTES in the Find What entry- means New Line
character
5- Type "\n\i\t" - WITHOUT QUOTES in the Replace With entry - means New
Line + Auto Number + TAB
6- Click Replace All
7 - Remove the leading carriage return in the file
8 - Click FILE SAVE AS and make sure the UNICODE option is selected

You can replace the TAB character with whatever your file requires or
add DOUBLE QUOTES around the Auto Number, etc.

You can download a free trial of TextPad on the web site.

--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
18339
by: Spamtrap | last post by:
I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy it, however some specific lines are in fact in UTF8 as printed garbagy characters and they need to be converted to Unicode, so that the new text file can be imported into a desktop program and into some Word documents. For the moment I would be...
4
2468
by: Majed | last post by:
Hi , all I'm trying to write unicode to a file for another app (not developed with vs2003) to read it. I used StreamWriter with unicode encoding.but I was surprised that the streamwriter adds FFFE to the start of the file,which stopes the other app from reading it!! any idea how to stope it frome doing that,do I have to use another class #####writer that supports unicode? help me Please! Thanks
8
4079
by: Eric Lilja | last post by:
Hello, I had what I thought was normal text-file and I needed to locate a string matching a certain pattern in that file and, if found, replace that string. I thought this would be simple but I had problems getting my algorithm to work and in order to help me find the solution I decided to print each line to screen as I read them. Then, to my surprise, I noticed that there was a space between every character as I outputted the lines to the...
4
6078
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning
1
17560
by: anantvrana | last post by:
Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open File1 For Input Access Read As #1 While Not EOF(1)
6
6621
by: John Sidney-Woollett | last post by:
Hi I need to store accented characters in a postgres (7.4) database, and access the data (mostly) using the postgres JDBC driver (from a web app). Does anyone know if: 1) Is there a performance loss using (multibyte) UNICODE vs (single byte) SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and searching/sorting speeds).
10
4076
by: aj | last post by:
DB2 WSUE LUW v8.2 FP4 (aka v8.1 FP11) RHEL AS 4 I am EXPORTing in IXF format from one schema and then LOADing into another schema on another server. The DB modeling tool I am using likes to put the PK columns at the very top of the CREATE TABLE ddl, so the ordering of the columns in the FROM schema is sometimes different from the ordering in the TO schema.
4
3880
by: candide_sh | last post by:
Hello, I created a script with database publishing wizard to convert a SS2005 db into a SS2k db. The script has schema and data and looks very good. When I try to start the created script in SS2k-Query analyzer, I get an error like (in german) Server: Nachr.-Nr. 105, Schweregrad 15, Status 1, Zeile 2 Öffnendes Anführungszeichen vor der Zeichenfolge '䝘퉊䘒 '.
10
13982
by: lamxing | last post by:
Dear all, I've spent a long time to try to get the xmldocument.load method to handle UTF-8 characters, but no luck. Every time it loads a document contains european characters (such as the one below, output from google map API), it always said invalid character at position 229, which I believe is the "" character. Can anyone point me to the right direction of how to load such documents using the xmldocument.load() method, or...
0
9978
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9822
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11232
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10819
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10462
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9633
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8016
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5851
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6045
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.