How to load a Unicode file into the database in the same order as the file order

The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Jul 23 '05 #1

Subscribe Reply

4307

David Gugick

no-email wrote:

The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.

Thanks for any help.

Mark Leary

Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries
perform better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #2

"David Gugick" <da***********@ imceda.com> wrote:

Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries perform
better.

In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.

I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would be
happy with simply adding a line number to each line of the Unicode text
file, and then loading the file with line number determining the order, but
I want to avoid programming in another language if possible. Eventually the
loaded text would be converted to proper relational tables. This doesn't
have to do with improving performance. Does this help?

Thanks.

Jul 23 '05 #3

David Gugick

no-email wrote:

"David Gugick" <da***********@ imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need
them sorted in some way after the import, you can create a clustered
index on the table to get the rows ordered in a way that helps your
queries perform better.

In general, I think BCP processes the rows in the file sequentially.
But again, I'm not clear on why this matters.

Could you elaborate on the exact issue you are trying to avoid.

I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would
be happy with simply adding a line number to each line of the Unicode
text file, and then loading the file with line number determining the
order, but I want to avoid programming in another language if
possible. Eventually the loaded text would be converted to proper
relational tables. This doesn't have to do with improving
performance. Does this help?
Thanks.

Yes. It sounds like you have rows in a specific order that will need to
be processed once on SQL Server. You want to preserve the order of rows
in the file so the rows can be processed in the same order once on SQL
Server.

In order to do this in any relational database, you need a sort key.
There is never a guarantee that a query you run without an ORDER BY will
return rows in the same order in any consistent way.

My understanding is that BCP feeds the rows in the order they appear in
a file. I can't imagine any reason it would or could do it differently.
In that case, you want to insert the data into a table that contains an
IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #4

David Portas

Do you know what order the source file is sorted in? If so, and if the
sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that
information from the other data.

If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl

not sure if that will work with unicode data though.
--
David Portas
SQL Server MVP
--

Jul 23 '05 #5

"David Portas" <RE************ *************** *@acm.org> wrote in
message:

Do you know what order the source file is sorted in? If so, and if the sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that information from the other data.
Unfortunately the data file consists of simple lines of text with no
other way to extract potential column information before loading it
into the database.
If not, then this article has a useful suggestion:

http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl
not sure if that will work with unicode data though.

Good suggestion but it does fail with Unicode. Thanks anyway.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Jul 23 '05 #6

"David Gugick" <da***********@ imceda.com> wrote:

My understanding is that BCP feeds the rows in the order they appear in a file. I can't imagine any reason it would or could do it differently. In that case, you want to insert the data into a table that contains an IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.

In general BCP loads the data in the same order as the file but not
always. The ordering sometimes reverses for thousands of rows, or
skips certain rows, but you need to check it carefully to find the
misordering. You can create a table with an identity column and load
the data, but again if the rows are not loaded in the same sequence as
the file this won't matter. You will end up with an ordered table that
is unfortunately not in the same order as the original file.

To be honest I have tried all these suggestions in the past. My
typical solution would be to open the original file in Excel, add a
rownumber column and then save the resulting file as a Unicode file.
This works up until around a maximum of 63,000 rows. You can break a
file into 63,000 row subfiles, but this would be too tedious if you
have row counts approaching a million.

Thanks for the suggestions but I may have to learn some C#.
Unfortunately Visual Basic has problems with Unicode, as I suspect
also C++ and C have similar problems.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Jul 23 '05 #7

David Gugick

What about using the CTS Import Wizard to import the data from the flat
file into a table with an identity.

Where is this data coming from? Is there any way to recreate it with a
counter column included in the output?
--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #8

"David Gugick" <da***********@ imceda.com> wrote:

What about using the CTS Import Wizard to import the data from the flat file into a table with an identity.
It's the same problem. The table order generally follows the order in
the file but not always.
Where is this data coming from? Is there any way to recreate it with a counter column included in the output?

It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Jul 23 '05 #9

David Gugick

no-email wrote:

It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.

I'm not sure what problems you would have as long as the tool you are
using to edit the data is unicode aware. I use TextEdit for editing
(www.textpad.com) and it has simple replacement expressions.

Assuming you had each row of data on a single line, you could simply do
the following:

1- Add a leading CARRIAGE RETURN to the file
2- Open the Replace dialog
3- Check the Regular Expression option
4- Type "\n" - WITHOUT QUOTES in the Find What entry- means New Line
character
5- Type "\n\i\t" - WITHOUT QUOTES in the Replace With entry - means New
Line + Auto Number + TAB
6- Click Replace All
7 - Remove the leading carriage return in the file
8 - Click FILE SAVE AS and make sure the UNICODE option is selected

You can replace the TAB character with whatever your file requires or
add DOUBLE QUOTES around the Auto Number, etc.

You can download a free trial of TextPad on the web site.

--
David Gugick
Imceda Software
www.imceda.com

Jul 23 '05 #10

Similar topics

18321

UTF8 to Unicode conversion

by: Spamtrap | last post by:

I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy it, however some specific lines are in fact in UTF8 as printed garbagy characters and they need to be converted to Unicode, so that the new text file can be imported into a desktop program and into some Word documents. For the moment I would be...

Perl

2454

StreamWriter Adds FFFE at the start of the file with Unicode Encoding!!

by: Majed | last post by:

Hi , all I'm trying to write unicode to a file for another app (not developed with vs2003) to read it. I used StreamWriter with unicode encoding.but I was surprised that the streamwriter adds FFFE to the start of the file,which stopes the other app from reading it!! any idea how to stope it frome doing that,do I have to use another class #####writer that supports unicode? help me Please! Thanks

.NET Framework

4032

Will standard C++ allow me to replace a string in a unicode-encoded text file?

by: Eric Lilja | last post by:

Hello, I had what I thought was normal text-file and I needed to locate a string matching a certain pattern in that file and, if found, replace that string. I thought this would be simple but I had problems getting my algorithm to work and in order to help me find the solution I decided to print each line to screen as I read them. Then, to my surprise, I noticed that there was a space between every character as I outputted the lines to the...

C / C++

6059

minidom xml & non ascii / unicode & files

by: webdev | last post by:

lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning

Python

17527

Unicode and VBA

by: anantvrana | last post by:

Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open File1 For Input Access Read As #1 While Not EOF(1)

Microsoft Access / VBA

6606

Unicode vs SQL_ASCII DBs

by: John Sidney-Woollett | last post by:

Hi I need to store accented characters in a postgres (7.4) database, and access the data (mostly) using the postgres JDBC driver (from a web app). Does anyone know if: 1) Is there a performance loss using (multibyte) UNICODE vs (single byte) SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and searching/sorting speeds).

PostgreSQL Database

4037

Is LOAD really this brain damaged?

by: aj | last post by:

DB2 WSUE LUW v8.2 FP4 (aka v8.1 FP11) RHEL AS 4 I am EXPORTing in IXF format from one schema and then LOADing into another schema on another server. The DB modeling tool I am using likes to put the PK columns at the very top of the CREATE TABLE ddl, so the ordering of the columns in the FROM schema is sometimes different from the ordering in the TO schema.

DB2 Database

3873

downgrade SQL2005 to SQL2k - Unicode ANSI issue

by: candide_sh | last post by:

Hello, I created a script with database publishing wizard to convert a SS2005 db into a SS2k db. The script has schema and data and looks very good. When I try to start the created script in SS2k-Query analyzer, I get an error like (in german) Server: Nachr.-Nr. 105, Schweregrad 15, Status 1, Zeile 2 Ã–ffnendes AnfÃ¼hrungszeichen vor der Zeichenfolge 'ä˜ïƒ›í‰Šä˜’ '.

Microsoft SQL Server

13951

Can XmlDocument.Load() method handle unicode characters?

by: lamxing | last post by:

Dear all, I've spent a long time to try to get the xmldocument.load method to handle UTF-8 characters, but no luck. Every time it loads a document contains european characters (such as the one below, output from google map API), it always said invalid character at position 229, which I believe is the "ß" character. Can anyone point me to the right direction of how to load such documents using the xmldocument.load() method, or...

.NET Framework

8310

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8826

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8732

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8503

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

7330

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6166

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5632

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4306

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1615

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General