The data file is a simple Unicode file with lines of text. BCP
apparently doesn't guarantee this ordering, and neither does the
import tool. I want to be able to load the data either sequentially or
add line numbering to large Unicode file (1 million lines). I don't
want to deal with another programming language if possible and I
wonder if there's a trick in SQL Server to get this accomplished.
Thanks for any help.
Mark Leary
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =--- 15 4307
no-email wrote: The data file is a simple Unicode file with lines of text. BCP apparently doesn't guarantee this ordering, and neither does the import tool. I want to be able to load the data either sequentially or add line numbering to large Unicode file (1 million lines). I don't want to deal with another programming language if possible and I wonder if there's a trick in SQL Server to get this accomplished.
Thanks for any help.
Mark Leary
Why does the order of the rows inserted into the table matter in your
case? Relational databases don't understand row order. If you need them
sorted in some way after the import, you can create a clustered index on
the table to get the rows ordered in a way that helps your queries
perform better.
In general, I think BCP processes the rows in the file sequentially. But
again, I'm not clear on why this matters.
Could you elaborate on the exact issue you are trying to avoid.
--
David Gugick
Imceda Software www.imceda.com
"David Gugick" <da***********@ imceda.com> wrote: Why does the order of the rows inserted into the table matter in your case? Relational databases don't understand row order. If you need them sorted in some way after the import, you can create a clustered index on the table to get the rows ordered in a way that helps your queries perform better.
In general, I think BCP processes the rows in the file sequentially. But again, I'm not clear on why this matters.
Could you elaborate on the exact issue you are trying to avoid.
I am trying to load a text file sequentially in order to perform text
manipulations using T-SQL that do depend on the exact order. I would be
happy with simply adding a line number to each line of the Unicode text
file, and then loading the file with line number determining the order, but
I want to avoid programming in another language if possible. Eventually the
loaded text would be converted to proper relational tables. This doesn't
have to do with improving performance. Does this help?
Thanks.
no-email wrote: "David Gugick" <da***********@ imceda.com> wrote:
Why does the order of the rows inserted into the table matter in your case? Relational databases don't understand row order. If you need them sorted in some way after the import, you can create a clustered index on the table to get the rows ordered in a way that helps your queries perform better.
In general, I think BCP processes the rows in the file sequentially. But again, I'm not clear on why this matters.
Could you elaborate on the exact issue you are trying to avoid.
I am trying to load a text file sequentially in order to perform text manipulations using T-SQL that do depend on the exact order. I would be happy with simply adding a line number to each line of the Unicode text file, and then loading the file with line number determining the order, but I want to avoid programming in another language if possible. Eventually the loaded text would be converted to proper relational tables. This doesn't have to do with improving performance. Does this help? Thanks.
Yes. It sounds like you have rows in a specific order that will need to
be processed once on SQL Server. You want to preserve the order of rows
in the file so the rows can be processed in the same order once on SQL
Server.
In order to do this in any relational database, you need a sort key.
There is never a guarantee that a query you run without an ORDER BY will
return rows in the same order in any consistent way.
My understanding is that BCP feeds the rows in the order they appear in
a file. I can't imagine any reason it would or could do it differently.
In that case, you want to insert the data into a table that contains an
IDENTITY column. You can then use that key for your ORDER BY when
processing the rows from whatever process does that.
--
David Gugick
Imceda Software www.imceda.com
Do you know what order the source file is sorted in? If so, and if the
sort order column(s) are included then you may not need to know the
line number since it is (theoretically anyway) possible to derive that
information from the other data.
If not, then this article has a useful suggestion: http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl
not sure if that will work with unicode data though.
--
David Portas
SQL Server MVP
--
"David Portas" <RE************ *************** *@acm.org> wrote in
message: Do you know what order the source file is sorted in? If so, and if
the sort order column(s) are included then you may not need to know the line number since it is (theoretically anyway) possible to derive
that information from the other data.
Unfortunately the data file consists of simple lines of text with no
other way to extract potential column information before loading it
into the database.
If not, then this article has a useful suggestion:
http://www.google.co.uk/groups?selm=...TNGP11.phx.gbl not sure if that will work with unicode data though.
Good suggestion but it does fail with Unicode. Thanks anyway.
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
"David Gugick" <da***********@ imceda.com> wrote: My understanding is that BCP feeds the rows in the order they appear
in a file. I can't imagine any reason it would or could do it
differently. In that case, you want to insert the data into a table that contains
an IDENTITY column. You can then use that key for your ORDER BY when processing the rows from whatever process does that.
In general BCP loads the data in the same order as the file but not
always. The ordering sometimes reverses for thousands of rows, or
skips certain rows, but you need to check it carefully to find the
misordering. You can create a table with an identity column and load
the data, but again if the rows are not loaded in the same sequence as
the file this won't matter. You will end up with an ordered table that
is unfortunately not in the same order as the original file.
To be honest I have tried all these suggestions in the past. My
typical solution would be to open the original file in Excel, add a
rownumber column and then save the resulting file as a Unicode file.
This works up until around a maximum of 63,000 rows. You can break a
file into 63,000 row subfiles, but this would be too tedious if you
have row counts approaching a million.
Thanks for the suggestions but I may have to learn some C#.
Unfortunately Visual Basic has problems with Unicode, as I suspect
also C++ and C have similar problems.
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
What about using the CTS Import Wizard to import the data from the flat
file into a table with an identity.
Where is this data coming from? Is there any way to recreate it with a
counter column included in the output?
--
David Gugick
Imceda Software www.imceda.com
"David Gugick" <da***********@ imceda.com> wrote: What about using the CTS Import Wizard to import the data from the
flat file into a table with an identity.
It's the same problem. The table order generally follows the order in
the file but not always.
Where is this data coming from? Is there any way to recreate it with
a counter column included in the output?
It's foreign language dictionary data that cannot be recreated. I
could manipulate the data on the file level but I am trying to avoid
potential problems with Unicode. Once the data gets into SQL Server I
don't have any problems, as long as the table order exactly matches
the file order.
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---
no-email wrote: It's foreign language dictionary data that cannot be recreated. I could manipulate the data on the file level but I am trying to avoid potential problems with Unicode. Once the data gets into SQL Server I don't have any problems, as long as the table order exactly matches the file order.
I'm not sure what problems you would have as long as the tool you are
using to edit the data is unicode aware. I use TextEdit for editing
( www.textpad.com) and it has simple replacement expressions.
Assuming you had each row of data on a single line, you could simply do
the following:
1- Add a leading CARRIAGE RETURN to the file
2- Open the Replace dialog
3- Check the Regular Expression option
4- Type "\n" - WITHOUT QUOTES in the Find What entry- means New Line
character
5- Type "\n\i\t" - WITHOUT QUOTES in the Replace With entry - means New
Line + Auto Number + TAB
6- Click Replace All
7 - Remove the leading carriage return in the file
8 - Click FILE SAVE AS and make sure the UNICODE option is selected
You can replace the TAB character with whatever your file requires or
add DOUBLE QUOTES around the Auto Number, etc.
You can download a free trial of TextPad on the web site.
--
David Gugick
Imceda Software www.imceda.com This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Spamtrap |
last post by:
I only work in Perl occasionaly, and have been searching for a
solution for a conversion, and everything I found seems much too
complex.
All I need to do is take a simple text file and copy it, however some specific lines are in
fact in UTF8 as printed garbagy characters and they need to be
converted to Unicode, so that the new text file can be imported into a
desktop program and into some Word documents.
For the moment I would be...
|
by: Majed |
last post by:
Hi , all
I'm trying to write unicode to a file for another app (not developed with
vs2003) to read it. I used StreamWriter with unicode encoding.but I was
surprised that the streamwriter adds FFFE to the start of the file,which
stopes the other app from reading it!!
any idea how to stope it frome doing that,do I have to use another class
#####writer that supports unicode?
help me Please!
Thanks
|
by: Eric Lilja |
last post by:
Hello, I had what I thought was normal text-file and I needed to locate a
string matching a certain pattern in that file and, if found, replace that
string. I thought this would be simple but I had problems getting my
algorithm to work and in order to help me find the solution I decided to
print each line to screen as I read them.
Then, to my surprise, I noticed that there was a space between every
character as I outputted the lines to the...
|
by: webdev |
last post by:
lo all,
some of the questions i'll ask below have most certainly been discussed
already, i just hope someone's kind enough to answer them again to help
me out..
so i started a python 2.3 script that grabs some web pages from the web,
regex parse the data and stores it localy to xml file for further use..
at first i had no problem using python minidom and everything concerning
|
by: anantvrana |
last post by:
Hello All,
I am trying to read Unicode (Kanji character) data from a text file.
When I store unicode data into variable my Kanji character gets messed
up.
I am using following code
Open File1 For Input Access Read As #1
While Not EOF(1)
| |
by: John Sidney-Woollett |
last post by:
Hi
I need to store accented characters in a postgres (7.4) database, and
access the data (mostly) using the postgres JDBC driver (from a web app).
Does anyone know if:
1) Is there a performance loss using (multibyte) UNICODE vs (single byte)
SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and
searching/sorting speeds).
|
by: aj |
last post by:
DB2 WSUE LUW v8.2 FP4 (aka v8.1 FP11)
RHEL AS 4
I am EXPORTing in IXF format from one schema and then LOADing into
another schema on another server.
The DB modeling tool I am using likes to put the PK columns at the
very top of the CREATE TABLE ddl, so the ordering of the columns in
the FROM schema is sometimes different from the ordering in the TO schema.
|
by: candide_sh |
last post by:
Hello,
I created a script with database publishing wizard to convert a SS2005
db into a SS2k db. The script has schema and data and looks very good.
When I try to start the created script in SS2k-Query analyzer, I get an
error like (in german)
Server: Nachr.-Nr. 105, Schweregrad 15, Status 1, Zeile 2
Öffnendes Anführungszeichen vor der Zeichenfolge '䝘퉊䘒
'.
|
by: lamxing |
last post by:
Dear all,
I've spent a long time to try to get the xmldocument.load method
to handle UTF-8 characters, but no luck. Every time it loads a
document contains european characters (such as the one below, output
from google map API), it always said invalid character at position
229, which I believe is the "" character.
Can anyone point me to the right direction of how to load such
documents using the xmldocument.load() method, or...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |