473,765 Members | 2,021 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Having mega-problems with importation of 20,000-line text file

MLH
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1 ) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.
Mar 12 '06 #1
8 2156
MLH
BTW, if I select about 50 rows of the pertinent field
in an open table, copy with CTRL-C and paste into
notepad, I see the problematic row data is wrapped
in dbl-quotes. The non-problematic row data has no
such quote wrapping. Dunno if that's important, but
it is factual.
Mar 12 '06 #2
MLH wrote:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1 ) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 12 '06 #3
MLH <CR**@NorthStat e.net> wrote in
news:dk******** *************** *********@4ax.c om:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1 ) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.
--
Bob Quintal

PA is y I've altered my email address.
Mar 12 '06 #4
MLH
>
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


Mar 13 '06 #5
MLH
>It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxx
Well, I did it the old-fashioned way. I used elbow grease.
If it ever becomes necessary 4 me 2 do this again, it will
remain a problem - I have no idea how to prevent it. Any-
way, the SQL enabled me to single out the crappy stuff,
get rid of the dbl-quotes, parse the lines out properly, and
append the RIGHT stuff back to the table in an orderly
fashion.
Mar 13 '06 #6
Sounds like the problem is in the unix box...

you could fix the text with vba before you import to fix it - remove
dbl qoutes that come directly after vbcrlf and remove vbcrlfs that
proceed quotes. then remove all remaining quotes.

but i'd fix the output from the unix box - or throw it back to the
person who is responsible for it.

Mar 13 '06 #7
MLH <CR**@NorthStat e.net> wrote in
news:9q******** *************** *********@4ax.c om:

The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process.


Then it is the directory structure that contains some
descriptors or metadata which contains the double quotes..
Strangely, out of 851 root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


--
Bob Quintal

PA is y I've altered my email address.
Mar 13 '06 #8
MLH wrote:
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


You've got something else going on. The ls command won't put double
quotes around anything. FTP, in ASCII mode, will sometimes however, add
CR-LF every 255 characters.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 13 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3271
by: TJ | last post by:
Anyone know how to display process id's (normally found by going to Component Services and switching to applications to 'Status View') in a web page? I'm trying to keep remote users from having to terminal server or use com+ to view the process id's they need for asp debugging (via attaching to a process (remote debugging via dcom and machine debug manager)) Thx in advance,
0
1460
by: Graham Simms | last post by:
I have tried to run a query where the reuslts are restricted by a HAVING clause with 2 conditions, but the second condition seems to be ignored. for example SELECT userid, AVG(position) FROM `table1` WHERE (status='finished') GROUP BY userid HAVING (COUNT(*) >= 4 AND AVG(position) < 5.5) returns exactly the same as
7
3313
by: | last post by:
I fail to understand why that the memory allocated in the void create(int **matrix) does not remain. I passed the address of matrix so shouldn't it still have the allocated memory when it returns to main. The problem i am having is understanding why the printf statement in the code below gives the value. I would have expected it to be 123 which is the value I set it to in the create. Thanx in advance. void create(int **matrix); int...
12
2706
by: Edward Diener | last post by:
Given value class X { public: // Not allowed: X():i(100000),s(10000) { } // Allowed void InitializeDefaults() { i = 100000; s = 10000; } private: int i;
1
1545
by: Vinod | last post by:
Hi, In VC8 project, I am having a struct which is having a char* variable. Now I am creating a 3 elements array object for the struct. I send the base address of the object using VARIANT to a function where I am casting to struct again using reinterpret_cast. Then If I am trying to traverse the array object from the index 0 to 2, it is working fine, If I am going for the index 3rd element, that char* variable
1
3483
by: vanandwiz | last post by:
Please find the code below. SELECT COMPANY.NAME, COUNT(ITEM.ID) FROM ITEM, COMPANY WHERE ITEM.COMPANYID = COMPANY.ID GROUP BY ROLLUP(COMPANY.NAME)
0
1160
by: DR | last post by:
I heard there is some trick to referencing statics in C# CLR stored procedure without having to mark the assembly as unsafe. Does anyone know this? This is usefull as the case of needing a little static shared variable here and there without having to compromise safety in the Sql Server 2005.
3
8142
by: Philia | last post by:
Hello, I'm having problem migrating to postgreSQL from MySQL, some of my queries does not work in postgres. Query(simplified): SELECT t1.bus_stop, t1.time, (case when t1.time >2200 then t1.time + 10000 else t1.time END) AS sort_column FROM routes AS t1 GROUP BY t1.bus_stop, t1.time HAVING sort_column > 2200 ORDER BY sort_column LIMIT 1
2
10414
by: Jack | last post by:
Hi, Performance-wise, is there a difference between these 2 clauses? I assume a large database... 1) With the WHERE clause: Select FLD1, FLD2, FLD3 From myFile Where FLD1 = 1 And FLD2 = 1
7
4315
by: tiptap | last post by:
Hey Guys, I have a huge statement loads of if statements in... and its getting bigger. On closer inspection there is only 3 difference in the select statement. so I thought I could cut the whole thing down to just 1 select statement if I have a conditional Having. I've simplified the IF statement down a bit to give you an idea of what im trying to achieve IF @month <> 0 & @diffFuture = 0 & @showDate <> 0
0
9398
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10160
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9951
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8831
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5421
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3531
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.