Having mega-problems with importation of 20,000-line text file 
March 12th, 2006, 02:55 AM
| | | Having mega-problems with importation of 20,000-line text file
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.
That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.
Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));
I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file. | 
March 12th, 2006, 03:05 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
BTW, if I select about 50 rows of the pertinent field
in an open table, copy with CTRL-C and paste into
notepad, I see the problematic row data is wrapped
in dbl-quotes. The non-problematic row data has no
such quote wrapping. Dunno if that's important, but
it is factual. | 
March 12th, 2006, 03:15 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
MLH wrote:[color=blue]
> Am trying to import 20,000+ lines of text in a file
> FTP'd from a UNIX platform to windows via FTP
> session in a DOS box. About 2000 records have
> multiple lines in them separated by CRLF's.
>
> That's not the idea. Each line was supposed to
> be a separate record. I don't know why the 2000
> records having MULTIPLE lines of text are so
> stubborn. I've run UNIX2DOS utility against the
> raw text file - no help. I know that UNIX and MAC
> systems standards for end-of-line termination characters
> differ. I really thought the UNIX2DOS utility would
> do the trick. But each time I import the text file -
> same thing.
>
> Here's the SQL that shows me the 2000 problematic
> rows:
> SELECT Apple.Field1 FROM Apple
> WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));
>
> I've analyzed the text file with a HEX editor and I see
> nothing different being used for end-of-line termination
> in the problematic rows than in the other lines (&H0d
> and &H0a). Why are 2000 records being formed during
> the text importation with 2, 3 and perhaps more lines
> in a single record? What can I do to stop that from
> happening? I really don't feel like doing this manually
> in 2000 different places in a 20000+ line text file.[/color]
It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.
In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.
--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember. | 
March 12th, 2006, 01:05 PM
| | | Re: Having mega-problems with importation of 20,000-line text file
MLH <CRCI@NorthState.net> wrote in
news:dk5712lmlim605k2qsq0h99rj4kt0p572b@4ax.com:
[color=blue]
> Am trying to import 20,000+ lines of text in a file
> FTP'd from a UNIX platform to windows via FTP
> session in a DOS box. About 2000 records have
> multiple lines in them separated by CRLF's.
>
> That's not the idea. Each line was supposed to
> be a separate record. I don't know why the 2000
> records having MULTIPLE lines of text are so
> stubborn. I've run UNIX2DOS utility against the
> raw text file - no help. I know that UNIX and MAC
> systems standards for end-of-line termination characters
> differ. I really thought the UNIX2DOS utility would
> do the trick. But each time I import the text file -
> same thing.
>
> Here's the SQL that shows me the 2000 problematic
> rows:
> SELECT Apple.Field1 FROM Apple
> WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));
>
> I've analyzed the text file with a HEX editor and I see
> nothing different being used for end-of-line termination
> in the problematic rows than in the other lines (&H0d
> and &H0a). Why are 2000 records being formed during
> the text importation with 2, 3 and perhaps more lines
> in a single record? What can I do to stop that from
> happening? I really don't feel like doing this manually
> in 2000 different places in a 20000+ line text file.[/color]
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.
--
Bob Quintal
PA is y I've altered my email address. | 
March 13th, 2006, 03:45 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
>[color=blue]
>The rows have the CrLf embedded in the quoted strings. That
>means they were placed in the original file.[/color]
Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.[color=blue]
>
>Try to redo the import specification to set the "Text
>Delimiter" value to the double-quote.[/color] | 
March 13th, 2006, 03:45 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
>It's difficult to diagnose what is happening from the information that[color=blue]
>you provided, but you might want to attempt your ftp again, and specify
>binary mode. FTP will insert returns in ASCII mode.
>
>In absolute worse case, you could read the file in with VBA in character
>mode and properly terminate the lines.[/color]
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Well, I did it the old-fashioned way. I used elbow grease.
If it ever becomes necessary 4 me 2 do this again, it will
remain a problem - I have no idea how to prevent it. Any-
way, the SQL enabled me to single out the crappy stuff,
get rid of the dbl-quotes, parse the lines out properly, and
append the RIGHT stuff back to the table in an orderly
fashion. | 
March 13th, 2006, 10:25 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
Sounds like the problem is in the unix box...
you could fix the text with vba before you import to fix it - remove
dbl qoutes that come directly after vbcrlf and remove vbcrlfs that
proceed quotes. then remove all remaining quotes.
but i'd fix the output from the unix box - or throw it back to the
person who is responsible for it. | 
March 13th, 2006, 11:35 AM
| | | Re: Having mega-problems with importation of 20,000-line text file
MLH <CRCI@NorthState.net> wrote in
news:9qs91294tb27e3lkbldjji7qj0judfv98t@4ax.com:
[color=blue][color=green]
>>
>>The rows have the CrLf embedded in the quoted strings. That
>>means they were placed in the original file.[/color]
> Sorry, what is meant by 'original file'? The file is created
> on-the-fly using ls command to do a dir of the unix box
> current directory - that listing is what's being written to
> the local PC during the process.[/color]
Then it is the directory structure that contains some
descriptors or metadata which contains the double quotes..
Strangely, out of 851[color=blue]
> root-level dirs jointly containing 24000+ 2nd level dirs,
> only about 2000 entries (rows which ultimately become
> records) are comprised of multiple CRLF-separated lines
> that are enclosed in dbl-quotes. 90% of the lines are not
> enclosed in dbl-quotes but DO have carriage returns &
> line feeds at the ends of the lines. They have to wrap
> somehow.[color=green]
>>
>>Try to redo the import specification to set the "Text
>>Delimiter" value to the double-quote.[/color]
>[/color]
--
Bob Quintal
PA is y I've altered my email address. | 
March 13th, 2006, 01:05 PM
| | | Re: Having mega-problems with importation of 20,000-line text file
MLH wrote:[color=blue][color=green]
>> The rows have the CrLf embedded in the quoted strings. That
>> means they were placed in the original file.[/color]
> Sorry, what is meant by 'original file'? The file is created
> on-the-fly using ls command to do a dir of the unix box
> current directory - that listing is what's being written to
> the local PC during the process. Strangely, out of 851
> root-level dirs jointly containing 24000+ 2nd level dirs,
> only about 2000 entries (rows which ultimately become
> records) are comprised of multiple CRLF-separated lines
> that are enclosed in dbl-quotes. 90% of the lines are not
> enclosed in dbl-quotes but DO have carriage returns &
> line feeds at the ends of the lines. They have to wrap
> somehow.[color=green]
>> Try to redo the import specification to set the "Text
>> Delimiter" value to the double-quote.[/color]
>[/color]
You've got something else going on. The ls command won't put double
quotes around anything. FTP, in ASCII mode, will sometimes however, add
CR-LF every 255 characters.
--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember. | | Thread Tools | Search this Thread | | | |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 220,840 network members.
|