By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,474 Members | 966 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,474 IT Pros & Developers. It's quick & easy.

Having mega-problems with importation of 20,000-line text file

P: n/a
MLH
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.
Mar 12 '06 #1
Share this Question
Share on Google+
8 Replies


P: n/a
MLH
BTW, if I select about 50 rows of the pertinent field
in an open table, copy with CTRL-C and paste into
notepad, I see the problematic row data is wrapped
in dbl-quotes. The non-problematic row data has no
such quote wrapping. Dunno if that's important, but
it is factual.
Mar 12 '06 #2

P: n/a
MLH wrote:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 12 '06 #3

P: n/a
MLH <CR**@NorthState.net> wrote in
news:dk********************************@4ax.com:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.
--
Bob Quintal

PA is y I've altered my email address.
Mar 12 '06 #4

P: n/a
MLH
>
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


Mar 13 '06 #5

P: n/a
MLH
>It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Well, I did it the old-fashioned way. I used elbow grease.
If it ever becomes necessary 4 me 2 do this again, it will
remain a problem - I have no idea how to prevent it. Any-
way, the SQL enabled me to single out the crappy stuff,
get rid of the dbl-quotes, parse the lines out properly, and
append the RIGHT stuff back to the table in an orderly
fashion.
Mar 13 '06 #6

P: n/a
Sounds like the problem is in the unix box...

you could fix the text with vba before you import to fix it - remove
dbl qoutes that come directly after vbcrlf and remove vbcrlfs that
proceed quotes. then remove all remaining quotes.

but i'd fix the output from the unix box - or throw it back to the
person who is responsible for it.

Mar 13 '06 #7

P: n/a
MLH <CR**@NorthState.net> wrote in
news:9q********************************@4ax.com:

The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process.


Then it is the directory structure that contains some
descriptors or metadata which contains the double quotes..
Strangely, out of 851 root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


--
Bob Quintal

PA is y I've altered my email address.
Mar 13 '06 #8

P: n/a
MLH wrote:
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


You've got something else going on. The ls command won't put double
quotes around anything. FTP, in ASCII mode, will sometimes however, add
CR-LF every 255 characters.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 13 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.