473,399 Members | 3,106 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Having mega-problems with importation of 20,000-line text file

MLH
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.
Mar 12 '06 #1
8 2112
MLH
BTW, if I select about 50 rows of the pertinent field
in an open table, copy with CTRL-C and paste into
notepad, I see the problematic row data is wrapped
in dbl-quotes. The non-problematic row data has no
such quote wrapping. Dunno if that's important, but
it is factual.
Mar 12 '06 #2
MLH wrote:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 12 '06 #3
MLH <CR**@NorthState.net> wrote in
news:dk********************************@4ax.com:
Am trying to import 20,000+ lines of text in a file
FTP'd from a UNIX platform to windows via FTP
session in a DOS box. About 2000 records have
multiple lines in them separated by CRLF's.

That's not the idea. Each line was supposed to
be a separate record. I don't know why the 2000
records having MULTIPLE lines of text are so
stubborn. I've run UNIX2DOS utility against the
raw text file - no help. I know that UNIX and MAC
systems standards for end-of-line termination characters
differ. I really thought the UNIX2DOS utility would
do the trick. But each time I import the text file -
same thing.

Here's the SQL that shows me the 2000 problematic
rows:
SELECT Apple.Field1 FROM Apple
WHERE (((Apple.Field1) Like "*" & Chr$(13) & "*"));

I've analyzed the text file with a HEX editor and I see
nothing different being used for end-of-line termination
in the problematic rows than in the other lines (&H0d
and &H0a). Why are 2000 records being formed during
the text importation with 2, 3 and perhaps more lines
in a single record? What can I do to stop that from
happening? I really don't feel like doing this manually
in 2000 different places in a 20000+ line text file.


The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.
--
Bob Quintal

PA is y I've altered my email address.
Mar 12 '06 #4
MLH
>
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


Mar 13 '06 #5
MLH
>It's difficult to diagnose what is happening from the information that
you provided, but you might want to attempt your ftp again, and specify
binary mode. FTP will insert returns in ASCII mode.

In absolute worse case, you could read the file in with VBA in character
mode and properly terminate the lines.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Well, I did it the old-fashioned way. I used elbow grease.
If it ever becomes necessary 4 me 2 do this again, it will
remain a problem - I have no idea how to prevent it. Any-
way, the SQL enabled me to single out the crappy stuff,
get rid of the dbl-quotes, parse the lines out properly, and
append the RIGHT stuff back to the table in an orderly
fashion.
Mar 13 '06 #6
Sounds like the problem is in the unix box...

you could fix the text with vba before you import to fix it - remove
dbl qoutes that come directly after vbcrlf and remove vbcrlfs that
proceed quotes. then remove all remaining quotes.

but i'd fix the output from the unix box - or throw it back to the
person who is responsible for it.

Mar 13 '06 #7
MLH <CR**@NorthState.net> wrote in
news:9q********************************@4ax.com:

The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file. Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process.


Then it is the directory structure that contains some
descriptors or metadata which contains the double quotes..
Strangely, out of 851 root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.

Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


--
Bob Quintal

PA is y I've altered my email address.
Mar 13 '06 #8
MLH wrote:
The rows have the CrLf embedded in the quoted strings. That
means they were placed in the original file.

Sorry, what is meant by 'original file'? The file is created
on-the-fly using ls command to do a dir of the unix box
current directory - that listing is what's being written to
the local PC during the process. Strangely, out of 851
root-level dirs jointly containing 24000+ 2nd level dirs,
only about 2000 entries (rows which ultimately become
records) are comprised of multiple CRLF-separated lines
that are enclosed in dbl-quotes. 90% of the lines are not
enclosed in dbl-quotes but DO have carriage returns &
line feeds at the ends of the lines. They have to wrap
somehow.
Try to redo the import specification to set the "Text
Delimiter" value to the double-quote.


You've got something else going on. The ls command won't put double
quotes around anything. FTP, in ASCII mode, will sometimes however, add
CR-LF every 255 characters.

--
Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.
Mar 13 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: TJ | last post by:
Anyone know how to display process id's (normally found by going to Component Services and switching to applications to 'Status View') in a web page? I'm trying to keep remote users from having...
0
by: Graham Simms | last post by:
I have tried to run a query where the reuslts are restricted by a HAVING clause with 2 conditions, but the second condition seems to be ignored. for example SELECT userid, AVG(position) FROM...
7
by: | last post by:
I fail to understand why that the memory allocated in the void create(int **matrix) does not remain. I passed the address of matrix so shouldn't it still have the allocated memory when it returns...
12
by: Edward Diener | last post by:
Given value class X { public: // Not allowed: X():i(100000),s(10000) { } // Allowed void InitializeDefaults() { i = 100000; s = 10000; } private: int i;
1
by: Vinod | last post by:
Hi, In VC8 project, I am having a struct which is having a char* variable. Now I am creating a 3 elements array object for the struct. I send the base address of the object using VARIANT to a...
1
by: vanandwiz | last post by:
Please find the code below. SELECT COMPANY.NAME, COUNT(ITEM.ID) FROM ITEM, COMPANY WHERE ITEM.COMPANYID = COMPANY.ID GROUP BY ROLLUP(COMPANY.NAME)
0
by: DR | last post by:
I heard there is some trick to referencing statics in C# CLR stored procedure without having to mark the assembly as unsafe. Does anyone know this? This is usefull as the case of needing a little...
3
by: Philia | last post by:
Hello, I'm having problem migrating to postgreSQL from MySQL, some of my queries does not work in postgres. Query(simplified): SELECT t1.bus_stop, t1.time, (case when t1.time >2200 then...
2
by: Jack | last post by:
Hi, Performance-wise, is there a difference between these 2 clauses? I assume a large database... 1) With the WHERE clause: Select FLD1, FLD2, FLD3 From myFile Where FLD1 = 1 And FLD2 = 1
7
by: tiptap | last post by:
Hey Guys, I have a huge statement loads of if statements in... and its getting bigger. On closer inspection there is only 3 difference in the select statement. so I thought I could cut the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.