Reading textfiles line by line

jamait

Hi all,

I m trying to read in a text file into a datatable...

Not sure on how to split up the information though, regex or substrings...?

sample:
Col1 Col2 Col3
Col4
A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000

Notice how the 2nd row has merged col2 and col3. There are no delimeter at
all but the position seem to be the same depending on the text in col2. The
original file
include approx. 5000 lines that I want to update a sql table with. The above
problem
occurs at many positions in the text file.

I have successfully read in data from the text file using this code:

<code>
StreamReader sr = File.OpenText(fileName);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if(line.Length > 17)
{
DataRow dr = m_Data.NewRow();
dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = ?
dr["Col3"] = ?
dr["Col4"] = ?
m_Data.Rows.Add(dr);
}
}
sr.Close();

Any help appriciated!

/Martin

Jul 21 '05 #1

Subscribe Reply

2747

Patrice

Not sure what you meant by "the position seem to be the same depending on
the text in col2" ?

What if you try to display this file using a fixed width font such as
courier new ? Are all fields aligned ?

To me it looks like this is a fixed width file. Each column uses always the
same range of characters on each line (but it may not be visible immediately
when using a proportional font).

Patrice

--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:8E**********************************@microsof t.com...

Hi all,

I m trying to read in a text file into a datatable...

Not sure on how to split up the information though, regex or substrings...?
sample:
Col1 Col2 Col3
Col4
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Notice how the 2nd row has merged col2 and col3. There are no delimeter at
all but the position seem to be the same depending on the text in col2. The original file
include approx. 5000 lines that I want to update a sql table with. The above problem
occurs at many positions in the text file.

I have successfully read in data from the text file using this code:

<code>
StreamReader sr = File.OpenText(fileName);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if(line.Length > 17)
{
DataRow dr = m_Data.NewRow();
dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = ?
dr["Col3"] = ?
dr["Col4"] = ?
m_Data.Rows.Add(dr);
}
}
sr.Close();

Any help appriciated!

/Martin

Jul 21 '05 #2

Dave

> The original file include approx. 5000 lines

Is this a one-time operation? If you are using SqlServer you can write a simple DTS package to do the transformation or just use
the Import Data command in Enterprise Manager.

--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
"Patrice" <no****@nowhere.com> wrote in message news:un*************@TK2MSFTNGP12.phx.gbl...

Not sure what you meant by "the position seem to be the same depending on
the text in col2" ?

What if you try to display this file using a fixed width font such as
courier new ? Are all fields aligned ?

To me it looks like this is a fixed width file. Each column uses always the
same range of characters on each line (but it may not be visible immediately
when using a proportional font).

Patrice

--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:8E**********************************@microsof t.com...
Hi all,

I m trying to read in a text file into a datatable...

Not sure on how to split up the information though, regex or

substrings...?

sample:
Col1 Col2 Col3
Col4
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Notice how the 2nd row has merged col2 and col3. There are no delimeter at
all but the position seem to be the same depending on the text in col2.

The
original file
include approx. 5000 lines that I want to update a sql table with. The

above
problem
occurs at many positions in the text file.

I have successfully read in data from the text file using this code:

<code>
StreamReader sr = File.OpenText(fileName);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if(line.Length > 17)
{
DataRow dr = m_Data.NewRow();
dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = ?
dr["Col3"] = ?
dr["Col4"] = ?
m_Data.Rows.Add(dr);
}
}
sr.Close();

Any help appriciated!

/Martin

Jul 21 '05 #3

jamait

Hi again,

It is not a one time operation and what I really want to do is to split each
line into 4 parts...

First bit is a ID field, second field a description, third is the unit and
is in the format of an known char array. The last column is the price of the
product.
Unfortunately when opening this file which is opened as plain text and read
line by line the 2nd and 3rd column somehow merges at some of the rows...
The merging only occurs where the description field is long enough and
probably at the position of the longest description in the file...The unit of
the line is then appended to the line without any delimeter whereas the main
problem.
The other columns in the file are separated by more than 1 white space
between them.

I am thinking of using some kind of regular expression and extracting the
information wanted line by line but not sure how to split the 2nd and 3rd
column.
A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000

Also updated the StreamReader to use the default encoding to display the
swedish characters properly...
Help...

/Martin

StreamReader sr = new StreamReader(file, System.Text.Encoding.Default);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if(line.Length > 17)
{
DataRow dr = m_Data.NewRow();
dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = ?
dr["Col3"] = ?
dr["Col4"] = ?
m_Data.Rows.Add(dr);
}
}
sr.Close();

"Dave" wrote:
The original file include approx. 5000 lines

Is this a one-time operation? If you are using SqlServer you can write a simple DTS package to do the transformation or just use
the Import Data command in Enterprise Manager.

--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
"Patrice" <no****@nowhere.com> wrote in message news:un*************@TK2MSFTNGP12.phx.gbl...
Not sure what you meant by "the position seem to be the same depending on
the text in col2" ?

What if you try to display this file using a fixed width font such as
courier new ? Are all fields aligned ?

To me it looks like this is a fixed width file. Each column uses always the
same range of characters on each line (but it may not be visible immediately
when using a proportional font).

Patrice

--

"jamait" <ja****@discussions.microsoft.com> a Ã©crit dans le message de
news:8E**********************************@microsof t.com...
Hi all,

I m trying to read in a text file into a datatable...

Not sure on how to split up the information though, regex or

substrings...?

sample:
Col1 Col2 Col3
Col4
A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000

Notice how the 2nd row has merged col2 and col3. There are no delimeter at
all but the position seem to be the same depending on the text in col2.

The
original file
include approx. 5000 lines that I want to update a sql table with. The

above
problem
occurs at many positions in the text file.

I have successfully read in data from the text file using this code:

<code>
StreamReader sr = File.OpenText(fileName);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if(line.Length > 17)
{
DataRow dr = m_Data.NewRow();
dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = ?
dr["Col3"] = ?
dr["Col4"] = ?
m_Data.Rows.Add(dr);
}
}
sr.Close();

Any help appriciated!

/Martin

Jul 21 '05 #4

Patrice

Sorry but I'm afraid I still don't catch the exact problem. The two lines
you showed us are using the same format.

There is NO separator. Each field uses a *fixed* width :

Copy the two lines below :
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Paste this into notepad and use the courier new font.

You'll see that 5L and 20KG are starting at the same location (and always 4
characters wide). ST and ST starts at the same location.
The last field doesn't but is right justified in a field that begins at some
unknown location (this is the only problem I see, you'll have to find out
the *fixed* length of the third field so that you can start reading the 4 th
field from the correct position).

Patrice
--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:00**********************************@microsof t.com...

Hi again,

It is not a one time operation and what I really want to do is to split each line into 4 parts...

First bit is a ID field, second field a description, third is the unit and
is in the format of an known char array. The last column is the price of the product.
Unfortunately when opening this file which is opened as plain text and read line by line the 2nd and 3rd column somehow merges at some of the rows...
The merging only occurs where the description field is long enough and
probably at the position of the longest description in the file...The unit of the line is then appended to the line without any delimeter whereas the main problem.
The other columns in the file are separated by more than 1 white space
between them.

I am thinking of using some kind of regular expression and extracting the
information wanted line by line but not sure how to split the 2nd and 3rd
column.
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Also updated the StreamReader to use the default encoding to display the
swedish characters properly...
Help...

/Martin
> StreamReader sr = new StreamReader(file, System.Text.Encoding.Default);> string line = "";
> while ((line = sr.ReadLine()) != null)
> {
> if(line.Length > 17)
> {
> DataRow dr = m_Data.NewRow();
> dr["Col1"]= line.Substring(0, 17).Trim();
> dr["Col2"] = ?
> dr["Col3"] = ?
> dr["Col4"] = ?
> m_Data.Rows.Add(dr);
> }
> }
> sr.Close();

"Dave" wrote:
The original file include approx. 5000 lines

Is this a one-time operation? If you are using SqlServer you can write a simple DTS package to do the transformation or just use
the Import Data command in Enterprise Manager.

--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
"Patrice" <no****@nowhere.com> wrote in message

news:un*************@TK2MSFTNGP12.phx.gbl...
Not sure what you meant by "the position seem to be the same depending on the text in col2" ?

What if you try to display this file using a fixed width font such as
courier new ? Are all fields aligned ?

To me it looks like this is a fixed width file. Each column uses always the same range of characters on each line (but it may not be visible immediately when using a proportional font).

Patrice

--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:8E**********************************@microsof t.com...
> Hi all,
>
> I m trying to read in a text file into a datatable...
>
> Not sure on how to split up the information though, regex or
substrings...?
>
> sample:
> Col1 Col2 Col3> Col4
> A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
> A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000
>
> Notice how the 2nd row has merged col2 and col3. There are no delimeter at> all but the position seem to be the same depending on the text in col2. The
> original file
> include approx. 5000 lines that I want to update a sql table with. The above
> problem
> occurs at many positions in the text file.
>
> I have successfully read in data from the text file using this code:
>
> <code>
> StreamReader sr = File.OpenText(fileName);
> string line = "";
> while ((line = sr.ReadLine()) != null)
> {
> if(line.Length > 17)
> {
> DataRow dr = m_Data.NewRow();
> dr["Col1"]= line.Substring(0, 17).Trim();
> dr["Col2"] = ?
> dr["Col3"] = ?
> dr["Col4"] = ?
> m_Data.Rows.Add(dr);
> }
> }
> sr.Close();
>
> Any help appriciated!
>
> /Martin
>

Jul 21 '05 #5

jamait

So by getting the positions for the columns on the first row i can use these
in the following lines?

I think that the column width is set by the max length of the 2nd
column...so this would probably change if the description in any of the rows
is longer. The file is an export from a different software package that I am
not able to change.

Is there a way to replace 2 or more spaces with a separator or is there a
better way to split the columns?

Thanks

/Martin

"Patrice" wrote:

Sorry but I'm afraid I still don't catch the exact problem. The two lines
you showed us are using the same format.

There is NO separator. Each field uses a *fixed* width :

Copy the two lines below :
A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000

Paste this into notepad and use the courier new font.

You'll see that 5L and 20KG are starting at the same location (and always 4
characters wide). ST and ST starts at the same location.
The last field doesn't but is right justified in a field that begins at some
unknown location (this is the only problem I see, you'll have to find out
the *fixed* length of the third field so that you can start reading the 4 th
field from the correct position).

Patrice
--

"jamait" <ja****@discussions.microsoft.com> a Ã©crit dans le message de
news:00**********************************@microsof t.com...
Hi again,

It is not a one time operation and what I really want to do is to split

each
line into 4 parts...

First bit is a ID field, second field a description, third is the unit and
is in the format of an known char array. The last column is the price of

the
product.
Unfortunately when opening this file which is opened as plain text and

read
line by line the 2nd and 3rd column somehow merges at some of the rows...
The merging only occurs where the description field is long enough and
probably at the position of the longest description in the file...The unit

of
the line is then appended to the line without any delimeter whereas the

main
problem.
The other columns in the file are separated by more than 1 white space
between them.

I am thinking of using some kind of regular expression and extracting the
information wanted line by line but not sure how to split the 2nd and 3rd
column.
A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000

Also updated the StreamReader to use the default encoding to display the
swedish characters properly...
Help...

/Martin
>> StreamReader sr = new StreamReader(file, System.Text.Encoding.Default); >> string line = "";
>> while ((line = sr.ReadLine()) != null)
>> {
>> if(line.Length > 17)
>> {
>> DataRow dr = m_Data.NewRow();
>> dr["Col1"]= line.Substring(0, 17).Trim();
>> dr["Col2"] = ?
>> dr["Col3"] = ?
>> dr["Col4"] = ?
>> m_Data.Rows.Add(dr);
>> }
>> }
>> sr.Close();

"Dave" wrote:
> The original file include approx. 5000 lines

Is this a one-time operation? If you are using SqlServer you can write a simple DTS package to do the transformation or just use the Import Data command in Enterprise Manager.

--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
"Patrice" <no****@nowhere.com> wrote in message news:un*************@TK2MSFTNGP12.phx.gbl... > Not sure what you meant by "the position seem to be the same depending on > the text in col2" ?
>
> What if you try to display this file using a fixed width font such as
> courier new ? Are all fields aligned ?
>
> To me it looks like this is a fixed width file. Each column uses always the > same range of characters on each line (but it may not be visible immediately > when using a proportional font).
>
> Patrice
>
> --
>
> "jamait" <ja****@discussions.microsoft.com> a Ã©crit dans le message de
> news:8E**********************************@microsof t.com...
>> Hi all,
>>
>> I m trying to read in a text file into a datatable...
>>
>> Not sure on how to split up the information though, regex or
> substrings...?
>>
>> sample:
>> Col1 Col2 Col3 >> Col4
>> A0012430 REKAL TVÃ„TTMEDEL EKOMAX 0,5L ST 75.9000
>> A0012550 REKAL TVÃ„TTMEDEL BIOKULÃ–R 20KGST 1727.0000
>>
>> Notice how the 2nd row has merged col2 and col3. There are no delimeter at >> all but the position seem to be the same depending on the text in col2. > The
>> original file
>> include approx. 5000 lines that I want to update a sql table with. The > above
>> problem
>> occurs at many positions in the text file.
>>
>> I have successfully read in data from the text file using this code:
>>
>> <code>
>> StreamReader sr = File.OpenText(fileName);
>> string line = "";
>> while ((line = sr.ReadLine()) != null)
>> {
>> if(line.Length > 17)
>> {
>> DataRow dr = m_Data.NewRow();
>> dr["Col1"]= line.Substring(0, 17).Trim();
>> dr["Col2"] = ?
>> dr["Col3"] = ?
>> dr["Col4"] = ?
>> m_Data.Rows.Add(dr);
>> }
>> }
>> sr.Close();
>>
>> Any help appriciated!
>>
>> /Martin
>>
>
>

Jul 21 '05 #6

Chris Mayers

Hi,

As other people have pointed out, this is a FIXED WIDTH file. Somthing like
this should do the job:

dr["Col1"]= line.Substring(0, 17).Trim();
dr["Col2"] = line.Substring(17,30).Trim();
// May need to adjust the length of Col3 and the starting position of Col4?
Not enough data to be sure.
dr["Col3"] = line.Substring(47,2).Trim();
dr["Col4"] = line.Substring(49).Trim();

If you give me some more sample lines then I can be more specific.

Cheers,

Chris.

Jul 21 '05 #7

Patrice

IMO it will never change. For example if I've got a product who handles a
first field with 10 chars and another with 4 chars, the vendor could decide
to export its data as a text file using 10 chars for first field and 4 chars
for the second fields...

Whatever the data are it will never change and each line will have always 14
chars... There is no need to ever use more characters as the first field
can't have more than 10 and the other can't have more than 4...

IMO this is the way this file works..

This is know as "fixed size" fields files (as the size of each field never
change) opposed to "delimited" in which you have a delimiter between
fields...

Patrice
--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:4B**********************************@microsof t.com...

So by getting the positions for the columns on the first row i can use these in the following lines?

I think that the column width is set by the max length of the 2nd
column...so this would probably change if the description in any of the rows is longer. The file is an export from a different software package that I am not able to change.

Is there a way to replace 2 or more spaces with a separator or is there a
better way to split the columns?

Thanks

/Martin

"Patrice" wrote:
Sorry but I'm afraid I still don't catch the exact problem. The two lines
you showed us are using the same format.

There is NO separator. Each field uses a *fixed* width :

Copy the two lines below :
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Paste this into notepad and use the courier new font.

You'll see that 5L and 20KG are starting at the same location (and always 4 characters wide). ST and ST starts at the same location.
The last field doesn't but is right justified in a field that begins at some unknown location (this is the only problem I see, you'll have to find out the *fixed* length of the third field so that you can start reading the 4 th field from the correct position).

Patrice
--

"jamait" <ja****@discussions.microsoft.com> a écrit dans le message de
news:00**********************************@microsof t.com...
Hi again,

It is not a one time operation and what I really want to do is to split
each
line into 4 parts...

First bit is a ID field, second field a description, third is the unit
and is in the format of an known char array. The last column is the price of the
product.
Unfortunately when opening this file which is opened as plain text and

read
line by line the 2nd and 3rd column somehow merges at some of the
rows... The merging only occurs where the description field is long enough and
probably at the position of the longest description in the file...The unit of
the line is then appended to the line without any delimeter whereas
the main
problem.
The other columns in the file are separated by more than 1 white space
between them.

I am thinking of using some kind of regular expression and extracting
the information wanted line by line but not sure how to split the 2nd and 3rd column.
A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000

Also updated the StreamReader to use the default encoding to display the swedish characters properly...
Help...

/Martin

> >> StreamReader sr = new StreamReader(file,

System.Text.Encoding.Default);
> >> string line = "";
> >> while ((line = sr.ReadLine()) != null)
> >> {
> >> if(line.Length > 17)
> >> {
> >> DataRow dr = m_Data.NewRow();
> >> dr["Col1"]= line.Substring(0, 17).Trim();
> >> dr["Col2"] = ?
> >> dr["Col3"] = ?
> >> dr["Col4"] = ?
> >> m_Data.Rows.Add(dr);
> >> }
> >> }
> >> sr.Close();

"Dave" wrote:

> > The original file include approx. 5000 lines
>
> Is this a one-time operation? If you are using SqlServer you can write a simple DTS package to do the transformation or just use
> the Import Data command in Enterprise Manager.
>
> --
> Dave Sexton
> dave@www..jwaonline..com
----------------------------------------------------------------------- > "Patrice" <no****@nowhere.com> wrote in message

news:un*************@TK2MSFTNGP12.phx.gbl...
> > Not sure what you meant by "the position seem to be the same

depending on
> > the text in col2" ?
> >
> > What if you try to display this file using a fixed width font such
as > > courier new ? Are all fields aligned ?
> >
> > To me it looks like this is a fixed width file. Each column uses

always the
> > same range of characters on each line (but it may not be visible

immediately
> > when using a proportional font).
> >
> > Patrice
> >
> > --
> >
> > "jamait" <ja****@discussions.microsoft.com> a écrit dans le message de > > news:8E**********************************@microsof t.com...
> >> Hi all,
> >>
> >> I m trying to read in a text file into a datatable...
> >>
> >> Not sure on how to split up the information though, regex or
> > substrings...?
> >>
> >> sample:
> >> Col1 Col2

Col3
> >> Col4
> >> A0012430 REKAL TVÄTTMEDEL EKOMAX 0,5L ST 75.9000
> >> A0012550 REKAL TVÄTTMEDEL BIOKULÖR 20KGST 1727.0000
> >>
> >> Notice how the 2nd row has merged col2 and col3. There are no

delimeter at
> >> all but the position seem to be the same depending on the text in

col2.
> > The
> >> original file
> >> include approx. 5000 lines that I want to update a sql table with. The
> > above
> >> problem
> >> occurs at many positions in the text file.
> >>
> >> I have successfully read in data from the text file using this

code: > >>
> >> <code>
> >> StreamReader sr = File.OpenText(fileName);
> >> string line = "";
> >> while ((line = sr.ReadLine()) != null)
> >> {
> >> if(line.Length > 17)
> >> {
> >> DataRow dr = m_Data.NewRow();
> >> dr["Col1"]= line.Substring(0, 17).Trim();
> >> dr["Col2"] = ?
> >> dr["Col3"] = ?
> >> dr["Col4"] = ?
> >> m_Data.Rows.Add(dr);
> >> }
> >> }
> >> sr.Close();
> >>
> >> Any help appriciated!
> >>
> >> /Martin
> >>
> >
> >
>
>
>

Jul 21 '05 #8

Reading textfiles line by line

Similar topics