By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,206 Members | 1,020 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,206 IT Pros & Developers. It's quick & easy.

Reading unknown number of strings from a file using C#

P: n/a
Hello All

I have a txt file of strings of different lengths.
I dont know how many strings are in the file.

I have no problem reading the file and sending to the console (as
below).

To store the strings read, in a buffer, I had decided to use an array
of strings.

However I must know the array size in c# unlike in C++.

Ho do I read the strings (of unknown number) into a buffer?
I will be able to specify a reasonable maximum number of strings in
the file.

using (StreamReader sr = new StreamReader(path))
{
String line;
// Buffer index
//int i=0;

// Read and display lines from the file
until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
// Data_Buffer[0] = line;
//i++;
Console.WriteLine(line);
}
}
} // if
Many thanks for any help.
Regards

Denis
_____________________
http://www.CentronSolutions.com

Nov 14 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Well, you could use a List<stringinstead of an array, and just
..Add(line) each time

However, if possible you might try to stream them (i.e. read and
process individually / in small batches) instead of all-at-once to
save having them all loaded... but this is not always possible.

Marc
Nov 14 '07 #2

P: n/a

<dg*******@eircom.netwrote in message
news:11*********************@19g2000hsx.googlegrou ps.com...
Hello All

I have a txt file of strings of different lengths.
I dont know how many strings are in the file.

I have no problem reading the file and sending to the console (as
below).

To store the strings read, in a buffer, I had decided to use an array
of strings.
However I must know the array size in c# unlike in C++.
<Snip>

Could you elaborate on this statement, since at face value it is untrue.
Arrays a fixed size in in C++.

Bill

Nov 14 '07 #3

P: n/a
Hi Guys

Yes Bill sorry. I put that incorrectly.

In C++ I would go

string Buffer[MAX_FILE_SIZE]; // Where
MAX_FILE_SIZE would give me the maximum number of strings that could
be in the file
int i=0;

while ((line = Read_A_line_From_File()) !=
null)
{
Buffer[i] = line;
i++;
}

Maybe I can do similar in C# ?

Thanks

Denis

On Nov 14, 1:15 pm, "Bill Butler" <qwe...@asdf.comwrote:
<dglees...@eircom.netwrote in message

news:11*********************@19g2000hsx.googlegrou ps.com...
Hello All
I have a txt file of strings of different lengths.
I dont know how many strings are in the file.
I have no problem reading the file and sending to the console (as
below).
To store the strings read, in a buffer, I had decided to use an array
of strings.
However I must know the array size in c# unlike in C++.

<Snip>

Could you elaborate on this statement, since at face value it is untrue.
Arrays a fixed size in in C++.

Bill

Nov 14 '07 #4

P: n/a
You *could* do
string[] buffer = new string[MAX_FILE_SIZE]

But generally a List<stringwill be fine and will be much more
memory-efficient for the typical case.

Marc
Nov 14 '07 #5

P: n/a
Thanks Marc

Ill check out both techniques for my own understanding.

I see that

#define MAX_DATA_BUFFER_LENGTH 200

isnt acceptable C#.

Whats the alternative?

Denis

On Nov 14, 1:39 pm, "Marc Gravell" <marc.grav...@gmail.comwrote:
You *could* do
string[] buffer = new string[MAX_FILE_SIZE]

But generally a List<stringwill be fine and will be much more
memory-efficient for the typical case.

Marc

Nov 14 '07 #6

P: n/a
const int MAX_DATA_BUFFER_LENGTH = 200;

This must be inside a class, not just out somewhere in the ether.
Typically the class that uses it - otherwise you'll have to mark it as
"internal" or "public", and (from the other classes) refer to it as
SomeClass.MAX_DATA_BUFFER_LENGTH

Marc
Nov 14 '07 #7

P: n/a
i have a Black box type class..where it actually sizes the one time.....for
speed
1. creation max count.....
i read say 30 lines of the orig file 1st find the largest / widest row
in this small set
lines u read are just to get a basis of a likely largest /widest row.
a. along with this test for largest/widest line i look for what char is
End of line
CRLF, CR only, Embedded LF with CRLF these are the most common
CTRL_Z usually last byte in file..so i just Disregard that

2. iTmp= (devide/largest row / 2) then create the array (iTmp/filesize)
what you want to achieve is a Array larger then the amount of rows you
have in the file
then at the end of the array load.
3. Create you byte array size of widest row*2 and create it outsize of the
read loop, reuse it
while reading , i check bytes read against the current WidestRow and if
its larger, I then
create a New byte arrary size of new widest row * 2
resize the array down to actual rows used.....
using a Arraylist.add() if you have a large file..you mite as well take a
vacation...so this is not a option

I tested the above with a million plus rows...and its blazing Fast

In my Case i only load the row offsets in the array...1 million row offsets
takes about 6 seconds....
meg size files a blink of a eye, then i just use this array of row offsets
to move to up or down
and returning rows as Needed. this is fast also since im only adding or
subtracting or updationg a
row pointer (long iPosition)
I have a getRow () method that returns the row based on the iPosition

hope this helps
Dave


"Marc Gravell" <ma**********@gmail.comwrote in message
news:ub**************@TK2MSFTNGP06.phx.gbl...
const int MAX_DATA_BUFFER_LENGTH = 200;

This must be inside a class, not just out somewhere in the ether.
Typically the class that uses it - otherwise you'll have to mark it as
"internal" or "public", and (from the other classes) refer to it as
SomeClass.MAX_DATA_BUFFER_LENGTH

Marc

Nov 14 '07 #8

P: n/a
using a Arraylist.add() if you have a large file..you mite as well
take a vacation...so this is not a option
Sorry, but that just simply isn't so (metrics below using
List<string>, but since string isn't boxed they should be quite
comparable). List<Tuses doubling, so it resizes itslef very quickly.
By my metrics, the List<Tapproach is slower, yes, but only by a
small factor (~5). Given the amount of IO involved, this is nothing -
i.e. stroring 1M strings in a List<Ttakes 59ms on my lowly laptop.
Maybe you take short vacations... I think all the faff you are doing
(including rewinding the stream) will bring them quite close.

In the test, for *both* cases I am simply storing nulls. Since we are
talking reference types, this is perfectly fine and doesn't affect the
results. The memory *in the list and array* is the same either way.
Again, most of the weight it either approach would be due to memory /
IO requirements for the *actual* data, which does not depend on the
storage mechanism.

output:
{size} {ticks for list} vs {ticks for fixed array} {multiplier}
({ms for list})
5 37 vs 6: 6.16666666666667 (0)
10000 3118 vs 602: 5.17940199335548 (0)
100000 23339 vs 4039: 5.77841049764793 (6)
1000000 214532 vs 50344: 4.261322103925 (59)
10000000 2606905 vs 472739: 5.51446992949598 (728)

Code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program {
static void Main() {
Test(5); // to get JIT etc
Test(10000);
Test(100000);
Test(1000000);
Test(10000000);
}
static void Test(int size) {
Stopwatch watch = new Stopwatch();
watch.Start();
List<stringlist = new List<string>();
for (int i = 0; i < size; i++) {
list.Add(null);
}
watch.Stop();
long tickList = watch.ElapsedTicks;
long msList = watch.ElapsedMilliseconds;

watch = new Stopwatch();
watch.Start();
string[] array = new string[size];
for(int i = 0; i < size; i++) {
array[i] = null;
}
watch.Stop();
long tickArray = watch.ElapsedTicks;

Console.WriteLine("{0}\t{1} vs {2}:\t{3} ({4})", size,
tickList, tickArray,
(tickList * 1.0) / tickArray, msList);

}
}
Nov 14 '07 #9

P: n/a
On Nov 14, 3:29 pm, "Analizer1" <dvs_...@sbcglobal.netwrote:
i have a Black box type class..where it actually sizes the one time.....for
speed
<snip>

All of that sounds like a really bad idea compared with the simplicity
of just calling StreamReader.ReadLine() repeatedly and adding the
results into an ArrayList or a List<T>.

I doubt that you can find many examples where the performance
difference is significant, but the *complexity* difference is
significantly in favour of reusing the existing .NET classes.

Jon

Nov 14 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.