473,403 Members | 2,354 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

How do I get string's functionality with the StringBuilder?

Hello,

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

Thanks.
May 8 '06 #1
14 7438
"Dan Aldean" <da*******@yahoo.com> wrote:
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but


What about StreamReader?

-- Barry
May 8 '06 #2
And how is your text file big and how often you manipulate with elemens?
Take into account that performance gap is significant if your iterating
25000 and more strings in circle.

Usually using "string" is an appropriate solution
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.


--
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
May 8 '06 #3
Hi,

What do you want to do?

the operations you mention : Split, Trim create new strings and may have
some impact in the performance,
EndsWith, IndexOf, etc does not change the string at all and have no impact
in the performance.

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation

"Dan Aldean" <da*******@yahoo.com> wrote in message
news:uq**************@TK2MSFTNGP05.phx.gbl...
Hello,

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content,
but with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

Thanks.

May 8 '06 #4
Thanks for the reply. Basically the file is big and stream is not a
solution, as I manipulate a lot.
"Michael Nemtsev" <Mi************@discussions.microsoft.com> wrote in
message news:2B**********************************@microsof t.com...
And how is your text file big and how often you manipulate with elemens?
Take into account that performance gap is significant if your iterating
25000 and more strings in circle.

Usually using "string" is an appropriate solution
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content,
but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(),
LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.


--
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do
not
cease to be insipid." (c) Friedrich Nietzsche

May 8 '06 #5
Thanks for the reply Barry.
I use StreamReader to read but I need to process the content. For example I
need to find trailing spaces before a '>' character and remove them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it.
Split() and then Trim() would have helped, but I cannot afford to use
strings as the file can be big and I have to process every line I read.

"Barry Kelly" <ba***********@gmail.com> wrote in message
news:0n********************************@4ax.com...
"Dan Aldean" <da*******@yahoo.com> wrote:
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content,
but


What about StreamReader?

-- Barry

May 8 '06 #6
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

"Ignacio Machin ( .NET/ C# MVP )" <ignacio.machin AT dot.state.fl.us> wrote
in message news:%2****************@TK2MSFTNGP05.phx.gbl...
Hi,

What do you want to do?

the operations you mention : Split, Trim create new strings and may have
some impact in the performance,
EndsWith, IndexOf, etc does not change the string at all and have no
impact in the performance.

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation

"Dan Aldean" <da*******@yahoo.com> wrote in message
news:uq**************@TK2MSFTNGP05.phx.gbl...
Hello,

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content,
but with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(),
LastIndex(), and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

Thanks.


May 8 '06 #7
"Dan Aldean" <da*******@yahoo.com> wrote:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.


If you have seriously long lines, I recommend that you use the
techniques of lexical analysis. Basically:

* Read your strings as System.String from StreamReader.ReadLine().
* Tokenize the strings using manual integer indexing and classify them
according to how you want to modify them.
* Write a loop which sucks in from your tokenizer and builds up a
resulting StringBuilder according to your modification rules.

This change would at least make the algorithm linear with respect to
input line length.

If the lines are very long (i.e. something that's going to really fall
out of the CPU cache), you might consider working with some kind of
pooled char arrays, using array operations to copy ranges, and thus
reduce memory management overhead. That will really help if your strings
are bigger than 80,000 bytes (i.e. 40,000 chars), since in that case
they'll fall into the large object heap and don't get collected until
generation 2 GCs.

To get the benefit from char arrays would mean using
TextReader.ReadBlock() instead of ReadLine(), and breaking into lines in
the tokenizer yourself.

-- Barry
May 10 '06 #8
Thanks Barry, I think this will help me a great deal.
"Barry Kelly" <ba***********@gmail.com> wrote in message
news:pa********************************@4ax.com...
"Dan Aldean" <da*******@yahoo.com> wrote:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and
remove
them.
Also within the string I should look for a character splitter ':' and
remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.


If you have seriously long lines, I recommend that you use the
techniques of lexical analysis. Basically:

* Read your strings as System.String from StreamReader.ReadLine().
* Tokenize the strings using manual integer indexing and classify them
according to how you want to modify them.
* Write a loop which sucks in from your tokenizer and builds up a
resulting StringBuilder according to your modification rules.

This change would at least make the algorithm linear with respect to
input line length.

If the lines are very long (i.e. something that's going to really fall
out of the CPU cache), you might consider working with some kind of
pooled char arrays, using array operations to copy ranges, and thus
reduce memory management overhead. That will really help if your strings
are bigger than 80,000 bytes (i.e. 40,000 chars), since in that case
they'll fall into the large object heap and don't get collected until
generation 2 GCs.

To get the benefit from char arrays would mean using
TextReader.ReadBlock() instead of ReadLine(), and breaking into lines in
the tokenizer yourself.

-- Barry

May 10 '06 #9
Dan Aldean <da*******@yahoo.com> wrote:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.


Are you sure you're not misinterpreting advice for a different
situation? It's not advisable to read a file by doing:

string result = "";

using (StreamReader reader = ...)
{
string line;
while ((line=reader.ReadLine()) != null)
{
result += line;
}
}

but that's because the strings involved become large, so copying them
for each iteration becomes a problem.

It's not nearly so bad to keep a StringBuilder to collect any content
(if indeed you need to) and use normal string operations on any one
particular line.

Have you tried the simplest solution (using strings) and found it too
slow? Have you profiled it?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 10 '06 #10
Thanks Jon.
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line[i]) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message

Are you sure you're not misinterpreting advice for a different
situation? It's not advisable to read a file by doing:

string result = "";

using (StreamReader reader = ...)
{
string line;
while ((line=reader.ReadLine()) != null)
{
result += line;
}
}

but that's because the strings involved become large, so copying them
for each iteration becomes a problem.

It's not nearly so bad to keep a StringBuilder to collect any content
(if indeed you need to) and use normal string operations on any one
particular line.

Have you tried the simplest solution (using strings) and found it too
slow? Have you profiled it?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 10 '06 #11
Dan Aldean <da*******@yahoo.com> wrote:
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line[i]) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.


Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 10 '06 #12
I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Dan Aldean <da*******@yahoo.com> wrote:
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line[i]) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it
would
be, the immutability was what scared me.


Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 11 '06 #13
Dan Aldean <da*******@yahoo.com> wrote:
I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that


How often are there continuations? I would suggest keeping a "current
line", and when you read a line, if it's a continuation of the current
line, add it and keep going. If it's not a continuation, process the
"current line", then set the current line to the one you've just read.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 11 '06 #14
There are quite often continuations, but I don't know what the next read
line is until I find tokens, which might be anywhere in the string. I might
use a second stringbuilder object for the next line until I determine what
type it is.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Dan Aldean <da*******@yahoo.com> wrote:
I can process one line at a time. I also need to determine if the next
line
continues the current one.
Probably I need Peek for that


How often are there continuations? I would suggest keeping a "current
line", and when you read a line, if it's a continuation of the current
line, add it and keep going. If it's not a continuation, process the
"current line", then set the current line to the one you've just read.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 11 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

37
by: Kevin C | last post by:
Quick Question: StringBuilder is obviously more efficient dealing with string concatenations than the old '+=' method... however, in dealing with relatively large string concatenations (ie,...
5
by: Dave | last post by:
I'm receiving info from a com port into a string. I gradually process the string which constantly shortens it. The question is how long can a string be before I need to write some info to disk...
32
by: Tubs | last post by:
Am i missing something or does the .Net Framework have a quirk in the way methods work on an object. In C++ MFC, if i have a CString and i use the format method, i format the string i am using. ...
9
by: Peter Row | last post by:
Hi, I know this has been asked before, but reading the threads it is still not entirely clear. Deciding which .Replace( ) to use when. Typically if I create a string in a loop I always use a...
12
by: Richard Lewis Haggard | last post by:
I thought that the whole point of StringBuilder was that it was supposed to be a faster way of building strings than string. However, I just put together a simple little application to do a...
10
by: Mo | last post by:
Hi, I am trying to write a code to build a string 768 characters long. This string is going to be written to a file which is then read by another application. The format of the string is already...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
34
by: raylopez99 | last post by:
StringBuilder better and faster than string for adding many strings. Look at the below. It's amazing how much faster StringBuilder is than string. The last loop below is telling: for adding...
3
by: =?Utf-8?B?TmVvbWl0ZQ==?= | last post by:
Hi, I'm having some difficulty converting a String to a std::basic_string (c++). I've tried using StringBuilder and Marshal.LPxxxx , no luck. I consistently get AccessViolation errors. I've...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.