473,386 Members | 1,864 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Efficient fixed width string substition puzzle

Hi,

I'm looking for an efficient way to do this, because I know it will be heavily used :-)

I have a fixed width string and I need to substitute a substring of characters with new values. I
can do this with 2 substring calls, but it will need to rebuild the string just to write a few
characters.

Here is the simple, but inefficient, version:
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

// Replace a substring with one of equal length, based on offset and length:
Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
Console.WriteLine("Original: [" + s + "]");

s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);

Console.WriteLine("Result: [" + s + "]");

This will take the string "0123456789" and replace the characters starting at offset 3 with "abc".
The result is "012abc6789"

I am guaranteeing that the lengths are the same, so in C/C++ I could do something like this with a
memcpy, but that isn't a very friendly way :-)

TIA,

Jami

Nov 16 '05 #1
8 5402
Use the StringBuilder class; it's optimized for things like this.

Tom Dacon
Dacon Software Consulting

"Jami Bradley" <jb******@isa-og.com> wrote in message
news:ab********************************@4ax.com...
Hi,

I'm looking for an efficient way to do this, because I know it will be heavily used :-)
I have a fixed width string and I need to substitute a substring of characters with new values. I can do this with 2 substring calls, but it will need to rebuild the string just to write a few characters.

Here is the simple, but inefficient, version:
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

// Replace a substring with one of equal length, based on offset and length: Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345" Console.WriteLine("Original: [" + s + "]");

s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
Console.WriteLine("Result: [" + s + "]");

This will take the string "0123456789" and replace the characters starting at offset 3 with "abc". The result is "012abc6789"

I am guaranteeing that the lengths are the same, so in C/C++ I could do something like this with a memcpy, but that isn't a very friendly way :-)

TIA,

Jami

Nov 16 '05 #2
Jami,
but it will need to rebuild the string just to write a few characters.


Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Mattias

--
Mattias Sjögren [MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.
Nov 16 '05 #3
Mattias Sjögren <ma********************@mvps.org> wrote:
but it will need to rebuild the string just to write a few characters.


Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.


Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}

On Tue, 3 Aug 2004 09:31:33 +0100, Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
Mattias Sjögren <ma********************@mvps.org> wrote:
>but it will need to rebuild the string just to write a few characters.


Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.


Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.


Nov 16 '05 #5
And one more note :-)

I tried increasing the starting string to 300 bytes, so that it would be more like my problem, and
the timing results changed to the following:

TestString: 10000000 Iterations in 11.4631644524653 seconds
TestStringBuilder: 10000000 Iterations in 10.010672026752 seconds
TestCharArray: 10000000 Iterations in 2.07768892415097 seconds

Not surprisingly, the character array moves ahead. It is interesting to see the StringBuilder pass
the string - makes some sense.

Enjoy,

Jami

On Tue, 03 Aug 2004 10:16:04 -0600, Jami Bradley <jb******@isa-og.com> wrote:
Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}

On Tue, 3 Aug 2004 09:31:33 +0100, Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
Mattias Sjögren <ma********************@mvps.org> wrote:
>but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.


Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.


Nov 16 '05 #6
Jami Bradley <jb******@isa-og.com> wrote:
Thanks everyone for their tips. I decided to try all three options
and time them to see what it would take. I tried doing a simple 3
byte copy into the middle of the string - similar to what I would
expect during production. With 10M iterations, I had the following
results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for
those interested. Timing was done with the PerformanceCounter.


Your test isn't really fair:

1) You don't end up with a string at the end of the TestCharArray
method, which I thought was the point. Just adding a
string s = new string(r); at the end of the loop makes the
TestCharArray version the slowest on my box.

2) You're only allocating the char array (and copying the original
contents) once in TestCharArray - which is no good unless you know
ahead of time what size all the strings you need to work with will be,
*and* that the "surrounding" string doesn't change between iterations -
and if that's the case, the StringBuilder case can be improved as well,
I suspect. (If it's not the case, you need to call ToCharArray on each
iteration, or use String.CopyTo if the first condition is true but not
the second.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7
It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other type) and has get/set methods
to update pieces of the record by field name. The record is fixed length for each record type,
guaranteed.

I really don't care how the record is stored internally, whether it is a string, StringBuilders, or
character array. At the end of the manipulations, I want to grab a copy of the record as a string.
So the typical usage would be 1) create empty fixed length record, 2) set a bunch of fields (about
50 calls), 3) get the record as a string type.

I'm not sure how to improve StringBuilder, because even if I keep the record in a StringBuilder,
I'll need to copy it to make the Append calls.

Thanks,

Jami

On Tue, 3 Aug 2004 18:16:48 +0100, Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
Jami Bradley <jb******@isa-og.com> wrote:
Thanks everyone for their tips. I decided to try all three options
and time them to see what it would take. I tried doing a simple 3
byte copy into the middle of the string - similar to what I would
expect during production. With 10M iterations, I had the following
results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for
those interested. Timing was done with the PerformanceCounter.


Your test isn't really fair:

1) You don't end up with a string at the end of the TestCharArray
method, which I thought was the point. Just adding a
string s = new string(r); at the end of the loop makes the
TestCharArray version the slowest on my box.

2) You're only allocating the char array (and copying the original
contents) once in TestCharArray - which is no good unless you know
ahead of time what size all the strings you need to work with will be,
*and* that the "surrounding" string doesn't change between iterations -
and if that's the case, the StringBuilder case can be improved as well,
I suspect. (If it's not the case, you need to call ToCharArray on each
iteration, or use String.CopyTo if the first condition is true but not
the second.)


Nov 16 '05 #8
Jami Bradley <jb******@isa-og.com> wrote:
It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to
DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other
type) and has get/set methods to update pieces of the record by field
name. The record is fixed length for each record type, guaranteed.

I really don't care how the record is stored internally, whether it
is a string, StringBuilders, or character array. At the end of the
manipulations, I want to grab a copy of the record as a string. So
the typical usage would be 1) create empty fixed length record, 2)
set a bunch of fields (about 50 calls), 3) get the record as a string
type.

I'm not sure how to improve StringBuilder, because even if I keep the
record in a StringBuilder, I'll need to copy it to make the Append
calls.


Okay. It sounds like keeping it in a char array is indeed going to be
the fastest way of doing things. If you're going to be doing lots of
manipulations with a single record, it probably doesn't matter if you
create a new char array for each record - if it were a case of millions
of records with a couple of manipulations each, and efficiency were
*really* an issue, you could have kept just one char array and copied
to it at the start of each set of manipulations.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: John F Dutcher | last post by:
I use code like the following to retrieve fields from a form: recd = recd.append(string.ljust(form.getfirst("lname",' '),15)) recd.append(string.ljust(form.getfirst("fname",' '),15)) etc.,...
179
by: SoloCDM | last post by:
How do I keep my entire web page at a fixed width? ********************************************************************* Signed, SoloCDM
5
by: Johnny Meredith | last post by:
I have seven huge fixed width text file that I need to import to Access. They contain headers, subtotals, etc. that are not needed. There is also some corrupt data that we know about and can...
1
by: Mark Smith | last post by:
Hi Group, Are there any examples of class for storing fixed width number strings such as phone number and social security numbers. This class would do thing like valid that the number is all...
0
by: Andy Sy | last post by:
Hi Dan, I find that when doing bit-twiddling in pure Python, fixed-width integer support is an extremely handy capability to have in Python regardless of what the apologists (for its absence)...
4
by: BostonNole | last post by:
I am looking for suggestions on the most efficient way to import 7 different fixed width files into a DataSet. Not all at the same time. One file at a time, but the format could change from file...
6
by: =?Utf-8?B?TWljaGFlbA==?= | last post by:
Hi, I need to create a formatted byte array for SMPP, e.g.: 00 00 00 21 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 is the length of the entire...
10
by: BostonNole | last post by:
Using Visual Studio 2005, .NET 2.0 and VB.NET: I am looking for the fastest possible way to import a very large fixed width file (over 6 million records and over 1.2 GB file size) into a...
4
by: Jeff | last post by:
Hey I'm wondering how the Fixed-Width Text Format is What I know is that the top line in this text format will contain column names. and each row beneath the top line represent for example a...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.