Optimize an IEnumerable

timor.super

Hi group,

imagine I want to find number of values of a word in a string ...

Look at my code :

List<stringthin gsToFind = new List<string>(ne w string[]
{"error", "values"});
MyClass myClass = new MyClass(thingsT oFind, data);
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // do something
}

.....

class MyClass
{
private readonly List<string_lis t;
private readonly string _data;

public MyClass(List<st ringlistOfThing ToSeach, string
data)
{
_list = listOfThingToSe ach;
_data = data;
}

public IEnumerable<Key ValuePair<int, string>GetValue s()
{
foreach (string someThingToFind in _list)
{
string savData = _data;
int cpt = 0;
int pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
while (savData.Length 0 && pos >= 0)
{
cpt++;
savData = savData.Substri ng(pos +
someThingToFind .Length);
pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
}
yield return new KeyValuePair<in t, string>(cpt,
someThingToFind );
}
}
}
the GetValues function use an yield return to construct an IEnumerable

If I do :

List<stringthin gsToFind = new List<string>(ne w string[]
{"error", "values"});
MyClass myClass = new MyClass(thingsT oFind, data);
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // do something
}
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // 2 TIMES
}

then, I will parse the string data 2 times, and I need only one
time ...

so I would like to optimize my GetValues function by constructing a
List<(for example, but maybe something else) during the parsing and
if the function is called a 2nd times, I return the list instead of
doing the parsing again ...
Something like this

public IEnumerable<Key ValuePair<int, string>GetValue s()
{
if my list is nothing
{
instanciate my list
do the parsing and create a new keyvaluepair
add the keyvaluepair to my list
yield return the keyvaluepair
}
else
{
return my list
}
}

but ... of course, I only can yield return, not return. Is there a way
to do what I want ?
Thanks for your answer

S.

Sep 21 '07 #1

Subscribe Reply

2527

Nicholas Paldino [.NET/C# MVP]

S.

Why not do the parsing in the constructor and then save the list? Then,
your enumerator can just issue a yield return for each element in the parsed
list.

If you want the lazy-load semantics, then your method could just check
to see if a member is set, if it is, then just use it, otherwise, construct
the list of return values and assign it to the member variable. Then, just
iterate through the list and yield return each item.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard. caspershouse.co m

<ti*********@gm ail.comwrote in message
news:11******** **************@ n39g2000hsh.goo glegroups.com.. .

Hi group,

imagine I want to find number of values of a word in a string ...

Look at my code :

List<stringthin gsToFind = new List<string>(ne w string[]
{"error", "values"});
MyClass myClass = new MyClass(thingsT oFind, data);
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // do something
}

....

class MyClass
{
private readonly List<string_lis t;
private readonly string _data;

public MyClass(List<st ringlistOfThing ToSeach, string
data)
{
_list = listOfThingToSe ach;
_data = data;
}

public IEnumerable<Key ValuePair<int, string>GetValue s()
{
foreach (string someThingToFind in _list)
{
string savData = _data;
int cpt = 0;
int pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
while (savData.Length 0 && pos >= 0)
{
cpt++;
savData = savData.Substri ng(pos +
someThingToFind .Length);
pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
}
yield return new KeyValuePair<in t, string>(cpt,
someThingToFind );
}
}
}
the GetValues function use an yield return to construct an IEnumerable

If I do :

List<stringthin gsToFind = new List<string>(ne w string[]
{"error", "values"});
MyClass myClass = new MyClass(thingsT oFind, data);
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // do something
}
foreach (KeyValuePair<i nt, stringaValue in
myClass.GetValu es())
{ // 2 TIMES
}

then, I will parse the string data 2 times, and I need only one
time ...

so I would like to optimize my GetValues function by constructing a
List<(for example, but maybe something else) during the parsing and
if the function is called a 2nd times, I return the list instead of
doing the parsing again ...
Something like this

public IEnumerable<Key ValuePair<int, string>GetValue s()
{
if my list is nothing
{
instanciate my list
do the parsing and create a new keyvaluepair
add the keyvaluepair to my list
yield return the keyvaluepair
}
else
{
return my list
}
}

but ... of course, I only can yield return, not return. Is there a way
to do what I want ?
Thanks for your answer

S.

Sep 21 '07 #2

Christof Nordiek

<ti*********@gm ail.comschrieb im Newsbeitrag
news:11******** **************@ n39g2000hsh.goo glegroups.com.. .

Hi group,

imagine I want to find number of values of a word in a string ...

<snip>

public IEnumerable<Key ValuePair<int, string>GetValue s()
{
if my list is nothing
{
instanciate my list
do the parsing and create a new keyvaluepair
add the keyvaluepair to my list
yield return the keyvaluepair
}
else
{
return my list
}
}

but ... of course, I only can yield return, not return. Is there a way
to do what I want ?

You can do it like:

public IEnumerable<Key ValuePair<int, string>GetValue ()
{
if my list is nothing
initialice and fill list:

foreach(.....)
yield return item;

}
or

public IEnumerable<Key ValuePair<int, string>GetValue ()
{
if my list is nothing
{
instanciate list
foreach()
{
get item
add item to list
yield return item
}
else
foreach(.....)
yield return item;
}

This doubles the yield return statment (in code), but in the first run it
will start faster.

Christof

Sep 21 '07 #3

Nicholas Paldino [.NET/C# MVP]

It will only start slightly faster, as the only overhead you are
removing with returning the item from the list as you generate it is the
overhead from iterating through the loop a second time, and the lookup of
the item from the list, which probably in most cases, is going to be
negligible.

It might even be more costly, as I think the enumerator that is
generated by the compiler is going to be more complex in this case, but
tests would have to be run to be sure.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard. caspershouse.co m

"Christof Nordiek" <cn@nospam.dewr ote in message
news:OB******** ******@TK2MSFTN GP06.phx.gbl...

<ti*********@gm ail.comschrieb im Newsbeitrag
news:11******** **************@ n39g2000hsh.goo glegroups.com.. .
>Hi group,

imagine I want to find number of values of a word in a string ...

<snip>
>public IEnumerable<Key ValuePair<int, string>GetValue s()
{
if my list is nothing
{
instanciate my list
do the parsing and create a new keyvaluepair
add the keyvaluepair to my list
yield return the keyvaluepair
}
else
{
return my list
}
}

but ... of course, I only can yield return, not return. Is there a way
to do what I want ?

You can do it like:

public IEnumerable<Key ValuePair<int, string>GetValue ()
{
if my list is nothing
initialice and fill list:

foreach(.....)
yield return item;

}
or

public IEnumerable<Key ValuePair<int, string>GetValue ()
{
if my list is nothing
{
instanciate list
foreach()
{
get item
add item to list
yield return item
}
else
foreach(.....)
yield return item;
}

This doubles the yield return statment (in code), but in the first run it
will start faster.

Christof

Sep 21 '07 #4

Peter Duniho

Nicholas Paldino [.NET/C# MVP] wrote:

It will only start slightly faster, as the only overhead you are
removing with returning the item from the list as you generate it is the
overhead from iterating through the loop a second time, and the lookup of
the item from the list, which probably in most cases, is going to be
negligible.

It really depends on what you mean by "faster".

I agree that the total cost of the complete enumeration may be basically
the same either way.

However, the initial construction of the parsed data could be very
expensive. By caching it as it's being enumerated the first time, this
cost is spread over the entire enumeration. If the client code is doing
something relatively expensive with each iteration (there's no way to
know from the sample code here whether that's the case), then the client
code may find the parsing enumerator more responsive implemented that
way. Otherwise, while the subsequent calls to the enumerator might be
extremely quick, that first one could be VERY slow.

Why would the code care? Any place where responsiveness is an issue.
For example, if the iteration is somehow tied to the user-interface,
making the user sit and wait for that first one to complete without any
feedback isn't all that great of an idea. Or if the code is doing some
sort of i/o as part of the iteration of the enumerator, at the very
least intermingling the enumeration with the i/o allows for efficient
use of the CPU and i/o resources. And in some situations, there may be
a responsiveness requirement for the i/o itself (for example, network
i/o where the other end will drop the connection after some timeout).

IMHO, all of the proposals from you and Christof's have value, including
the "build the list as you go" one from Christof. Which is most
appropriate depends on the specific situation.

Finally, one warning about the "build the list as you go". One design
disadvantage I see with this technique is that there's no guarantee that
the client will do the entire enumeration. So it would be important not
just to check for existence of the cached list, but to verify that it's
complete. This would probably require inclusion of some sort of "where
did the enumeration leave off last" state in the enumerator that can be
referenced the next time the enumeration is done.

Pete

Sep 21 '07 #5

Peter Duniho

You've gotten a couple of really useful replies, IMHO, regarding
optimizing specifically via a cache, as you've requested. Though,
personally it seems to me that unless you have a very specific need to
reuse the enumeration in completely different places in the code, it
might be better to just have the _client_ of the enumerator cache the
results as needed.

But even so, I think the basic question has been answered pretty well.
So here are a couple of comments that digress a bit, but which IMHO
should still be useful...

First, the specific code you posted:

[...]
public IEnumerable<Key ValuePair<int, string>GetValue s()
{
foreach (string someThingToFind in _list)
{
string savData = _data;
int cpt = 0;
int pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
while (savData.Length 0 && pos >= 0)
{
cpt++;
savData = savData.Substri ng(pos +
someThingToFind .Length);
pos = savData.IndexOf (someThingToFin d,
StringCompariso n.InvariantCult ureIgnoreCase);
}
yield return new KeyValuePair<in t, string>(cpt,
someThingToFind );
}
}

I may be missing something, but the above code appears to me to only
return an iteration instance once for each word in your search list.
This may be your intended semantics, but if so it's my opinion that
returning a KeyValuePair isn't appropriate.

Actually, I don't think returning a KeyValuePair is appropriate in any
case. The reason why is that semantically, a KeyValuePair implies a key
that is unique and a value that is associated with that unique key. In
this case, your "key" isn't unique at all as near as I can tell; it's
simply the number of times a specific word showed up in your string.

I understand the temptation to use the KeyValuePair since it _seems_ to
encapsulate the very information you're returning (a pair of related
values). But IMHO it would be much better for you to just define your
own generic struct that does basically the same thing.

Alternatively, at the very least it's my opinion that the key in the
KeyValuePair should be the string and the count be the value. It seems
much more likely to me that your string is the unique element, rather
than the count.

Why would you care? Well, IMHO the biggest risk is that a KeyValuePair
could be used to construct a Dictionary or similar object directly, and
this would fail as soon as the key -- that is, the integer count --
showed up more than once.

I think there are other semantic reasons to fix this part of the code,
but the fact that there's a genuine potential risk for future uses of
the code is probably the only reason one would need to justify fixing it.
My second comment has to do with the "optimizati on" aspect itself.
IMHO, caching the results is a "sort of" optimization. That is, yes it
has the potential for improving performance, but it doesn't really
change the efficiency of the algorithm.

I note that this algorithm searches the string once for each word in
your list of words. For a very small number of words, this might not be
too bad, but as the number of words goes up, it can dramatically inflate
the cost of the algorithm.

The idea of caching the results can mitigate that problem somewhat, but
given that the algorithm is basically O(N^2), if you can change the
algorithm to something more efficient, you could wind up getting a HUGE
performance increase even without caching.

It depends somewhat on the number of words, the length of the string,
and the number of times you'd enumerate the results. But assuming that
the number of words is more than the number of times you'd enumerate the
results, an O(N) enumeration executed multiple times will still be
faster than an O(N^2) enumeration executed once and cached for future use.

The exact break-even point would depend a little on the specific
performance and how many times you need to do the enumeration for a
given list of words. But it seems to me that in any situation where you
are actually concerned about performance, that means the number of
words, length of string, and/or number of enumerations is relatively
high, and these are the situations where the disparity between an O(N)
algorithm and an O(N^2) algorithm can be very significant.

Of course, if you cache the results _and_ fix the algorithm to be better
than O(N^2), you stand to gain even more.

So, what's an O(N) algorithm to accomplish the same thing? A common one
used for string searches such as yours would be a state machine
implementation, where the nodes of the state graph are connected
according to the letters in the search words.

You traverse the graph by extracting a single character at a time from
the string being searched and using that character to determine which
node to move to; if the character has no matching link to a subsequent
node, you return to the root of the graph. The graph will have
"terminal" nodes representing the end of a search word; if and when you
reach one of those, you know you've found a word.

The exact implementation depends on your needs. The basic
implementation suffices in many cases, but if it is possible for your
search words to overlap, that can be addressed as well.

Using a state graph requires a small amount of initialization to take
place, and for relatively small tasks this initialization could cost
more than the overhead of a less-efficient algorithm. But in those
cases, usually you don't care about performance anyway. For larger
tasks where performance is actually a concern, the initialization is
relatively minor compared to the benefits of being able to go through a
string only once while still recovering every single word you're looking
for.

So, that's a long way of saying, if you are really serious about
optimizing the code, what you need is a different algorithm. You can
mitigate the problems of the O(N^2) algorithm you've got somewhat, but
in the end you've still got an O(N^2) algorithm. It will never scale
well, no matter what you do to it.

For what it's worth, there are other forms of search algorithms that are
similarly efficient. I just mention the state graph because IMHO it's
relatively easy to understand _and_ implement. You might want to look
at other search algorithms; pretty much anything designed specifically
for matching search terms to an input string will be better than the
algorithm you've got now, and it's entirely possible you could find
something that would suit your specific needs better than the
general-purpose state graph solution.

Pete

Sep 21 '07 #6

Similar topics

2756

problem using IEnumerable<T>

by: jcc | last post by:

Hi guys, I'm a newbie to C#. My Visual Studio 2005 failed to compile the following code with error as 'HelloWorld.A' does not implement interface member 'System.Collections.IEnumerable.GetEnumerator()'. 'HelloWorld.A.GetEnumerator()' is either static, not public, or has the wrong return type. class A : IEnumerable<string>

C# / C Sharp

5110

Implementing generic IEnumerable

by: KH | last post by:

I can't seem to figure this one out... I've searched MSDN and Goog, and made my best guesses to no avail,, so help would be much appreciated! public ref class T sealed : public System::Collections::IEnumerable , public System::Collections::Generic::IEnumerable<int> { // How to implement GetEnumerator() for both interfaces? // Compiler complains that functions differ only in return type // which I understand, but can't get the right...

.NET Framework

4208

IEnumerable is strange

by: Tin Gherdanarra | last post by:

Dear mpdls, here is a simple example of an IEnumerable that generates integers: It works, but I have only a vague idea of what's going on. I understand that /yield/ wraps the humble integer that comes from counter++

C# / C Sharp

2986

Returning data contained in Ienumerable<type> as type[]?

by: =?Utf-8?B?RnJhbmNvaXNWaWxqb2Vu?= | last post by:

Hi there, Does anybody know how to return the results of an IEnumerable<typeas an array of the same type i.e type in a web service call without first having to collect all elements in the IEnumerable<typeand storing it in memory first? This question is really about optimizing large collections of data returned by web services. If it is possible to loop through the IEnumerable<type> with a "foreach" and somehow start formatting the...

.NET Framework

4240

syntax wonderings, iterator and IEnumerable

by: =?Utf-8?B?a2VubmV0aEBub3NwYW0ubm9zcGFt?= | last post by:

When creating multiple iterators, the original is defined as returning IEnumerator, ie public IEnumerator GetEnumerator() { yield x; ...} whereas the additional ones are defined as returning IEnumerable, ie public IEnumerable AnotherSortOrder() { yield x;....} Any insights out there as to why the additional iteration methods did not just return IEnumerator? (its just a little confusing, hoping for some better insight)

C# / C Sharp

4587

Why IEnumerable<T> inherits from IEnumerable?

by: Morgan Cheng | last post by:

In .Net 2.0, Generics are introduced. I found that IEnumerable<T> inherites from IEnumerable. This makes implementation of IEnumerable<Thas to have two GetEnumerator methods defined( one for IEnumerable<Tand the other for IEnumerable). I doubt why .Net class hierarchy is designed in such a way. IMHO, they should not have inheritance releationship, just like IList<Tand IList. I googled the web and found two related articles....

C# / C Sharp

2414

IEnumerable objects are essentially arrays, right?

by: gnewsgroup | last post by:

I've read the msdn doc about IEnumerable. It seems to me that IEnumerable objects are essentially wrapped-up arrays. It simply gives us the foreach convenience. Is this correct?

C# / C Sharp

2502

What is the disadvantage if I don't implement IEnumerable<T>

by: Tony Johansson | last post by:

Hello! Below I have a working program. I have one generic class called Farm<T> with this header definition public class Farm<T: IEnumerable<Twhere T : Animal Now to my question I changed the inheritance of the IEnumerable from the generic IEnumerable<T> to the generel IEnumerable and the program function just the same so no

C# / C Sharp

2019

IEnumerable<T> interface

by: =?Utf-8?B?UElFQkFMRA==?= | last post by:

Something I found surprising this week involves the IEnumerable<Tinterface. I have a class that I wrote a couple of years ago which implements the IEnumerable interface. This week I realized it should implement IEnumerable<T>. But when I changed the return type of the GetEnumerator method the compiler said the class no longer implemented IEnumerable. IEnumerable<Thas IEnumerable as a base interface, so I had assumed that implementing...

C# / C Sharp

9718

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9596

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10617

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10370

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10109

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7649

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5545

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5678

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4328

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp