473,396 Members | 1,891 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

slow linq query

Eps
Hi there,

I am doing the following, this is a List of audio files.

this.Where(p =p.Album == AnAudioFileObject.Album).Select(s =>
s.Artist).Distinct().Count() 1;

The aim is to determine whether AnAudioFileObject is from an album that
has various artists on it or just one artist.

If I load several thousand audio files into the list it becomes very
slow, can anyone think of a way I could speed this up ?.

Any help appreciated.

--
Eps
Sep 9 '08 #1
12 3035
How about

if (this.Any(p =p.Album == AnAudioFileObject.Album)) { .. }

? :-)

--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)
Sep 9 '08 #2
Eps <ep*@mailinator.comwrote:
I am doing the following, this is a List of audio files.

this.Where(p =p.Album == AnAudioFileObject.Album).Select(s =>
s.Artist).Distinct().Count() 1;

The aim is to determine whether AnAudioFileObject is from an album that
has various artists on it or just one artist.

If I load several thousand audio files into the list it becomes very
slow, can anyone think of a way I could speed this up ?.

Any help appreciated.
Could you give definite figures for "several thousand" and "very slow"?
All of those should be linear operations as far as I'm aware, so it's
possible something else is going on.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 9 '08 #3
Eps
Anders Borum wrote:
How about

if (this.Any(p =p.Album == AnAudioFileObject.Album)) { .. }

? :-)
hmmm, do half of it with it linq and the other half programmatically ?.

I thought that linq and programmatic iteration were roughly about as
fast as each other, whats the advantage of mixing the two ?.

Just after I posted I realized I could do this...

this.Where(p =p.Album == AnAudioFileObject.Album).Take(2).Select(s =>
s.Artist).Distinct().Count() 1;

The Take(2) does seem to have a significant impact on the speed, its not
ideal since there are albums with various artists that do feature the
same artist twice (or more times) but for my purposes I think it will be ok.

--
Eps
Sep 9 '08 #4
Eps
Jon Skeet [C# MVP] wrote:
Could you give definite figures for "several thousand" and "very slow"?
All of those should be linear operations as far as I'm aware, so it's
possible something else is going on.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
ok, lets say 6000 audio files, each is an object that will read in data
and metadata directly from the file on the disk, so its IO bound to a
certain extent. In terms of time I can't be exact but the time taken to
process a file jumped from about 30 - 40 seconds to over a couple of
minutes.

I think I kind of have a solution now so I won't bother trying to
replicate the problem in a demonstration program.

Thanks for your help though.

--
Eps
Sep 9 '08 #5
Eps <ep*@mailinator.comwrote:
I am doing the following, this is a List of audio files.

this.Where(p =p.Album == AnAudioFileObject.Album).Select(s =>
s.Artist).Distinct().Count() 1;

The aim is to determine whether AnAudioFileObject is from an album that
has various artists on it or just one artist.

If I load several thousand audio files into the list it becomes very
slow, can anyone think of a way I could speed this up ?.
I've just thought of something which could theoretically help. Instead
of calling Count(), call Take(1).Any(). That way as soon as Distinct()
yields its second element, you can finish.

That way you can deal with *huge* sequences, or even infinite ones (so
long as they aren't an infinite sequence repeating a single element).
For example:

using System;
using System.Linq;

public static class Test
{
static void Main()
{
var allPositiveInts = Enumerable.Range(0, int.MaxValue);
bool quick = allPositiveInts.Distinct().Take(1).Any();
Console.WriteLine("quick = " + quick);
bool slow = allPositiveInts.Distinct().Count() 1;
Console.WriteLine ("slow = " + slow);
}
}

The result is:

quick = True

Unhandled Exception: OutOfMemoryException.

Having said all of this, I strongly suspect that you won't gain much
from this. Try printing out

this.Where(p =p.Album == AnAudioFileObject.Album)
.Select(s =s.Artist)
.Count()

just to see how many artist entries we're talking about as the input to
Distinct() to start with.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 9 '08 #6
Eps <ep*@mailinator.comwrote:
ok, lets say 6000 audio files, each is an object that will read in data
and metadata directly from the file on the disk, so its IO bound to a
certain extent. In terms of time I can't be exact but the time taken to
process a file jumped from about 30 - 40 seconds to over a couple of
minutes.

I think I kind of have a solution now so I won't bother trying to
replicate the problem in a demonstration program.
I strongly suspect the problem wasn't in the code you showed then -
because that part should be very fast. 6000 entries is nothing. I
suspect if you load them all into memory to start with, the query will
execute pretty much instantly.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 9 '08 #7
Eps <ep*@mailinator.comwrote:
Just after I posted I realized I could do this...

this.Where(p =p.Album == AnAudioFileObject.Album).Take(2).Select(s =>
s.Artist).Distinct().Count() 1;

The Take(2) does seem to have a significant impact on the speed, its not
ideal since there are albums with various artists that do feature the
same artist twice (or more times) but for my purposes I think it will be ok.
See my other post for an alternative which will be correct but still
fast.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 9 '08 #8
Eps
Jon Skeet [C# MVP] wrote:
Having said all of this, I strongly suspect that you won't gain much
from this. Try printing out

this.Where(p =p.Album == AnAudioFileObject.Album)
.Select(s =s.Artist)
.Count()

just to see how many artist entries we're talking about as the input to
Distinct() to start with.
Hmmm, I haven't had a chance to test this yet but I think I know whats
going on.

Where the album is not set (String.IsNullOrEmpty) I put in "Unknown
Album". Obviously the query considers all these audio files (and there
will be a significant number of them) as a part of the same album.

I could combine your code with mine, do something like....

this.Where(p =p.Album == AnAudioFileObject.Album)
.Select(s =s.Artist)
.Distinct().Take(10).Any()

This still isn't foolproof, there could be an album with 10 tracks by
one artist and 1 track (or more) by another. But this should be good
enough, I will test it and post the results.

--
Eps
Sep 9 '08 #9
I'm sorry, I misread your original query - sorry about that (I promise I'll
head straight to bed) ;-)

--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)

Sep 9 '08 #10
Eps <ep*@mailinator.comwrote:
Hmmm, I haven't had a chance to test this yet but I think I know whats
going on.

Where the album is not set (String.IsNullOrEmpty) I put in "Unknown
Album". Obviously the query considers all these audio files (and there
will be a significant number of them) as a part of the same album.
Even so it shouldn't be a problem.

Here's a complete program which generates 100,000 tracks and one album
with 1,000 tracks including 5 artists. It counts those 5 distinct
artists in 67ms on my box. That's pretty clear evidence to me that
something is going on beyond what you're aware of. It would be worth
getting to the bottom of that, as it may bite you later on - right now
you've got (I would imagine) a reasonably straightforward situation to
diagnose. The next symptom of the same problem could be much harder to
track down.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

public class Track
{
public string Artist { get; set; }
public Album Album { get; set; }

public static Track GenerateRandom()
{
return new Track
{
Album = new Album(),
Artist = GenerateRandomString()
};
}

static Random rng = new Random();
static string GenerateRandomString()
{
StringBuilder builder = new StringBuilder(15);
for (int i=0; i < 15; i++)
{
builder.Append((char)(rng.Next(26)+'A'));
}
return builder.ToString();
}
}

public class Album
{
}

public static class Test
{
static void Main()
{
// First generate lots of random tracks
List<Tracktracks = new List<Track>();
for (int i=0; i < 100000; i++)
{
tracks.Add(Track.GenerateRandom());
}

string[] artists = { "Mike Rutherfield", "Phil Collins",
"Tony Banks", "Peter Gabriel", "Steve Hackett" };

// Now make 1,000 (or thereabouts) of them belong to one album.
// Give each of them one of our 5 artists
Album picked = new Album();
Random rng = new Random();
for (int i = 0; i < 1000; i++)
{
Track track = tracks[rng.Next(tracks.Count)];
track.Album = picked;
track.Artist = artists[rng.Next(artists.Length)];
}

Console.WriteLine("Finding the distinct count...");
Stopwatch sw = Stopwatch.StartNew();
int count = tracks.Where(track =track.Album == picked)
.Select(track =track.Artist)
.Distinct()
.Count();
sw.Stop();
Console.WriteLine("Found {0} distinct artists in {1}ms",
count, sw.ElapsedMilliseconds);
}
}
I could combine your code with mine, do something like....

this.Where(p =p.Album == AnAudioFileObject.Album)
.Select(s =s.Artist)
.Distinct().Take(10).Any()

This still isn't foolproof, there could be an album with 10 tracks by
one artist and 1 track (or more) by another. But this should be good
enough, I will test it and post the results.
No, it was already foolproof beforehand. You only need Take(1), because
Take(1).Any() is the equivalent of Count() 1. The important point is
that I put the Take(1) *after* the Distinct() whereas you put it
*before*.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 9 '08 #11
Eps
Jon Skeet [C# MVP] wrote:
No, it was already foolproof beforehand. You only need Take(1), because
Take(1).Any() is the equivalent of Count() 1. The important point is
that I put the Take(1) *after* the Distinct() whereas you put it
*before*.
Yep, your completely right, I am now using the line below to determine
if a track is from an various artists album.

this.Where(p =p.Album == af.Album).Select(s =>
s.Artist).Distinct().Take(1).Any();

This seems to be very quick, thanks for all your help Jon, your examples
are as ever very informative.

--
Eps
Sep 10 '08 #12
Eps <ep*@mailinator.comwrote:
No, it was already foolproof beforehand. You only need Take(1), because
Take(1).Any() is the equivalent of Count() 1. The important point is
that I put the Take(1) *after* the Distinct() whereas you put it
*before*.

Yep, your completely right, I am now using the line below to determine
if a track is from an various artists album.

this.Where(p =p.Album == af.Album).Select(s =>
s.Artist).Distinct().Take(1).Any();

This seems to be very quick, thanks for all your help Jon, your examples
are as ever very informative.
Thanks :) I'm still intrigued as to why it was taking a long time in
the first place though. It really, really shouldn't... if you're able
to send me some sample code, I'd be very interested to have a look. I
understand if you can't do that though.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Sep 10 '08 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: szwejk | last post by:
Hi! How to get result od dataTable from Linq query? I have typied DataSet and I want to join couple of tables. And I have a problem with change this result to DataTable type. (I don't want to...
1
by: silpa | last post by:
Hi, I have an SQL query like this select distinct t1.prodID from Table1 t1 left join Table2 t2 on t2.prodID = t1.prodID left join Table3 t3 on t3.serialno = t2.Id and t3.Qty = 0 ...
2
by: Joey | last post by:
I am querying a DataSet with LINQ. I am running into a problem when trying to construct my query because in the "from" clause I do not know the table name (range variable) until runtime. Possible...
2
by: =?Utf-8?B?Tmljaw==?= | last post by:
Hello, I need some assistance with a LINQ query. I've got a simple query: var q = from t in db.Table1 select t; Based on user input I'm adding some where clauses: if (condition1) q =...
3
by: Vivien Parlat | last post by:
Hello, I am currently using VB.Net 2008 express. I use linq to perform queries on a database, and I'm using the following link's source to convert those queries into DataTables i can then bind...
1
by: alex21 | last post by:
Ok i am trying to use a Linq query to access a dictionary. public static Dictionary<string, Client> Clients = new Dictionary<string, Client>();Using this Linq query: IEnumerable<Staff> loginquery...
2
by: scott1010 | last post by:
Hello I am having problems with a Linq query. I need to return an ID field which is of type GUID (c.Id) along with other fields of type String. I have tried both anonymous types and strongly typed...
0
by: James Folkerman | last post by:
Hi. I have SQL Server database that is connected to my WPF desktop-based application via ADO.NET Entity Framework. Now I need get content of one of the table via LINQ query and show it in...
1
by: kgkgkg | last post by:
Hey everyone, I've dealt with some simple LINQ queries before, but am not sure how to tackle this one... I've got a project I'm working on where I'm dealing with two classes I've created,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.