Connecting Tech Pros Worldwide Forums | Help | Site Map

Need help using STL to sort a vector of strings

Al Newton
Guest
 
Posts: n/a
#1: Jul 19 '05
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?

Thanks for any help ... Al

Fraser Ross
Guest
 
Posts: n/a
#2: Jul 19 '05

re: Need help using STL to sort a vector of strings


> bool pred(const std::string& lhs, const std::string& rhs)[color=blue]
> {
> return lhs.substr(0, sig_chars) < rhs.substr(0, sig_chars);
> }[/color]
bool pred(const std::string& lhs, const std::string& rhs)
{
return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
rhs.end());
}

This would be considerably faster with not having to create substrings.

Fraser.


Al Newton
Guest
 
Posts: n/a
#3: Jul 19 '05

re: Need help using STL to sort a vector of strings


"Victor Bazarov" <v.Abazarov@attAbi.com> wrote in message news:<vil6eprf2vmhdc@corp.supernews.com>...[color=blue]
> "Al Newton" <alnewton1@hotmail.com> wrote...[color=green]
> > I want to use STL's sort algorithm to sort a string vector. Some of
> > the strings are fairly long (300 to 400 chars) and the vector isn't
> > small (5,000 to 10,000 elements). Naturally, sorting time is long.
> > All I need sorted are the first 10 characters in the strings.[/color]
>
> Do you need to sort every string in a vector and then the vector
> as well, or just every string, or just the vector based on the
> first 10 chars in every string?
>
> Example: given the vector
>
> "sadflkjasdlksjdf"
> "sadfslkjlskjdlfkj"
> "lkjdfasdflkas"
>
> sorting the first 10 chars in each string gives
>
> "aaddfjklsslksjdf"
> "adfjkllssskjdlfkj"
> "addffjkllskas"
>
> sorting the first 10 chars, then the entire vector gives
>
>
> "aaddfjklsslksjdf"
> "addffjkllskas"
> "adfjkllssskjdlfkj"
>
> sorting the entire vector using only 10 chars of each line gives
>
> "lkjdfasdflkas"
> "sadflkjasdlksjdf"
> "sadfslkjlskjdlfkj"
>
> So, make up your mind.
>[color=green]
> > Limiting the number of characters processed should speed things up
> > considerably. Is there a way to do 1t?[/color]
>
> Of course. If you've only specified what exactly it is you want
> to do...
>
> Victor[/color]


Sorry about the ambiguity. Given as input the 3-element vector:
987654321A
123456789B
012345678C
I want the sorted output to be:
012345678C
123456789B
987654321A

Hope this clears up things ... Al
Andrew Koenig
Guest
 
Posts: n/a
#4: Jul 19 '05

re: Need help using STL to sort a vector of strings


Al> I apologize for ambiguities in the posting. I hope these were
Al> resolved in my reply to Bazarov. All strings begin with 10
Al> numerics that typically repeat 3 to 8 times in a vector of 10,000
Al> strings, but many numerics are unique within the vector. In all
Al> cases the numerics are followed by alphabetic data of no
Al> consequence. I don't have a quantitative answer as to run time,
Al> but the frown on my manager's face says, "Too long. Way, way too
Al> long."

If the numerics don't repeat much, then you're not going to save much
by restricting the comparisons. In fact, depending on how you do it,
you might even make things slower.

10,000 strings isn't that many on a modern processor. If it's taking
more than a small fraction of a second to sort them, I would be
inclined to look somewhere other than the comparison function for the
cause.

--
Andrew Koenig, ark@acm.org
Al Newton
Guest
 
Posts: n/a
#5: Jul 19 '05

re: Need help using STL to sort a vector of strings


"Mike Wahler" <mkwahler@mkwahler.net> wrote in message news:<bge5dp$d3j$1@slb6.atl.mindspring.net>...[color=blue]
> Al Newton <alnewton1@hotmail.com> wrote in message
> news:4e1067aa.0308010753.28a6fbdd@posting.google.c om...[color=green]
> > I want to use STL's sort algorithm to sort a string vector. Some of
> > the strings are fairly long (300 to 400 chars) and the vector isn't
> > small (5,000 to 10,000 elements). Naturally, sorting time is long.
> > All I need sorted are the first 10 characters in the strings.
> > Limiting the number of characters processed should speed things up
> > considerably. Is there a way to do 1t?
> >
> > Thanks for any help ... Al[/color]
>
> The sort function offers an optional argument, a
> 'predicate' function you can write to specify the
> ordering:
>
> <snip>[/color]

Thank you for the response. It was valuable and sincerely appreciated.

.... Al
Al Newton
Guest
 
Posts: n/a
#6: Jul 19 '05

re: Need help using STL to sort a vector of strings


"Fraser Ross" <fraserATmembers.v21.co.unitedkingdom> wrote in message news:<3f2ad703@news.greennet.net>...[color=blue][color=green]
> > bool pred(const std::string& lhs, const std::string& rhs)
> > {
> > return lhs.substr(0, sig_chars) < rhs.substr(0, sig_chars);
> > }[/color]
> bool pred(const std::string& lhs, const std::string& rhs)
> {
> return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
> lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
> rhs.end());
> }
>
> This would be considerably faster with not having to create substrings.
>
> Fraser.[/color]

You're quite right. 50000 repetitions of the original sort required
2664ms. 50000 repetitions of your modification reduced that to a mere
581ms.

Sincere thanks ... Al
Jerry Coffin
Guest
 
Posts: n/a
#7: Jul 19 '05

re: Need help using STL to sort a vector of strings


In article <4e1067aa.0308010753.28a6fbdd@posting.google.com >, alnewton1
@hotmail.com says...[color=blue]
> I want to use STL's sort algorithm to sort a string vector. Some of
> the strings are fairly long (300 to 400 chars) and the vector isn't
> small (5,000 to 10,000 elements). Naturally, sorting time is long.
> All I need sorted are the first 10 characters in the strings.
> Limiting the number of characters processed should speed things up
> considerably. Is there a way to do 1t?[/color]

I'm not sure about the "naturally sorting time is long" part -- sorting
10,000 strings should not be very slow. Though you'd need to use a
profiler to know, there's also a chance that you're actually copying
entire strings instead of just copying pointers when (for example) the
sort function decides it needs to swap a pair of strings. In this case,
you might gain considerably more by using pointers rather than raw
strings.

To answer your question directly, however, when you call std::sort you
can pass it a comparison routine you want used. Your comparison
function could simply create 10-character sub-strings and compare them,
or it could call strncmp to do the comparison, or (of course) it could
do the comparison itself. Which of these would be fastest is likely to
depend somewhat on how std::string is implemented on your machine.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Closed Thread