By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,918 Members | 2,246 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,918 IT Pros & Developers. It's quick & easy.

Need help using STL to sort a vector of strings

P: n/a
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?

Thanks for any help ... Al
Jul 19 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
> bool pred(const std::string& lhs, const std::string& rhs)
{
return lhs.substr(0, sig_chars) < rhs.substr(0, sig_chars);
}

bool pred(const std::string& lhs, const std::string& rhs)
{
return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
rhs.end());
}

This would be considerably faster with not having to create substrings.

Fraser.
Jul 19 '05 #2

P: n/a
"Victor Bazarov" <v.********@attAbi.com> wrote in message news:<vi************@corp.supernews.com>...
"Al Newton" <al*******@hotmail.com> wrote...
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.


Do you need to sort every string in a vector and then the vector
as well, or just every string, or just the vector based on the
first 10 chars in every string?

Example: given the vector

"sadflkjasdlksjdf"
"sadfslkjlskjdlfkj"
"lkjdfasdflkas"

sorting the first 10 chars in each string gives

"aaddfjklsslksjdf"
"adfjkllssskjdlfkj"
"addffjkllskas"

sorting the first 10 chars, then the entire vector gives
"aaddfjklsslksjdf"
"addffjkllskas"
"adfjkllssskjdlfkj"

sorting the entire vector using only 10 chars of each line gives

"lkjdfasdflkas"
"sadflkjasdlksjdf"
"sadfslkjlskjdlfkj"

So, make up your mind.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?


Of course. If you've only specified what exactly it is you want
to do...

Victor

Sorry about the ambiguity. Given as input the 3-element vector:
987654321A
123456789B
012345678C
I want the sorted output to be:
012345678C
123456789B
987654321A

Hope this clears up things ... Al
Jul 19 '05 #3

P: n/a
Al> I apologize for ambiguities in the posting. I hope these were
Al> resolved in my reply to Bazarov. All strings begin with 10
Al> numerics that typically repeat 3 to 8 times in a vector of 10,000
Al> strings, but many numerics are unique within the vector. In all
Al> cases the numerics are followed by alphabetic data of no
Al> consequence. I don't have a quantitative answer as to run time,
Al> but the frown on my manager's face says, "Too long. Way, way too
Al> long."

If the numerics don't repeat much, then you're not going to save much
by restricting the comparisons. In fact, depending on how you do it,
you might even make things slower.

10,000 strings isn't that many on a modern processor. If it's taking
more than a small fraction of a second to sort them, I would be
inclined to look somewhere other than the comparison function for the
cause.

--
Andrew Koenig, ar*@acm.org
Jul 19 '05 #4

P: n/a
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<bg**********@slb6.atl.mindspring.net>...
Al Newton <al*******@hotmail.com> wrote in message
news:4e**************************@posting.google.c om...
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?

Thanks for any help ... Al


The sort function offers an optional argument, a
'predicate' function you can write to specify the
ordering:

<snip>


Thank you for the response. It was valuable and sincerely appreciated.

.... Al
Jul 19 '05 #5

P: n/a
"Fraser Ross" <fraserATmembers.v21.co.unitedkingdom> wrote in message news:<3f******@news.greennet.net>...
bool pred(const std::string& lhs, const std::string& rhs)
{
return lhs.substr(0, sig_chars) < rhs.substr(0, sig_chars);
}

bool pred(const std::string& lhs, const std::string& rhs)
{
return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
rhs.end());
}

This would be considerably faster with not having to create substrings.

Fraser.


You're quite right. 50000 repetitions of the original sort required
2664ms. 50000 repetitions of your modification reduced that to a mere
581ms.

Sincere thanks ... Al
Jul 19 '05 #6

P: n/a
In article <4e**************************@posting.google.com >, alnewton1
@hotmail.com says...
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?


I'm not sure about the "naturally sorting time is long" part -- sorting
10,000 strings should not be very slow. Though you'd need to use a
profiler to know, there's also a chance that you're actually copying
entire strings instead of just copying pointers when (for example) the
sort function decides it needs to swap a pair of strings. In this case,
you might gain considerably more by using pointers rather than raw
strings.

To answer your question directly, however, when you call std::sort you
can pass it a comparison routine you want used. Your comparison
function could simply create 10-character sub-strings and compare them,
or it could call strncmp to do the comparison, or (of course) it could
do the comparison itself. Which of these would be fastest is likely to
depend somewhat on how std::string is implemented on your machine.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 19 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.