473,396 Members | 2,129 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

how to do a "REAL" string compare?

Heres the deal... I have an application where I have a list (as in a Windows
list control, but thats not important) displayed to the user. I sort this
list based on the list controls sort function (again, its not important that
its Windows) which ends up calling a compare function in my code:

int CompareFunc(char* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal quick
sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could force
the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
either putting requirements on the user or changing his item text. I'd much
rather end up with:

Item (1/100)
Item (2/100)
..
..
..
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of formats,
brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
characters a user might enter. Even the forward slash may not be the part
separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.

Jul 23 '05 #1
11 2943
Nobody wrote:
Heres the deal... I have an application where I have a list (as in a Windows
list control, but thats not important) displayed to the user. I sort this
list based on the list controls sort function (again, its not important that
its Windows) which ends up calling a compare function in my code:

int CompareFunc(char* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal quick
sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could force
the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
either putting requirements on the user or changing his item text. I'd much
rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of formats,
brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
characters a user might enter. Even the forward slash may not be the part
separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.


Prefixing single digits with 0 is a fix (I assume this is what you mean
by "zero-pad").

Also prefixing with space does the same job:
#include <string>
#include <algorithm>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_back("Item ( 2/100");
somelist.push_back("Item ( 1/100)");
somelist.push_back("Item (11/100)");

somelist.sort();

for(list<string>::const_iterator p= somelist.begin();
p!=somelist.end(); ++p)
cout<<*p<<"\n";
}
C:\c>temp
Item ( 1/100)
Item ( 2/100
Item (11/100)

C:\c>
In summary just make sure all strings share the same length.
I am interested in a cleaner approach myself too.

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #2

"Ioannis Vranos" <iv*@remove.this.grad.com> wrote in message
news:1110659832.315731@athnrd02...
Nobody wrote:
Heres the deal... I have an application where I have a list (as in a
Windows list control, but thats not important) displayed to the user. I
sort this list based on the list controls sort function (again, its not
important that its Windows) which ends up calling a compare function in
my code:

int CompareFunc(char* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal
quick sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of
formats, brackets, parents, dashes, asterisks, etc or any endless supply
of cutesy characters a user might enter. Even the forward slash may not
be the part separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do
this.


Prefixing single digits with 0 is a fix (I assume this is what you mean by
"zero-pad").

Also prefixing with space does the same job:
#include <string>
#include <algorithm>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_back("Item ( 2/100");
somelist.push_back("Item ( 1/100)");
somelist.push_back("Item (11/100)");

somelist.sort();

for(list<string>::const_iterator p= somelist.begin();
p!=somelist.end(); ++p)
cout<<*p<<"\n";
}
C:\c>temp
Item ( 1/100)
Item ( 2/100
Item (11/100)

C:\c>
In summary just make sure all strings share the same length.
I am interested in a cleaner approach myself too.


Well, as I said, I don't want to change the text the user entered. Space
padding is just as bad as zero padding :). Just finding the part number in
the string could be tough since you can have:

Item #1 of 100
Item 1/100
Item (1/100)
Item 1-100
Item 1 / 100
Item 1/ 100
Item *1/100*
Item <1/100>
Item [1/100]

etc.

You could never handle every case... and what if someone wants to be cute
and do something like:

Item [1/100] -= by Fred =-

or

Item Part # 1-100 [1/100]
Item Part # 1-100 [2/100]

etc.

Some users would follow format requirements, but I guess a huge majority
wouldn't. And a lot of cutesy users would rather put the cutesy crap on the
description then have it properly sorted.

Jul 23 '05 #3
Nobody wrote:
Now I have a user in which this list displays "multi-part" items.
... the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. source for them. How can this be sorted properly without requiring
the user to enter it in a very specific format? I could never handle
every possible format in my code. There must be some kind of cool
generic way to do this.


There are infinitely many criteria for sorting strings. There's no way to answer
your question without getting more of an idea of the range of possible inputs,
what they represent, and why you think various algorithms are lame.

One thing you might do is count any non-alphanumeric character as a separator,
and then lexicographically sort the resulting sequences with indivual components
ordered numrically. OTOH, maybe this, too, is lame.

Jonathan
Jul 23 '05 #4
"Nobody" <no****@cox.net> wrote in message
news:gOHYd.18095$KK5.2916@fed1read03...
Heres the deal... I have an application where I have a list (as in a
Windows list control, but thats not important) displayed to the user. I
sort this list based on the list controls sort function (again, its not
important that its Windows) which ends up calling a compare function in my
code:

int CompareFunc(char* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal
quick sort algorithm. No problem, it all works fine. .... I'd much rather end up with:

Item (1/100)
Item (2/100)
.
Item (11/100) .... I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.


What about something like:

int CompareFunc(char const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}

I hope this helps,
Ivan
Jul 23 '05 #5
What about something like:

int CompareFunc(char const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}


This would only work if *any* number was assumed to be the part #. If the
description contained any other number, this would fail.
Jul 23 '05 #6

"Jonathan Turkanis" <te******@kangaroologic.com> wrote in message
news:39*************@individual.net...
Nobody wrote:
Now I have a user in which this list displays "multi-part" items.
... the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should.

source for them. How can this be sorted properly without requiring
the user to enter it in a very specific format? I could never handle
every possible format in my code. There must be some kind of cool
generic way to do this.


There are infinitely many criteria for sorting strings. There's no way to
answer
your question without getting more of an idea of the range of possible
inputs,
what they represent, and why you think various algorithms are lame.

One thing you might do is count any non-alphanumeric character as a
separator,
and then lexicographically sort the resulting sequences with indivual
components
ordered numrically. OTOH, maybe this, too, is lame.

Jonathan


LOL, nah, all I said was zero-padding or space-padding is lame because we
dont want to change the original string or force the user to do anything
special (like zero pad the numbers themselves). The description text
represents a file or multiple files that a user uploads. Since these files
can get large, they are often split up into multiple parts.

I guess we could code this for the few most common forms and then add 'em as
we see 'em.
Jul 23 '05 #7
Nobody wrote:
LOL, nah, all I said was zero-padding or space-padding is lame because we
dont want to change the original string or force the user to do anything
special (like zero pad the numbers themselves). The description text
represents a file or multiple files that a user uploads. Since these files
can get large, they are often split up into multiple parts.

I guess we could code this for the few most common forms and then add 'em as
we see 'em.

At first I observed this:

#include <string>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_back("Item ( 2/100)");
somelist.push_back("Item ( 1/100)");
somelist.push_back("Item (11/100)");
for(list<string>::const_iterator p= somelist.begin();
p!=somelist.end(); ++p)
{
int sum=0;

for(string::size_type i=0; i<p->size(); ++i)
sum+= p->operator[](i);

cout<<sum<<"\n";
}
}
C:\c>temp
786
785
802

C:\c>
The question is how does it scale.
Anyway let's try to break this into steps. At first, I think such a
problem would arise to strings whose prefix substring is the same.
So the first step would be to check any string up to the first digit (if
existent), and then check the rest strings if they begin in the same way.

Regular expressions can help write minimal code for this, at first you
would check if there is a match for the regular expression "\\d+"
(digit) and then find where in the string a first occurrence of it
exists, and then use the substring up to that digit to check if some
other of the rest strings begin the same way (let's suppose we have the
string ""Item ( 2/100)", we then use the regular expression "^Item (.*"
with ^ denoting the beginning of the string - and $ the end -).
Then if we found a match, we would check the rest digits.

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #8
What about if you looked for the last two numbers in the string (or
just the second to last really) and sorted on that?

The code would be similar to Ivan's, but going backwards in the string.
(I'm gonna assume you can write the actual code, so I won't.)

Jul 23 '05 #9
"Nobody" <no****@cox.net> wrote in message
news:RCJYd.18970$KK5.12712@fed1read03...

int CompareFunc(char const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}


This would only work if *any* number was assumed to be the part #. If the
description contained any other number, this would fail.


It depends what is considered as success.
Did I miss something, or is this a new requirement that wasn't in
your initial spec? I don't know, and from what I read in your posts,
I wouldn't be able to say what is a part number and what isn't!

If you want some numbers to be sorted by value, and others
lexicographically,
you obviously need to give the comparison function some knowledge about your
field's formatting.
Just add a criterion to conditionally enable the value-based comparison.

For example, if you only want the second-last number in the string
to be specially treated as a (part) number, you could pre-scan each
string (backwards) to identify the start of that number - and do the
value-based comparison only when you reach that point.

Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #10

"Nobody" <no****@cox.net> schrieb im Newsbeitrag
news:RCJYd.18970$KK5.12712@fed1read03...
What about something like:

int CompareFunc(char const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}


This would only work if *any* number was assumed to be the part #. If the
description contained any other number, this would fail.


As you can easily see there is simply no way to create a sort that handles
all kinds of situations without *any* knowledge. This is impossible by the
nature of sorting, which implicitly requires at least some information ;-)
What you could do for example, is to resort to an associative container with
keys (like a map). The keys are filled with the minimum information that you
try to extract from the values and sorting is based on the keys. This way
you do not impose too many restrictive requirements on your clients but,
naturally, the success depends very much on you "information gathering
scheme".

Cheers
Chris

Jul 23 '05 #11
"Nobody" <no****@cox.net> wrote in message
news:gOHYd.18095$KK5.2916@fed1read03...
Heres the deal... I have an application where I have a list (as in a Windows list control, but thats not important) displayed to the user. I sort this
list based on the list controls sort function (again, its not important that its Windows) which ends up calling a compare function in my code:

int CompareFunc(char* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal quick sort algorithm. No problem, it all works fine.
It works fine, but I would expect it to work for 'char*' types, apparently
they are C strings as you say.
Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)
I think the problem here is treating the data as C strings. These are not
strings but string representations of complex objects. I don't think you
should compare them as strings; because as you note, it doesn't work for
this type of data.
Now while that is a "correct" string sort, its kind of lame. I could force
the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am either putting requirements on the user or changing his item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of formats, brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
characters a user might enter. Even the forward slash may not be the part
separator and there may be stuff after the part #s.
It means that the data will be parsed differently in each case. (Boost's
spirit or some other parser might help.)
I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format?


The data can be sorted properly by first parsing the input, then creating
the objects, and finaly comparing the objects.

Ali

Jul 23 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Silas | last post by:
Hi, I use view to join difference table together for some function. However, when the "real" table fields changed (e.g. add/delete/change field). The view table still use the "old fields". ...
0
by: Simon Verona | last post by:
I have some Windows Forms software that I'm developing that uses a remote server (called using remoting) to provide the business rules and dataaccess. For development purposes the client and...
0
by: David Garamond | last post by:
I want to know how functional indexes are used "in the real world". Here are the common uses: * non-unique index on the first parts of a longish text field (SUBSTRING(field)) to save disk space,...
5
by: engsolnorm | last post by:
I'm playing with a sudoku GUI...just to learn more about python. I've made 81 'cells'...actually small canvases Part of my scheme to write the cells (all 81 of them in the gui) to a file (using...
5
by: playagain | last post by:
Please help me to build a list of examples of stack and queue in real life situation... Conditions: The object concerned must only one object. And the object must be tangible. Example: Queue...
1
by: Tyno Gendo | last post by:
Hi everyone I need to move on a step in my PHP... I know what classes are, both in PHP4 and 5 and I'm aware of "patterns" existing, but what I'm looking for are some real world projects eg....
3
by: Mark Shroyer | last post by:
I guess this sort of falls under the "shameless plug" category, but here it is: Recently I used a custom metaclass in a Python program I've been working on, and I ended up doing a sort of write-up...
71
by: Jack | last post by:
I understand that the standard Python distribution is considered the C-Python. Howerver, the current C-Python is really a combination of C and Python implementation. There are about 2000 Python...
0
by: Ignacio Machin ( .NET/ C# MVP ) | last post by:
The difference between compile & runtime. CreateInstance works at runtime, you can pass ANY string to it (even an incorrect one like "123123123123") and it will compile Only at runtime you will...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.