473,669 Members | 2,371 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to do a "REAL" string compare?

Heres the deal... I have an application where I have a list (as in a Windows
list control, but thats not important) displayed to the user. I sort this
list based on the list controls sort function (again, its not important that
its Windows) which ends up calling a compare function in my code:

int CompareFunc(cha r* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal quick
sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could force
the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
either putting requirements on the user or changing his item text. I'd much
rather end up with:

Item (1/100)
Item (2/100)
..
..
..
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of formats,
brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
characters a user might enter. Even the forward slash may not be the part
separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.

Jul 23 '05 #1
11 2972
Nobody wrote:
Heres the deal... I have an application where I have a list (as in a Windows
list control, but thats not important) displayed to the user. I sort this
list based on the list controls sort function (again, its not important that
its Windows) which ends up calling a compare function in my code:

int CompareFunc(cha r* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal quick
sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could force
the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
either putting requirements on the user or changing his item text. I'd much
rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of formats,
brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
characters a user might enter. Even the forward slash may not be the part
separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.


Prefixing single digits with 0 is a fix (I assume this is what you mean
by "zero-pad").

Also prefixing with space does the same job:
#include <string>
#include <algorithm>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_b ack("Item ( 2/100");
somelist.push_b ack("Item ( 1/100)");
somelist.push_b ack("Item (11/100)");

somelist.sort() ;

for(list<string >::const_iterat or p= somelist.begin( );
p!=somelist.end (); ++p)
cout<<*p<<"\n";
}
C:\c>temp
Item ( 1/100)
Item ( 2/100
Item (11/100)

C:\c>
In summary just make sure all strings share the same length.
I am interested in a cleaner approach myself too.

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #2

"Ioannis Vranos" <iv*@remove.thi s.grad.com> wrote in message
news:1110659832 .315731@athnrd0 2...
Nobody wrote:
Heres the deal... I have an application where I have a list (as in a
Windows list control, but thats not important) displayed to the user. I
sort this list based on the list controls sort function (again, its not
important that its Windows) which ends up calling a compare function in
my code:

int CompareFunc(cha r* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal
quick sort algorithm. No problem, it all works fine.

Now I have a user in which this list displays "multi-part" items. You can
guess where this is headed :), the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. Now keep in mind that this could end up in dozens of
formats, brackets, parents, dashes, asterisks, etc or any endless supply
of cutesy characters a user might enter. Even the forward slash may not
be the part separator and there may be stuff after the part #s.

I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do
this.


Prefixing single digits with 0 is a fix (I assume this is what you mean by
"zero-pad").

Also prefixing with space does the same job:
#include <string>
#include <algorithm>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_b ack("Item ( 2/100");
somelist.push_b ack("Item ( 1/100)");
somelist.push_b ack("Item (11/100)");

somelist.sort() ;

for(list<string >::const_iterat or p= somelist.begin( );
p!=somelist.end (); ++p)
cout<<*p<<"\n";
}
C:\c>temp
Item ( 1/100)
Item ( 2/100
Item (11/100)

C:\c>
In summary just make sure all strings share the same length.
I am interested in a cleaner approach myself too.


Well, as I said, I don't want to change the text the user entered. Space
padding is just as bad as zero padding :). Just finding the part number in
the string could be tough since you can have:

Item #1 of 100
Item 1/100
Item (1/100)
Item 1-100
Item 1 / 100
Item 1/ 100
Item *1/100*
Item <1/100>
Item [1/100]

etc.

You could never handle every case... and what if someone wants to be cute
and do something like:

Item [1/100] -= by Fred =-

or

Item Part # 1-100 [1/100]
Item Part # 1-100 [2/100]

etc.

Some users would follow format requirements, but I guess a huge majority
wouldn't. And a lot of cutesy users would rather put the cutesy crap on the
description then have it properly sorted.

Jul 23 '05 #3
Nobody wrote:
Now I have a user in which this list displays "multi-part" items.
... the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should. source for them. How can this be sorted properly without requiring
the user to enter it in a very specific format? I could never handle
every possible format in my code. There must be some kind of cool
generic way to do this.


There are infinitely many criteria for sorting strings. There's no way to answer
your question without getting more of an idea of the range of possible inputs,
what they represent, and why you think various algorithms are lame.

One thing you might do is count any non-alphanumeric character as a separator,
and then lexicographical ly sort the resulting sequences with indivual components
ordered numrically. OTOH, maybe this, too, is lame.

Jonathan
Jul 23 '05 #4
"Nobody" <no****@cox.net > wrote in message
news:gOHYd.1809 5$KK5.2916@fed1 read03...
Heres the deal... I have an application where I have a list (as in a
Windows list control, but thats not important) displayed to the user. I
sort this list based on the list controls sort function (again, its not
important that its Windows) which ends up calling a compare function in my
code:

int CompareFunc(cha r* str1, char* str2)
{
}

this function returns -1, 0 or 1 which gets passed on to the internal
quick sort algorithm. No problem, it all works fine. .... I'd much rather end up with:

Item (1/100)
Item (2/100)
.
Item (11/100) .... I've seen some applications do this in the past, but never saw the source
for them. How can this be sorted properly without requiring the user to
enter it in a very specific format? I could never handle every possible
format in my code. There must be some kind of cool generic way to do this.


What about something like:

int CompareFunc(cha r const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}

I hope this helps,
Ivan
Jul 23 '05 #5
What about something like:

int CompareFunc(cha r const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}


This would only work if *any* number was assumed to be the part #. If the
description contained any other number, this would fail.
Jul 23 '05 #6

"Jonathan Turkanis" <te******@kanga roologic.com> wrote in message
news:39******** *****@individua l.net...
Nobody wrote:
Now I have a user in which this list displays "multi-part" items.
... the list ends up like this:

Item (1/100)
Item (11/100)
Item (2/100)

Now while that is a "correct" string sort, its kind of lame. I could
force the user to zero-pad or zero-pad myself, but both seem kind of
hokey as I am either putting requirements on the user or changing his
item text. I'd much rather end up with:

Item (1/100)
Item (2/100)
.
.
.
Item (11/100)

As it should.

source for them. How can this be sorted properly without requiring
the user to enter it in a very specific format? I could never handle
every possible format in my code. There must be some kind of cool
generic way to do this.


There are infinitely many criteria for sorting strings. There's no way to
answer
your question without getting more of an idea of the range of possible
inputs,
what they represent, and why you think various algorithms are lame.

One thing you might do is count any non-alphanumeric character as a
separator,
and then lexicographical ly sort the resulting sequences with indivual
components
ordered numrically. OTOH, maybe this, too, is lame.

Jonathan


LOL, nah, all I said was zero-padding or space-padding is lame because we
dont want to change the original string or force the user to do anything
special (like zero pad the numbers themselves). The description text
represents a file or multiple files that a user uploads. Since these files
can get large, they are often split up into multiple parts.

I guess we could code this for the few most common forms and then add 'em as
we see 'em.
Jul 23 '05 #7
Nobody wrote:
LOL, nah, all I said was zero-padding or space-padding is lame because we
dont want to change the original string or force the user to do anything
special (like zero pad the numbers themselves). The description text
represents a file or multiple files that a user uploads. Since these files
can get large, they are often split up into multiple parts.

I guess we could code this for the few most common forms and then add 'em as
we see 'em.

At first I observed this:

#include <string>
#include <iostream>
#include <list>

int main()
{
using namespace std;

list<string> somelist;

somelist.push_b ack("Item ( 2/100)");
somelist.push_b ack("Item ( 1/100)");
somelist.push_b ack("Item (11/100)");
for(list<string >::const_iterat or p= somelist.begin( );
p!=somelist.end (); ++p)
{
int sum=0;

for(string::siz e_type i=0; i<p->size(); ++i)
sum+= p->operator[](i);

cout<<sum<<"\n" ;
}
}
C:\c>temp
786
785
802

C:\c>
The question is how does it scale.
Anyway let's try to break this into steps. At first, I think such a
problem would arise to strings whose prefix substring is the same.
So the first step would be to check any string up to the first digit (if
existent), and then check the rest strings if they begin in the same way.

Regular expressions can help write minimal code for this, at first you
would check if there is a match for the regular expression "\\d+"
(digit) and then find where in the string a first occurrence of it
exists, and then use the substring up to that digit to check if some
other of the rest strings begin the same way (let's suppose we have the
string ""Item ( 2/100)", we then use the regular expression "^Item (.*"
with ^ denoting the beginning of the string - and $ the end -).
Then if we found a match, we would check the rest digits.

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #8
What about if you looked for the last two numbers in the string (or
just the second to last really) and sorted on that?

The code would be similar to Ivan's, but going backwards in the string.
(I'm gonna assume you can write the actual code, so I won't.)

Jul 23 '05 #9
"Nobody" <no****@cox.net > wrote in message
news:RCJYd.1897 0$KK5.12712@fed 1read03...

int CompareFunc(cha r const* s1, char const* s2)
{
for(;;) {
unsigned char c1 = *s1++;
unsigned char c2 = *s2++;
if( isdigit(c1) && isdigit(c2) ) {
unsigned long v1 = strtoul(s1-1,&s1,10);
unsigned long v2 = strtoul(s2-1,&s2,10);
if(v1!=v2) return (v1<v2) ? -1 : +1;
continue;
}
else if(c1!=c2) {
...compare single chars as usual...
}
else if( !c1 )
return 0; // reached end of strings
}
}


This would only work if *any* number was assumed to be the part #. If the
description contained any other number, this would fail.


It depends what is considered as success.
Did I miss something, or is this a new requirement that wasn't in
your initial spec? I don't know, and from what I read in your posts,
I wouldn't be able to say what is a part number and what isn't!

If you want some numbers to be sorted by value, and others
lexicographical ly,
you obviously need to give the comparison function some knowledge about your
field's formatting.
Just add a criterion to conditionally enable the value-based comparison.

For example, if you only want the second-last number in the string
to be specially treated as a (part) number, you could pre-scan each
string (backwards) to identify the start of that number - and do the
value-based comparison only when you reach that point.

Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3131
by: Silas | last post by:
Hi, I use view to join difference table together for some function. However, when the "real" table fields changed (e.g. add/delete/change field). The view table still use the "old fields". Therefore everytimes when I change the real table, I also needed open the view table and save it by SQL enterprise manager manually for update the view table field.
0
1072
by: Simon Verona | last post by:
I have some Windows Forms software that I'm developing that uses a remote server (called using remoting) to provide the business rules and dataaccess. For development purposes the client and server portions are both running on my PC. I was wondering if there was some way that for "real world" performance, I could somehow "throttle" the tcp/ip stack so that calls to localhost had some lag to represent the latency/bandwidth of a real...
0
393
by: David Garamond | last post by:
I want to know how functional indexes are used "in the real world". Here are the common uses: * non-unique index on the first parts of a longish text field (SUBSTRING(field)) to save disk space, while still allowing faster searches than a sequential scan. * indexing on LOWER(field)/UPPER(field) to allow case-insensitive searches or case-insensitive unique constraint.
5
1874
by: engsolnorm | last post by:
I'm playing with a sudoku GUI...just to learn more about python. I've made 81 'cells'...actually small canvases Part of my scheme to write the cells (all 81 of them in the gui) to a file (using the the SAVE callback/button), then restore the gui cells from the contents of the saved file, which depends on knowing the "name" of the cell with the focus, or one (or more) which have a number. The print shows .9919624.9990312, but this...
5
39875
by: playagain | last post by:
Please help me to build a list of examples of stack and queue in real life situation... Conditions: The object concerned must only one object. And the object must be tangible. Example: Queue (FIFO): The bullet in a machine gun..(you cannot fire 2 bullets at the same time) Stack (LIFO): The tennis balls in their container.. (you cannot remove 2 balls at the same time)
1
3121
by: Tyno Gendo | last post by:
Hi everyone I need to move on a step in my PHP... I know what classes are, both in PHP4 and 5 and I'm aware of "patterns" existing, but what I'm looking for are some real world projects eg. Open Source that people consider to use classes and patterns correctly. I lack a senior person to lead me in this so I feel I'm losing out on only using bare PHP class features and not really knowing how to design
3
1753
by: Mark Shroyer | last post by:
I guess this sort of falls under the "shameless plug" category, but here it is: Recently I used a custom metaclass in a Python program I've been working on, and I ended up doing a sort of write-up on it, as an example of what a "real life" __metaclass__ might do for those who may never have seen such a thing themselves. http://markshroyer.com/blog/2007/11/09/tilting-at-metaclass-windmills/ So what's the verdict? Incorrect? Missed the...
71
3287
by: Jack | last post by:
I understand that the standard Python distribution is considered the C-Python. Howerver, the current C-Python is really a combination of C and Python implementation. There are about 2000 Python files included in the Windows version of Python distribution. I'm not sure how much of the C-Python is implemented in C but I think the more modules implemented in C, the better performance and lower memory footprint it will get. I wonder if it's...
0
174
by: Ignacio Machin ( .NET/ C# MVP ) | last post by:
The difference between compile & runtime. CreateInstance works at runtime, you can pass ANY string to it (even an incorrect one like "123123123123") and it will compile Only at runtime you will get the error. And honestly, you HAVE to know something about your Class. otherwise, how do you know which method to call?
0
8462
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8382
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8893
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
7405
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5682
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4206
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2792
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2028
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1787
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.