By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,264 Members | 1,743 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,264 IT Pros & Developers. It's quick & easy.

elementary string processing question

P: n/a
Hi everyone,

I have a "simple" question, especially for people familiar with regex.
I need to parse strings that have the form:

1:3::5:9

which indicates the set of integers {1 3 4 5 9}. In other words i have
a set of numbers separated by ":", where "::" indicates a range from
lo to hi inclusive. It is desirable to error check this string (i.e it
should. start and end with a number, and be composed only numbers,
"::", and ":"). I'm currently using the Boost C++ library, and i've
worked out some pretty ugly solutions. If anyone has a suggestion, I'd
very much appreciate it. Thanks!
Nov 1 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Nov 1, 4:28*am, tonywh00t <tony.s...@gmail.comwrote:
I have a "simple" question, especially for people familiar
with regex. I need to parse strings that have the form:
1:3::5:9
which indicates the set of integers {1 3 4 5 9}. In other
words i have a set of numbers separated by ":", where "::"
indicates a range from lo to hi inclusive. It is desirable to
error check this string (i.e it should. start and end with a
number, and be composed only numbers, "::", and ":"). I'm
currently using the Boost C++ library, and i've worked out
some pretty ugly solutions. If anyone has a suggestion, I'd
very much appreciate it. Thanks!
I presume that the number of entries in the string may vary;
otherwise, of course, you said it yourself, regex. I'd still
use regex to validate the string, something like
"^\\d+(:\\d+|::\\d+)*$", I think would do the trick. (It would
be really elegant if you could use capture, but capture doesn't
work well within closures---only the last match is captured.)
Then I'd simply break the string up into substrings at each ':':

std::vector< std::string >
parse( std::string const& source )
{
typedef std::string::const_iterator
TextIter ;
std::vector< std::string >
result ;
TextIter current = source.begin() ;
TextIter const end = source.end() ;
while ( current != end ) {
TextIter fieldBegin = current ;
current = std::find( current, end, ':' ) ;
result.push_back( std::string( fieldBegin, current ) ) ;
if ( current != end ) {
++ current ;
}
}
return result ;
}

This gives you an array of strings, with an emtpy string between
:: (so when you see an empty string, you know you have a range).
So you could do something like:

int
toInt( std::string const& string )
{
std::istringstream cvt( string ) ;
int result ;
cvt >result ;
return result ;
}

std::vector< int >
convert( std::vector< std::string const& source )
{
typedef std::vector< std::string >::const_iterator
FieldIter ;
std::vector< int result ;
FieldIter current = source.begin() ;
FieldIter const end = source.end() ;
while ( current != end ) {
result.push_back( toInt( *current ) ) ;
++ current ;
if ( current != end && *current == "" ) {
int bottom = result.back() ;
++ current ;
int top = toInt( *current ) ;
if ( top <= bottom ) {
throw someError ;
}
while ( ++ bottom <= top ) {
result.push_back( bottom ) ;
}
++ current ;
}
}
sort( result.begin(), result.end() ) ;
// Or you might want to track the last seen to ensure
// that the input was correctly sorted.
return result ;
}

Note that all of the above code supposes the precheck on the
format using regex. Otherwise, you'll need a lot more error
handling and special cases.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Nov 1 '08 #2

P: n/a
On Nov 1, 4:28*am, tonywh00t <tony.s...@gmail.comwrote:
I have a "simple" question, especially for people familiar
with regex. I need to parse strings that have the form:
1:3::5:9
which indicates the set of integers {1 3 4 5 9}. In other
words i have a set of numbers separated by ":", where "::"
indicates a range from lo to hi inclusive. It is desirable to
error check this string (i.e it should. start and end with a
number, and be composed only numbers, "::", and ":"). I'm
currently using the Boost C++ library, and i've worked out
some pretty ugly solutions. If anyone has a suggestion, I'd
very much appreciate it. Thanks!
I presume that the number of entries in the string may vary;
otherwise, of course, you said it yourself, regex. I'd still
use regex to validate the string, something like
"^\\d+(:\\d+|::\\d+)*$", I think would do the trick. (It would
be really elegant if you could use capture, but capture doesn't
work well within closures---only the last match is captured.)
Then I'd simply break the string up into substrings at each ':':

std::vector< std::string >
parse( std::string const& source )
{
typedef std::string::const_iterator
TextIter ;
std::vector< std::string >
result ;
TextIter current = source.begin() ;
TextIter const end = source.end() ;
while ( current != end ) {
TextIter fieldBegin = current ;
current = std::find( current, end, ':' ) ;
result.push_back( std::string( fieldBegin, current ) ) ;
if ( current != end ) {
++ current ;
}
}
return result ;
}

This gives you an array of strings, with an emtpy string between
:: (so when you see an empty string, you know you have a range).
So you could do something like:

int
toInt( std::string const& string )
{
std::istringstream cvt( string ) ;
int result ;
cvt >result ;
return result ;
}

std::vector< int >
convert( std::vector< std::string const& source )
{
typedef std::vector< std::string >::const_iterator
FieldIter ;
std::vector< int result ;
FieldIter current = source.begin() ;
FieldIter const end = source.end() ;
while ( current != end ) {
result.push_back( toInt( *current ) ) ;
++ current ;
if ( current != end && *current == "" ) {
int bottom = result.back() ;
++ current ;
int top = toInt( *current ) ;
if ( top <= bottom ) {
throw someError ;
}
while ( ++ bottom <= top ) {
result.push_back( bottom ) ;
}
++ current ;
}
}
sort( result.begin(), result.end() ) ;
// Or you might want to track the last seen to ensure
// that the input was correctly sorted.
return result ;
}

Note that all of the above code supposes the precheck on the
format using regex. Otherwise, you'll need a lot more error
handling and special cases.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Nov 1 '08 #3

P: n/a
tonywh00t wrote:
I'm currently using the Boost C++ library, and i've
worked out some pretty ugly solutions. If anyone has a suggestion, I'd
very much appreciate it. Thanks!
My experience is that whenever you need to parse input data which is
more complicated than fixed-format whitespace-separated elements, the
parsing code always becomes very complicated in C++ (as well as C). The
C/C++ language has clearly not been designed to be a language which you
can use to create complicated format parsers with one-liners. Often not
even with 100-liners (especially if you want full error checking).

Of course libraries have been developed during the decades to try to
help this, but they often only help more on the abstraction rather than
on the verbosity and complexity of the code.
Nov 1 '08 #4

P: n/a
thanks guys very much for your suggestions and help =).
Nov 1 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.