Connecting Tech Pros Worldwide Forums | Help | Site Map

string splitting

xyz
Guest
 
Posts: n/a
#1: Jun 27 '08
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help

Lars Uffmann
Guest
 
Posts: n/a
#2: Jun 27 '08

re: string splitting


Google for c++ explode, you'll find a lot of infos on how to do it.
alasham.said@gmail.com
Guest
 
Posts: n/a
#3: Jun 27 '08

re: string splitting


See "A string tokenizer" here: http://oopweb.com/CPP/Documents/CPPH...g-HOWTO-7.html

Regards.
Jim Langston
Guest
 
Posts: n/a
#4: Jun 27 '08

re: string splitting




--
Jim Langston
tazmaster@rocketmail.com
"xyz" <lavanyareddy.p@gmail.comwrote in message
news:8daf3ab2-37ef-47ac-b2b4-f34816e70f56@p25g2000hsf.googlegroups.com...
Quote:
>I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
>
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help
Not complete but giving you all the pieces.

You should use your favorite method for converting from strings to ints,
I'im showing a manual stringstream way, but I use a template myself.

Output is:

16:23:18.659343 -- time
131.188.37.230.22 -- srcaddress/port
131.188.37.59.1398 -- destaddress/port
tcp -- protocol
168 -- size

131.188.37.230 : 22

#include <string>
#include <sstream>
#include <iostream>

int main()
{
std::string Input( "16:23:18.659343 131.188.37.230.22 131.188.37.59.1398
tcp 168" );
std::stringstream Stream( Input );

std::string Time;
std::string SrcAddressPort;
std::string DestAddressPort;
std::string Protocol;
int Size;

if ( Stream >Time >SrcAddressPort >DestAddressPort >Protocol >>
Size )
{
std::cout << Time << " -- time\n" <<
SrcAddressPort << " -- srcaddress/port\n" <<
DestAddressPort << " -- destaddress/port\n" <<
Protocol << " -- protocol\n" <<
Size << " -- size\n\n";
}
else
std::cerr << "Parsing error\n";

std::string SrcAddress;
std::string PortString;
int SrcPort = 0;

SrcAddress = SrcAddressPort.substr( 0,
SrcAddressPort.find_last_of('.') );
PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
1, std::string::npos );

std::stringstream Convert;
Convert << PortString;
Convert >SrcPort;

std::cout << SrcAddress << " : " << SrcPort << "\n";

}



kwikius
Guest
 
Posts: n/a
#5: Jun 27 '08

re: string splitting



"xyz" <lavanyareddy.p@gmail.comwrote in message
news:8daf3ab2-37ef-47ac-b2b4-f34816e70f56@p25g2000hsf.googlegroups.com...
Quote:
>I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
>
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help
Parsing is best solved formally with a parser generator, for which the best
option is to write a grammar.

Below is a LL(1) grammar written as source code for slk parser:
LL(1) grammar is very similar to hand written parsing

http://home.earthlink.net/~slkpg/

In the grammar the parts prefixed with "__" are actions which you write
code for in C++ (or C ,Java or C#).
Slk does most of the rest of the working in creating the application

----------------

/*
slk grammar
integer and tcp are terminals from the lexer
*/

parser :
time src dest proto

time:
integer __hr : integer __min : integer __sec_int [ . integer __sec_frac ]

src:
integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port

dest:
integer __d1 . integer __d2 . integer __d3 . integer __d4

proto:
tcp integer __size


-----------------

regards
Andy Little






xyz
Guest
 
Posts: n/a
#6: Jun 27 '08

re: string splitting


On Apr 29, 3:32*pm, "kwikius" <a...@servocomm.freeserve.co.ukwrote:
Quote:
"xyz" <lavanyaredd...@gmail.comwrote in message
>
news:8daf3ab2-37ef-47ac-b2b4-f34816e70f56@p25g2000hsf.googlegroups.com...
>
>
>
Quote:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
>
Quote:
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 * -- srcaddress
22 * * * * * * * * * * * *--srcport
131.188.37.59 * *--destaddress
1398 * * * * * * * * *--destport
tcp * * * * * * * * * *--protocol
168 * * * * * * * * *--size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help
>
Parsing is best solved formally with a parser generator, for which the best
option is to write a grammar.
>
Below is a LL(1) grammar written as source code for slk parser:
LL(1) *grammar is very similar to hand written parsing
>
http://home.earthlink.net/~slkpg/
>
In the grammar the *parts prefixed with "__" are actions which you write
code for in C++ (or C ,Java or C#).
Slk does most of the rest of the working in creating the application
>
----------------
>
/*
slk grammar
integer and tcp are terminals *from the lexer
*/
>
parser :
* time src dest proto
>
time:
* integer __hr : integer __min : integer *__sec_int [ . integer __sec_frac ]
>
src:
* integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port
>
dest:
* integer __d1 . integer __d2 . integer __d3 . integer __d4
>
proto:
* tcp integer __size
>
-----------------
>
regards
Andy Little
i solved it....thanks to all
Default User
Guest
 
Posts: n/a
#7: Jun 27 '08

re: string splitting


xyz wrote:
Quote:
I have a string
Pick a language. You posted the same thing (twice) to comp.lang.c.





Brian
Jim Langston
Guest
 
Posts: n/a
#8: Jun 27 '08

re: string splitting


Jim Langston wrote:
Quote:
Quote:
>I have a string
>16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
>>
>for example lets say for the above string
>16:23:18.659343 -- time
>131.188.37.230 -- srcaddress
>22 --srcport
>131.188.37.59 --destaddress
>1398 --destport
>tcp --protocol
>168 --size
>i need to split the string such that i need to get all these
>parameters....
>the field widths are not fixed..i have some times four/three digits
>srcport ..so i cant do it with substr function...i need this in c++
>i am not getting an idea how to split it..
>thank you for any help
>
Not complete but giving you all the pieces.
>
You should use your favorite method for converting from strings to
ints, I'im showing a manual stringstream way, but I use a template
myself.
Output is:
>
16:23:18.659343 -- time
131.188.37.230.22 -- srcaddress/port
131.188.37.59.1398 -- destaddress/port
tcp -- protocol
168 -- size
>
131.188.37.230 : 22
>
#include <string>
#include <sstream>
#include <iostream>
>
int main()
{
std::string Input( "16:23:18.659343 131.188.37.230.22
131.188.37.59.1398 tcp 168" );
std::stringstream Stream( Input );
>
std::string Time;
std::string SrcAddressPort;
std::string DestAddressPort;
std::string Protocol;
int Size;
>
if ( Stream >Time >SrcAddressPort >DestAddressPort >>
Protocol >Size )
{
std::cout << Time << " -- time\n" <<
SrcAddressPort << " -- srcaddress/port\n" <<
DestAddressPort << " -- destaddress/port\n" <<
Protocol << " -- protocol\n" <<
Size << " -- size\n\n";
}
else
std::cerr << "Parsing error\n";
>
std::string SrcAddress;
std::string PortString;
int SrcPort = 0;
>
SrcAddress = SrcAddressPort.substr( 0,
SrcAddressPort.find_last_of('.') );
PortString = SrcAddressPort.substr(
SrcAddressPort.find_last_of('.') + 1, std::string::npos );
Oh, I forgot about a substr overload. This line can be simplified to:
PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
1 );

std::string::npos is default for 2nd paramenter.
Quote:
std::stringstream Convert;
Convert << PortString;
Convert >SrcPort;
>
std::cout << SrcAddress << " : " << SrcPort << "\n";
>
}
--
Jim Langston
tazmaster@rocketmail.com


James Kanze
Guest
 
Posts: n/a
#9: Jun 27 '08

re: string splitting


On Apr 29, 3:32 pm, "kwikius" <a...@servocomm.freeserve.co.ukwrote:
Quote:
"xyz" <lavanyaredd...@gmail.comwrote in message
Quote:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
Quote:
Quote:
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
Quote:
Parsing is best solved formally with a parser generator, for
which the best option is to write a grammar.
I don't think that there's a general consensus about that. None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.

In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jim Langston
Guest
 
Posts: n/a
#10: Jun 27 '08

re: string splitting


James Kanze wrote:
Quote:
On Apr 29, 3:32 pm, "kwikius" <a...@servocomm.freeserve.co.ukwrote:
Quote:
>"xyz" <lavanyaredd...@gmail.comwrote in message
Quote:
>>I have a string
>>16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
>
Quote:
Quote:
>>for example lets say for the above string
>>16:23:18.659343 -- time
>>131.188.37.230 -- srcaddress
>>22 --srcport
>>131.188.37.59 --destaddress
>>1398 --destport
>>tcp --protocol
>>168 --size
>>i need to split the string such that i need to get all these
>>parameters....
>>the field widths are not fixed..i have some times four/three digits
>>srcport ..so i cant do it with substr function...i need this in c++
>>i am not getting an idea how to split it..
>
Quote:
>Parsing is best solved formally with a parser generator, for
>which the best option is to write a grammar.
>
I don't think that there's a general consensus about that. None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.
>
In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.
Reading up on C++0x it is supposed to contain regular expressions. Which is
good for this, but bad because I hate regex.

But, truthfully, having regex in the language will make parsing this type of
thing a *lot* easier. Although to me regex expressions usually look like
just so much line noise.

--
Jim Langston
tazmaster@rocketmail.com


kwikius
Guest
 
Posts: n/a
#11: Jun 27 '08

re: string splitting


On Apr 30, 10:45*am, James Kanze <james.ka...@gmail.comwrote:
Quote:
On Apr 29, 3:32 pm, "kwikius" <a...@servocomm.freeserve.co.ukwrote:
<...>
Quote:
Quote:
Parsing is best solved formally with a parser generator, for
which the best option is to write a grammar.
>
I don't think that there's a general consensus about that. *None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.
I used to agree but someone some time ago "politely suggested" using a
formal parser rather than writing parsers by hand and now I am
completely converted. Parser generators will verify the grammar that
is presented to them and point out ambiguities that a hand written
parser would never spot. ( have written various parsers by hand ) and
are easier for others to understand

Also Bjarne Stroustrup himself says that C++ grammar is "absurd ".
See:

http://www.research.att.com/~bs/hopl-almost-final.pdf

page 38 column 2, half way down, para starting "However , tools and
environments..
Quote:
In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.
I'm sure no expert on regular expressions, but AFAIK you cant abstract
a part of a regular expression into a production ( e.g "integer" in my
above example ), so you end up with a long difficult to read and
verify expression ( which is hard work). If you could have
productions... I think you'd have a parser grammar. But as I say I am
no expert and I'm sure someone will correct me if I'm wrong about
that.

regards
Andy Little




James Kanze
Guest
 
Posts: n/a
#12: Jun 27 '08

re: string splitting


On Apr 30, 9:54 pm, kwikius <a...@servocomm.freeserve.co.ukwrote:
Quote:
On Apr 30, 10:45 am, James Kanze <james.ka...@gmail.comwrote:
Quote:
Quote:
On Apr 29, 3:32 pm, "kwikius"
<a...@servocomm.freeserve.co.ukwrote:
Quote:
<...>
Quote:
Quote:
Quote:
Parsing is best solved formally with a parser generator,
for which the best option is to write a grammar.
Quote:
Quote:
I don't think that there's a general consensus about that.
None of the C++ compilers I know use a parser generator for
their grammar, for example, but prefer hand written ones.
Quote:
I used to agree but someone some time ago "politely suggested"
using a formal parser rather than writing parsers by hand and
now I am completely converted. Parser generators will verify
the grammar that is presented to them and point out
ambiguities that a hand written parser would never spot. (I
have written various parsers by hand ) and are easier for
others to understand
I think it depends a lot on the grammar. I regularly use flex
for smaller things. In general, if the grammar isn't too
complex, a parser generator may be simpler (and if you define a
grammar yourself, you should definitely strive to make it not
too complex). In practice, however, most real programming
languages have very complex grammars (C++ is probably one of the
worst), and hand written parsers can usually give better error
messages, handle error recovery more gracefully, and it's also
easier to "cheat" a bit when necessary to make things work. (I
suspect, for example, that most C++ compilers use some sort of
backtracking in cases where it isn't clear from the initial
sequence whether you're dealing with a declaration or an
expression.)

As for "easier for others to understand", it obviously depends
on which "others". I've been hassled for using flex because
some of the "others" aren't familiar with the tool, and don't
feel at home with anything more complex than recursive descent.
Quote:
Also Bjarne Stroustrup himself says that C++ grammar is
"absurd". See:
Quote:
page 38 column 2, half way down, para starting "However ,
tools and environments..
Yes. C++ is one of the most difficult languages to parse.
Quote:
Quote:
In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of
extended regular expressions, such as those supported by
boost::regex.
Quote:
I'm sure no expert on regular expressions, but AFAIK you cant
abstract a part of a regular expression into a production (e.g
"integer" in my above example ), so you end up with a long
difficult to read and verify expression ( which is hard work).
If you could have productions... I think you'd have a parser
grammar. But as I say I am no expert and I'm sure someone will
correct me if I'm wrong about that.
The grammar that he's parsing is regular, so you don't need
anything more complicated than a regular expression. And the
regular expression matchers I know (e.g. my own or Boost) all
start with a string. So you would start with something like:

std::string const integer( "\\d+" ) ;

and build up the final expression as a string. For the original
problem, you might end up with something like:

std::string const integer( "\\d+" ) ;
std::string const spaces( "\\s+" ) ;
std::string const time(
integer + ":" integer + ":" + integer + "\\." +
integer ) ;
std::string const ipAddress(
integer + "\\." + integer
+ "\\." + integer
+ "\\." + integer ) ;
std::string const fullAddress(
ipAddress + "\\." + integer ) ;
// Or should this use a "/" as a
// separator?
std::string const protocol( "\l+" ) ;
// or "\S+" ?
std::string const line( time
+ spaces + fullAddress
+ spaces + fullAddress
+ spaces + protocol
+ spaces + integer ) ;
boost::regex pattern( line ) ;

As usual: divide and conquer. (Note that if you're not afraid
of a few local macros, the fact that C++ concatenates adjacent
string literals means that you can actually do all of this at
compile time, replacing the std::string const with #define, and
dropping the +'s.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Closed Thread