Angus Mackay <ye**@right.com > wrote:
I remember python having a generic tokenizer in the library. all I want
is to set a list of token seperators and then read tokens out of a
stream, the token seperators should be returned as themselves.
is there anything like this?
Not as such in the standard library: the functions in module tokenizer
do not let you 'set a list of token separators'. If what you're
tokenizing can fit in a string in memory, module re can help:
x=re.compile('( \s+|,|;)')
for w in x.split('a,b, c;d; e'): print repr(w),'+',
....
'a' + ',' + 'b' + ',' + '' + ' ' + 'c' + ';' + 'd' + ';' + '' + ' ' +
'e' +
Note that you get empty-string items when two separators abut.
If the limitations of re.split (stuff must fit in memory, &c) are a
problem, then the lexx-like solutions I see somebody else suggested may
be more appropriate for your needs.
Alex