471,330 Members | 1,567 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,330 software developers and data experts.

Good String Tokenizer

I have searched the board and noticed that there isn't really any sort
of good implementation of a string tokenizer that will tokenize based
on a custom set of tokens and return both the tokens and the parts
between the tokens.

For example, if I have the string:

"Hello, World! How are you?"

And my splitting points are comma, and exclamation point then I would
expect to get back.

["Hello", ",", " World", "!", " How are you?"]

Does anyone know of a tokenizer that will allow for this sort of use?

Thanks in advance,
Jim Howard

Jul 24 '07 #1
1 3339
JamesHoward <Ja************@gmail.comwrote:
I have searched the board
what board? I don't see any boards here.
>
And my splitting points are comma, and exclamation point then I would
expect to get back.

["Hello", ",", " World", "!", " How are you?"]

Does anyone know of a tokenizer that will allow for this sort of use?
>>import re
re.split("([!,])", "Hello, World! How are you?")
['Hello', ',', ' World', '!', ' How are you?']
Jul 24 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by blrmaani | last post: by
4 posts views Thread by Java Guy | last post: by
12 posts views Thread by Generic Usenet Account | last post: by
28 posts views Thread by Andre | last post: by
7 posts views Thread by Felix85 | last post: by
10 posts views Thread by Bilal | last post: by
1 post views Thread by xetulul | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.