469,632 Members | 1,732 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,632 developers. It's quick & easy.

splitting strings with python

im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

..

Jul 19 '05 #1
4 1716

<sb******@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

.


i don't think you can do it with string.split, although i guess you could do
it with re.split, although i think it's easier to use re.findall.

import re
re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

should work.


Jul 19 '05 #2
one problem is that str1 is unicode (japanese kanji), and str2 is
japanese kana

can i still use re.findall(~)?

thanks for your help!

Jul 19 '05 #3
sorry, i should be more specific about the encoding

it's euc-jp

i googled alittle, and you can still use re.findall with the japanese
kana, but i didnt find anything about kanji.

Jul 19 '05 #4
sb******@gmail.com wrote:
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.
Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
- are str* and def* single words or can they include whitespace, comma, slash, paren...
- not clear what replaces the ... (or if they are literal)

This might be a good job for PyParsing.

Kent
sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

.

Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

18 posts views Thread by robsom | last post: by
5 posts views Thread by Steven Bethard | last post: by
7 posts views Thread by Jeremy Sanders | last post: by
2 posts views Thread by Trint Smith | last post: by
20 posts views Thread by Opettaja | last post: by
4 posts views Thread by Michael Yanowitz | last post: by
17 posts views Thread by Qiangning Hong | last post: by
2 posts views Thread by shadow_ | last post: by
4 posts views Thread by Steven D'Aprano | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.