By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,172 Members | 727 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,172 IT Pros & Developers. It's quick & easy.

splitting strings with python

P: n/a
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

..

Jul 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a

<sb******@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

.


i don't think you can do it with string.split, although i guess you could do
it with re.split, although i think it's easier to use re.findall.

import re
re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

should work.


Jul 19 '05 #2

P: n/a
one problem is that str1 is unicode (japanese kanji), and str2 is
japanese kana

can i still use re.findall(~)?

thanks for your help!

Jul 19 '05 #3

P: n/a
sorry, i should be more specific about the encoding

it's euc-jp

i googled alittle, and you can still use re.findall with the japanese
kana, but i didnt find anything about kanji.

Jul 19 '05 #4

P: n/a
sb******@gmail.com wrote:
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)
str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
the varibles i need are str*, def*.
Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
- are str* and def* single words or can they include whitespace, comma, slash, paren...
- not clear what replaces the ... (or if they are literal)

This might be a good job for PyParsing.

Kent
sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings
"..." means that there are sometimes more then two definitions per
meaning.
im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

.

Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.