Connecting Tech Pros Worldwide Forums | Help | Site Map

splitting strings with python

sbucking@gmail.com
Guest
 
Posts: n/a
#1: Jul 19 '05
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)


str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings


"..." means that there are sometimes more then two definitions per
meaning.


im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

..


inhahe
Guest
 
Posts: n/a
#2: Jul 19 '05

re: splitting strings with python



<sbucking@gmail.com> wrote in message
news:1118308380.823382.146430@g47g2000cwa.googlegr oups.com...[color=blue]
> im trying to split a string with this form (the string is from a
> japanese dictionary file with mulitple definitions in english for each
> japanese word)
>
>
> str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
>
>
> the varibles i need are str*, def*.
>
> sometimes the (1) and (2) are not included - they are included only if
> the word has two different meanings
>
>
> "..." means that there are sometimes more then two definitions per
> meaning.
>
>
> im trying to use the re.split() function but with no luck.
>
> Is this possible with python, or am i dreamin!?
>
> All the best,
>
> .
>[/color]

i don't think you can do it with string.split, although i guess you could do
it with re.split, although i think it's easier to use re.findall.

import re
re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

should work.




sbucking@gmail.com
Guest
 
Posts: n/a
#3: Jul 19 '05

re: splitting strings with python


one problem is that str1 is unicode (japanese kanji), and str2 is
japanese kana

can i still use re.findall(~)?

thanks for your help!

sbucking@gmail.com
Guest
 
Posts: n/a
#4: Jul 19 '05

re: splitting strings with python


sorry, i should be more specific about the encoding

it's euc-jp

i googled alittle, and you can still use re.findall with the japanese
kana, but i didnt find anything about kanji.

Kent Johnson
Guest
 
Posts: n/a
#5: Jul 19 '05

re: splitting strings with python


sbucking@gmail.com wrote:[color=blue]
> im trying to split a string with this form (the string is from a
> japanese dictionary file with mulitple definitions in english for each
> japanese word)
>
>
> str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
>
>
> the varibles i need are str*, def*.[/color]

Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
- are str* and def* single words or can they include whitespace, comma, slash, paren...
- not clear what replaces the ... (or if they are literal)

This might be a good job for PyParsing.

Kent[color=blue]
>
> sometimes the (1) and (2) are not included - they are included only if
> the word has two different meanings
>
>
> "..." means that there are sometimes more then two definitions per
> meaning.
>
>
> im trying to use the re.split() function but with no luck.
>
> Is this possible with python, or am i dreamin!?
>
> All the best,
>
> .
>[/color]
Closed Thread