By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,548 Members | 1,735 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,548 IT Pros & Developers. It's quick & easy.

Split function to split sentence into words

P: 11
Hi,

i don't have enough experience in writing codes in Python but now i'm trying to see how i can start using Python.
I've tried to write a simple program that can display a sentence. now my problem is how to write a code using split function to split that sentence into words then print out each word separately. let me give u an example:

>>>sentence=" My question is to know how to write a code in Python"

then the output of this sentece must give:

sentence[1]=My
sentence[2]=question
sentence[3]=is
sentence[4]=to
sentence[5]=know
......
.......

Can someone help me in this?
Nov 15 '08 #1
Share this Question
Share on Google+
19 Replies


Expert 100+
P: 671
Always check the documentation, for something interesting. In this case, look at possible string methods ( http://www.python.org/doc/2.5.2/lib/string-methods.html ) and you will see a split function. Here’s a quick example.
Expand|Select|Wrap|Line Numbers
  1. >>> sent = "Jack ate the apple."
  2. >>> splitsent = sent.split(' ')
  3. >>> splitsent
  4. ['Jack', 'ate', 'the', 'apple.']
Simple as that.
Nov 15 '08 #2

P: 11
thank you for the help but the question is not fully answered! with this program it will split the sentence but i would like the output to be lets say if we have a= jack ate the apple, i would like the output to be:
a[0]=jack
a[1]=ate
a[2]=the
a[3]=apple

can you please see if its possible to get the above output?
Nov 15 '08 #3

bvdet
Expert Mod 2.5K+
P: 2,851
thank you for the help but the question is not fully answered! with this program it will split the sentence but i would like the output to be lets say if we have a= jack ate the apple, i would like the output to be:
a[0]=jack
a[1]=ate
a[2]=the
a[3]=apple

can you please see if its possible to get the above output?
The answer is: string formatting!

Example:
Expand|Select|Wrap|Line Numbers
  1. >>> sentence = "The dog ate my homework"
  2. >>> for i,word in enumerate(sentence.split()):
  3. ...     print "Word #%d: %s" % (i, word)
  4. ...     
  5. Word #0: The
  6. Word #1: dog
  7. Word #2: ate
  8. Word #3: my
  9. Word #4: homework
  10. >>> 
Nov 15 '08 #4

P: 11
thank you for the help, but as u can see with the output below when i do the command sentence[0] to show me the first word it is showing me "T" this is not what i want!!! for me i would like to see if i type the command sentence[0]; to display "The" and if i type again sentence[1]; it has to give me "dog"


can you plz help!
Expand|Select|Wrap|Line Numbers
  1. >>> sentence="The dog ate my homework"
  2. >>> for i, word in enumerate(sentence.split()):
  3. ...             print " word #%d: %s" % (i,word)
  4. ...
  5.  word #0: The
  6.  word #1: dog
  7.  word #2: ate
  8.  word #3: my
  9.  word #4: homework
  10. >>> sentence[0]
  11. 'T'
  12. >>> sentence[1];
  13. 'h'
  14. >>>
  15.  
Nov 16 '08 #5

bvdet
Expert Mod 2.5K+
P: 2,851
How about this?
Expand|Select|Wrap|Line Numbers
  1. >>> split_sentence = sentence.split()
  2. >>> split_sentence[0]
  3. 'The'
  4. >>> split_sentence[1]
  5. 'dog'
  6. >>> 
Nov 16 '08 #6

P: 11
ohhhh thank you so much.
thats what i wanted.
may God bless U.
once again thank you
Nov 16 '08 #7

P: 11
hi, i have another question related to the above:
I have created a file of more than 100 sentences in it then i saved it with extension .py , then i'm using the operations to open the file which are:
Expand|Select|Wrap|Line Numbers
  1. f=open("example.py")
  2. try:
  3.     for line in f:
  4.                     print line
  5. finally:
  6.           f.close()
  7.  
so after using the above commands im able to open my file. Now i know how to split a sentence into words, the problem comes now how can i do it on a file containing more than 100 sentences in it? with 1 or 2 senteces i can write the sentences and split them, now how about a file with many sentences?

can someone help me?
Nov 16 '08 #8

bvdet
Expert Mod 2.5K+
P: 2,851
Please use code tags around code. It will make your code much easier to read.

[CODE]..code goes here..[/CODE]

In your code, you are iterating on each line in the file. Each iteration, the variable line represents a sentence. Do you want to save the sentence in a list? What do you want to do with 100 sentences?

The following will save a list of lists. You can access each word by list index.
Expand|Select|Wrap|Line Numbers
  1. lineList = [line.strip().split() for line in open("your_file").readlines()]
  2. # print the first word in the first line.
  3. print lineList[0][0]
Nov 16 '08 #9

P: 11
i dont need to save a sentence in a list. what i want to do is : i take a document which has like any number of sentences then by using Python i would like to split the document of any number of sentences into words where each word has a number e.g., word1=the, word2= apple ect. then by this output i will use an other program that can help me to identify if word1 is a noun or not and son on. Brief after getting all the words in a document , I will try to identify only noun and extract only nouns from the doc.
Nov 16 '08 #10

bvdet
Expert Mod 2.5K+
P: 2,851
The previous code I posted above will work fine for your purpose. To get all the words in a single list:
Expand|Select|Wrap|Line Numbers
  1. wordList = reduce(lambda x,y: x+y, lineList, [])
Now you have a list of all the words. To iterate on the list of words:
Expand|Select|Wrap|Line Numbers
  1. >>> lineList = [['1','2','3'],['4','5','6']]
  2. >>> reduce(lambda x,y: x+y, lineList, [])
  3. ['1', '2', '3', '4', '5', '6']
  4. >>> wordList = reduce(lambda x,y: x+y, lineList, [])
  5. >>> for i,word in enumerate(wordList):
  6. ...     print "Word[%d]: %s" % (i,word)
  7. ...     
  8. Word[0]: 1
  9. Word[1]: 2
  10. Word[2]: 3
  11. Word[3]: 4
  12. Word[4]: 5
  13. Word[5]: 6
  14. >>> 
Nov 16 '08 #11

P: 11
hey thanks for the help.
my dear your last solution works perfectly with numbers!!!
but the one that i was lookin for is the solution u gave me in your reply number 9 :
Expand|Select|Wrap|Line Numbers
  1. lineList = [line.strip().split() for line in open("your_file").readlines()] 
  2. # print the first word in the first line. 
  3. print lineList[0][0]
  4.  
this solution is helping me to find one word at a time. imagine i have a doc of two pages, the above codes will take time. because when i'm typing like print lineList[0][3] it is giving me the third word in my doc which is perfect, but the problem i have to type print lineList[0][1] upto print lineList[0][n] with n the last word in my doc!!! i want codes like the above one but which will not ask me to type print lineList[][] to get only one word in my doc.
can u please help? i know the codes u gave me are working but the problem i have to type print lineList[][] for each word.
Nov 23 '08 #12

bvdet
Expert Mod 2.5K+
P: 2,851
If you have a list of lists:
Expand|Select|Wrap|Line Numbers
  1. >>> list_of_lists = [[1,2,3],[4,5,6],[7,8,9]]
  2. >>> for i, item in enumerate(list_of_lists):
  3. ...     for j, word in enumerate(item):
  4. ...         print "List item #%d, Word #%d: %s" % (i,j,word)
  5. ...         
  6. List item #0, Word #0: 1
  7. List item #0, Word #1: 2
  8. List item #0, Word #2: 3
  9. List item #1, Word #0: 4
  10. List item #1, Word #1: 5
  11. List item #1, Word #2: 6
  12. List item #2, Word #0: 7
  13. List item #2, Word #1: 8
  14. List item #2, Word #2: 9
  15. >>> 
Nov 24 '08 #13

P: 11
okay thank you so much for the help bvdet!!!
but i think u didn't get my question. Ok let me be clear and simple. let us assume i have a file called ex1.py, then in this doc i have more than one paragraph. to open the file i know the pocedure to open a file. now I would like to know if there is a way i can open the file, then read like one sentence or paragraph of the doc then after readin the sentence, i split that sentence such that if the sentence was "jack is a hard worker" i want to have the output like :
word 1: jack
word 2: is
word 3: a
word 4: hard
word 5: worker.

then after reading and splitting that sentence, i go to the next sentence in the file and do the same thing.

is there any way to do it in python?
I need help please!!!
Nov 30 '08 #14

bvdet
Expert Mod 2.5K+
P: 2,851
You will need to establish rules for determining what is a sentence. If the file is not too big, you can read the entire file into a string and split on the periods.
Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> s = 'This is a paragraph. How will we split it? We can use re module split()! We should get four sentences.'
  3. >>> sList = [item.strip() for item in re.split('[!?.]', s) if item]
  4. >>> sList
  5. ['This is a paragraph', 'How will we split it', 'We can use re module split()', 'We should get four sentences']
  6. >>> 
Nov 30 '08 #15

P: 11
thank you for ths answer but that is not want i want,
i have a file called ex1.py then to open it i do:
Expand|Select|Wrap|Line Numbers
  1. f=open("ex1.py")
  2. try:
  3. ........................
  4.  
then after all the procedures of opening a file i have:

jack is a brother of carine, .................................................. ...


My question is: is there anyway after opening this file which contain like 5 paragraph, to be splited into words?
Nov 30 '08 #16

bvdet
Expert Mod 2.5K+
P: 2,851
It seems that we already covered this:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. s = open('your_file').read()
  3. wordList = []
  4. for sentence in [item.strip() for item in re.split(r'[!?.]', s.replace('\n',' ')) if item]:
  5.     wordList.append(sentence.split())
Nov 30 '08 #17

P: 11
ohhhh thank you but using the codes i couldn't get anything. maybe i used it wrong. what do u mean when u put [! ? .] or ' '
it seems like i was supposed to put something instead of those symbols.
please look at what i did in the below code. ntbs1.py is my file.
Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> s=open('ntbs1.py').read()
  3. >>> wordList=[]
  4. >>> for sentence in [item.strip() for item in re.split(r'[!?.]', s.replace('\n',
  5. '')) if item]:
  6. ...       wordList.append(sentence.split())
  7. ...
  8. >>>
  9.  
Dec 1 '08 #18

bvdet
Expert Mod 2.5K+
P: 2,851
The sentences are split on the characters inside the brackets (!?.). Each newline character (\n) is replaced with a space character.
Dec 1 '08 #19

NeoPa
Expert Mod 15k+
P: 31,476
Fellya,

You have been asked to enclose all your code within [ CODE ] tags. We have rules on this site that demand you do that. Please pay attention in future to making sure all your code is posted that way to ensure it is easier to understand and doesn't waste the time of our experts trying to decipher it.

-Administrator.
Dec 3 '08 #20

Post your reply

Sign in to post your reply or Sign up for a free account.