By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,257 Members | 2,812 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,257 IT Pros & Developers. It's quick & easy.

output from one module is not being inserted to other function

P: 3
Expand|Select|Wrap|Line Numbers
  1. from __future__ import print_function
  2. import os, codecs, mysql, mysql.connector, re, string
  3.  
  4. #Reading files with txt extension
  5. def get_sentences():
  6.     for root, dirs, files in os.walk("/Users/document/test1"):
  7.         for file in files:
  8.             if file.endswith(".txt"):
  9.                 x_ = codecs.open(os.path.join(root,file),"r", "utf-8-sig")
  10.                 for lines in x_.readlines():
  11.                     yield lines
  12. formoreprocessing = get_sentences()
  13.  
  14. #Tokenizing sentences of the text files
  15. from nltk.tokenize import sent_tokenize
  16. for i in formoreprocessing:
  17.     raw_docs = sent_tokenize(i)
  18.     tokenized_docs = [sent_tokenize(i) for sent in raw_docs]
  19. #    print (tokenized_docs)
  20.  
  21. #Removing stop words
  22. stopword_removed_sentences = []
  23. from nltk.corpus import stopwords
  24. stopset = stopwords.words("English")
  25.  
  26. def strip_stopwords(sentence):
  27.     return ' '.join(word for word in sentence.split() if word not in stopset)
  28. stopword_removed_sentences = (strip_stopwords(sentence) for sentence in raw_docs)
  29.  
  30. '''Removing punctation marks'''
  31. regex = re.compile('[%s]' % re.escape(string.punctuation))
  32. nw = []
  33. for review in stopword_removed_sentences:
  34.     new_review = ''
  35.     for token in review: 
  36.         new_token = regex.sub(u'', token)
  37.         if not new_token == u'':
  38.             new_review += new_token
  39.     nw.append(new_review)
  40. --------------
  41. --------------
  42.  
  43.  
I am getting the output upto tokenized_docs but not after that..
Output from tokenized_docs is not being inserted into next module as input.
Please help me to figure out the problem.
Thanks
Jun 16 '16 #1
Share this Question
Share on Google+
2 Replies


Expert 100+
P: 621
tokenized_docs is not used anywhere in the program. What were you expecting that the program should do. What does this mean?
not being inserted into next module as input
Jun 16 '16 #2

P: 3
Thanks. I just want to remove the stop words from the output of tokenized_docs for which the ouput of tokenized_docs must be inserted as the input of # Removed stop_word module. I tried my bes t but unable to do so.
Jun 17 '16 #3

Post your reply

Sign in to post your reply or Sign up for a free account.