473,386 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

How to count line number of incorrect words in a set using a dictionary

I have a txt file with words in it and I had to print the incorrect words into a set which I have.
Now I need to find in which line the incorrect words are in the text file and print it as a dictionary
for e.g. it should look like together:
togeher 3 4 #word being the incorrect word and 3 and 4 the line number where it is located in the txt file.

I know I need to use a line counter but dont know how to use it

words = [] # is my txt file
text1 # is my set of incorrect words
i have done this so far:
d = {} # an empty dictionary
key = text1
value = linecounter # dont know what to assign the value to
for
May 22 '10 #1
20 2229
bvdet
2,851 Expert Mod 2GB
Assume you have a function correct(word) that returns False if a word is incorrect. The following untested code would compile a dictionary of the incorrect words where the words are keys and the line numbers are contained in a list associated with the keys.
Expand|Select|Wrap|Line Numbers
  1. f = open("words.txt")
  2. dd = {}
  3. for i, word in enumerate(f):
  4.     if not correct(word.strip()):
  5.         dd.setdefault(word, []).append(i)
  6. f.close()
May 22 '10 #2
@bvdet
cheers for your reply ive done this
Expand|Select|Wrap|Line Numbers
  1. d = {}
  2. def correct(word):
  3.     for i, word in enumerate(words):
  4.         if not correct(word.strip()):
  5.             d.setdefault(word, []).append(i)
  6.     print(d)
but how would i print the word and the line number next to each
e.g.
loo: 5 8 # numbers being the line number
so would i create something like
keys = text1 # text1 being my incorrect words
then what though
May 23 '10 #3
bvdet
2,851 Expert Mod 2GB
Please use code tags when posting code.

To print the contents of a dictionary, iterate on the dictionary and format the printing of the key and value. In this case, the value is a list of line numbers, so...
Expand|Select|Wrap|Line Numbers
  1. >>> dd = {"word1": [6,9], "word2": [5], "word3": [0,3,8]}
  2. >>> for key in dd:
  3. ...     print "%s: %s" % (key, ", ".join([str(n) for n in dd[key]]))
  4. ...     
  5. word1: 6, 9
  6. word3: 0, 3, 8
  7. word2: 5
  8. >>> 
May 23 '10 #4
i have only been programming for 2 months and a bit so im not fully following you
so is the code that i prevously posted correct ?
and do i have to add what u said in you post by
Expand|Select|Wrap|Line Numbers
  1. for key in d:
  2. print ....
May 23 '10 #5
bvdet
2,851 Expert Mod 2GB
Your definition of function correct() is not right. You did not understand my post that explained that you need a function or some code that decides for you if a word is correct or not. If a word is correct, do not add it to the dictionary. If a word is incorrect, add it. In pseudo code:
Expand|Select|Wrap|Line Numbers
  1. def correct(word):
  2.     If word is a correct word, return True
  3.     Else, word is incorrect, return False
My post regarding printing - I used a list comprehension to create a string of the line numbers. It is equivalent to:
Expand|Select|Wrap|Line Numbers
  1. >>> dd[key]
  2. [5, 12, 16]
  3. >>> tem = []
  4. >>> for n in dd[key]:
  5. ...     tem.append(str(n))
  6. ...     
  7. >>> ", ".join(tem)
  8. '5, 12, 16'
  9. >>> 
May 23 '10 #6
Glenton
391 Expert 256MB
You don't have to do what he said, but your code is not correct. You have correct called in your definition of correct. Of course this is actually allowed, but I don't think it's what you intend, and it wouldn't work in your situation.

What @bvdet was asking was how are you going to determine whether a word is correct or not?
May 23 '10 #7
@Glenton
i have already created a list of incorrect words by comparing the txtfile and a dictionary i used now all i want to do is count the line number for each incorrect word in the txt file
May 23 '10 #8
i have tried another piece of code this:
Expand|Select|Wrap|Line Numbers
  1. from collections import defaultdict
  2. d = defaultdict(list)
  3. for lineno, word in enumerate(words):
  4.     if word in text1:
  5.         d[word].append(lineno)
  6. print(d)
but it prints the incorrect word and like which place it is not the line it is located in
May 23 '10 #9
Glenton
391 Expert 256MB
Okay, re-reading it seems we've been missing you about the line numbers. Sorry about that.

Am I right in saying that your original text file has a bunch of lines each with a bunch of words, and you're trying to figure out how to figure out which line the incorrectly spelled words are in. But all you have is the words list.

I don't see how this is possible. There seems to be no information linking the words in the words list to the line number from the original file. So probably the best is to do this when your extracting the information from the file in the first place! Ie fiddle around with the code that you used to create the incorrect file list.

Regarding getting the line number, something like this will work fine:
Expand|Select|Wrap|Line Numbers
  1. myfile=open("file.txt")
  2. for i,line in myfile:
  3.     print i+1,line  #i starts from 0, so if you don't want that, you need to add 1
  4. myfile.close()
May 24 '10 #10
@Glenton
this is what i have
# text is a list of my txt file
# words is a list of my incorrect words
i want to find the line number of the incorrect words in the txt file ?
May 24 '10 #11
Glenton
391 Expert 256MB
@lightning18
Oh, so you're just looking for the index command.

Expand|Select|Wrap|Line Numbers
  1. In [5]: text="helo mum how arew you".split(" ")
  2.  
  3. In [6]: text
  4. Out[6]: ['helo', 'mum', 'how', 'arew', 'you']
  5.  
  6. In [7]: words=["arew","helo"]
  7.  
  8. In [8]: for w in words:
  9.    ...:     print w, text.index(w)
  10.    ...:     
  11.    ...:     
  12. arew 3
  13. helo 0
  14.  
A quick browse through the python docs or a text book or whatever is a good idea just to get a feel for what's possible.

Unless I'm still not understanding what you're wanting!
May 24 '10 #12
Glenton
391 Expert 256MB
Oh, so maybe the word appears multiple times. Similar idea.

Eg this function:
Expand|Select|Wrap|Line Numbers
  1. def findLineNos(text,word):
  2.     "returns a list of all the line numbers where word appears"
  3.     ans=[]
  4.     reps = text.count(word)
  5.     n=0
  6.     for i in range(reps):
  7.         ans.append(text[n:].index(word)+n)
  8.         n=text[n:].index(word)+1
  9.     return ans
  10.  
  11. text="helo mum how arew you helo mum how arew you".split(" ")
  12. words=["arew","helo","false"]
  13.  
  14. for w in words:
  15.     print w,findLineNos(text,w)
  16.  
returns this:
Expand|Select|Wrap|Line Numbers
  1. arew [3,8]
  2. helo [0,5]
  3. false []
May 24 '10 #13
cheers grenton that is pretty much what i want a set of incorrect words and the line number its located in the txtfile howver i get an error this is my code
the error is:
syntaxerror: invalid syntac
Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import string
  3.  
  4. text = []
  5. infile = open(sys.argv[1], 'r').read()
  6. for punct in string.punctuation:
  7.     infile = infile.replace(punct, "")
  8.     text = infile.split()
  9.  
  10. dict = open(sys.argv[2], 'r').read()
  11. dictset = []
  12. dictset = dict.split()
  13.  
  14. words = []
  15. words = list(set(text) - set(dictset))
  16. words = [text.lower() for text in words]
  17. words.sort()
  18.  
  19. def findline(text, word):
  20.     ans = []
  21.     reps = text.count(word)
  22.     n = 0
  23.     for i in range(reps):
  24.         ans.append(text[n:].index(word)+n)
  25.         n = text[n:].index(word)+1
  26.     return ans
  27. for w in words:
  28.     print(w,findline(text, w)
  29.  
  30.  
May 24 '10 #14
Glenton
391 Expert 256MB
You'll need to be more specific than that on the error code. I can't run your file cos I don't have your inputs, so I'm guessing just by reading your code.

However, looking at it, it seems that text is a list of words, with no line information. Changing your line 8 to
Expand|Select|Wrap|Line Numbers
  1. text = infile.split("\n")
will mean that text is a list of the lines from the text file, rather than a list of words.

This should make it possible.
May 24 '10 #15
still get the same error
Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import string
  3.  
  4. text = []
  5. infile = open(sys.argv[1], 'r').read()
  6. for punct in string.punctuation:
  7.     infile = infile.replace(punct, "")
  8.     text = infile.split("\n")
  9.  
  10. dict = open(sys.argv[2], 'r').read()
  11. dictset = []
  12. dictset = dict.split()
  13.  
  14. words = []
  15. words = list(set(text) - set(dictset))
  16. words = [text.lower() for text in words]
  17. words.sort()
  18.  
  19.  
  20. def findline(text, word):
  21.     ans = []
  22.     reps = text.count(word)
  23.     n = 0
  24.     for i in range(reps):
  25.         ans.append(text[n:].index(word)+n)
  26.         n = text[n:].index(word)+1
  27.     return ans
  28. for w in words:
  29.     print(w,findline(text, w)
  30.  
May 24 '10 #16
Glenton
391 Expert 256MB
I wasn't trying to solve your syntax error. You need to post more details or do your own debugging. There's normally a clue about where the syntax error is with whatever compiler your using.

Although looking through your line 29 is wrong. print w,etc instead of print(w,etc

Incidentally lines 4,7 and 14 are not needed.
May 24 '10 #17
@Glenton
im using python
and this is my first program i have ever made in my life
so im not good with syntax
the code i showed u is fully what i have
the rror im getting is on line 29 all the time
May 24 '10 #18
Glenton
391 Expert 256MB
@lightning18
As I said in my previous message replace line 29 with this:
Expand|Select|Wrap|Line Numbers
  1. print w,findline(text, w)
Your current line 29 doesn't have matching brackets.
May 24 '10 #19
@Glenton
it still gives me a synatx error for
Expand|Select|Wrap|Line Numbers
  1.  print w,findline(text, w)
Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import string
  3.  
  4.  
  5. infile = open(sys.argv[1], 'r').read()
  6. for punct in string.punctuation:
  7.     text = infile.split()
  8.  
  9. dict = open(sys.argv[2], 'r').read()
  10. dictset = []
  11. dictset = dict.split()
  12.  
  13. words = list(set(text) - set(dictset))
  14. words = [text.lower() for text in words]
  15. words.sort()
  16.  
  17.  
  18. def findline(text, word):
  19.     ans = []
  20.     reps = text.count(word)
  21.     n = 0
  22.     for i in range(reps):
  23.         ans.append(text[n:].index(word)+n)
  24.         n = text[n:].index(word)+1
  25.     return ans
  26. for w in words:
  27.     print w,findline(text, w)
May 24 '10 #20
Glenton
391 Expert 256MB
I'm afraid I have no idea why. You're going to have to debug it. This is a normal part of coding. Try commenting out bits of the code and rerunning, until you narrow it down to where it is.

Good luck!
May 24 '10 #21

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: janet | last post by:
how can i count how many words have i written in a text area??? Like taking an example ... i am writing in this textarea of microsoft usergroup. and say in total i have written 50 words .. how...
9
by: cw bebop | last post by:
Hi all Using Visual Studio C# Have a string string st = "Hi, these pretzels are making me thirsty; drink this tea. Run like heck." ******
1
by: Oleg.Ogurok | last post by:
Hi there, The .pdb files are generally not installed in a production environment. As a result, when an exception occurs, the runtime can't resolve the lines of the code where the problem...
1
by: Sandesh | last post by:
Hello All, Me saying " has any body come across such error would be underestimating". Well I am getting a very peculiar and unique error "Line 1: Incorrect syntax near 'Actions'." ...
6
by: Gary Wessle | last post by:
hi I have a data file with equal number of columns for each row. I need to get the number of rows and columns to allocate a matrix in gsl. getline (in, line) and parse the line for the number of...
0
by: Tigerlily | last post by:
Hello! I need to count the number of words in a string read in from an infile, in a function, but I don't know how to do this. This is what I have so far. //Tiffany Lynn Goodseit #include...
2
by: surekhareddy | last post by:
can i count the number of words in a file
13
by: humaid | last post by:
hi,guys i have done a program to count the number of bigrams. i have taken a input file by using @ARGV,then icounted the number of lines in the file,using the split function i splited the sentence...
1
by: jaisi | last post by:
Hi I have a csv file with 3 columns. 1...."bkpf","zr","PDF" 2:.. "bkpf","zq","FAX" Now i have to write a batch program to count the number of pdf files and fax files and watever other...
2
by: alwaali | last post by:
Hi I need help please This is my project and i need a help to solve it with you A page of text is to be read and analyzed to determine number of occurrences and locations of different words. The...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.