By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,660 Members | 1,102 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,660 IT Pros & Developers. It's quick & easy.

basic python questions

P: n/a
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.

Nov 18 '06 #1
Share this Question
Share on Google+
21 Replies


P: n/a
na*******@gmail.com wrote:
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.
Hi,
on first reading, you have a naked except clause that catches all
exceptions. You might want to try your program on a non-existent file
to find out the actual exception you need to trap for that error
message. Do you want the program to continue if you have no input file?

If you have not covered Regular Expressions, often called RE's then one
way of getting rid of puctuation is to turn the problem on its head.
create a string of all the characters that you consider as valid in
words then go through each input line discarding any character not *in*
the string. Use the doctored line for word extraction.

help(sorted) will start you of on sorting in python. Other
documentation sources have a lot more.

P.S. I have not run the code myself
P.P.S. Where is the functions docstring!
P.P.P.S. You might want to read up on enumerate. It gives another way
to do things when you want an index as well as each item from an
iterable but remember, the index given starts from zero.

Oh, and welcome to comp.lang.python :-)

- Paddy.

Nov 18 '06 #2

P: n/a
In <11*********************@j44g2000cwa.googlegroups. com>,
na*******@gmail.com wrote:
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.
if lines[line_num] == "": continue
Take a look at the lines you've read and you'll see why the ``continue``
is never executed.
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
Nov 18 '06 #3

P: n/a
na*******@gmail.com wrote:
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:
looks like an excellent start to me.
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks
it depends a bit how you define the term "word".

if you're using regular text, with a limited set of punctuation
characters, you can simply do e.g.

word = word.strip(".,!?:;")
if not word:
continue

inside the "for word" loop. this won't handle such characters if they
appear inside words, but that's probably good enough for your task.

another, slightly more advanced approach is to use regular expressions,
such as re.findall("\w+") to get a list of all alphanumeric "words" in
the text. that'll have other drawbacks (e.g. it'll split up words like
"couldn't" and "cross-reference", unless you tweak the regexp), and is
probably overkill.

and how do I sort the list and

how to sort the dictionary when printing the cross-reference, you mean?
just use "sorted" on the dictionary; that'll get you a sorted list
of the keys.

sorted(dict)

to avoid duplicates and simplify sorting, you probably want to normalize
the case of the words you add to the dictionary, e.g. by converting all
words to lowercase.
if there anything else that I am doing wrong in this code
there's plenty of things that can be tweaked and tuned and written in a
slightly shorter way by an experienced Python programmer, but assuming
that this is a general programming assignment, I don't see something
seriously "wrong" in your code (just make sure you test it on a file
that doesn't exist before you hand it in)

</F>

Nov 18 '06 #4

P: n/a
<na*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...
>I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:
And in general, this is one of the best "can anyone help me with my
homework?" posts I've ever seen.
A. You told us up front that it was your homework.
B. You made an honest stab at the solution before posting, and posted the
actual code.
C. You ended with some specific questions on things that didn't work or that
you wanted to improve.

Your current program looks like at least A- material. Add use of sorted and
enumerate, and handle that exception a little better, and you're getting
into A+ territory.

Out of curiosity, what school are you attending that is teaching Python, and
under what course of study?

-- Paul
Nov 18 '06 #5

P: n/a
I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of
the course, he spent about 2 - 3 weeks on python which has been good. I
currently work with .net and it is fun to see what other languages have
and what sytax they use.

Paul McGuire wrote:
<na*******@gmail.comwrote in message
news:11*********************@j44g2000cwa.googlegro ups.com...
I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

And in general, this is one of the best "can anyone help me with my
homework?" posts I've ever seen.
A. You told us up front that it was your homework.
B. You made an honest stab at the solution before posting, and posted the
actual code.
C. You ended with some specific questions on things that didn't work or that
you wanted to improve.

Your current program looks like at least A- material. Add use of sorted and
enumerate, and handle that exception a little better, and you're getting
into A+ territory.

Out of curiosity, what school are you attending that is teaching Python, and
under what course of study?

-- Paul
Nov 18 '06 #6

P: n/a
I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict

Marc 'BlackJack' Rintsch wrote:
In <11*********************@j44g2000cwa.googlegroups. com>,
na*******@gmail.com wrote:
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):

Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.
if lines[line_num] == "": continue

Take a look at the lines you've read and you'll see why the ``continue``
is never executed.
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)

Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
Nov 18 '06 #7

P: n/a
tom
na*******@gmail.com wrote:
I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

Try this in the interpreter,

l = [5,4,3,2,1]
for count, i in enumerate(l):
print count, i

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict

Marc 'BlackJack' Rintsch wrote:
>In <11*********************@j44g2000cwa.googlegroups. com>,
na*******@gmail.com wrote:

>>def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.

>> if lines[line_num] == "": continue
Take a look at the lines you've read and you'll see why the ``continue``
is never executed.

>> words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch

Nov 18 '06 #8

P: n/a
tom
tom wrote:
na*******@gmail.com wrote:
>I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.
Try this in the interpreter,

l = [5,4,3,2,1]
for count, i in enumerate(l):
print count, i
you could do it like this.

for count, line in enumerate(fb):
for word in line.split():
etc...

filehandles are iterators themselves.

dont take my words for granted though, i'm kinda new to all this too :)
>def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict

Marc 'BlackJack' Rintsch wrote:

>>In <11*********************@j44g2000cwa.googlegroups. com>,
na*******@gmail.com wrote:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.

if lines[line_num] == "": continue
Take a look at the lines you've read and you'll see why the ``continue``
is never executed.

words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch



Nov 18 '06 #9

P: n/a
na*******@gmail.com schrieb:
I have taken the coments and think I have implemented most. My only
Unfortunately, no.
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"
Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"
Diez
Nov 18 '06 #10

P: n/a
na*******@gmail.com wrote:
dict = {}
As a general rule you should avoid variable names which shadow built in
types (list, dict, etc.). This can cause unexpected behavior later on.

Also, variable names should be more descriptive of their contents.

Try word_dict or some such variant

Nov 18 '06 #11

P: n/a
So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5

Diez B. Roggisch wrote:
na*******@gmail.com schrieb:
I have taken the coments and think I have implemented most. My only

Unfortunately, no.
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename

You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"
Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"
Diez
Nov 19 '06 #12

P: n/a
So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5

Diez B. Roggisch wrote:
na*******@gmail.com schrieb:
I have taken the coments and think I have implemented most. My only

Unfortunately, no.
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename

You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"
Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"
Diez
Nov 19 '06 #13

P: n/a
na*******@gmail.com wrote:
So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5
Just try shortening the statement to the bare:
raise

For example:

| >>try:
| ... f = open("nonesuch.txt")
| ... except IOError:
| ... raise
| ...
| Traceback (most recent call last):
| File "<stdin>", line 2, in <module>
# Coming from a file you'll get filename, linenumber, function/method
above
| IOError: [Errno 2] No such file or directory: 'nonesuch.txt'
| >>>

If you feel that the error message that you get is descriptive enough,
even better than what you'd contemplated writing yourself, you're done.
Otherwise you need to raise an instance of the Exception class, and the
degree of difficulty just went up a notch.

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

Cheers,
John
>
Diez B. Roggisch wrote:
na*******@gmail.com schrieb:
I have taken the coments and think I have implemented most. My only
Unfortunately, no.
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.
>
def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"
Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"
Diez
Nov 19 '06 #14

P: n/a

John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?
The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.

Nov 19 '06 #15

P: n/a
<na*******@gmail.comwrote:

I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of
what does "upper division" mean in this context ? I am unfamiliar with the
term.

- Hendrik

Nov 19 '06 #16

P: n/a
"Hendrik van Rooyen" <ma**@microcorp.co.zawrote in message
news:ma**************************************@pyth on.org...
<na*******@gmail.comwrote:

>I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of

what does "upper division" mean in this context ? I am unfamiliar with
the
term.

- Hendrik
In a 4-year college program in the US, an upper division course is an
advanced course, usually reserved for those in the 3rd or 4th years.

-- Paul
Nov 19 '06 #17

P: n/a

Paddy wrote:
John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.
In fact, he may well by now know more than his instructor, and be
explaining the finer points of Python :-)

Nov 19 '06 #18

P: n/a
I normaly try to be as resourceful as I can. I find that newgroups give
a wide range of answers and solutions to problems and you get a lot
responses to what is the right way to do things and different point of
views about the language that you can't find in help manuals. I also
want to thank everyone for being so helpful in this group, it has been
one of the better groups that I have used.
John Machin wrote:
Paddy wrote:
John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?
>
The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.

In fact, he may well by now know more than his instructor, and be
explaining the finer points of Python :-)
Nov 19 '06 #19

P: n/a
Diez B. Roggisch a écrit :
na*******@gmail.com schrieb:
>I have taken the coments and think I have implemented most. My only


Unfortunately, no.
>question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename


You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"
Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"

And this would be still more informative (and not deprecated...):

filename = '/tmp/foo'
f = open(fliename)

Catching an exception just to raise a less informative one is somewhat
useless IMHO.
Nov 19 '06 #20

P: n/a

Bruno Desthuilliers wrote:
And this would be still more informative (and not deprecated...):

filename = '/tmp/foo'
f = open(fliename)

Catching an exception just to raise a less informative one is somewhat
useless IMHO.
Except that in the 'artificial' environment of homework, you are marked
for what you *show* you know, Catching an exception shows that the
pupil considered that opening a file could throw an exception.
You would have to comment that a try except block was considered, give
an example of correct usage, then why it was not used, to get
equivalent, (or slightly higher), marks.

(That was drilled into me so many times at school).

- Paddy.

Nov 20 '06 #21

P: n/a
Paddy a écrit :
Bruno Desthuilliers wrote:
>>And this would be still more informative (and not deprecated...):

filename = '/tmp/foo'
f = open(fliename)

Catching an exception just to raise a less informative one is somewhat
useless IMHO.


Except that in the 'artificial' environment of homework, you are marked
for what you *show* you know, Catching an exception shows that the
pupil considered that opening a file could throw an exception.
Surely - but then he should also show that he knows what to do with it
(ie sys.exit with a friendlier error message). Bad exception handling is
worse than no exception handling (MHO based on experience).
You would have to comment that a try except block was considered, give
an example of correct usage, then why it was not used, to get
equivalent, (or slightly higher), marks.
This should certainly get a *far* better marks than a broken exception
handling...

FWIW, the real problem here is with the xref function being responsible
for file access. It should of course take a filelike object and let the
caller deal with opening/closing the file - and handling possible
exceptions in a way that's appropriate.
Nov 20 '06 #22

This discussion thread is closed

Replies have been disabled for this discussion.