Hi Guys,
I've written a Markov analysis program and would like to get your
comments on the code As it stands now the final input comes out as a
tuple, then list, then tuple. Something like ('the', 'water') ['us']
('we', 'took')..etc...
I'm still learning so I don't know any advanced techniques or methods
that may have made this easier.
here's the code:
def makelist(f): #turn a document into a list
fin = open(f)
results = []
for line in fin:
line = line.replace('"', '')
line = line.strip().split()
for word in line:
results.append(word)
return results
def markov(f, preflen=2): #f is the file to analyze, preflen is prefix length
convert_file = makelist(f)
mapdict = {} #dict where the prefixes will map to suffixes
start = 0
end = preflen #start/end set the slice size
for words in convert_file:
prefix = tuple(convert_file[start:end]) #tuple as mapdict key
suffix = convert_file[start + 2 : end + 1] #word as suffix to key
mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
start += 1
end += 1
return mapdict
def randsent(f, amt=10): #prints a random sentence
analyze = markov(f)
for i in range(amt):
rkey = random.choice(analyze.keys())
print rkey, analyze[rkey],
The book gave a hint saying to make the prefixes in the dict using:
def shift(prefix, word):
return prefix[1:] + (word, )
However I can't seem to wrap my head around incorporating that into the
code above, if you know a method or could point me in the right
direction (or think that I don't need to use it) please let me know.
Thanks for all your help,
Dave 5 1677
dave, few general comments to your code:
- Instead of using a comment that explains the meaning of a function,
add such things into docstrings.
- Your names can be improved, instead of f you can use file_name or
something like that, instead of convert_file you can use a name that
denotes that the conversion is already done, etc.
- You can use xrange instead of range and you can indent less, like 4
spaces.
- This line may be slow, you may want to find simpler ways to do the
same thing:
rkey = random.choice(analyze.keys())
- I suggest you to add doctests to all your functions.
Bye,
bearophile
On 2008-05-17 06:01:01 -0600, be************@lycos.com said:
dave, few general comments to your code:
- Instead of using a comment that explains the meaning of a function,
add such things into docstrings.
- Your names can be improved, instead of f you can use file_name or
something like that, instead of convert_file you can use a name that
denotes that the conversion is already done, etc.
- You can use xrange instead of range and you can indent less, like 4
spaces.
- This line may be slow, you may want to find simpler ways to do the
same thing:
rkey = random.choice(analyze.keys())
- I suggest you to add doctests to all your functions.
Bye,
bearophile
bear,
thanks for the suggestions. I use IDLE to write the code and when it's
working I paste it over into a new window. I'll tabify before saving
the pasted code. To add doctests would I need to use a certain
filename for the tests to be run on? Can you have doctests on random
functions?
Thanks
Dave
"dave" <sq*************@1ya2hoo3.netwrote in message
news:g0**********@news.xmission.com...
| bear,
| thanks for the suggestions. I use IDLE to write the code and when it's
| working I paste it over into a new window.
Or you can just save code you want to keep to a new name.
| To add doctests would I need to use a certain
| filename for the tests to be run on?
You can run a doctest on a file from within the file (as well as from
without).
if __name__ == '__main__': <run doctest>
I presume the manual gives the details.
| Can you have doctests on random functions?
???
tjr
dave:
>Can you have doctests on random functions?
Yes, you can add doctests to methods, functions, classes, module
docstrings, and in external text files.
Bye,
bearophile
dave wrote:
Hi Guys,
I've written a Markov analysis program and would like to get your
comments on the code As it stands now the final input comes out as a
tuple, then list, then tuple. Something like ('the', 'water') ['us']
('we', 'took')..etc...
I'm still learning so I don't know any advanced techniques or methods
that may have made this easier.
here's the code:
def makelist(f): #turn a document into a list
fin = open(f)
results = []
for line in fin:
line = line.replace('"', '')
line = line.strip().split()
for word in line:
results.append(word)
return results
What's you data look like? Just straight text?
>
def markov(f, preflen=2): #f is the file to analyze, preflen is
prefix length
convert_file = makelist(f)
mapdict = {} #dict where the prefixes will map to suffixes
start = 0
end = preflen #start/end set the slice size
for words in convert_file:
prefix = tuple(convert_file[start:end]) #tuple as mapdict key
suffix = convert_file[start + 2 : end + 1] #word as suffix to key
mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
start += 1
end += 1
return mapdict
What is convert_file??
>
def randsent(f, amt=10): #prints a random sentence
analyze = markov(f)
for i in range(amt):
rkey = random.choice(analyze.keys())
print rkey, analyze[rkey],
The book gave a hint saying to make the prefixes in the dict using:
def shift(prefix, word):
return prefix[1:] + (word, )
That's not a very helpful hint.
It works if you call it with a tuple and a word --- it shifts off the
front of the tuple ... so :
shift(('foo','bar') "word")
becomes ('bar', 'word')
Whoopty doo --- I'm not sure what that accomplishes!!
Unless the author means "pass a list and a randomly pick a word from the
list" in which case the return statement could be
random.choice(prefix) + (word, )
* shrug *
But -- that's not very Markov ... you'd want a weighted choice of words
.... depending on how you define your Markov chain -- say a Markov chain
based on part-of-speech or probability of occurrence from a given word-set.
Can you give some more detail??
>
However I can't seem to wrap my head around incorporating that into the
code above, if you know a method or could point me in the right
direction (or think that I don't need to use it) please let me know.
Thanks for all your help,
Dave This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Daniele |
last post by:
I have a 40 MB database in excel format.
I need to use it in Analysis Services, I imported the data by DTS (Data
Transformation Services), everything is working I can see the database,
but I can't...
|
by: wwalkerbout |
last post by:
Greetings,
Although, this relates to Analysis Services administration, I thought
I'd post it here in case someone with the administrative side of SQL
Server who subscribes to this Group may also...
|
by: kpp9c |
last post by:
markov query
I have noticed a couple markov implementations in python, but none
quite seem to do what i would like. Most seem to do an analysis of some
text and create a new text based on...
|
by: Scott David Daniels |
last post by:
Here's one way (convert each set of transition percentages to
a running sum up to one):
import random
class SingleStateMarkov(object):
def __init__(self, probabilities, initial=None):...
|
by: Ray Tomes |
last post by:
Hi Folks
I am an old codger who has much experience with computers
in the distant past before all this object oriented stuff.
Also I have loads of software in such languages as FORTRAN
and...
|
by: mc |
last post by:
When I run Code analysis on my website I always get 4 errors, one from
each of my Global.asax functions Application_Error,Application_Start,
Session_End, Session_Start. the error is "'Function...
|
by: tavares |
last post by:
---------------------------------------------------------------------------------------------------------------------------------------------
(Apologies for cross-posting)
Symposium...
|
by: Aussie Rules |
last post by:
Hi,
I have a vb.net 2005 project that has just got slower and slower as I
develop.
Does anybody know of a code tool to use to to pin point performance
problems, and clean up/optimise the...
|
by: ray pulbrook |
last post by:
My questions are can you use access to query correlation and regression analysis or should i link an excel spreadsheet to the database that has those functions specific to the analysis. if you can do...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |