473,326 Members | 2,128 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Markov Analysis Help

Hi Guys,

I've written a Markov analysis program and would like to get your
comments on the code As it stands now the final input comes out as a
tuple, then list, then tuple. Something like ('the', 'water') ['us']
('we', 'took')..etc...

I'm still learning so I don't know any advanced techniques or methods
that may have made this easier.
here's the code:

def makelist(f): #turn a document into a list
fin = open(f)
results = []
for line in fin:
line = line.replace('"', '')
line = line.strip().split()
for word in line:
results.append(word)
return results

def markov(f, preflen=2): #f is the file to analyze, preflen is prefix length
convert_file = makelist(f)
mapdict = {} #dict where the prefixes will map to suffixes
start = 0
end = preflen #start/end set the slice size
for words in convert_file:
prefix = tuple(convert_file[start:end]) #tuple as mapdict key
suffix = convert_file[start + 2 : end + 1] #word as suffix to key
mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
start += 1
end += 1
return mapdict

def randsent(f, amt=10): #prints a random sentence
analyze = markov(f)
for i in range(amt):
rkey = random.choice(analyze.keys())
print rkey, analyze[rkey],
The book gave a hint saying to make the prefixes in the dict using:

def shift(prefix, word):
return prefix[1:] + (word, )

However I can't seem to wrap my head around incorporating that into the
code above, if you know a method or could point me in the right
direction (or think that I don't need to use it) please let me know.

Thanks for all your help,

Dave

Jun 27 '08 #1
5 1677
dave, few general comments to your code:
- Instead of using a comment that explains the meaning of a function,
add such things into docstrings.
- Your names can be improved, instead of f you can use file_name or
something like that, instead of convert_file you can use a name that
denotes that the conversion is already done, etc.
- You can use xrange instead of range and you can indent less, like 4
spaces.
- This line may be slow, you may want to find simpler ways to do the
same thing:
rkey = random.choice(analyze.keys())
- I suggest you to add doctests to all your functions.

Bye,
bearophile
Jun 27 '08 #2
On 2008-05-17 06:01:01 -0600, be************@lycos.com said:
dave, few general comments to your code:
- Instead of using a comment that explains the meaning of a function,
add such things into docstrings.
- Your names can be improved, instead of f you can use file_name or
something like that, instead of convert_file you can use a name that
denotes that the conversion is already done, etc.
- You can use xrange instead of range and you can indent less, like 4
spaces.
- This line may be slow, you may want to find simpler ways to do the
same thing:
rkey = random.choice(analyze.keys())
- I suggest you to add doctests to all your functions.

Bye,
bearophile

bear,
thanks for the suggestions. I use IDLE to write the code and when it's
working I paste it over into a new window. I'll tabify before saving
the pasted code. To add doctests would I need to use a certain
filename for the tests to be run on? Can you have doctests on random
functions?

Thanks

Dave

Jun 27 '08 #3

"dave" <sq*************@1ya2hoo3.netwrote in message
news:g0**********@news.xmission.com...
| bear,
| thanks for the suggestions. I use IDLE to write the code and when it's
| working I paste it over into a new window.

Or you can just save code you want to keep to a new name.

| To add doctests would I need to use a certain
| filename for the tests to be run on?

You can run a doctest on a file from within the file (as well as from
without).

if __name__ == '__main__': <run doctest>

I presume the manual gives the details.

| Can you have doctests on random functions?

???

tjr

Jun 27 '08 #4
dave:
>Can you have doctests on random functions?
Yes, you can add doctests to methods, functions, classes, module
docstrings, and in external text files.

Bye,
bearophile
Jun 27 '08 #5
dave wrote:
Hi Guys,

I've written a Markov analysis program and would like to get your
comments on the code As it stands now the final input comes out as a
tuple, then list, then tuple. Something like ('the', 'water') ['us']
('we', 'took')..etc...

I'm still learning so I don't know any advanced techniques or methods
that may have made this easier.
here's the code:

def makelist(f): #turn a document into a list
fin = open(f)
results = []
for line in fin:
line = line.replace('"', '')
line = line.strip().split()
for word in line:
results.append(word)
return results

What's you data look like? Just straight text?

>
def markov(f, preflen=2): #f is the file to analyze, preflen is
prefix length
convert_file = makelist(f)
mapdict = {} #dict where the prefixes will map to suffixes
start = 0
end = preflen #start/end set the slice size
for words in convert_file:
prefix = tuple(convert_file[start:end]) #tuple as mapdict key
suffix = convert_file[start + 2 : end + 1] #word as suffix to key
mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
start += 1
end += 1
return mapdict

What is convert_file??
>
def randsent(f, amt=10): #prints a random sentence
analyze = markov(f)
for i in range(amt):
rkey = random.choice(analyze.keys())
print rkey, analyze[rkey],
The book gave a hint saying to make the prefixes in the dict using:

def shift(prefix, word):
return prefix[1:] + (word, )
That's not a very helpful hint.

It works if you call it with a tuple and a word --- it shifts off the
front of the tuple ... so :

shift(('foo','bar') "word")
becomes ('bar', 'word')

Whoopty doo --- I'm not sure what that accomplishes!!

Unless the author means "pass a list and a randomly pick a word from the
list" in which case the return statement could be

random.choice(prefix) + (word, )

* shrug *

But -- that's not very Markov ... you'd want a weighted choice of words
.... depending on how you define your Markov chain -- say a Markov chain
based on part-of-speech or probability of occurrence from a given word-set.

Can you give some more detail??
>
However I can't seem to wrap my head around incorporating that into the
code above, if you know a method or could point me in the right
direction (or think that I don't need to use it) please let me know.

Thanks for all your help,

Dave
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Daniele | last post by:
I have a 40 MB database in excel format. I need to use it in Analysis Services, I imported the data by DTS (Data Transformation Services), everything is working I can see the database, but I can't...
0
by: wwalkerbout | last post by:
Greetings, Although, this relates to Analysis Services administration, I thought I'd post it here in case someone with the administrative side of SQL Server who subscribes to this Group may also...
5
by: kpp9c | last post by:
markov query I have noticed a couple markov implementations in python, but none quite seem to do what i would like. Most seem to do an analysis of some text and create a new text based on...
9
by: Scott David Daniels | last post by:
Here's one way (convert each set of transition percentages to a running sum up to one): import random class SingleStateMarkov(object): def __init__(self, probabilities, initial=None):...
5
by: Ray Tomes | last post by:
Hi Folks I am an old codger who has much experience with computers in the distant past before all this object oriented stuff. Also I have loads of software in such languages as FORTRAN and...
1
by: mc | last post by:
When I run Code analysis on my website I always get 4 errors, one from each of my Global.asax functions Application_Error,Application_Start, Session_End, Session_Start. the error is "'Function...
0
by: tavares | last post by:
--------------------------------------------------------------------------------------------------------------------------------------------- (Apologies for cross-posting) Symposium...
5
by: Aussie Rules | last post by:
Hi, I have a vb.net 2005 project that has just got slower and slower as I develop. Does anybody know of a code tool to use to to pin point performance problems, and clean up/optimise the...
1
by: ray pulbrook | last post by:
My questions are can you use access to query correlation and regression analysis or should i link an excel spreadsheet to the database that has those functions specific to the analysis. if you can do...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.