473,412 Members | 2,051 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

how to organize a module that requires a data file

Ok, so I have a module that is basically a Python wrapper around a big
lookup table stored in a text file[1]. The module needs to provide a
few functions::

get_stem(word, pos, default=None)
stem_exists(word, pos)
...

Because there should only ever be one lookup table, I feel like these
functions ought to be module globals. That way, you could just do
something like::

import morph
assist = morph.get_stem('assistance', 'N')
...

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I can only think of a few obvious places where I could find the text
file at import time -- in the same directory as the module (e.g.
lib/site-packages), in the user's home directory, or in a directory
indicated by an environment variable. The first seems weird because the
text file is large (about 10MB) and I don't really see any other
packages putting data files into lib/site-packages. The second seems
weird because it's not a per-user configuration - it's a data file
shared by all users. And the the third seems weird because my
experience with a configuration depending heavily on environment
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by
starting each function with "if _lookup_table is not None"), I could
allow users to specify a location for the file after the module is
imported, e.g.::

import morph
morph.setfile(r'C:\resources\morph_english.flat')
...

Then all the module-level functions would have to raise Exceptions until
setfile() was called. I don't like that the user would have to
configure the module each time they wanted to use it, but perhaps that's
unaviodable.

Any suggestions? Is there an obvious place to put the text file that
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their
morphological stems provided by the University of Pennsylvania.
Nov 22 '05 #1
12 1902
Steven Bethard wrote:

[Text file for a module's internal use.]
My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.


I tend to make use of the __file__ attribute available in every module.
For example:

resource_dir = os.path.join(os.path.split(__file__)[0], "Resources")

This assigns to resource_dir the path to the Resources directory
alongside the module itself in the filesystem. Of course, if you just
wanted the text file to reside alongside the module, rather than a
whole directory of stuff, you'd replace "Resources" with the name of
your file (and change the variable name, of course). For example:

filename = os.path.join(os.path.split(__file__)[0],
"morph_english.flat")

Having posted this solution, and in the tradition of Usenet, I'd be
interested to hear whether this is a particularly bad idea.

Paul

Nov 22 '05 #2
Steven Bethard wrote:

[Text file for a module's internal use.]
My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.


I tend to make use of the __file__ attribute available in every module.
For example:

resource_dir = os.path.join(os.path.split(__file__)[0], "Resources")

This assigns to resource_dir the path to the Resources directory
alongside the module itself in the filesystem. Of course, if you just
wanted the text file to reside alongside the module, rather than a
whole directory of stuff, you'd replace "Resources" with the name of
your file (and change the variable name, of course). For example:

filename = os.path.join(os.path.split(__file__)[0],
"morph_english.flat")

Having posted this solution, and in the tradition of Usenet, I'd be
interested to hear whether this is a particularly bad idea.

Paul

Nov 22 '05 #3
On Thu, 17 Nov 2005 12:18:51 -0700
Steven Bethard <st************@gmail.com> wrote:
My problem is with the text file. Where should I keep it?

I can only think of a few obvious places where I could
find the text file at import time -- in the same
directory as the module (e.g. lib/site-packages), in the
user's home directory, or in a directory indicated by an
environment variable.


Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.

That's pretty typical behavior for configuration files on
any Posix system.

Cheers,
Terry
--
Terry Hancock (ha*****@AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com

Nov 22 '05 #4
On Thu, 17 Nov 2005 12:18:51 -0700
Steven Bethard <st************@gmail.com> wrote:
My problem is with the text file. Where should I keep it?

I can only think of a few obvious places where I could
find the text file at import time -- in the same
directory as the module (e.g. lib/site-packages), in the
user's home directory, or in a directory indicated by an
environment variable.


Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.

That's pretty typical behavior for configuration files on
any Posix system.

Cheers,
Terry
--
Terry Hancock (ha*****@AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com

Nov 22 '05 #5
Terry Hancock wrote:
On Thu, 17 Nov 2005 12:18:51 -0700
Steven Bethard <st************@gmail.com> wrote:
My problem is with the text file. Where should I keep it?

I can only think of a few obvious places where I could
find the text file at import time -- in the same
directory as the module (e.g. lib/site-packages), in the
user's home directory, or in a directory indicated by an
environment variable.

Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.


The file is a lookup table of word stems distributed by the University
of Pennsylvania. It doesn't really make sense for users to customize
it, because it's not a configuration file, but it is possible that UPenn
would distribute a new version at some point. That's what I meant when
I said "it's not a per-user configuration - it's a data file shared by
all users". So there should be exactly one copy of the file, so I
shouldn't have to deal with shadowing.

Of course, even with only one copy of the file, that doesn't mean that I
couldn't search a few places. Maybe I could by default put it in
lib/site-packages, but allow an option to setup.py to put it somewhere
else for anyone who was worried about putting 10MB into
lib/site-packages. Those folks would then have to use an environment
variable, say $MORPH_FLAT, to identify the directory they . At module
import I would just check both locations...

I'll have to think about this some more...

STeVe
Nov 22 '05 #6
Terry Hancock wrote:
On Thu, 17 Nov 2005 12:18:51 -0700
Steven Bethard <st************@gmail.com> wrote:
My problem is with the text file. Where should I keep it?

I can only think of a few obvious places where I could
find the text file at import time -- in the same
directory as the module (e.g. lib/site-packages), in the
user's home directory, or in a directory indicated by an
environment variable.

Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.


The file is a lookup table of word stems distributed by the University
of Pennsylvania. It doesn't really make sense for users to customize
it, because it's not a configuration file, but it is possible that UPenn
would distribute a new version at some point. That's what I meant when
I said "it's not a per-user configuration - it's a data file shared by
all users". So there should be exactly one copy of the file, so I
shouldn't have to deal with shadowing.

Of course, even with only one copy of the file, that doesn't mean that I
couldn't search a few places. Maybe I could by default put it in
lib/site-packages, but allow an option to setup.py to put it somewhere
else for anyone who was worried about putting 10MB into
lib/site-packages. Those folks would then have to use an environment
variable, say $MORPH_FLAT, to identify the directory they . At module
import I would just check both locations...

I'll have to think about this some more...

STeVe
Nov 22 '05 #7
Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
#
# Insert code here to see if it is in the current
# directory and/or look in other directories.
#

try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary

else:
#
# Insert code here to load data from .txt file
#

fp.close()
return

def get_stem(self, arg1, arg2):
#
# Code for get_stem method
#

The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.

Hope this helps.

-Larry Bates

Steven Bethard wrote:
Ok, so I have a module that is basically a Python wrapper around a big
lookup table stored in a text file[1]. The module needs to provide a
few functions::

get_stem(word, pos, default=None)
stem_exists(word, pos)
...

Because there should only ever be one lookup table, I feel like these
functions ought to be module globals. That way, you could just do
something like::

import morph
assist = morph.get_stem('assistance', 'N')
...

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I can only think of a few obvious places where I could find the text
file at import time -- in the same directory as the module (e.g.
lib/site-packages), in the user's home directory, or in a directory
indicated by an environment variable. The first seems weird because the
text file is large (about 10MB) and I don't really see any other
packages putting data files into lib/site-packages. The second seems
weird because it's not a per-user configuration - it's a data file
shared by all users. And the the third seems weird because my
experience with a configuration depending heavily on environment
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by
starting each function with "if _lookup_table is not None"), I could
allow users to specify a location for the file after the module is
imported, e.g.::

import morph
morph.setfile(r'C:\resources\morph_english.flat')
...

Then all the module-level functions would have to raise Exceptions until
setfile() was called. I don't like that the user would have to
configure the module each time they wanted to use it, but perhaps that's
unaviodable.

Any suggestions? Is there an obvious place to put the text file that
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their
morphological stems provided by the University of Pennsylvania.

Nov 22 '05 #8
Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
#
# Insert code here to see if it is in the current
# directory and/or look in other directories.
#

try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary

else:
#
# Insert code here to load data from .txt file
#

fp.close()
return

def get_stem(self, arg1, arg2):
#
# Code for get_stem method
#

The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.

Hope this helps.

-Larry Bates

Steven Bethard wrote:
Ok, so I have a module that is basically a Python wrapper around a big
lookup table stored in a text file[1]. The module needs to provide a
few functions::

get_stem(word, pos, default=None)
stem_exists(word, pos)
...

Because there should only ever be one lookup table, I feel like these
functions ought to be module globals. That way, you could just do
something like::

import morph
assist = morph.get_stem('assistance', 'N')
...

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I can only think of a few obvious places where I could find the text
file at import time -- in the same directory as the module (e.g.
lib/site-packages), in the user's home directory, or in a directory
indicated by an environment variable. The first seems weird because the
text file is large (about 10MB) and I don't really see any other
packages putting data files into lib/site-packages. The second seems
weird because it's not a per-user configuration - it's a data file
shared by all users. And the the third seems weird because my
experience with a configuration depending heavily on environment
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by
starting each function with "if _lookup_table is not None"), I could
allow users to specify a location for the file after the module is
imported, e.g.::

import morph
morph.setfile(r'C:\resources\morph_english.flat')
...

Then all the module-level functions would have to raise Exceptions until
setfile() was called. I don't like that the user would have to
configure the module each time they wanted to use it, but perhaps that's
unaviodable.

Any suggestions? Is there an obvious place to put the text file that
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their
morphological stems provided by the University of Pennsylvania.

Nov 22 '05 #9
Larry Bates wrote:
Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
# Insert code here to see if it is in the current
# directory and/or look in other directories.
try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary
else:
# Insert code here to load data from .txt file
fp.close()
return

def get_stem(self, arg1, arg2):
# Code for get_stem method
Actually, this is basically what I have right now. It bothers me a
little because you can get two instances of "morph", with two separate
dictionaries loaded. Since they're all loading the same file, it
doesn't seem like there should be multiple instances. I know I could
use a singleton pattern, but aren't modules basically the singletons of
Python?
The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.


That's a thought. Thanks.

Steve
Nov 22 '05 #10
Larry Bates wrote:
Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
# Insert code here to see if it is in the current
# directory and/or look in other directories.
try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary
else:
# Insert code here to load data from .txt file
fp.close()
return

def get_stem(self, arg1, arg2):
# Code for get_stem method
Actually, this is basically what I have right now. It bothers me a
little because you can get two instances of "morph", with two separate
dictionaries loaded. Since they're all loading the same file, it
doesn't seem like there should be multiple instances. I know I could
use a singleton pattern, but aren't modules basically the singletons of
Python?
The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.


That's a thought. Thanks.

Steve
Nov 22 '05 #11
I have tried several ways, this is the way I like best (I develop in
Windows, but this technique should work in *NIX for your application)

:: \whereever\whereever\ (the directory your module is in,
obviously somewhere where PYTHONPATH can
see it)

:::: stevemodule.py (your module)

:::: stevemodule_workfiles\ (a subdirectory in the same directory as
your module)

:::::: __init__.py (an empty file in stevemodule_workfiles\,
only here to make stevemodule_workfiles\
look like a package)

:::::: stevelargetextfile.txt (your large textfile in
stevemodule_workfiles\)

Now, to load the large textfile, I agree that it should be done with
module functions, so if it gets used several times in the same process,
it is only loaded once. The Python module itself follows the
"singleton" pattern, so you get that behavior for free.

Here is the Python code for loading the file:

import os.path
import stevemodule_workfiles

workfiles_path =
os.path.split(stevemodule_workfiles.__file__)[0]

stevelargetextfile_fullpath =
os.path.join(workfiles_path, 'stevelargetextfile.txt')

stevelargetextfile_file = open(stevelargetextfile_fullpath)

Nov 22 '05 #12
I have tried several ways, this is the way I like best (I develop in
Windows, but this technique should work in *NIX for your application)

:: \whereever\whereever\ (the directory your module is in,
obviously somewhere where PYTHONPATH can
see it)

:::: stevemodule.py (your module)

:::: stevemodule_workfiles\ (a subdirectory in the same directory as
your module)

:::::: __init__.py (an empty file in stevemodule_workfiles\,
only here to make stevemodule_workfiles\
look like a package)

:::::: stevelargetextfile.txt (your large textfile in
stevemodule_workfiles\)

Now, to load the large textfile, I agree that it should be done with
module functions, so if it gets used several times in the same process,
it is only loaded once. The Python module itself follows the
"singleton" pattern, so you get that behavior for free.

Here is the Python code for loading the file:

import os.path
import stevemodule_workfiles

workfiles_path =
os.path.split(stevemodule_workfiles.__file__)[0]

stevelargetextfile_fullpath =
os.path.join(workfiles_path, 'stevelargetextfile.txt')

stevelargetextfile_file = open(stevelargetextfile_fullpath)

Nov 22 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tian | last post by:
I am writing a python program which needs to support some plug-ins. I have an XML file storing some dynamic structures. XML file records some class names whose instance needs to be created in the...
10
by: TokiDoki | last post by:
Hello there, I have been programming python for a little while, now. But as I am beginning to do more complex stuff, I am running into small organization problems. It is possible that what I...
0
by: Steven Bethard | last post by:
Ok, so I have a module that is basically a Python wrapper around a big lookup table stored in a text file. The module needs to provide a few functions:: get_stem(word, pos, default=None)...
4
by: Daniel N | last post by:
I am new to .net and want to organize my code better. I am writing in vb.net and the code for my main form is nearing 50-60 pages and would like to create another file like a class, module or code...
2
by: key9 | last post by:
Hi all look at the organize tree main.c ------ #include lib_adapter.c main() { foo();
3
by: =?ISO-8859-1?Q?Gregory_Pi=F1ero?= | last post by:
Hi Python Experts, I hope I can explain this right. I'll try. Background: I have a module that I leave running in a server role. It has a module which has data in it that can change. So...
21
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most...
1
by: Thomas Wittek | last post by:
Hi! Is there any possibility/tool to automatically organize the imports at the beginning of a module? I don't mean automatic imports like autoimp does as I like seeing where my...
36
by: The Frog | last post by:
Hi Everyone, I am trying to find a solution for handling zipped data without the need to ship / install any DLL files with the database. Does anybody know of code to handle ZIP files that does...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.