Organizing a Python project

ivarnelispam

Hello all,

I'm starting work on what is going to become a fairly substantial
Python project, and I'm trying to find the best way to organize
everything. The project will consist of:

- A few applications
- Several small scripts and utilities
- Unit tests and small interactive test programs
- A number of custom libraries and modules that may be shared and
referenced among all of the above

I have the following general structure in mind:

myproject/
app1/
main.py
file1.py
file2.py
tests/
test_abc.py
test_xyz.py
app2/
...
scripts/
script1.py
script2.py
shared/
mylib1/
file1.py
file2.py
tests/
test_foo.py
test_bar.py
mylib2/
...
The files that you might want to execute directly are:
- Any of the "main.py" files under app*/
- Any of the files under shared/
- Any of the files under app*/tests or shared/mylib*/tests

So, my questions:
First of all, does this look like a reasonable overall structure, or
are there better alternatives?

Second (and the thing I'm primarily interested in), what is the best
way to deal with importing the shared modules in the applications,
scripts, test programs, and possibly other shared modules? I think the
most obvious solution is to add /path/to/myproject to PYTHONPATH.
However, this seems like an annoying little dependency that you are
likely to forget whenever you move your workspace to a new path, open
up a branch of the project in a different directory, or download and
work on the project using a different computer.

Is there a way to set this up that is a bit more self contained? For
example, at first I was somewhat hopeful that Python could ascend
parent directories until it reached a directory that did not include
an __init__.py file, and it could use this as a root for referring to
packages and modules from any file contained within. (e.g. in the
example project above, any file could refer to myproject.shared.mylib1
so long as 'myproject' and all subdirectories contained an
__init__.py, and the parent of 'myproject' didn't contain such a
file). Evidently this is not the case, but it seems like it could be a
useful feature in these situations.

Anyway, I'm sure this is not an unusual situation, so I'm curious to
hear how other people have handled it.

(Note - don't remove 'spam' from my e-mail address when replying. The
address is correct as listed)

Thanks!
Kevin

Jun 27 '08 #1

Subscribe Post Reply

2801

A.T.Hofkamp

On 2008-05-19, iv**********@gmail.com <iv**********@gmail.comwrote:

Hello all,

I'm starting work on what is going to become a fairly substantial
Python project, and I'm trying to find the best way to organize
everything. The project will consist of:

- A few applications
- Several small scripts and utilities
- Unit tests and small interactive test programs
- A number of custom libraries and modules that may be shared and
referenced among all of the above

I have the following general structure in mind:

myproject/
app1/
main.py
file1.py
file2.py
tests/
test_abc.py
test_xyz.py
app2/
...
scripts/
script1.py
script2.py
shared/
mylib1/
file1.py
file2.py
tests/
test_foo.py
test_bar.py
mylib2/
...
The files that you might want to execute directly are:
- Any of the "main.py" files under app*/
- Any of the files under shared/
- Any of the files under app*/tests or shared/mylib*/tests

So, my questions:
First of all, does this look like a reasonable overall structure, or
are there better alternatives?

You could make a 'bin' directory next to 'myproject' with executable programs
which would usually do something like

#!/usr/bin/env python
from myproject.app1 import main
main.run()

to make a more clear seperation between code that can be executed and code that
is imported in an application.
Also, why do you make a distinction between shared and non-shared code?
You could simply eliminate 'shared' directory, and put its contents directly
under myproject.

Second (and the thing I'm primarily interested in), what is the best
way to deal with importing the shared modules in the applications,
scripts, test programs, and possibly other shared modules? I think the
most obvious solution is to add /path/to/myproject to PYTHONPATH.
However, this seems like an annoying little dependency that you are
likely to forget whenever you move your workspace to a new path, open
up a branch of the project in a different directory, or download and
work on the project using a different computer.

What I am missing here is how you plan to do the development.
If you want to do branch-based development, you may want to have a look at
Combinator (at divmod.org). It handles branch management, adds executable
programs from bin to your path (in your current branch), and extends PYTHONPATH
(with your current branch).

Even if you have just 1 branch (namely 'trunk') it may be useful.

Is there a way to set this up that is a bit more self contained? For
example, at first I was somewhat hopeful that Python could ascend
parent directories until it reached a directory that did not include
an __init__.py file, and it could use this as a root for referring to
packages and modules from any file contained within. (e.g. in the
example project above, any file could refer to myproject.shared.mylib1
so long as 'myproject' and all subdirectories contained an
__init__.py, and the parent of 'myproject' didn't contain such a
file). Evidently this is not the case, but it seems like it could be a
useful feature in these situations.

Work is being done on relative imports. Not sure of its state.

Anyway, I'm sure this is not an unusual situation, so I'm curious to
hear how other people have handled it.

Most people probably run scripts from the root, ie where 'myproject' is a
sub-directory. Since Python automatically adds '.' to its path, it will work.

Sincerely,
Albert

Jun 27 '08 #2

Jorge Godoy

A.T.Hofkamp wrote:

Also, why do you make a distinction between shared and non-shared code?
You could simply eliminate 'shared' directory, and put its contents
directly under myproject.

I would go further and make them individual projects, with their own version
control, code repository and then install them as eggs using setuptools.

This has been working fine for me in some projects and has the advantage of
being reusable in different big projects.

Also, using setuptools on each big project I don't have to worry with
dependencies because it downloads and installs everything to me when I
install the main project.

>Is there a way to set this up that is a bit more self contained? For
example, at first I was somewhat hopeful that Python could ascend
parent directories until it reached a directory that did not include
an __init__.py file, and it could use this as a root for referring to
packages and modules from any file contained within. (e.g. in the
example project above, any file could refer to myproject.shared.mylib1
so long as 'myproject' and all subdirectories contained an
__init__.py, and the parent of 'myproject' didn't contain such a
file). Evidently this is not the case, but it seems like it could be a
useful feature in these situations.

Eggs would solve that as well. They would behave like any other
installed "library" on your system.

--
Jorge Godoy <jg****@gmail.com>

Jun 27 '08 #3

Terry Reedy

<iv**********@gmail.comwrote in message
news:96**********************************@e39g2000 hsf.googlegroups.com...
| Hello all,
|
| I'm starting work on what is going to become a fairly substantial
| Python project, and I'm trying to find the best way to organize
| everything. The project will consist of:
|
| - A few applications
| - Several small scripts and utilities
| - Unit tests and small interactive test programs
| - A number of custom libraries and modules that may be shared and
| referenced among all of the above
|
| I have the following general structure in mind:
|
| myproject/
| app1/
| main.py

If you put myproject in Pythonxy/Lib/site-packages, there is no need to
fiddle with PYTHONPATH or sys.path. In 3.0a5 I tried a relative import and
got a message that relative imports only work within packages, not modules.
I presume that means package.__init__.py. Maybe I just miswrote the
import, but I decided to stick with what dependably works whether from
within or without the package
from package.subpackage import module #or
from package.subpackage.module import object.

I agree with the comment about removing the 'shared' package layer.
Two packages deep is enough typing unless the deeper hierarchy is needed
(like possibly the 'tests' subsubpackages, if they make running the tests
easier).

tjr

Jun 27 '08 #4

Gabriel Genellina

En Wed, 21 May 2008 07:44:50 -0300, Casey McGinty <ca***********@gmail.comescribió:

Just my own opinion on these things:

1. Script code should be as basic as possible, ideally a module import line
and function or method call. This is so you don't have to worry about script
errors and/or increase startup time because a *.pyc file can not be store in
/usr/bin.

Scripts are not compiled by default, only imported modules (anyway you could compile scripts by hand). Keeping the main script short might reduce the startup time, yes.

2. In the top of your package directory it is typical to have a module name
'_package.py'. This is ideally where the main command line entry point for
the package code should be placed.

Why the underscore? And I usually don't put executable scripts inside a package - I consider them just libraries, to be imported by other parts of the application.
It's easier to test too when your tests *and* the application code are external to the package itself.

3. In the _package.py file you should add a "class Package" that holds most
or all of the application startup/execution code not designated to other
modules. Then run the application with the call to "Package()", as in

if __name__ == '__main__':
Package()

In Python that doesn't *have* to be a class, and in fact, most of the time I use a function instead. Something like this:

def main(argv):
# do things

if __name__ == '__main__':
import sys
sys.exit(int(main(sys.argv) or 0))

Some other questions I have are:
A. What should go in the package __init__.py file? For example, a doc
describing the program usage seems helpful, but maybe it should have info
about your modules only? Assuming the __init__.py code gets executed when
you import the module, you could place part or all of the application code
here as well. I'm guessing this is not a good idea, but not really
convinced.

Yes, the __init__.py is executed when you import the package. And no, I don't think it's a good idea to put all the application code there. As I said above, I consider packages *libraries*, the application code should *import* and use them, but not reside *inside* a package.
And there is the "import lock" issue too - I'm unsure of this but I think a lock is held until the import operation finishes, and that would happen only after __init__.py is fully executed.

B. How should you import your _package.py module into your /usr/bin script.
Is there a way to use an '__all__' to simplify this? Again this goes back to
question A, should there be any code added to __init__.py?

I don't get the question... you import it as any other module (but I would not use a _package.py file anyway)

C. If you have a _package.py file as the application entry, is it worth it
to place most of the application code in a class, described in part 3?

As I said, I'd use a function, at least at the top level. Of course it can create many other objects.

D. When I import a package._package module, I get a lot of junk in my
namespace. I thought an '__all__' define in the module would prevent this,
but it does not seem to work.

`import package._package` should only add "package" to the current namespace. If you're using `from package import *` - well, just don't do that :)
Otherwise I don't understand what you mean.

--
Gabriel Genellina

Jun 27 '08 #5

Gabriel Genellina

En Sun, 25 May 2008 19:46:50 -0300, Casey McGinty <ca***********@gmail.comescribió:

On Sat, May 24, 2008 at 2:11 PM, Gabriel Genellina <ga*******@yahoo.com.ar>
wrote:

2. In the top of your package directory it is typical to have a module
name
'_package.py'. This is ideally where the main command line entry point
for
the package code should be placed.

Why the underscore? And I usually don't put executable scripts inside a
package - I consider them just libraries, to be imported by other parts of
the application.
It's easier to test too when your tests *and* the application code are
external to the package itself.

Ok, guess using the '_package' name is not very common. I saw it used by the
dbus module. My whole concern here is that using a script will fragment your
application code across the file system. I would prefer to have it all in a
single spot. So I still like they idea of keeping most of application code
(argument parsing, help output, initialization) inside of the package. The
'_' should indicate that any other modules using your package should import
that module.

Not only using _package isn't very common - it goes against the general rule (this one very well established) that names with a single leading _underscore are private (implementation details). So I would *not* expect to import _package as a rule.
I don't think splitting the application in two or more parts ("fragmentation" as you call it) is a bad thing by itself; the package will likely appear under some-python-directory/site-packages/your_package_name (where anyone would search for it) and the executable scripts on /usr/bin (where anyone would likely search for it too).

--
Gabriel Genellina

Jun 27 '08 #6

Organizing a Python project

Similar topics