Project organization and import

Martin Unsal

I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace, doesn't seem very well suited for
big projects. However, I might not really understand the Pythonic way.
I'm not sure if I have a specific question here, just a general plea
for advice.

1) Namespace. Python wants my namespace heirarchy to match my
filesystem heirarchy. I find that a well organized filesystem
heirarchy for a nontrivial project will be totally unwieldy as a
namespace. I'm either forced to use long namespace prefixes, or I'm
forced to use "from foo import *" and __all__, which has its own set
of problems.

1a) Module/class collision. I like to use the primary class in a file
as the name of the file. However this can lead to namespace collisions
between the module name and the class name. Also it means that I'm
going to be stuck with the odious and wasteful syntax foo.foo
everywhere, or forced to use "from foo import *".

1b) The Pythonic way seems to be to put more stuff in one file, but I
believe this is categorically the wrong thing to do in large projects.
The moment you have more than one developer along with a revision
control system, you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

2) Importing and reloading. I want to be able to reload changes
without exiting the interpreter. This pretty much excludes "from foo
import *", unless you resort to this sort of hack:

http://www.python.org/search/hyperma...1993/0448.html

Has anyone found a systematic way to solve the problem of reloading in
an interactive interpreter when using "from foo import *"?
I appreciate any advice I can get from the community.

Martin

Mar 5 '07 #1

Subscribe Reply

3933

1
2
3
>
Last »

Jorge Godoy

"Martin Unsal" <ma*********@gm ail.comwrites:

1) Namespace. Python wants my namespace heirarchy to match my filesystem
heirarchy. I find that a well organized filesystem heirarchy for a
nontrivial project will be totally unwieldy as a namespace. I'm either
forced to use long namespace prefixes, or I'm forced to use "from foo import
*" and __all__, which has its own set of problems.

I find it nice. You have the idea of where is something just from the import
and you don't have to search for it everywhere. Isn't, e.g., Java like that?
(It's been so long since I last worried with Java that I don't remember if
this is mandatory or just a convention...)

You might get bitten with that when moving files from one OS to another,
specially if one of them disconsider the case and the other is strict with
it.

1a) Module/class collision. I like to use the primary class in a file as the
name of the file. However this can lead to namespace collisions between the
module name and the class name. Also it means that I'm going to be stuck
with the odious and wasteful syntax foo.foo everywhere, or forced to use
"from foo import *".

Your classes should be CamelCased and start with an uppercase letter. So
you'd have foo.Foo, being "foo" the package and "Foo" the class inside of it.

1b) The Pythonic way seems to be to put more stuff in one file, but I
believe this is categorically the wrong thing to do in large projects. The
moment you have more than one developer along with a revision control
system, you're going to want files to contain the smallest practical
functional blocks. I feel pretty confident saying that "put more stuff in
one file" is the wrong answer, even if it is the Pythonic answer.

Why? RCS systems can merge changes. A RCS system is not a substitute for
design or programmers communication. You'll only have a problem if two people
change the same line of code and if they are doing that (and worse: doing that
often) then you have a bigger problem than just the contents of the file.

Unit tests help being sure that one change doesn't break the project as a
whole and for a big project you're surely going to have a lot of those tests.

If one change breaks another, then there is a disagreement on the application
design and more communication is needed between developers or a better
documentation of the API they're implementing / using.

2) Importing and reloading. I want to be able to reload changes without
exiting the interpreter. This pretty much excludes "from foo import *",
unless you resort to this sort of hack:

http://www.python.org/search/hyperma...1993/0448.html

Has anyone found a systematic way to solve the problem of reloading in an
interactive interpreter when using "from foo import *"?

I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).

I appreciate any advice I can get from the community.

This is just how I deal with it... My bigger "project" has several modules
now each with its own namespace and package. The API is very documented and
took the most work to get done.

Using setuptools, entrypoints, etc. helps a lot as well.
The thing is that for big projects your design is the most important part.
Get it right and you won't have problems with namespaces and filenames. If
you don't dedicate enough time on this task you'll find yourself in trouble
really soon.

--
Jorge Godoy <jg****@gmail.c om>

Mar 5 '07 #2

bruno.desthuilliers

On 5 mar, 01:21, "Martin Unsal" <martinun...@gm ail.comwrote:

I'm using Python for what is becoming a sizeable project and I'm
already running into problems organizing code and importing packages.
I feel like the Python package system, in particular the isomorphism
between filesystem and namespace,

It's not necessarily a 1:1 mapping. Remember that you can put code in
the __init__.py of a package, and that this code can import sub-
packages/modules namespaces, making the package internal organisation
transparent to user code (I've quite often started with a simple
module, latter turning it into a package as the source-code was
growing too big).

doesn't seem very well suited for
big projects. However, I might not really understand the Pythonic way.

cf above.

I'm not sure if I have a specific question here, just a general plea
for advice.

1) Namespace. Python wants my namespace heirarchy to match my
filesystem heirarchy. I find that a well organized filesystem
heirarchy for a nontrivial project will be totally unwieldy as a
namespace. I'm either forced to use long namespace prefixes, or I'm
forced to use "from foo import *" and __all__, which has its own set
of problems.

cf above. Also remember that you can "import as", ie:

import some_package.so me_subpackage.s ome_module as some_module

1a) Module/class collision. I like to use the primary class in a file
as the name of the file.

Bad form IMHO. Packages and module names should be all_lower,
classnames CamelCased.

>
1b) The Pythonic way seems to be to put more stuff in one file,

Pythonic way is to group together highly related stuff. Not to "put
more stuff".

but I
believe this is categorically the wrong thing to do in large projects.

Oh yes ? Why ?

The moment you have more than one developer along with a revision
control system,

You *always* have a revision system, don't you ? And having more than
one developper on a project - be it big or small - is quite common.

you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

Is this actually based on working experience ? It seems that there are
enough not-trivial Python projects around to prove that it works just
fine.

Mar 5 '07 #3

Martin Unsal

On Mar 5, 12:45 am, "bruno.desthuil li...@gmail.com "
<bruno.desthuil li...@gmail.com wrote:

Remember that you can put code in
the __init__.py of a package, and that this code can import sub-
packages/modules namespaces, making the package internal organisation
transparent to user code

Sure, but that doesn't solve the problem.

Say you have a package "widgets" with classes ScrollBar, Form, etc.
You want the end user to "import widgets" and then invoke
"widgets.Scroll Bar()". As far as I know there are only two ways to do
this, both seriously flawed: 1) Put all your code in one module
widgets.py, 2) use "from scrollbar import *" in widgets/__init__.py,
which is semi-deprecated and breaks reload().

Also remember that you can "import as", ie:

import some_package.so me_subpackage.s ome_module as some_module

Sure but that doesn't eliminate the unfortunate interaction between
Python class organization and filesystem heirarchy. For example, say
you want to organize the widgets package as follows:

widgets/scrollbar/*.py
widgets/form/*.py
widgets/common/util.py

Other than messing around with PYTHONPATH, which is horrible, I don't
see how to import util.py from the widget code.

Bad form IMHO. Packages and module names should be all_lower,
classnames CamelCased.

You're still stuck doing foo.Foo() everywhere in your client code,
which is ugly and wastes space, or using "from foo import *" which is
broken.

but I
believe this is categorically the wrong thing to do in large projects.

Oh yes ? Why ?

For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each are more amenable to source code
management with multiple developers.

We could discuss this till we're blue in the face but it's beside the
point. For any given project, architecture, and workflow, the
developers are going to have a preference for how to organize the code
structurally into files, directories, packages, etc. The language
itself should not place constraints on them. The mere fact that it is
supposedly "Pythonic" to put more functionality in one file indicates
to me that the Python package system is obstructing some of its users
who have perfectly good reasons to organize their code differently.

you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

Is this actually based on working experience ? It seems that there are
enough not-trivial Python projects around to prove that it works just
fine.

Yes. I've worked extensively on several projects in several languages
with multi-million lines of code and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

In Python, most of the large projects I've looked at use "from foo
import *" liberally.

I guess my question boils down to this. Is "from foo import *" really
deprecated or not? If everyone has to use "from foo import *" despite
the problems it causes, how do they work around those problems (such
as reloading)?

Martin

Mar 5 '07 #4

Martin Unsal

Jorge, thanks for your response. I replied earlier but I think my
response got lost. I'm trying again.

On Mar 4, 5:20 pm, Jorge Godoy <jgo...@gmail.c omwrote:

Why? RCS systems can merge changes. A RCS system is not a substitute for
design or programmers communication.

Text merges are an error-prone process. They can't be eliminated but
they are best avoided when possible.

When refactoring, it's much better to move small files around than to
move chunks of code between large files. In the former case your SCM
system can track integration history, which is a big win.

Unit tests help being sure that one change doesn't break the project as a
whole and for a big project you're surely going to have a lot of those tests.

But unit tests are never an excuse for error prone workflow. "Oh,
don't worry, we'll catch that with unit tests" is never something you
want to say or hear.

I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).

That's interesting, is this workflow pretty universal in the Python
world?

I guess that seems unfortunate to me, one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible. As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

Martin

Mar 5 '07 #5

Chris Mellon

On 5 Mar 2007 08:32:34 -0800, Martin Unsal <ma*********@gm ail.comwrote:

Jorge, thanks for your response. I replied earlier but I think my
response got lost. I'm trying again.

On Mar 4, 5:20 pm, Jorge Godoy <jgo...@gmail.c omwrote:
Why? RCS systems can merge changes. A RCS system is not a substitute for
design or programmers communication.

Text merges are an error-prone process. They can't be eliminated but
they are best avoided when possible.

When refactoring, it's much better to move small files around than to
move chunks of code between large files. In the former case your SCM
system can track integration history, which is a big win.

Unit tests help being sure that one change doesn't break the project as a
whole and for a big project you're surely going to have a lot of those tests.

But unit tests are never an excuse for error prone workflow. "Oh,
don't worry, we'll catch that with unit tests" is never something you
want to say or hear.

That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?

I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).

That's interesting, is this workflow pretty universal in the Python
world?

I guess that seems unfortunate to me, one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible. As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery. You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I've copied some of your responses from your earlier post below:

>Yes. I've worked extensively on several projects in several languages
with multi-million lines of code and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependenci es) as possible speeds up the
compilation process. Absent this need (which doesn't exist in Python),
what benefit is there to separating out related functionality into
multiple files? Don't split them up just because you've done so in the
past - know why you did it in the past and if those conditions still
apply. Don't split them up until it makes sense for *this* project,
not the one you did last year or 10 years ago.

>I guess my question boils down to this. Is "from foo import *" really
deprecated or not? If everyone has to use "from foo import *" despite
the problems it causes, how do they work around those problems (such
as reloading)?

from foo import * is a bad idea at a top level because it pollutes
your local namespace. In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense. In some cases you don't want to export
everything, which is when __all__ starts to make sense. Clients of a
package (or a module) shouldn't use from foo import * without a good
reason. Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that. It's not hard to restart a shell,
especially if you use ipython (which can save and re-create a session)
or a script thats set up to create your testing environment. This is
still a much faster way than compiling any but the most trivial of
C/C++ modules. In fact, on my system startup time for the interpreter
is roughly the same as the "startup time" of my compiler (that is to
say, the amount of time it takes deciding what its going to compile,
without actually compiling anything).

>You're still stuck doing foo.Foo() everywhere in your client code,
which is ugly and wastes space, or using "from foo import *" which is
broken.

If you don't like working with explicit namespaces, you've probably
chosen the wrong language. If you have a specific name (or a few
names) which you use all the time from a module, then you can import
just those names into your local namespace to save on typing. You can
also alias deeply nested names to something more shallow.

>For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each are more amenable to source code
management with multiple developers.

I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature. This is an impossible goal in practice (because you
will inevitably and necessarily have intermodule dependencies) but
your developers don't write code based around individual files. They
base it around the systems and the interfaces that compose your
project. It makes no more sense to arbitrarily break them into
multiple files than it does to arbitrarily leave them all in a single
file.

In summary: I think you've bound yourself to a style of source
management that made sense in the past without reanalyzing it to see
if it makes sense now. Trust your judgment and that of your developers
when it comes to modularization. When they end up needing to merge all
the time because they're conflicting with someone else's work, they'll
break things up into modules.

You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.

Mar 5 '07 #6

Martin Unsal

On Mar 5, 9:15 am, "Chris Mellon" <arka...@gmail. comwrote:

That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?

Protecting your head is the exact benefit of bike helmets, that
doesn't mean you should bike more more recklessly just because you're
wearing a helmet. :)

Doing text merges is more error prone than not doing them. :)

There are myriad other benefits of breaking up large files into
functional units. Integration history, refactoring, reuse, as I
mentioned. Better clarity of design. Easier communication and
coordination within a team. What's the down side? What's the advantage
of big files with many functional units?

If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery.

Only because reload() is hard in Python! ;)

You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

When I refer to "Pythonic" all I'm talking about is what I've read
here and observed in other people's code. I'm here looking for more
information about how other people work, to see if there are good
solutions to the problems I see.

However when I talk about what I think is "wrong" with the Pythonic
way, obviously that's just my opinion formed by my own experience.

Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I don't think so; I'm observing things that are common to several
projects in several languages.

I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependenci es) as possible speeds up the
compilation process.

I don't think that's actually true. Fewer, bigger compilation units
actually compile faster in C, at least in my experience.

Absent this need (which doesn't exist in Python),

Python still takes time to load & "precompile ". That time is becoming
significant for me even in a modest sized project; I imagine it would
be pretty awful in a multimillion line project.

No matter how fast it is, I'd rather reload one module than exit my
interpreter and reload the entire world.

This is not a problem for Python as scripting language. This is a real
problem for Python as world class application development language.

In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense.

OK! That's good info, thanks.

Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that.

That too... although I think that's unfortunate. If reload() were
reliable, would you use it? Do you think it's inherently unreliable,
that is, it couldn't be fixed without fundamentally breaking the
Python language core?

This is
still a much faster way than compiling any but the most trivial of
C/C++ modules.

I'm with you there! I love Python and I'd never go back to C/C++. That
doesn't change my opinion that Python's import mechanism is an
impediment to developing large projects in the language.

If you don't like working with explicit namespaces, you've probably
chosen the wrong language.

I never said that. I like foo.Bar(), I just don't like typing
foo.Foo() and bar.Bar(), which is a waste of space; syntax without
semantics.

I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature.

Oh, I agree completely. I think we're using the exact same criterion.
A class is a self-contained feature with a well defined interface,
just what you'd want to put in it's own file. (Obviously there are
trivial classes which don't implement features, and they don't need
their own files.)

You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.

I think this is the crux of my frustration. I think reload() is
unreliable and hard to validate because Python's package management is
broken. I appreciate your suggestion of alternatives and I think I
need to come to terms with the fact that reload() is just broken. That
doesn't mean it has to be that way or that Python is blameless in this
problem.

Martin

Mar 5 '07 #7

Chris Mellon

On 5 Mar 2007 10:31:33 -0800, Martin Unsal <ma*********@gm ail.comwrote:

On Mar 5, 9:15 am, "Chris Mellon" <arka...@gmail. comwrote:
That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?

Protecting your head is the exact benefit of bike helmets, that
doesn't mean you should bike more more recklessly just because you're
wearing a helmet. :)

Doing text merges is more error prone than not doing them. :)

There are myriad other benefits of breaking up large files into
functional units. Integration history, refactoring, reuse, as I
mentioned. Better clarity of design. Easier communication and
coordination within a team. What's the down side? What's the advantage
of big files with many functional units?

I never advocated big files with many functional units - just files
that are "just big enough". You'll know you've broken them down small
enough when you stop having to do text merges every time you commit.

If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery.

Only because reload() is hard in Python! ;)

You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

When I refer to "Pythonic" all I'm talking about is what I've read
here and observed in other people's code. I'm here looking for more
information about how other people work, to see if there are good
solutions to the problems I see.

However when I talk about what I think is "wrong" with the Pythonic
way, obviously that's just my opinion formed by my own experience.

Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I don't think so; I'm observing things that are common to several
projects in several languages.

..... languages with similar runtime semantics and perhaps common
ancestry? All languages place limitations on how you handle modules,
either because they have infrastructure you need to use or because
they lack it and you're left on your own.

I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependenci es) as possible speeds up the
compilation process.

I don't think that's actually true. Fewer, bigger compilation units
actually compile faster in C, at least in my experience.

If you're doing whole project compilation. When you're working,
though, you want to be able to do incremental compilation (all modern
compilers I know of support this) so you just recompile the files
you've changed (and dependencies) and relink. Support for this is why
we have stuff like precompiled headers, shadow headers like Qt uses,
and why C++ project management advocates single class-per-file
structures. Fewer dependencies between compilation units means a
faster rebuild-test turnaround.

Absent this need (which doesn't exist in Python),

Python still takes time to load & "precompile ". That time is becoming
significant for me even in a modest sized project; I imagine it would
be pretty awful in a multimillion line project.

No matter how fast it is, I'd rather reload one module than exit my
interpreter and reload the entire world.

Sure, but whats your goal here? If you're just testing something as
you work, then this works fine. If you're testing large changes, that
affect many modules, then you *need* to reload your world, because you
want to make sure that what you're testing is clean. I think this
might be related to your desire to have everything in lots of little
files. The more modules you load, the harder it is to track your
dependencies and make sure that the reload is correct.

This is not a problem for Python as scripting language. This is a real
problem for Python as world class application development language.

Considering that no other "world class application development
language" supports reload even as well as Python does, I'm not sure I
can agree here. A perfect reload might be a nice thing to have, but
lack of it hardly tosses Python (or any language) out of the running.

In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense.

OK! That's good info, thanks.

Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that.

That too... although I think that's unfortunate. If reload() were
reliable, would you use it? Do you think it's inherently unreliable,
that is, it couldn't be fixed without fundamentally breaking the
Python language core?

The semantics of exactly what reload should do are tricky. Pythons
reload works in a sensible but limited way. More complicated reloads
are generally considered more trouble than they are worth. I've wanted
different things from reload() at different times, so I'm not even
sure what I would consider it being "reliable".

Here's a trivial example - if you rename a class in a module and then
reload it, what should happen to instances of the class you renamed?

This is
still a much faster way than compiling any but the most trivial of
C/C++ modules.

I'm with you there! I love Python and I'd never go back to C/C++. That
doesn't change my opinion that Python's import mechanism is an
impediment to developing large projects in the language.

If you don't like working with explicit namespaces, you've probably
chosen the wrong language.

I never said that. I like foo.Bar(), I just don't like typing
foo.Foo() and bar.Bar(), which is a waste of space; syntax without
semantics.

There's nothing that prevents there being a bar.Foo, the namespace
makes it clear where you're getting the object. This is again a
consequence of treating modules like classes. Some modules only expose
a single class (StringIO/cStringIO in the standardlib is a good
example), but it's more common for them to expose a single set of
"functionality" .

That said, nothing prevents you from using "from foo import Foo" if
Foo is all you need (or need most - you can combine this with import
foo).

I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature.

Oh, I agree completely. I think we're using the exact same criterion.
A class is a self-contained feature with a well defined interface,
just what you'd want to put in it's own file. (Obviously there are
trivial classes which don't implement features, and they don't need
their own files.)

Sure, if all your classes are that. But very few classes exist in
isolation - there's external and internal dependencies, and some
classes are tightly bound. There's no reason for these tightly bound
classes to be in external files (or an external namespace), because
when you work on one you'll need to work on them all.

You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.

I think this is the crux of my frustration. I think reload() is
unreliable and hard to validate because Python's package management is
broken. I appreciate your suggestion of alternatives and I think I
need to come to terms with the fact that reload() is just broken. That
doesn't mean it has to be that way or that Python is blameless in this
problem.

I wonder what environments you worked in before that actually had a
reliable and gotcha free version of reload? I actually don't know of
any - Smalltalk is closest. It's not really "broken" when you
understand what it does. There's just an expectation that it does
something else, and when it doesn't meet that expectation it's assumed
to be broken. Now, thats a fair definition of "broken", but replacing
running instances in a live image is a very hard problem to solve
generally. Limiting reload() to straightforward , reliable behavior is
a reasonable design decision.

Mar 5 '07 #8

Bruno Desthuilliers

Martin Unsal a écrit :

On Mar 5, 12:45 am, "bruno.desthuil li...@gmail.com "
<bruno.desthuil li...@gmail.com wrote:

>>Remember that you can put code in
the __init__.py of a package, and that this code can import sub-
packages/modules namespaces, making the package internal organisation
transparent to user code

Sure, but that doesn't solve the problem.

Say you have a package "widgets" with classes ScrollBar, Form, etc.
You want the end user to "import widgets" and then invoke
"widgets.Scroll Bar()". As far as I know there are only two ways to do
this, both seriously flawed: 1) Put all your code in one module
widgets.py, 2) use "from scrollbar import *" in widgets/__init__.py,
which is semi-deprecated

"deprecated " ? Didn't see any mention of this so far. But it's bad form,
since it makes hard to know where some symbol comes from.

# widgets.__init
from scrollbar import Scrollbar, SomeOtherStuff, some_function, SOME_CONST

and breaks reload().

>
>>Also remember that you can "import as", ie:

import some_package.so me_subpackage.s ome_module as some_module

Sure but that doesn't eliminate the unfortunate interaction between
Python class organization and filesystem heirarchy.

*class* organization ? It's not Java here. Nothing forces you to use
classes.

For example, say
you want to organize the widgets package as follows:

widgets/scrollbar/*.py
widgets/form/*.py
widgets/common/util.py

Other than messing around with PYTHONPATH, which is horrible, I don't
see how to import util.py from the widget code.

Some of us still manage to do so without messing with PYTHONPATH.

>
>>Bad form IMHO. Packages and module names should be all_lower,
classnames CamelCased.

You're still stuck doing foo.Foo() everywhere in your client code,

from foo import Foo

But:

which is ugly

It's not ugly, it's informative. At least you know where Foo comes from.

and wastes space,

My. Three letters and a dot...

or using "from foo import *" which is
broken.

cf above.

>

>>>but I
believe this is categorically the wrong thing to do in large projects.

Oh yes ? Why ?

For myriad reasons, just one of them being the one I stated -- smaller
files with one functional unit each

Oh. So you're proposing that each and any single function goes in a
separate file ?

are more amenable to source code
management with multiple developers.

This is not my experience.

We could discuss this till we're blue in the face but it's beside the
point. For any given project, architecture, and workflow, the
developers are going to have a preference for how to organize the code
structurally into files, directories, packages, etc. The language
itself should not place constraints on them. The mere fact that it is
supposedly "Pythonic" to put more functionality in one file indicates
to me that the Python package system is obstructing some of its users
who have perfectly good reasons to organize their code differently.

It has never been an issue for me so far.

>

>>>you're going to want files to contain the smallest
practical functional blocks. I feel pretty confident saying that "put
more stuff in one file" is the wrong answer, even if it is the
Pythonic answer.

Is this actually based on working experience ? It seems that there are
enough not-trivial Python projects around to prove that it works just
fine.

Yes. I've worked extensively on several projects in several languages
with multi-million lines of code

I meant, based on working experience *with Python* ? I've still not seen
a "multi-million" KLOC project in Python - unless of course you include
all the stdlib and the interpreter itself, and even then I doubt we get
so far.

and they invariably have coding
styles that recommend one functional unit (such as a class), or at
most a few closely related functional units per file.

Which is what I see in most Python packages I've seen so far. But we may
not have the same definition for "a few" and "closely related" ?

In Python, most of the large projects I've looked at use "from foo
import *" liberally.

I've seen few projects using this. And I wouldn't like having to
maintain such a project.

I guess my question boils down to this. Is "from foo import *" really
deprecated or not?

This syntax is only supposed to be a handy shortcut for quick testing
and exploration in an interactive session. Using it in production code
is considered bad form.

If everyone has to use "from foo import *"

I never did in 7 years.

despite
the problems it causes, how do they work around those problems (such
as reloading)?

Do you often have a need for "reloading" in production code ???

Martin, I'm not saying Python is perfect, but it really feels like
you're worrying about things that are not problems.

Mar 5 '07 #9

Bruno Desthuilliers

Martin Unsal a écrit :
(snip)

When refactoring, it's much better to move small files around than to
move chunks of code between large files.

Indeed. But having hundreds or thousands of files each with at most a
dozen lines of effective code is certainly not an ideal. Remember that
Python let you tell much more in a few lines than some mainstream
languages I won't name here.

>
>>I don't reload... When my investigative tests gets bigger I write a script
and run it with the interpreter. It is easy since my text editor can call
Python on a buffer (I use Emacs).

That's interesting, is this workflow pretty universal in the Python
world?

I don't know, but that's also mostly how I do work.

I guess that seems unfortunate to me,

So I guess you don't understand what Jorge is talking about.

one of the big wins for
interpreted languages is to make the development cycle as short and
interactive as possible.

It's pretty short interactive. Emacs Python mode let you fire up a
subinterpreter and eval either your whole buffer or a class or def block
or even a single expression - and play with the result in the
subinterpreter.

As I see it, the Python way should be to
reload a file and reinvoke the class directly, not to restart the
interpreter, load an entire package and then run a test script to set
up your test conditions again.

^Cc^C! to start a new interpeter
^Cc^Cc to eval the whole module

Since the module takes care of "loading the entire package", you don't
have to worry about this. And since, once the script eval'd, you still
have your (interactive) interpreter opened, with all state set, you can
then explore at will. Try it by yourself. It's by far faster and easier
than trying to manually keep track of the interpreter state.

Mar 5 '07 #10

Project organization and import

Similar topics