Is there a "Large Scale Python Software Design" ?

Andrea Griffini

I did it.

I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it. I'm not sure yet if the decision will get through, but
something I'll need in this case is some experience-based set
of rules about how to use python in this context.

For example... is defining readonly attributes in classes
worth the hassle ? Does duck-typing scale well in complex
software or should I go for a classic inheritance hierarchy ?

In other words... is there something like the classic "Large
Scale C++ Software Design" (Lakos) for python ? I'm not
looking for a bible, but lessons learned from someone that
already went down this path could be quite interesting.

Any suggestions/pointers are welcome.

Andrea

Jul 18 '05 #1

Subscribe Reply

6306

Jonathan Ellis

Andrea Griffini wrote:

I did it.

I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it. I'm not sure yet if the decision will get through, but
something I'll need in this case is some experience-based set
of rules about how to use python in this context.

For example... is defining readonly attributes in classes
worth the hassle ? Does duck-typing scale well in complex
software or should I go for a classic inheritance hierarchy ?

In other words... is there something like the classic "Large
Scale C++ Software Design" (Lakos) for python ? I'm not
looking for a bible, but lessons learned from someone that
already went down this path could be quite interesting.

Wouldn't it have been better to ask these questions BEFORE proposing
python as (presumably) a Great Solution? IMO, as great as python is,
it isn't appropriate for projects that are large and include many
developers.

The benefits of static typing, not least among which is the vastly
superior ease of creating tools that "understand" the language,
outweigh python's advantages in an environment when many people are
writing a lot of code. This can be mitigated by reducing the
connectedness of your code, e.g. with a plugin architecture, but that
isn't always an option either...

Good luck.

-Jonathan

Jul 18 '05 #2

Carlos Ribeiro

On 18 Oct 2004 16:49:29 -0700, Jonathan Ellis <jb*****@gmail.com> wrote:

The benefits of static typing, not least among which is the vastly
superior ease of creating tools that "understand" the language,
outweigh python's advantages in an environment when many people are
writing a lot of code. This can be mitigated by reducing the
connectedness of your code, e.g. with a plugin architecture, but that
isn't always an option either...

On principle, I disagree with this statement. Doing large scale
development using Python isn't certainly the same thing as to do it
with another language - C, C++ or Java, for instance. It will require
a different approach to the problem, and perhaps a particular set of
tools and disciplines to help with the process. But I don't think that
static typing represents such a great advantage per itself, as to make
Python badly suited to the problem, because there are many aspects to
it, and Python has its own advantages too. Give it a solid design,
leveraging Python particular strengths, and the end result has the
potential be a positive surprise. But again, that's just my opinion,
and I'm not the best person around to make a definitive claim on it
:-)
--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: ca********@gmail.com
mail: ca********@yahoo.com

Jul 18 '05 #3

Peter L Hansen

Jonathan Ellis wrote:

Andrea Griffini wrote:
I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it.
Wouldn't it have been better to ask these questions BEFORE proposing
python as (presumably) a Great Solution? IMO, as great as python is,
it isn't appropriate for projects that are large and include many
developers.

I don't know what Jonathan's experience with using Python in
large teams and projects is, but mine includes four years
as Director of Software Engineering at a wireless tech company
and a team that ran between ten or fifteen people, and a very
large amount of code. We found Python to be *very* appropriate
for this and of course anything smaller.
The benefits of static typing, not least among which is the vastly
superior ease of creating tools that "understand" the language,
outweigh python's advantages in an environment when many people are
writing a lot of code.

While it appears true that it is easier to develop certain
tools for statically typed languages, it's not at all apparent
that this small benefit outweighs the very significant advantages
that Python brings to large-scale development, and to large-team
development. I'll add "especially when using test-driven
development and any agile process", and to be perfectly honest
I'm not sure I would recommend Python nearly as strongly if one
was forced to use a traditional, non-agile approach to the work.

My past posts on the subject have covered this a number of times.
I have to admit I haven't seen anything from Jonathan on this
topic, so I can't say how his experience compares with mine, nor
why he would feel the way he does.

-Peter

Jul 18 '05 #4

Josiah Carlson

Andrea Griffini wrote:

I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it. I'm not sure yet if the decision will get through, but
something I'll need in this case is some experience-based set
of rules about how to use python in this context.
I know of 2 startups who have decided to construct similar softwares in
Python, due to the fact that they can build entire packages in a year
with a small, but experienced, development team. At least one
of them is funded in the tens-of-millions of dollars range by a
half-dozen automotive and aerospace companies.
Jonathan Ellis wrote: Wouldn't it have been better to ask these questions BEFORE proposing
python as (presumably) a Great Solution? IMO, as great as python is,
it isn't appropriate for projects that are large and include many
developers.

Having recently released a piece of software with 10k lines of Python
running in its backend as a core technology, and being paid for it, I
will say that Python was and is the best tool for the job. A C version
would have been at least 4-10 times as many lines, and we wouldn't be
releasing ~3 months after starting with nearly the confidence we are now.
In terms of developers, some projects require more than one developer,
and in that sense, Python works as well as other languages: planning is
key.

- Josiah

Jul 18 '05 #5

Dave Brueck

Jonathan Ellis wrote:

Andrea Griffini wrote:
I did it.

I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it. I'm not sure yet if the decision will get through, but
something I'll need in this case is some experience-based set
of rules about how to use python in this context.

For example... is defining readonly attributes in classes
worth the hassle ? Does duck-typing scale well in complex
software or should I go for a classic inheritance hierarchy ?

In other words... is there something like the classic "Large
Scale C++ Software Design" (Lakos) for python ? I'm not
looking for a bible, but lessons learned from someone that
already went down this path could be quite interesting.

Wouldn't it have been better to ask these questions BEFORE proposing
python as (presumably) a Great Solution? IMO, as great as python is,
it isn't appropriate for projects that are large and include many
developers.

The OP would be well-advised to search the Google archives of c.l.py as many
(myself included) take the contrarian view - as the project grows in size it is
harder to justify going with "classic" languages like C++, or even Java - the
associated costs at each stage of the project are relatively larger to begin
with, and grow more quickly as well.
The benefits of static typing, not least among which is the vastly
superior ease of creating tools that "understand" the language,
outweigh python's advantages in an environment when many people are
writing a lot of code.

I'm not so sure - how much of the benefit those "smart" tools provide goes to
helping the developer manage complexity caused by the language itself? It seems
that often (not always, of course) a lot of what they do is help the programmer
manage oodles of little details that the programmer ought not be burdened with
in the first place, _especially_ on large projects.

What specifically do you see breaking down if Python is used in a project with
lots of people? From working on large projects with lots of people, I've noticed
that projects naturally get divided into components as different teams work on
them, regardless of the language (so for any given piece of code, the percentage
of the total programmers touching that piece of code drops, not rises, as the
total size of the development staff goes up). Again, regardless of language,
large projects & teams almost force well-defined interface points between
various components - I don't see how Python would be any hinderance at all.

On the plus side, projects implemented in higher level languages grow more
slowly (and thus become unmanageable more slowly) than would projects
implemented in lower-level languages. The list goes on and on - I've found
Python components generally easier to test than, say, C++ components. It's also
easier for more people to comprehend more of the code (and, in turn, more of the
implications of decisions), etc., etc.

-Dave

Jul 18 '05 #6

Stephen Waterbury

Andrea Griffini wrote:

I proposed python as the main language for our next CAD/CAM
software because I think that it has all the potential needed
for it.
I agree, even without knowing the intended scope. ;)
Speaking of scope, if you are allowed to divulge it, that
would be interesting to know. Will it be 2D or 3D (3D I
would assume), and what kind of geometry engine?
Probably one of the open-source ones that already have
a Python API, no?

If it is 3D, a very desirable feature would be STEP
(ISO 10303) geometry import/export, so that you will be able
to exchange CAD data with virtually any commercial CAD
tool, and some open source ones (such as OpenCascade).
That will greatly increase its chance of adoption by
experienced CAD users, who typically have existing
libraries of CAD designs created using a COTS CAD tool.
(This is even more useful if you are planning to support
assemblies of components -- which might even be the most
logical initial feature for a new Python-based CAD/CAM,
since assemblies could be manipulated even without having
native geometric-form-creation capabilities: all you
would need is rendering, orientation, and interfacing
of existing solids -- a.k.a., "parts".)

If you have access to a license for ABAQUS, I recently
discovered that they have implemented a Python API for their
FEA engine, and have implemented STEP geometry as well.
See: http://www.abaqus.com/PAPortal
... I'm not sure yet if the decision will get through, but
something I'll need in this case is some experience-based set
of rules about how to use python in this context.

For example... is defining readonly attributes in classes
worth the hassle ? Does duck-typing scale well in complex
software or should I go for a classic inheritance hierarchy ?

For something as complex as CAD/CAM, you will probably want to
make maximum use of interfaces and adaptors, with minimal and
very judicious application of classic inheritance hierarchies.
I am *not* an expert on interfaces and adapters, but several
of the gurus on this list are.

Since you will probably want to do lots of prototyping, you
can probably delay decisions about matters such as read-only
attributes until your API has stabilized somewhat.

Keep us posted on your progress.

Cheers,
Steve

Jul 18 '05 #7

Jonathan Ellis

Josiah Carlson wrote:

Jonathan Ellis wrote:
Wouldn't it have been better to ask these questions BEFORE proposing python as (presumably) a Great Solution? IMO, as great as python is, it isn't appropriate for projects that are large and include many
developers.
Having recently released a piece of software with 10k lines of Python
running in its backend as a core technology, and being paid for it, I
will say that Python was and is the best tool for the job. A C

version would have been at least 4-10 times as many lines, and we wouldn't be
releasing ~3 months after starting with nearly the confidence we are

now.

Heh. "Large" depends on a lot of things, particularly connectedness,
but I really can't picture 10k being large under any circumstances.
-Jonathan

Jul 18 '05 #8

Jonathan Ellis

Peter L Hansen wrote:

Jonathan Ellis wrote:
The benefits of static typing, not least among which is the vastly
superior ease of creating tools that "understand" the language,
outweigh python's advantages in an environment when many people are
writing a lot of code.
While it appears true that it is easier to develop certain
tools for statically typed languages, it's not at all apparent
that this small benefit outweighs the very significant advantages
that Python brings to large-scale development, and to large-team
development.

Almost four years ago I started working at a company with about 500
kloc of Java code. Thanks largely to tool support I was able to get in
and start fixing bugs my first day (this is without significant prior
Java experience). A more-experienced co-worker pointed me in the right
direction, and the IDE did the rest. ("Find definition," "Find
references.") Grep can do much the same thing, but painfully slowly --
and inaccurately, when you have a bunch of interfaces implementing the
same method names. Even after years in the codebase, I still used
these heavily; the codebase grew to about 800 kloc during the 3 years I
worked there. Developers came and went; even if my memory were good
enough to remember all the code _I_ ever wrote, I'd still have to
periodically repeat the familiarization process with code written by
others.

I haven't jumped into a project of similar size with python, but the
tool support for this approach to working with a large codebase just
isn't there, and I haven't seen any convincing arguments that
alternative methodologies are enough better to make up for this.
I'll add "especially when using test-driven
development and any agile process", and to be perfectly honest
I'm not sure I would recommend Python nearly as strongly if one
was forced to use a traditional, non-agile approach to the work.

Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding
connectedness helps, but that's not always possible.

-Jonathan

Jul 18 '05 #9

Brad Tilley

Jonathan Ellis wrote:

Heh. "Large" depends on a lot of things, particularly connectedness,
but I really can't picture 10k being large under any circumstances.
-Jonathan

It's large to me. Most sys-admin scripts/programs never cross 1K or 2K
at the most. And Python works well for sys-admin tasks.

Jul 18 '05 #10

Alex Martelli

Dave Brueck <da**@pythonapocrypha.com> wrote:

The OP would be well-advised to search the Google archives of c.l.py as
many (myself included) take the contrarian view - as the project grows in
size it is harder to justify going with "classic" languages like C++, or
even Java - the associated costs at each stage of the project are
relatively larger to begin with, and grow more quickly as well.

I entirely agree with you, Dave. Moreover, I do have a mass of growing
but as-yet-unorganized notes, based mostly on experiences on large
projects I have consulted for or even been very intimately connected
with, showing why Python is superior to various plausible alternatives
(for various and different reasons in each case) for large-scale
software development, and what principles, practices and patterns best
enable teams in various conditions to actualize those advantages.

That is the book I want to write, the one I have always wanted to write;
the Nutshell and the Cookbook (and now their second editions) keep
delaying that plan, but, in a sense, that's good, because I keep
accumulating useful experiences to enrich those notes, and Python keeps
growing (particularly but not exclusively in terms of third-party
extensions and tools) in ways that refine and sometimes indeed redefine
some key aspects. To give a simple technical example: I used to have
substantial caveats in those notes cautioning readers to use multiple
inheritance in an extremely sparing, cautious way, due to traps and
pitfalls that made it fragile. Nowadays, with the advent of 2.3, most
of those traps and pitfalls have gone away (in the newstyle object
model), to the point that the whole issue can be reconsidered.

Anybody who has written serious technical books can gauge the amount of
work it takes to turn "a mass of yet-unorganized notes" into a real
book: it's _staggering_. I can't seriously undertake the task of making
my copious notes into a book until I can consider devoting at least half
of my time to it for a year -- this means no other books in the making,
_and_ a reduction in the amount of consulting, teaching, mentoring, etc,
that I do. The biggest general issue is that a book cannot be
_interactive_, _customized_ to the specific skills and interests of a
reader, in the way in which I can customize interactively the kind of
hands-on teaching, mentoring and consulting which I do for a specific
customer.

For a given customer, I can and do find out what kinds of areas they
believe their large projects will cover, what skills their people start
with (and what skills can they expect other people to start with in the
future, depending on expected turnover), the "political" and "social"
dynamics of the team -- is the kind of "customer involvement" that's the
crux of Extreme Programming wrt other kinds of Agile Development
feasible at all, at what cost, etc, for example -- and so on. I can
avoid spending substantial time and energy on issues which don't matter
to project A even though they may be crucial to most large projects --
believe it or not, SOME projects need no networking, others will never
directly interface to a relational database, etc, etc, even though these
days 9 large projects out of 10 will need to deal with both kinds of
issues; and GUI issues, especially for large projects which mostly deal
with web interfacing vs others which will need traditional GUIs, can be
even more divergent. And this amount of variety is just for the
_technical_ issues; the political/social/business-plan ones, people's
skills and backgrounds, etc, are even more diverse...

To make a book, I will have to find an organization that works for busy
readers who don't have the time or patience to read through long parts
connected to database issues if they're on one of the few projects that
don't care about databases, and so forth -- structure sections,
chapters, appendices, footnotes, sidebars, ... so that skimming or
skipping the "don't care about it right now" parts can work; find a way
to reach that part of the audience that has never really undertaken a
large project before, or has played in such projects the role of a "cog"
without a clear picture of the whole structure, _as well all_ the lead
architects and tech-savvy project managers.

Lakos did manage, and I admire him immensely for that. Robert Martin
has also done great work, though his books, while good, are (IMHO) never
_quite_ as excellent as his superb essays (don't get me wrong: I wish I
was half as good as Uncle Bob!-). Eric Raymond's "Art of Unix
Programming" is one of the most useful books for would-be architects of
software systems that I've ever laid my paws on -- I rate it as close to
the Mythical Man-Month, Design Patterns, Programming Pearls, and a few
of the many recent books on Extreme and other Agile methods (my personal
favorites of the crop are Scott Ambler's and Kent Beck's books).

However, none of these excellent books really addresses the questions
specific to the architecture, design, and development practices that
work best for dynamic VHLLs, and specifically for Python; so, I do
believe the book I dream to write is still needed (even though I might
be a grandfather by the time I'm done with it;-).

Meanwhile, to people and firms which aren't interested in retaining my
professional services, the best advice I can give -- after that of
studying the various books I have mentioned above (as well as good
Python books -- I like my own, but then, of course, I'm biased; I'd also
suggest others, such as Holden's, Pilgrim's, Hetland's, ...) -- is to
try something like:
<http://groups.google.com/groups?safe...thon*&as_uauth
ors=alex%20martelli&lr=&hl=en>
as well as similar searches for the many other authors that contribute
so validly to the Python discussions, of course.

Somewhere or other, in my 8190 posts found by the above Google Groups
search, I have expressed (often more than once, and with different
nuances depending on the exact subject, apparent skills and interests of
other discussants, etc; as well as sometimes based on my changing ideas
on some sub-issue, or changes in Python and other tools and
technologies) a majority of the issues that I touch upon in that "mass
of notes". Of course, the stuff is yet more disorganized than said
notes; however, it _is_ written to be read and hopefully understood by
others, while most of said notes are written essentially "to myself", to
remind me of the huge variety of things that may need to be covered
regarding the huge variety of facets that make up the subject "Large
Scale System Architecture, Design, and Development Practices with
Python". Moreover, a majority of the 8190 posts are undoubtedly dealing
with subjects that aren't really related to LSSADDPP. Hey, there's
_got_ to be some advantage in retaining me, or reading my hopefully
future book, rather than combing through all my posts, no?-)

Seriously: one day do I hope to start putting up some parts of those
notes, mutated into intelligible text and organized into kind of
almost-essays, on my website -- fragments of said future book, but more
accessible and usable than the sheer morass of posts above-mentioned.
But don't hold your breath for _that_, any more than for the book; I've
been meaning to redo my site for _years_, and it just hasn't happened...
there's always something else that looks more interesting, either
intrinsically, and/or because of the little issue of money;-). Some
stuff (mostly presentations) you can find at www.strakt.com, which also
has important stuff written by Jacob Hallén and others.

People with lot of important and interesting things to teach, who have
managed to do a much better job than me at organizing their stuff on the
web, include for example Fredrik Lundh and Marc-Andre Lemburg. The
latter gave an hour-long talk this summer at Europython on the subject.
Unfortunately I can't easily find his presentation on
www.europython.org, nor Fredrik's, but I'm sure that an abler searcher
than me will manage, and they do have their own websites as well. In
any case, I'm sure that either of them could be (and often is, in their
respective professional practices) at least as effective as a teacher,
consultant or mentor, on large-scale software projects in Python, as me;
and the same applies no doubt to many others. In fact, the Python world
is blessed, in my opinion, with quite a number of excellent people who
might fill such roles -- one more reason to consider Python for
large-scale, mission-critical development, in fact!!!-)
Alex

Jul 18 '05 #11

Alex Martelli

Stephen Waterbury <go***@comcast.net> wrote:
...

For something as complex as CAD/CAM, you will probably want to
make maximum use of interfaces and adaptors, with minimal and
very judicious application of classic inheritance hierarchies.
I am *not* an expert on interfaces and adapters, but several
of the gurus on this list are.

Heh -- funny enough, I did develop my ideas on protocol adaptation
mostly while working in the CAD area (as Senior Software Consultant to
what used to be Cad.Lab, and is now Think3, for over 10 years).

Our main implementation language, over time, moved from Fortran to C,
then to C++ -- but we did have our own proprietary scripting language,
and a growing amount of applications' functionality was coded in that
higher-level language. Interfaces (formalized or not) were of course a
given -- in the last few years I was there (and later when I worked as a
consultant for them), as the firm had moved to Windows as the only
platform for its products, mostly COM interfaces among components
(earlier, we had tried Corba, Java, and less formalized ones). The
Gof4's Design Patterns, and Lakos' Larce Scale C++ Software Design,
helped us crystallize our ideas and practices when they came out (I
devoured both avidly as soon as I could get my hands on them;-), but we
_had_ mostly gone that way already. But something was missing, and
Robert Martin's excellent essays (the Dependency Inversion Principle
first and foremost) helped BUT didn't quite solve that something...

Protocol Adaptation can, at least potentially. Try Eby's PyProtocols
for a taste (I may not agree with every one of Eby's design and
architectural choices, but nevertheless it seems to me that PyProtocols
is, today, the best implementation of Protocol Adaptation ideas).
Unfortunately, _that_ is when our choice of programming languages bit --
none of them, including our proprietary scripting language, had
introspection and dynamism enough to get anywhere near. Java perhaps
might, with much huffing and puffing, but we had put it aside after
extensive trials: too hard to interface our huge existing base of C++,
and rewriting stuff from C++ to Java would have been a nightmare without
templates (generic programming) in Java at the time -- even quite apart
from performance issues, the productivity gains with Java were not worth
the migration costs (for a single-platform software company, at least;
had we still been striving on multiple platforms, I guess it might have
been different:-).

Python (as Eby's work shows, for example) is fully adequate for Protocol
Adaptation (as, no doubt, would other modern VHLLs!)...
Alex

Jul 18 '05 #12

Stephen Waterbury

Jonathan Ellis wrote:

... A more-experienced co-worker pointed me in the right
direction, and the IDE did the rest. ("Find definition," "Find
references.") Grep can do much the same thing, but painfully slowly --
and inaccurately, when you have a bunch of interfaces implementing the
same method names. ...

Try "glimpse" (http://webglimpse.net) -- it uses a superset of
grep's arguments and can search large collections of files at
a single bound! Re-indexing takes a few seconds, but doesn't
need to be done unless there are major changes. The indexing
makes it considerably faster than grep (you can even read the
index into memory using glimpseserver, and then searches of
~100MB of files take a fraction of a second). The first thing
I do when using any large Python library is put a glimpse
index on it.

Steve

Jul 18 '05 #13

GerritM

"Jonathan Ellis" <jb*****@gmail.com> schreef in bericht
news:10**********************@z14g2000cwz.googlegr oups.com...
<..snip...>

Almost four years ago I started working at a company with about 500
kloc of Java code. Thanks largely to tool support I was able to get in
and start fixing bugs my first day (this is without significant prior
Java experience). A more-experienced co-worker pointed me in the right
direction, and the IDE did the rest. ("Find definition," "Find
references.") Grep can do much the same thing, but painfully slowly --
and inaccurately, when you have a bunch of interfaces implementing the
same method names. Even after years in the codebase, I still used
these heavily; the codebase grew to about 800 kloc during the 3 years I
worked there. Developers came and went; even if my memory were good
enough to remember all the code _I_ ever wrote, I'd still have to
periodically repeat the familiarization process with code written by
others.
The point you make is that good tooling is important. I worked 12 years ago
in a large Objective-C environment. The same static versus dynamic wars were
raging at that time (Objective-C vs C++). I fully agree that good toold make
quite a difference. Most often very simple tools can do wonders. The dynamic
nature of Objective-C made also dynamic tools feasible, with an amazing
small extension. The run-time instrumentation proved at least as powerful,
as the compile time tools. Nowadays the same code is ported to Java, but
unfortunately the same powerful instrumentation is lost.
<...snip> -Jonathan

Contrary to your believe I would jump into larger scale Python development
without hesistation. However, I would introduce a few naming conventions to
support the static tool part.

kind regards, Gerrit
<www.extra.research.philips.com/ natlab/sysarch/>

--
Praktijk voor Psychosociale therapie Lia Charité
<www.liacharite.nl>

Jul 18 '05 #14

Josiah Carlson

Heh. "Large" depends on a lot of things, particularly connectedness,
but I really can't picture 10k being large under any circumstances.

Ok, so what is large? How many orders of magnitude larger than 10k
lines does it take for a piece of software to be large? And why should
you be the judge?

I'd let it slip to medium, but I wouldn't say that the project was small.
Small is something you can do in a weekend because you've been putting
it off. Small is something a newb to the language can do in a week
while they are learning the language.

- Josiah

Jul 18 '05 #15

Andreas Kostyrka

On Tue, Oct 19, 2004 at 07:16:01AM -0700, Jonathan Ellis wrote:

Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding
connectedness helps, but that's not always possible.

What classes of errors are completely avoided by "static typing" as
implemented by C++ (Java)? Just out of curiosity, because this is
usually stated as "true by axiomatic definition" in this kind of
discussions.

Andreas

Jul 18 '05 #16

Dave Brueck

Josiah Carlson wrote:

Heh. "Large" depends on a lot of things, particularly connectedness,
but I really can't picture 10k being large under any circumstances.

Ok, so what is large? How many orders of magnitude larger than 10k
lines does it take for a piece of software to be large? And why should
you be the judge?

I think the only way to compare projects is from a user's or customer's
perspective - what functionality the application provides & its scope. Any
comparison involving lines of code or number of developers won't be reliable
unless other factors (especially implementation language & libraries) are held
semi-constant. For example, at one company I think the total was 1.1 or 1.2
million lines of code (all C++ & about 60-70 developers), and yet I have trouble
imagining how, if I could go back and do it again in Python, it'd take even 200k
lines of code (and the riskier side of me feels it'd come in at under 100k - it
just didn't _do_ a lot despite all that code!)

In that sense, a 10k Python app can be fairly large in terms of end-user
functionality. For example, our main product where I work consists of *many*
different custom servers, a full web-based administrative interface, an end-user
web interface, a client application that does all sorts of interaction with the
servers, and lots of database interaction. Add to this many internal tools,
integration tools we provide to our customers, etc., and I would rate it overall
as on the upper end of medium-sized projects, functionality-wise - not the
largest I've worked on but well beyond any definition of small, and our plans
for the next few quarters will definitely push it into the range of what I'd
normally consider a large system. IIRC we're only in the 10k-20k for lines of
Python code, plus a few modules here and there being C++.

Having said all that, I've found that competitors in our same space tend to have
20-30 developers on the low end to over 100 on the high-end, while we have but a
handful. We don't have quite the same breadth of functionality - at least not
yet - but we generally make up for it by accounting for it architecturally but
not adding it until a customer actually needs it (a sort of JIT approach to
development). As such we've been able to compete head-to-head with others in the
same sector. On more than one occasion I've wondered aloud how so many
developers working for Competitor X can stay busy, and I can only imagine how
many lines of code they're churning out - and yet, from a functionality
perspective we're keeping pace. I also wonder how many hours a day they spend in
meetings trying to coordinate everything. Ugh.

Back to the point at hand: a project using a higher-level language gets out of
hand more slowly; if there were no other advantage it'd still be a "win" IMO
because you encounter "big project" problems a lot later - and that's a huge
benefit in and of itself.

Jul 18 '05 #17

Dave Brueck

Andreas Kostyrka wrote:

On Tue, Oct 19, 2004 at 07:16:01AM -0700, Jonathan Ellis wrote:
Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding
connectedness helps, but that's not always possible.

What classes of errors are completely avoided by "static typing" as
implemented by C++ (Java)?

I'm curious as well, because from what I've seen, the classes of errors "caught"
are (1) a subset of the higher-level (e.g. algorithmic and corner-case) errors
caught by good testing anyway, (2) much more common in code written by
lazy/underexperienced developers who are already considered a liability, and (3)
caused in part by complexities introduced by the language itself*.

More modern/advanced static type systems that let you actually get into the
semantics of the program (as opposed to just deciding which predefined type
bucket your data fits in) may help, but IMO the jury's still out on them (partly
due to complexity, and partly due to _when_ in the development process they must
be defined - perhaps that's the root problem of some static type systems - they
make you declare intent and semantics when you know the _least_ about them!
Consider the parallels to available knowledge in compile-time versus run-time
optimizations).

-Dave

* A trivial example:When programmers need to count something, rarely do they
care about unsigned vs signed or short vs normal vs long vs longlong, and yet in
something like C++ they are _constantly_ making this decision.

Another: in Java, every exception that can be thrown must be mentioned in the
code every step of the way - a maintenance nightmare, not to mention the utter
distraction during development.

Jul 18 '05 #18

Alex Martelli

Dave Brueck <da**@pythonapocrypha.com> wrote:

Andreas Kostyrka wrote:
On Tue, Oct 19, 2004 at 07:16:01AM -0700, Jonathan Ellis wrote:
Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding
connectedness helps, but that's not always possible.
What classes of errors are completely avoided by "static typing" as
implemented by C++ (Java)?

C++'s casting power makes this a bit moot -- I have seen generally-good
developers (not quite comfy with C++, from a mostly-Fortran then a
little C background) mangle poor innocent rvalues (and even lvalues,
BION, with ample supplies of & and * to help) with such overpowering
hits of reinterpret_cast<> that I'm still queasy to think of it years
later. Java is mercifully a bit less powerful, but of course _its_
casts are generally runtime-checked. So, when one sees:

WhatAWonderfulWord w = (WhatAWonderfulWord) v;

one _IS_ admittedly inclined to think that the "class of error being
completely avoided" is "erroneous omission of a cast that plays no
useful role at all and is going to be checked only at runtime anyway".

However, there _are_ tiny but undeniable advantages to static typing:

1. some typos are caught at compiletime, rather than 2 seconds later by
unit tests -- 2 seconds ain't much, but it ain't 0 either;

2. simple-minded tools have an easier time offering such editing
services as "auto-completion", which may save a little typing;

3. simple-minded compilers have an easier time producing halfway
decent code;

and the like. None deal with "classes of errors completely avoided"
unless one thinks of unittests as an optional add-on and of compilers as
a mandatory must-have, which is wrong -- the point Robert Martin makes
excellently in his artima article about the wonders of dynamic typing of
a bit more than a year ago (dynamic typing is wonderful _with_ unit
testing, but then unit testing is an absolute must anyway, to
summarize).

I'm curious as well, because from what I've seen, the classes of errors
"caught" are (1) a subset of the higher-level (e.g. algorithmic and
corner-case) errors caught by good testing anyway,
Yes, undeniable.
(2) much more common in code written by
lazy/underexperienced developers who are already considered a liability,
No, I think you're wrong here. Typos are just as frequent for just
about all classes of coders, lazy or eager, experienced or not -- the
eager experienced ones often use faster typing (nothing to do with
static typing;-).
and (3)
caused in part by complexities introduced by the language itself*.
Yes, a fair cop. E.g., a typo in one of those redundant mentions of a
type or interface, seen above, is an error introduced only because I'm
required to type the GD thing twice over (though autocompletion may save
me some keystrokes;-).

More modern/advanced static type systems that let you actually get into
the semantics of the program (as opposed to just deciding which predefined
type bucket your data fits in) may help, but IMO the jury's still out on
them (partly due to complexity, and partly due to _when_ in the
development process they must be defined - perhaps that's the root problem
of some static type systems - they make you declare intent and semantics
when you know the _least_ about them! Consider the parallels to available
knowledge in compile-time versus run-time optimizations).

If you mean typesystems such as Haskell's or ML's, allowing extended
inference (and, in Haskell's case, the wonder of typeclasses), I think
you're being a bit unfair here. You can refactor your types and
typeclasses just as much as any other part of your code, so the "when
they must be defined" seems a bit of a red herring to me (unless you
have in mind other more advanced typesystems yet, in which case I'd like
some URL to read up on them -- TIA).

I think we agree at 95% to 99%, btw, I admit I'm just picking nits...
Alex

Jul 18 '05 #19

Alex Martelli

Josiah Carlson <jc******@uci.edu> wrote:

Heh. "Large" depends on a lot of things, particularly connectedness,
but I really can't picture 10k being large under any circumstances.
Ok, so what is large? How many orders of magnitude larger than 10k
lines does it take for a piece of software to be large? And why should
you be the judge?

My definition of a large software system is: a system that cannot
sensibly be developed and maintained by just one developer, but requires
a team of developers. Among the factors defining where the boundaries
lie are such things as deployment issues (how many platforms, how
diverse), function points, analysis/requirements, etc, etc, but SLOC
(properly counted/normalized lines of code) are the main determinant.

For a reasonably experienced programmer, with decent tools, and without
hair-raising problems of deployment, optimization, continuous fast
changes to specs, etc, etc, 10k SLOC should be within the threshold of
"can be sensibly developed and maintained by one person"; 100k SLOC
won't be; the threshold is somewhere in-between. Of course, if you're
talking freshman programming trainees, or special problems of the
various sorts mentioned, the thresholds do shift downwards.

I'd let it slip to medium, but I wouldn't say that the project was small.
Small is something you can do in a weekend because you've been putting
it off. Small is something a newb to the language can do in a week
while they are learning the language.

OK, that's your definition of "small", I guess. I don't know that
there's a commonly accepted one. On the other hand, moving from a
project that can all fit in your head, one you can fully develop and
actively maintain by yourself, to a team situation, _is_ a crucial
threshold, as teams have such different strengths and problems than
individuals on their own; and the "Large Scale" monicker is typically
tagged onto projects requiring a team.

We can quibble about special cases (is a 2-people team, with one of them
developing half-time and the rest of the time out selling the system,
comparable to a more typical case of 6-10 people working full-time on
development and maintenance of a system?), but that's always so for
taxonomies, and doesn't add much to the discussion IMHO.
Alex

Jul 18 '05 #20

Greg Ewing

Andrew Dalke wrote:

I didn't know what was going on at a certain
spot, looked up a few lines, and saw the comment I had
written explaining the tricky spot. I thought it was
very nice of the past me to help out then present me. :)

Indeed. It would be nice to have access to a time machine
so one could go back and ask oneself about things like
this...

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #21

Andrew Dalke

Greg Ewing wrote:

Indeed. It would be nice to have access to a time machine
so one could go back and ask oneself about things like
this...

As in James P. Hogan's book "Thrice Upon a Time".
Also notable for having no antagonist and for being
the only story I know of that uses quantum time as
the way to resolve time travel paradoxes.

Andrew
da***@dalkescientific.com

Jul 18 '05 #22

Michele Simionato

al*****@yahoo.com (Alex Martelli) wrote in message news:<1glwwg1.1hgx2561vflstbN%al*****@yahoo.com>.. .

That is the book I want to write, the one I have always wanted to write;
the Nutshell and the Cookbook (and now their second editions) keep
delaying that plan, but, in a sense, that's good, because I keep
accumulating useful experiences to enrich those notes, and Python keeps
growing (particularly but not exclusively in terms of third-party
extensions and tools) in ways that refine and sometimes indeed redefine
some key aspects. To give a simple technical example: I used to have
substantial caveats in those notes cautioning readers to use multiple
inheritance in an extremely sparing, cautious way, due to traps and
pitfalls that made it fragile. Nowadays, with the advent of 2.3, most
of those traps and pitfalls have gone away (in the newstyle object
model), to the point that the whole issue can be reconsidered.

Uhm ... I must say that I was quite keen of Multiple Inheritance
but having seen the (ab)use of it in Zope I am starting questioning
the wisdom of it. The problem I see with MI (even done well) is that
you keep getting methods from parent classes and each time you have
to think about the MRO and the precedence rules. It is an additional
burden in the programmer's mind. I miss the clean simple concept of
superclass; the MRO may be cool but it is not as simple to learn, to
teach
and especially remember. Notice, I am not referring to the algorithm,
it is
not important to remember it; what is disturbing to me is to be aware
that the resolution of the methods can be non-trivial and that I
should call .mro() each time to check exactly what is happening.
Also 'super' is hard to understand and to use :-(
So, I wonder if Matz was right after all and single inheritance +
mixins
à la Ruby are the right way to go. Yes, from a purist point of view
they are inferior to MI, however from the pragramatist point
of view I don't think you loose very much, and you get a big gain in
short learning curve and expliciteness. Especially 'super' stays
simple.

However I lack experience in Ruby with mixins: do you have experience
or do you know people with experience on that? What they think?
Are they happy with the approach or they wish Ruby had real MI?
Of course in simple systems there is no real issue, I am talking
about large systems. Also, I am not talking about wrong design choice
(for
instance Zope 3 use MI much less than Zope 2: I interpret this as a
recognition that the design was wrong) but in general: assuming you
have
an application where the "right" design is via mixins, is there a real
difference in doing it à la Ruby or with real MI?
It does not look there is a big difference, in practice.
Yes, you do not have the full power of cooperative methods but you
also avoid the burden of them and you can always find workarounds; I
would
say there are compensations.

I have not yet a definite opionion on this point, so I would like to
hear the opinion of others, especially people with real world
experience in complex
systems.

Michele Simionato

Jul 18 '05 #23

Jonathan Ellis

Stephen Waterbury wrote:

Jonathan Ellis wrote:
... A more-experienced co-worker pointed me in the right
direction, and the IDE did the rest. ("Find definition," "Find
references.") Grep can do much the same thing, but painfully slowly -- and inaccurately, when you have a bunch of interfaces implementing the same method names. ...

Try "glimpse" (http://webglimpse.net) -- it uses a superset of
grep's arguments and can search large collections of files at
a single bound! Re-indexing takes a few seconds, but doesn't
need to be done unless there are major changes.

glimpse addresses grep's speed problem, but unfortunately has no more
semantic understanding beyond "it's just text." Etags is a little
better but not much, and also suffers from the
have-to-remember-to-reindex-if-you-want-accurate-results "feature."
-Jonathan

Jul 18 '05 #24

Jonathan Ellis

Andreas Kostyrka wrote:

On Tue, Oct 19, 2004 at 07:16:01AM -0700, Jonathan Ellis wrote:
Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding connectedness helps, but that's not always possible.
What classes of errors are completely avoided by "static typing" as
implemented by C++ (Java)? Just out of curiosity, because this is
usually stated as "true by axiomatic definition" in this kind of
discussions.

As one example: in this codebase (closer to 700 kloc than 500 by this
time, if it matters) the very oldest code used a Borland wrapper over
JDBC. At the time, it allowed doing things JDBC version 1 did not; by
the time I got fed up, JDBC version 3 had caught up and far surpassed
Borland's API. There was also a lot of JDBC code that was suboptimal
-- for the application I worked on, it almost always made sense to use
a PreparedStatement rather than a simple Statement, but because binding
parameters in jdbc is something of a PITA we often went with the
Statement anyway. Both the Borland-style and the JDBC code also dealt
with calls to stored procedures, most of them not in CallableStatements
(the "right" way to do this).

I volunteered to write a more friendly wrapper over JDBC than Borland's
that would handle caching of [Prepared|Callable]Statement objects and
parameter binding transparently, nothing fancy (in particular my select
methods returned ResultSets, where Borland had their own class for
this) and rewrite these thousands of calls to use the new API. Of
course I wrote scripts to do this; 5 or 6, each handling a different
aspect.

To write unit tests for this by hand would have been obscene. (As an
aside, writing unit tests for anything that deals with many tables in a
database is a PITA already and usually ends up not really a "unit" test
anymore.) Even generating unit tests with more scripts would have
required a significantly deeper semantic understanding of the code
being filtered, and hence a lot more work.

As it was, with the compiler letting me know when I screwed up and
improve my scripts accordlingly, out of the thousands of calls, I
ultimately had to do a few dozen by hand (because that was less work
than getting my scripts able to deal with the very worst examples), and
the compiler let me know what those were. After the process was
complete, QA turned up (over several weeks) 4 or 5 places where I'd
broken things despite the static checking, which I considered a very
good success ratio.

-Jonathan

Jul 18 '05 #25

Jonathan Ellis

Peter Hansen wrote:

Jonathan Ellis wrote:
I haven't jumped into a project of similar size with python, but the tool support for this approach to working with a large codebase just isn't there, and I haven't seen any convincing arguments that
alternative methodologies are enough better to make up for this.
I'm getting the impression you also haven't tried any significant
test-driven development. The benefits of this approach are *easily*
convincing to most people, and it also fits the bill as removing the
need for a very sizable portion of the tool support which you rightly
point out is not there in most tools for dynamically typed languages.

I think I responded to this already --

Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large. Avoiding connectedness helps, but that's not always possible.

Oh yes; so I did. :) (See my reply to another subthread for one
example of when static type checking saved me a LOT of work.)

What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.

So people don't get me wrong: I love python. Most of my programming
friends call me "the python zealot" behind my back. I just don't think
it's the right tool for every problem.

Specifically, in my experience, statically-typed languages make it much
easier to say "okay, I'm fixing a bug in Class.Foo; here's all the
places where it's used." This lets me see how Foo is actually used --
in a perfect world, Foo's documentation is precise and up to date, but
I haven't worked anywhere that this was always the case -- which lets
me make my fix with a reasonable chance of not breaking anything.
Compile-time type checking increases those chances. Unit tests
increase that further, but relying on unit tests as your first and only
line of defense is suboptimal when there are better options.
Having experience with both approaches, and choosing one over
the other, gives one greater credibility than having experience
with just one approach, yet clinging to it...

You are incorrect if you assume I am unfamiliar with python. I readily
admit I have no experience with truly large python projects; I would
classify my the python application I work on as "small," but it seems I
am in good company here in that respect... I do claim to have fairly
extensive experience with large projects in a statically typed language
(Java).

-Jonathan

Jul 18 '05 #26

Alex Martelli

Jonathan Ellis <jb*****@gmail.com> wrote:

What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.

I have the experience, both with Python and with C++, and I can confirm
that test-driven development (with more code for tests, particularly
unit- but also system-/integration-/acceptance-, than code to implement
actual functionality) scales up.

The C++ system had about five times the number of developers and ten
times the code size for about the same amount of functionality (as
roughly measured in function points) as the Python system.

Type safety and const-correctness in the C++ system were of very minor
help; not 100% negligible, but clearly they were not pulling their
weight, by a long shot.

In both systems, the trouble spots came invariably where testing had
been skimped on, due to time pressures and insufficient acculturation of
developers to testing; the temptation to shirk is a bit bigger in C++,
where one can work under the delusion that the compiler's typechecks
compensate (they don't).

_Retrofitting_ tests to code developed any old how is not as effective
as growing the tests and code together. It appears to me that the
experience you relate is about code which didn't have a good battery of
unit tests to go with it.

Lastly, I'm still looking for systematic ways to test system integration
that are as effective as unit tests are for each single component or
subsystem; but that's an area where type and const checking are of just
about negligible help.
Alex

Jul 18 '05 #27

GerritM

"Alex Martelli" <al*****@yahoo.com> schreef in bericht
news:1gm1x7t.w8k2hqrv300tN%al*****@yahoo.com...
<...skip...>

Lastly, I'm still looking for systematic ways to test system integration
that are as effective as unit tests are for each single component or
subsystem; but that's an area where type and const checking are of just
about negligible help.

System integration has a completely different nature than unit testing.
During system integration the "unforeseens" and the "unknowns" pop-up. And
of course the not-communicated, implicit human assumptions are uncovered.
And the "non-functional" behavior is a source of problems (response times,
memory footprint, etc). Many system integration problems are semantic
problems. system integration is often dufficult due to heterogeniety of the
problems, technologies and the people involved. In other words the larger
the system the more challenging systems integration becomes

All of these problems are not addressed at all by static typing. However,
design clarity and compactness does help tremendously. I would expect for
these reasons that Python is a big plus during system integration of large
systems. Of course design attention is required to cope with the
"non-functional" imapct of Python, such as CPU and memory consumption. on
top of that (run-time) instrumentation is very helpful. Here again the
dynamic nature of Python is a big plus.

kind regards, Gerrit Muller
Gaudi Systems Architecting www.extra.research.philips.com/natlab/sysarch/

Jul 18 '05 #28

Alex Martelli

GerritM <gm*****@worldonline.nl> wrote:

"Alex Martelli" <al*****@yahoo.com> schreef in bericht
news:1gm1x7t.w8k2hqrv300tN%al*****@yahoo.com...
<...skip...>
Lastly, I'm still looking for systematic ways to test system integration
that are as effective as unit tests are for each single component or
subsystem; but that's an area where type and const checking are of just
about negligible help.
System integration has a completely different nature than unit testing.
During system integration the "unforeseens" and the "unknowns" pop-up. And
of course the not-communicated, implicit human assumptions are uncovered.

Exactly -- which is why I'm still looking (doesn't mean I think I'll
find;-).
All of these problems are not addressed at all by static typing. However,
Essentially not.
design clarity and compactness does help tremendously. I would expect for
these reasons that Python is a big plus during system integration of large
Not as much as one might hope, in my experience. Protocol Adaptation
_would_ help (see PEP 246), but it would need to be widely deployed.
systems. Of course design attention is required to cope with the
"non-functional" imapct of Python, such as CPU and memory consumption. on
top of that (run-time) instrumentation is very helpful. Here again the
dynamic nature of Python is a big plus.

But the extreme difficulty in keeping track of what amount of memory
goes where in what cases is a big minus. I recall similar problems with
Java, in my limited experience with it, but for Java I see now there are
commercial tools specifically to hunt down memory problems. In C++
there were actual _leaks_ which were a terrible problem for us, but
again pricey commercial technology came to the rescue.

With Python, I've found, so far, that tracking where _time_ goes is
quite feasible, with systematic profiling &c (of course profiling is
always a bit invasive, and so on, but no more so in Python than
otherwise), so that in the end CPU consumption is no big deal (it's easy
to find out the tiny hot spot and turn it into an extension iff needed).
But memory is a _big_ problem, in my experience so far, with servers
meant to run a long time and having very large code bases. I'm sure
there IS a commercial niche for a _good_ general purpose Python tool to
keep track of memory consumption, equivalent to those available for C,
C++ and Java...
Alex

Jul 18 '05 #29

Peter Hansen

Jonathan Ellis wrote:

Peter Hansen wrote:
I'm getting the impression you also haven't tried any significant
test-driven development.
I think I responded to this already --
Testing is good; preventing entire classes of errors from ever
happening at all is better, particularly when you get large.
Oh yes; so I did. :) (See my reply to another subthread for one
example of when static type checking saved me a LOT of work.)

And you've reemphasized my point. "Testing" is not test-driven
development. In fact, test-driven development is about *design*,
not just about testing. The two are related, but definitely not
the same thing, and eliminating TDD with a wave of a hand intended
to poo-poo mere testing is to miss the point. Once someone has
tried TDD, they are unlikely to lump it in with simple "unit testing"
as it has other properties that aren't obvious on the surface.
What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.
The topic of the thread was large projects with _large teams_,
I thought, so I won't focus on my personal work. The team I
was leading worked on code that, if I recall, was somewhat over
100,000 lines of Python code including tests. I don't recall
whether that number was the largest piece, or combining several
separate applications which ran together but in a distributed
system... I think there were close to 20 man years in the main
bit.

(And remembering that 1 line of Python code corresponds to
some larger number, maybe five or ten, of C code, that should
qualify it as a large project by many definitions.)
So people don't get me wrong: I love python. Most of my programming
friends call me "the python zealot" behind my back. I just don't think
it's the right tool for every problem.
Neither do I. The above project also involved some C and
some assembly, plus some Javascript and possibly something else
I've forgotten by now. We just made efforts to use Python *as
much as possible* and it paid off.
Specifically, in my experience, statically-typed languages make it much
easier to say "okay, I'm fixing a bug in Class.Foo; here's all the
places where it's used." This lets me see how Foo is actually used --
in a perfect world, Foo's documentation is precise and up to date, but
I haven't worked anywhere that this was always the case -- which lets
me make my fix with a reasonable chance of not breaking anything.
Compile-time type checking increases those chances. Unit tests
increase that further, but relying on unit tests as your first and only
line of defense is suboptimal when there are better options.

But what if you already had tests which allowed you to do exactly
the thing you describe? Is there a need for "better options"
at that point? Are they really better? When I do TDD, I can
*trivially* catch all the cases where Class.Foo is used
because they are all exercised by the tests. Furthermore, I
can catch real bugs, not just typos and simple things involving
using the wrong type. A superset of the bugs your statically
typed language tools are letting you catch. But obviously
I'm rehashing the argument, and one which has been discussed
here many times, so I should let it go.
Having experience with both approaches, and choosing one over
the other, gives one greater credibility than having experience
with just one approach, yet clinging to it...

You are incorrect if you assume I am unfamiliar with python.

I assumed no such thing, just that you were unfamiliar with
large projects in Python and yet were advising the OP on its
suitability in that realm. You're bright and experienced, and
your comments have substance, but until you've actually
participated in a large project with Python and seen it fail
gloriously *because it was not statically typed*, I wouldn't
put much weight on your comments in this area if I were the
OP. That's all I was saying...

-Peter

Jul 18 '05 #30

Alex Martelli

Peter Hansen <pe***@engcorp.com> wrote:
...

And you've reemphasized my point. "Testing" is not test-driven
development. In fact, test-driven development is about *design*,
not just about testing. The two are related, but definitely not
Hmmm... the way I see it, it's about how one (or, better!!!, two: pair
programming is a GREAT idea) proceeds to _implement_ a design. The fact
that the package (or other kind of component) I'm writing will offer a
class Foo with a no-parameters constructor, methods A, B, and C with
parameters thus and thus, etc, has hopefully been determined and agreed
beforehand -- the people who now write that package, and other teams who
write other code using the package, have presumably met and haggled
about it and coded one or more mock-up versions of the package (or used
other lesser way to clarify the specs), so that code depending on the
package can be tested-and-coded (with the mock-ups) even while the
package itself is being tested and coded...

I know Kent Beck's book shows a much more 'exploratory' kind of TDD, but
in a large project that would lead to its own deleterious equivalent of
"waterfall": everything must proceed bottom-up because no mid-level
components can be coded until the low-level components are done, and so
forth. I don't think that's acceptable in this form, in general.

_Design_, in large scale software development, is mainly about sketching
reasonable boundaries between components to allow testing and coding to
proceed with full advantage of the fact that the team of developers is
of substantial size. Indeed there may be several sub-teams, or even
several full-fledged teams, though the latter situation (over, say,
around 20 developers, assuming colocation... that's the very maximum you
could possibly, sensibly cram into a single team) suddenly begets its
own sociopolitical AND technical problems... I have not been in that
situation with Python, yet, only with Fortran, C, C++.

Of course when both components are nominally done there comes
integration testing time, and the sparks fly;-). Designing integration
tests ahead of time, at the same time as the mock-ups, _would_ help, but
somehow or other it never really seems to happen (I'd love hearing
real-life experiences from somebody who DID manage to make it happen,
btw; maybe I'd learn how to "facilitate" its happening, too!-).

If and when you're lucky there's some 'customer' (in the
extreme-programming sense) busy writing _acceptance_ tests for the
system, making the user-stories concrete, at the same time -- but good
acceptance tests are NOT the same thing as good integration tests... you
need both kinds (at least if the system is truly large). Anyway, at
integration-testing time and/or acceptance-testing time, there is
typically at least one iteration where the mock-ups/specs are updated to
take into account of what we've learned while implementing the component
and consuming it, and it's back to the pair-programming parts with TDD.

But these fascinating AND crucially important issues are about far wider
concerns than "static type testing" can help with. "Design by
Contract", where the mock-up includes preconditions and postconditions
and invariants, can be helpful, but such DbC thingies are to be checked
at runtime, anyway (they're great in pinpointing more problems during
integration testing, etc, etc, they don't _substitute_ for testing
though, they simply amplify its effectiveness, which is good enough).

TDD may surely help defining the internal structures and algorithms
within a single component, of course, if that's what you mean by design.

But (with decent partitioning) a single component should be _at most_ a
few thousand lines of Python code -- very offhand I'd say no more than
2/3 thousand lines of functional code, as much again of unit tests, and
a generous addition of comments, docstrings, and blank lines, to a total
line count, as "wc *.py" gives it of no more than 6k, 7k tops. If it's
bigger, there are problems -- docstrings are trying to become user
reference manuals, comments are summarizing whole books on data
structures and algorithms rather than giving the URLs to them, or, most
likely, there was a mispartitioning and this poor "component" is being
asked to do far too much, way more than one cohesive set of
responsibilities which need to be well-coordinated. Time to call an
emergency team meeting and repartition a little bit.

Hmmm, this has relatively little to do with static type checks, but is
extremely relevant to the 'Subject' - indeed, it's one (or two;-) of the
many sets of issues that (IMHO) need to be addressed in a book or course
on large scale software development (not JUST design, mind you: the
process whereby the design is defined, how it's changed during the
development, and how it is implemented by the various components, is at
least as important as the technical characteristics that the design
itself, seen as a finished piece of work, should exhibit...).
the same thing, and eliminating TDD with a wave of a hand intended
to poo-poo mere testing is to miss the point. Once someone has
Absolutely -- I do fully agree with you on this.
tried TDD, they are unlikely to lump it in with simple "unit testing"
as it has other properties that aren't obvious on the surface.
It sure beats "retrofitting" unit tests post facto. But I'm not sure
what properties you have in mind here; care to expand?

What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.

The topic of the thread was large projects with _large teams_,
I thought, so I won't focus on my personal work. The team I

Yeah, I think the intended meaning was "in which you personally have
taken part" rather than any implication of "single-handedly" -- the
mention of "more than a handful of developers" being key.

BTW, a team with 5-6 colocated developers, plus support personnel for
GUI painting/design (in the graphical sense) and/or webpage/style ditto,
system administration, documentation, acceptance testing, etc, can build
QUITE a large system, if the team dynamics and the skills of the people
involved are right. So the "more of a handful of developers" doesn't
seem a necessary part of the definition of "large scale software
development". 5-6 full-time developers already require the kind of
coordination and component design (partitioning) that 10-12 will,
there's no big jump there in my experience. The jump does come when you
double again (or can't have colocation, even with just, say, 6 people),
because that's when the one team _must_ split into cooperating teams
(again in my experience: I _have_ seen -- thankfully not participated in
-- alleged single "teams" of 50 or more people, but I am not sure they
actually even managed to deploy any working code... whatever language
we're talking about, matters little, as here we're clashing with a
biological characteristic of human beings, probably connected to the
prehistorical size of optimal hunting bands or something!-).
was leading worked on code that, if I recall, was somewhat over
100,000 lines of Python code including tests. I don't recall
whether that number was the largest piece, or combining several
separate applications which ran together but in a distributed
system... I think there were close to 20 man years in the main
bit.
I think this qualifies as large, assuming the separate applications had
to cooperate reasonably closely (i.e. acting as "components", even
though maybe in separate processes and "only" touching on the same
database or set of files or whatever).
(And remembering that 1 line of Python code corresponds to
some larger number, maybe five or ten, of C code, that should
qualify it as a large project by many definitions.)
I agree. There IS a persistent idea that codebase size is all that
really matters, so 100,000 lines of code are just as difficult to
develop and maintain whether they're assembly, C, or Python. I think
this common idea overstates the case a bit (and even Capers Jones
agrees, though he tries to do so by distinguishing coding from other
development activities, which isn't _quite_ my motivation).

Part of why I recommend having no more than 2-3 k lines of functional
code in a single Python component (plus about as much again of unit
test, etc, to no more than 6-7k lines including blanks/cmts/docstrings,
as above explained) is that those (say) 2.5k lines can do a hell of a
_LOT_ of stuff, quite comparable in my experience to 10k-15k lines of
C++ or Java (and more than that of C, of course) -- on the order of
magnitude of 200-300 function points at least. If you go much above
that, keeping the characteristics of cohesion and coherence becomes way
too hard. So, a 100kSLOC Python project will have at least about 40
components, and 10k or so FPs, where a Java project with the same line
count might typically have 2-3K FPs spread into, say, 15 components.
(I'm thinking of functional effective lines, net of testing, comments,
docstrings, or any kind of code instrumentation for debug/profile/&c).

In other words: the Python project is _way_ bigger in functionality, and
therefore in needed/opportune internal granularity, than the Java one
with the same SLOCs. Jones' estimates for Java's language level; are
"10 to 20 function points per staff month". He doesn't estimate Python,
but if I'm right and the language level (FP/SLOC) is about 4-5 times
Java's, nevertheless according to Jones' tables that, per se, would only
push productivity to "30 to 50 function points per staff month" -- a
factor of less than three.

(( Of course, for both Java and Python, and also C, C++, etc,
superimposed on all of these productivity estimates there _is_ the
possibility of reuse of the huge libraries of code available for these
languages -- most of all for Python, who's well supplied with tools and
tecnologies to leech^H^H^H^H^H ahem, I mean, fruitfully reuse good
existing libraries almost regardless of what language the libraries were
originally made _for_. A reuse-oriented culture, particularly now that
so many good libraries are available under open-source terms, CAN in my
opinion easily boost overall productivity, in terms of functionality
delivered and deployed, by _AT LEAST_ a factor of 2 in any of these
languages. But this, in a way, is a different issue... ))

So people don't get me wrong: I love python. Most of my programming
friends call me "the python zealot" behind my back. I just don't think
it's the right tool for every problem.

Neither do I. The above project also involved some C and
some assembly, plus some Javascript and possibly something else
I've forgotten by now. We just made efforts to use Python *as
much as possible* and it paid off.

Hmmmm, yes, assembly may be unusual these days, but C extensions are
very common, pyrex ones rightfully becoming more so, Javascript quite
typical when you need to serve webpages that are richly interactive
without requiring round-trips to the server, and we shouldn't ignore the
role of XSLT and friends too. And what large project is without some
SQL? Exceedingly few, I think.

But Python can fruitfully sit in the center and easily amount to 80% or
90% of the codebase even in projects needing all of these other
technologies for specialized purposes...

Specifically, in my experience, statically-typed languages make it much
easier to say "okay, I'm fixing a bug in Class.Foo; here's all the
places where it's used." This lets me see how Foo is actually used --
in a perfect world, Foo's documentation is precise and up to date, but
I haven't worked anywhere that this was always the case -- which lets
me make my fix with a reasonable chance of not breaking anything.
Compile-time type checking increases those chances. Unit tests
increase that further, but relying on unit tests as your first and only
line of defense is suboptimal when there are better options.

But what if you already had tests which allowed you to do exactly
the thing you describe? Is there a need for "better options"
at that point? Are they really better? When I do TDD, I can
*trivially* catch all the cases where Class.Foo is used
because they are all exercised by the tests. Furthermore, I

Absolutely. The main role of the unit tests is exactly to define all
the use cases of Foo and the expected results of such uses. If the unit
tests are decent, and with TDD they _will_ be, they suffice to let you
change Foo's internals without breaking Foo's uses (refactoring).

One thing unit tests can't do, and Foo's documentation cannot either, is
to find out if any of Foo's abilities are _totally unused_ -- for that,
you do need to scour the codebase. Trimming functionality that had
originally seemed necessary and was negotiated to be included, but turns
out to be overdesigned, is not a crucial activity (it's sure not worth
distorting a language to make such trimming faster), but it's a nice
periodic exercise. Anything that's excised from the code, and tests,
and internal docs, is so much less to maintain in the future. Of
course, you can't do that anyway if you "publish" components for outside
consumption by code you can't check or control; and even in a single
team situation you still need to check with others if they weren't
planning to use just tomorrow one of the capabilities you'd like to
remove today.

One interesting possibility is to instrument Foo to record all the uses
it gets, tracing them into a file or wherever, then run the system
through its paces -- all the unit tests of every component that depends
(even indirectly) on the one containing Foo, and all the existing
integration and acceptance tests. A profiler can typically do it for
you, in any language, when used in "code coverage" mode. If any part of
Foo's code has 0 coverage _except_ possibly by Foo's own unit tests,
that _does_ tell you something. And it need have nothing to do with
typing, of course. One case I recall from many years ago was something
like:

int foo(int x, int y) {
if (x<23) { /* small fast case, get out of the way quick */
/* a dozen lines of code for the small fast case */
} else { /* the real thing, get to work! */
/* six dozen lines of code for the real thing */
}
}

where the whole 'real thing' _never_ happened to be exercised. With a
little checking around, changing this to return an error code if x>=23
(it should never have happened, just as it never did) was a really nice
_snip_ (excised code goes to a vault and a pointer to it is left in a
comment here, of course, in case it's needed again in the future; but
meanwhile it doesn't need to get maintained or tested, maybe for years,
maybe forever...).
can catch real bugs, not just typos and simple things involving
using the wrong type. A superset of the bugs your statically
typed language tools are letting you catch. But obviously
I'm rehashing the argument, and one which has been discussed
here many times, so I should let it go.

You surely won't get any disagreement from me about this -- and I don't
believe any static-typing enthusiast argues _against_ unit tests and
TDD, they just want BOTH, even though we claim (and C++/Java guru Robert
Martin himself strongly claims) that TDD and systematic unit testing
really makes static-typing rather redundant... you keep paying all the
price for that language feature, don't get much benefit in return.

Having experience with both approaches, and choosing one over
the other, gives one greater credibility than having experience
with just one approach, yet clinging to it...

You are incorrect if you assume I am unfamiliar with python.

I assumed no such thing, just that you were unfamiliar with
large projects in Python and yet were advising the OP on its
suitability in that realm. You're bright and experienced, and
your comments have substance, but until you've actually
participated in a large project with Python and seen it fail
gloriously *because it was not statically typed*, I wouldn't
put much weight on your comments in this area if I were the
OP. That's all I was saying...

I would gladly accept as relevant experiences with other languages that
are strictly but dynamically typed, such as, say, Smalltalk or Ruby or
Erlang, if project failures (or even, short of failures, severe
productivity hits) can indeed be traced, despite proper TDD/unit
testing, to the lack of statically checked typing. I try to keep up
with the relevant literature (can't possibly manage for _all_ of it of
course) and don't recall any such experiences, but of course I may well
have missed some, particularly since not everything gets published.
Alex

Jul 18 '05 #31

GerritM

"Alex Martelli" <al*****@yahoo.com> schreef in bericht
news:1gm21ye.rego7yswhcmsN%al*****@yahoo.com...

GerritM <gm*****@worldonline.nl> wrote: <...snip...>
design clarity and compactness does help tremendously. I would expect for these reasons that Python is a big plus during system integration of large
Not as much as one might hope, in my experience. Protocol Adaptation
_would_ help (see PEP 246), but it would need to be widely deployed.
I think that I understand how PEP 246 might be an improvement over the
current situation. However, I think that Python 2.3 capabilities already
result in smaller programs and presumably less modules than their equivalent
in Java (or C++). The Objective-C system that we created (360kloc in 1992,
600kloc in 1994) did have a signficant amount of classes that are today
covered by the standard build-ins.I expect that Java and C++ suffer from the
same problem. The packages that I used a long time ago in Java were less
natural than todays Python packages (this might be entirely different today,
I haven't touched Java for centuries, ehh years). My assumption is that
integration problems are at least proportional with the implementation size
(in kloc). So my unproven hypothesis is that since Python programs tend to
be smaller than their Java equivalent that the integration problems are
smaller, purely due to the size.
systems. Of course design attention is required to cope with the
"non-functional" imapct of Python, such as CPU and memory consumption. on top of that (run-time) instrumentation is very helpful. Here again the
dynamic nature of Python is a big plus.

But the extreme difficulty in keeping track of what amount of memory
goes where in what cases is a big minus. I recall similar problems with
Java, in my limited experience with it, but for Java I see now there are
commercial tools specifically to hunt down memory problems. In C++
there were actual _leaks_ which were a terrible problem for us, but
again pricey commercial technology came to the rescue.

In the same system mentioned above we build our own instrumentation. The
main part was based on insering a small piece of adminstrative code at every
object creation and deletion. This Object Instantation Tracing proved to be
a goldmine of information, including memory use. For instance the memory use
of Lists and Dictionaries could be traced for well defined use cases.
Besides this instrumentation we did the memory management of "bulkdata",
such as images, explicitly. This helps to keep the memory consumption within
specified boundaries and it helps to prevent memory fragmentation problems.
<...snip...> But memory is a _big_ problem, in my experience so far, with servers
meant to run a long time and having very large code bases. I'm sure
there IS a commercial niche for a _good_ general purpose Python tool to
keep track of memory consumption, equivalent to those available for C,
C++ and Java...

The investment in the tools mentioned above were relatively small. However,
this works only if the entire system is based on the same architectural
rules.

The additional challenge of Python relative to Objective-C is its garbage
collection. This provides indeed a poorly predictable memory behavior.

Some of the design aspects mentioned here are described in this chapter of
my PhD thesis:
http://www.extra.research.philips.co...ualViewPaper.p
df

kind regards, Gerrit

Jul 18 '05 #32

Peter Hansen

Alex Martelli wrote (a hell of a lot, as usual, and I do hope
he'll forgive me that I chose to skip/skim some material and
try merely to catch the highlights, doubtless missing some
interesting bits in the process):

Peter Hansen <pe***@engcorp.com> wrote:
And you've reemphasized my point. "Testing" is not test-driven
development. In fact, test-driven development is about *design*,
not just about testing. The two are related, but definitely not
Hmmm... the way I see it, it's about how one (or, better!!!, two: pair
programming is a GREAT idea) proceeds to _implement_ a design. The fact
that the package (or other kind of component) I'm writing will offer a
class Foo with a no-parameters constructor, methods A, B, and C with
parameters thus and thus, etc, has hopefully been determined and agreed

We don'really disagree on this point. I'd clarify my comments just
by saying that depending on what stage you are looking at, there
is always a preceding decision that could be called "design" and some
subsequent work that implements that design. If you are figuring out
what requirements your system should have, you are "designing" it for
your eventual users in a sense. If you are analysing requirements
later on and blocking out the major architectural areas and interfaces,
you are doing design, but then the traditional "designers" might still
have to go to work. Those designers (being the ones we usually saddle
with the title) then do "detailed design" and specify interfaces and
such as you note above, but they aren't yet doing implementation. Along
comes the programmer pair and they "design" the implementation in their
heads as they come to a failing acceptance test case, then conceive of
some units tests and some code, designing as they go.

In a nutshell, I was talking about that portion of design that occurs
when a good programmer goes to work figuring out just *how* she will
implement that method A with parameters x and y and a failing test
case that says it should act suchlike... Certainly TDD is not as
much about the more traditional design, the implementation of which
you refer to above.
TDD may surely help defining the internal structures and algorithms
within a single component, of course, if that's what you mean by design.
Yep.. saw this while pruning your text. Had I read more thoroughly
the first time it would have saved all that typing, which I'm now
loathe to remove. :-(

tried TDD, they are unlikely to lump it in with simple "unit testing"
as it has other properties that aren't obvious on the surface.

It sure beats "retrofitting" unit tests post facto. But I'm not sure
what properties you have in mind here; care to expand?

You've forgotten them at the moment, but I know you know about those
properties such as how TDD *forces* testability on the design/
implementation, and thus improves modularity, how it greatly reduces
the incentive and opportunity to gold-plate, how the most critical
tests are run hundreds or thousands of times during a project instead
of a handful of times just prior to shipping, and so forth.

was leading worked on code that, if I recall, was somewhat over
100,000 lines of Python code including tests. I don't recall
whether that number was the largest piece, or combining several
separate applications which ran together but in a distributed
system...

I think this qualifies as large, assuming the separate applications had
to cooperate reasonably closely (i.e. acting as "components", even
though maybe in separate processes and "only" touching on the same
database or set of files or whatever).

It was a true distributed system, so yes the components
closely cooperated. Acceptance tests actually ran both pieces
simultaneously, for the more complex tests, and in some few cases
even involved a simulator of the 16-bit embedded devices so that
the test case spanned four levels (web browser, server, third
piece, and the simulator for smaller gadgets). The simulator, of
course, was written in Python...

-Peter

Jul 18 '05 #33

Alex Martelli

GerritM <gm*****@worldonline.nl> wrote:
...

these reasons that Python is a big plus during system integration of
large

Not as much as one might hope, in my experience. Protocol Adaptation
_would_ help (see PEP 246), but it would need to be widely deployed.

I think that I understand how PEP 246 might be an improvement over the
current situation. However, I think that Python 2.3 capabilities already
result in smaller programs and presumably less modules than their equivalent

If you are aiming at a given fixed amount of functionality, yes: smaller
programs, and fewer modules (not in proportion, because each module
tends to be smaller). Modules aren't really the problem in _system
integration_, though; the unit that's developed together, tested
together, released together, is something a bit less definite that is
sometimes called a "component". It could be a module, more likely it
will be a small number of modules, perhaps grouped into a package.

One of my ideas is that a component needs to be cohesive and coherent.
I'm not alone in thinking that, at any rate. Therefore, the number of
components in a system with a given number of FP is weakly affected by
the language level of the chosen implementation language[s], because
each component cannot/shouldn't really have more than X function points,
even if using a very high level language means each component is
reasonably small. To get concrete, already in my previous post I gave
some numbers (indicative ones, of course): 200-300 FP per component,
meaning about 2k-3k SLOCs in Python (functional _application_ code, net
of tests, instrumentation, docs, comments, etc -- about 6k-7k lines as
wc counts them might be a reasonable rules of thumb, about half of them
being tests).

So, if you're building a 5000-FP system, you're going to end up with
about 20 components to integrate -- even though in Python that means 50k
lines of application code, and in Java or C++ it might well be 200k or
more. The design problem (partitioning) and the system integration may
end up being in the same order of magnitude, or the Python benefit might
be 20%, 30% tops, nothing like the 4:1 or 5:1 advantage you get in the
coding and testing of the specific single components.

My numbers may well be off (I'm trying to be concrete because it's too
easy to handwave away too much, in this field;-) but even if you double
component size and thus halve number of components in each language the
relative ratio remains the same. Python may gain some advantage by
making components that are a bit richer than the optimal size for Java
or C++ coded ones, but it's still not a many-times-to-one ratio as it is
for the pure issue of coding, in my experience.

I haven't touched Java for centuries, ehh years). My assumption is that
integration problems are at least proportional with the implementation size
(in kloc). So my unproven hypothesis is that since Python programs tend to
be smaller than their Java equivalent that the integration problems are
smaller, purely due to the size.
This is the crux of our disagreement. For a solid component built by
TDD, it's a second-order issue, from the POV of integrating it with the
other components with which it must interact in the overall system, how
big it is internally: the first order issue is, how rich is the
functionality the component supplies to other components, consumes from
them, internally implements. Integrating two components with the same
amount of functionality and equivalent interfaces between them, assuming
they're both developed solidly wrt the specs that are incarnated in each
component's unit-tests, is weakly dependent on the level of their
implementation languages.

Maybe I'm taking for granted a design approach that requires system
functionality to be well-partitioned among components interacting by
defined interfaces. But that's not a Python-specific issue: that's what
we were doing, albeit without a fully developed "ideology" well
developed to support it, when in the 2nd half of the '90s Lakos'
milestone book (whose title is echoed in this thread's subject) arrived
to confirm and guide our thinking and practice on the subject. I'm sure
_survivable_ large systems must be developed along this kind of lines
(with many degrees of variation possible, of course) in any language.

In the same system mentioned above we build our own instrumentation. The
main part was based on insering a small piece of adminstrative code at every
object creation and deletion. This Object Instantation Tracing proved to be ... The investment in the tools mentioned above were relatively small. However,
this works only if the entire system is based on the same architectural
rules.
Well, this last sentence might be the killer, since it looks like it
will in turn kill the project's ability to reuse the huge amount of good
code that's out there for the taking. If you have to invasively modify
the code you're reusing, reuse benefits drop and might disappear.

So I want instrumentation that need not be in the Python sources of
application and library and framework components (multiframework reuse
is also a crux for PEP 246), much as I have for coverage or profiling.
If all it takes is hacking on the Python internals to provide a mode
(perhaps a separate compilation) that calls some sys.newhook at every
creation, sys.delhook at every deletion, etc, then that would IMHO be a
quite reasonable price to pay, for example.

The additional challenge of Python relative to Objective-C is its garbage
collection. This provides indeed a poorly predictable memory behavior.
Obj-C uses mark-and-sweep, right? Like Java? I'm not sure why
(reference counting bugs in badly tested extensions apart) Python's mix
of RC normally plus MS occasionally should be a handicap here.

Some of the design aspects mentioned here are described in this chapter of
my PhD thesis:
http://www.extra.research.philips.co...ualViewPaper.p
df

Tx, I'll be happy to study this.
Alex

Jul 18 '05 #34

Aahz

In article <10********************@f14g2000cwb.googlegroups.c om>,
Jonathan Ellis <jb*****@gmail.com> wrote:

What is the biggest system you have built with python personally? I'm
happy to be proven wrong, but honestly, the most enthusiastic "testing
solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.

So you're saying that both attributes are necessary? (We're essentially
three programmers, but the codebase seems to be on the order of 150kloc,
about 2/3 of which is Python and the rest is HTML templates. I didn't
bother doing an exact check 'cause I'm in the middle of something else.)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

WiFi is the SCSI of the 21st Century -- there are fundamental technical
reasons for sacrificing a goat. (with no apologies to John Woods)

Jul 18 '05 #35

Alan Gauld

On 22 Oct 2004 19:03:02 -0400, aa**@pythoncraft.com (Aahz) wrote:

solves all my problem" people I have seen haven't worked on anything
"large" -- and my definition of large agrees with Alex's; over 100
kloc, more than a handful of developers.
So you're saying that both attributes are necessary?

I think they are because both people and code issues arise on
'large' projects (see the Mythoinal Man Month for examples
of each type), although all things are relative. Our local
definition of project size is:

< 100Kloc = small
100K-1Mloc = Medium 1Mloc = large

We try to keep the large projects to less than 10 at any one
time...

Staffing sizes are 1-6 on small projects
4-30 on medium and typically 30-500 on large ones
(I'd guess most large projects are actually around 2-3 MLoc
and have about 60-100 developers (inc dedicated testers).

Our most common project size is 200-300K with about 10-20
developers. (and the preferred methodology is DSDM) We probably
have about 30-50 such projects running at any one time.

On that scale I use Python for prototyping "components" on
the medium-large stuff but it all gets built in C++ or Java.
The small projects could be in Perl, VB/ASP, PL/SQL or Java.
(Sadly Python is not an approved language for production -
yet...I'm working on it :-)

Alan G
Author of the Learn to Program website
http://www.freenetpages.co.uk/hp/alan.gauld/tutor2

Jul 18 '05 #36

Jack Diederich

On Sat, Oct 23, 2004 at 04:31:56PM +0000, Alan Gauld wrote:

< 100Kloc = small
100K-1Mloc = Medium
1Mloc = large

We try to keep the large projects to less than 10 at any one
time...

Staffing sizes are 1-6 on small projects
4-30 on medium and typically 30-500 on large ones
(I'd guess most large projects are actually around 2-3 MLoc
and have about 60-100 developers (inc dedicated testers).

Our most common project size is 200-300K with about 10-20
developers. (and the preferred methodology is DSDM) We probably
have about 30-50 such projects running at any one time.

Holy such-and-such, how many developers do you have? and
isn't it more like thirty+ companies under one roof?

I've mainly worked for dot-coms (and most of them startups)
but your coordination overhead must be just staggering. I work/worked
for small companies because I prefer it, but whoa...

-Jack

Jul 18 '05 #37

Is there a "Large Scale Python Software Design" ?

Similar topics