473,785 Members | 3,285 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Standard Forth versus Python: a case study

I realized that I have a little job on the table that is a fine test of
the Python versus Standard Forth code availability and reusability issue.

Note that I have little experience with either Python or Standard Forth
(but I have much experience with a very nonstandard Forth). I've noodled
around a bit with both gforth and Python, but I've never done a serious
application in either. In my heart, I'm more of a Forth fan: Python is a
bit too much of a black box for my taste. But in the end, I have work to
get done.

The problem:

I have a bunch of image files in FITS format. For each raster row in
each file, I need to determine the median pixel value and subtract it
from all of the pixels in that row, and then write out the results as
new FITS files.

This is a real problem I need to solve, not a made-up toy problem. I was
originally thinking of solving it in C (I know where to get the pieces
in that language), but it seemed like a good test problem for the Python
versus Forth issue.

I looked to import FITS reading/writing, array manipulation, and median
determination. From there, the solution should be pretty easy.

So, first look for a median function in Python. A little googling finds:

http://www.astro.cornell.edu/staff/loredo/statpy/

Wow! This is good stuff! An embarrassment of riches here! There are even
several FITS modules, and I wasn't even looking for those yet. And just
for further gratification, the page's author is an old student of mine
(but I'll try not to let this influence my attitude). So, I followed the
link to:

http://www.nmr.mgh.harvard.edu/Neura...ry/python.html

From there, I downloaded stats.py, and the two other modules the page
says it requires, and installed them in my site-packages directory. Then
"from stats import median" got me a function to approximately determine
the median of a list. It just worked. The approximation is good enough
for my purposes here.

Pyfits required a little more resourcefulness , in part because STSCI's
ftp server was down yesterday, but I got it installed too. It helps that
when something is missing, the error message gives you a module name. It
needs the numarray module, so I got array manipulation as a side effect.

I haven't finished the program, but I've tried out the pieces and all
looks well here.

OK, now for Forth. Googling for "forth dup swap median" easily found:

http://www.taygeta.com/fsl/library/find.seq

At first blush, this looked really good for Forth. The search zeroed in
on just what I wanted, no extras. The algorithm is better than the one
in the Python stats module: it gives exact results, so there's no need
to check that an approximation is good enough. But then, the
disappointment came.

What do you do with this file? It documents the words it depends on, but
not where to get them. I'm looking at a significant effort to assemble
the pieces here, an effort I didn't suffer through with Python. So, my
first question was: "Is it worth it?".

The answer came from searching for FITS support in Forth. If it exists
in public, it must be really well hidden. That's a "show stopper", so
there was no point in pursuing the Forth approach further.

In the end, it was like comparing a muzzle-loading sharpshooter's rifle
with a machine gun: Forth got off one really good shot, but Python just
mowed the problems down.

The advocates of the idea that Standard Forth has been a boon to code
reusability seem mostly to be people with large private libraries of
Forth legacy code. No doubt to them it really has been a boon. But I
think this little experiment shows that for the rest of us, Python has a
published base of reusable code that puts Forth to shame.

--
John Doty, Noqsi Aerospace, Ltd.
--
Specialization is for robots.
Oct 11 '06
42 5463
On 12 Oct 2006 04:40:32 -0700, Paul Rubin
<"http://phr.cx"@nospam. invalidwrote:
Paul Rubin <http://ph****@NOSPAM.i nvalidwrites:
fun median {
var x = 0.
while( *p++) {
if( (*p) x) x = *p.
}
return x.
}
I count two variables, p and x.

Also, that finds the maximum, not the median. I had stopped examining
it after seeing it used more than one variable.
Um... isn't 'p' the list in question? There is only one variable used
for the calculation of the return value (yes, it's a maximum), and
that's 'x'.
--

# p.d.
Oct 12 '06 #21
Peter Decker wrote:
>Also, that finds the maximum, not the median. I had stopped examining
it after seeing it used more than one variable.

Um... isn't 'p' the list in question?
no, it's a pointer to the current item in the list.

</F>

Oct 12 '06 #22
Ian McConnell wrote:
If you can use Psyco and your FITS lines are really long (well, maybe
too much, the treshold if about >~3000 in my PC) you can use something
like this instead the builtin timsort:
http://aspn.activestate.com/ASPN/Coo.../Recipe/466330
(To compute the median on a image line the median that uses sort is
probably better in most cases, expecially if you use the built in sort
of numerical libraries.)

sort() sorts all of the data, but you're only after one or two numbers, so
the MODFIND method may be faster for the median:
The modified quicksort I have shown in the cookbook (466330) is O(n)
too, and it modifies the list in place, so you can apply it twice for
lists of even len.

Bye,
bearophile

Oct 12 '06 #23
<id****@gmail.c omwrote in message
news:11******** **************@ i42g2000cwa.goo glegroups.com.. .
Paul McGuire wrote:
><id****@gmail. comwrote in message
news:11******* **************@ h48g2000cwc.goo glegroups.com.. .
[snip]

no sort() is needed to calculate the median of a list.

you just need one temp var.

Ok, I'll bite. How do you compute the median of a list using just a
single
temp var?

-- Paul

hi Paul; well when this was a stats-class assignment (back when pascal
was popular :) i just stepped through the vector and compared it

(pseudo-code)

ptr p = [with values].

fun median {
var x = 0.
while( *p++) {
if( (*p) x) x = *p.
}
return x.
}

of course, pascal is more verbose but that's median()
No, that's the maximum. The median value is the value that is in the middle
of the list when the list is sorted. Many analyses prefer median to mean
(also known as "average") because the median is less sensitive to wild
outlier points.

My original question was in response to your post, that sort() wasn't
required but only a temp variable. I am very interested in seeing your
solution that does not require the data to be sorted. (This is not just an
academic exercise - given a large historical data set, sorting the data is
one of the costliest parts of computing the median, and I would greatly
appreciate seeing an alternative algorithm.)

-- Paul
Oct 12 '06 #24
Paul McGuire wrote:
My original question was in response to your post, that sort() wasn't required but only a temp
variable. I am very interested in seeing your solution that does not require the data to be
sorted. (This is not just an academic exercise - given a large historical data set, sorting the
data is one of the costliest parts of computing the median, and I would greatly appreciate seeing
an alternative algorithm.)
if you absolutely definitely cannot afford to modify or copy the input data set, but can
read the data sequentially multiple times reasonably fast, you can do what's basically a
binary search for the median, by counting how many values you have that's above or
below the current guess, and repeating until you find the right value. see e.g.

http://ndevilla.free.fr/median/median/src/torben.c

</F>

Oct 12 '06 #25
"Paul McGuire" <pt***@austin.r r._bogus_.comwr ites:
My original question was in response to your post, that sort() wasn't
required but only a temp variable. I am very interested in seeing your
solution that does not require the data to be sorted. (This is not just an
academic exercise - given a large historical data set, sorting the data is
one of the costliest parts of computing the median, and I would greatly
appreciate seeing an alternative algorithm.)
There are well known algorithms for finding the median in expected
O(n) time, the most straightforward of which is a modified quicksort.
You do the traditional quicksort pivot step, figure out which
partition the median is in, and recurse on just that partition instead
of on both of them. Expected running time is n + n/2 + n/4 + ...
which is approx 2n.

Tarjan discovered a guaranteed O(n) algorithm in the 1970's(?) whose
operation is much different and quite complex. But all of these need
more than one temp var. See an algorithms book like CLRS or Knuth
for more info.
Oct 12 '06 #26
Paul Rubin <http://ph****@NOSPAM.i nvalidwrites:
Tarjan discovered a guaranteed O(n) algorithm in the 1970's(?) whose
operation is much different and quite complex. But all of these need
more than one temp var. See an algorithms book like CLRS or Knuth
for more info.
Ehh, make that Blum, Floyd, Pratt, Rivest, and Tarjan, and the main
different part is selecting the pivot, plus the complexity analysis.

http://www.cs.cmu.edu/afs/cs.cmu.edu...s/lect0907.pdf

http://en.wikipedia.org/wiki/Selection_algorithm (see "median of
medians algorithm")
Oct 12 '06 #27

"Fredrik Lundh" <fr*****@python ware.comwrote in message
news:ma******** *************** *************** @python.org...
Paul McGuire wrote:
>My original question was in response to your post, that sort() wasn't
required but only a temp variable. I am very interested in seeing your
solution that does not require the data to be sorted. (This is not just
an academic exercise - given a large historical data set, sorting the
data is one of the costliest parts of computing the median, and I would
greatly appreciate seeing an alternative algorithm.)

if you absolutely definitely cannot afford to modify or copy the input
data set, but can
read the data sequentially multiple times reasonably fast, you can do
what's basically a
binary search for the median, by counting how many values you have that's
above or
below the current guess, and repeating until you find the right value.
see e.g.

http://ndevilla.free.fr/median/median/src/torben.c

</F>
Thanks!

-- Paul
Oct 12 '06 #28
On 2006-10-12, Paul Rubin <httpwrote:
Tarjan discovered a guaranteed O(n) algorithm in the 1970's(?)
Huhn! I thought Tarjan was just the big bad evil guy in Bard's
Tale 2 who was creating eternal winter. I'm glad he also
contributed to our stock of *useful* algorithms.

--
Neil Cerutti
We will not have an all volunteer army. We *will* have an all
volunteer army. --George W. Bush
Oct 12 '06 #29
Paul McGuire wrote:
<id****@gmail.c omwrote in message
news:11******** **************@ i42g2000cwa.goo glegroups.com.. .
>Paul McGuire wrote:
>><id****@gmail .comwrote in message
news:11****** *************** @h48g2000cwc.go oglegroups.com. ..
[snip]

no sort() is needed to calculate the median of a list.

you just need one temp var.

Ok, I'll bite. How do you compute the median of a list using just a
single
temp var?

-- Paul
hi Paul; well when this was a stats-class assignment (back when pascal
was popular :) i just stepped through the vector and compared it

(pseudo-code)

ptr p = [with values].

fun median {
var x = 0.
while( *p++) {
if( (*p) x) x = *p.
}
return x.
}

of course, pascal is more verbose but that's median()

No, that's the maximum. The median value is the value that is in the middle
of the list when the list is sorted. Many analyses prefer median to mean
(also known as "average") because the median is less sensitive to wild
outlier points.

My original question was in response to your post, that sort() wasn't
required but only a temp variable. I am very interested in seeing your
solution that does not require the data to be sorted. (This is not just an
academic exercise - given a large historical data set, sorting the data is
one of the costliest parts of computing the median, and I would greatly
appreciate seeing an alternative algorithm.)
Here's a K&R C function I wrote almost 20 years ago. It's a general
purpose quantile. The essential idea is to choose an element at random
(thus mostly avoiding perverse behavior with already sorted data) and
divide the set into two pieces around it. Then you figure out which
piece contains the quantile you want, and what quantile it is within
that piece, and recurse. When you see enough identical elements in the
piece you're processing, it's done. In the extreme case you'll get down
to one element.

ixrand(n) generates a random integer in the range 0..n-1. I think that's
the only nonstandard function used.

The style is torqued by what Unisoft C could and couldn't optimize: I
wouldn't write it quite like that today. One of my students pointed out
that all of the recursion is tail recursion so it should be easy to
flatten. Left as an exercise to the reader...

Somewhere, in Knuth I think, I saw a somewhat similar algorithm that
promised a little better performance by estimating the median from a
sample of the data, breaking the data up there, and then looking for a
quantile (statistically guaranteed to be) close to the min or max of one
of the subsets.

It shouldn't be hard to recode in Python, Forth, or whatever. That
wasn't my purpose in the exercise that started the thread though: I
wanted to see if I could import modules good enough to do the job from
public sources. In Python I could, and the entire image processing
program is 15 lines. In Forth I couldn't.

Anyway, here it is:

/* Find the nth from the minimum value in an array */
/* Monte Carlo method intended for finding medians */
/* 2/13/85 jpd */

/* For random data, this routine takes about */
/* 2.6*numdata + O( log( numdata )) comparisons */
/* If the data is tightly clustered about the mean, */
/* there is a speedup; it may take as few as
/* 0.5*numdata comparisons. */
/* There is a slight penalty if the array is completely */
/* or partially sorted; it is at most 25%. */

/* NTH will be nthi, nths, etc., depending on DATATYPE */

NTH( data, numdata, n )
DATATYPE data[]; /* Data array (will be scrambled on return) */
int numdata; /* lemgth of data array */
int n; /* index if item to find:
1 <= n <= numdata */
{
register DATATYPE boundary, thisdata;
register DATATYPE *lowp, *highp;
DATATYPE v1, v2;
int nlowbin;

lowp = data; /* Init data pointer */

v1 = data[ ixrand( numdata )];
{
register DATATYPE v1r = v1;
int nc = 1 + numdata - n; /* "Complement " of n */

if( nc n )
highp = lowp + nc;
else
highp = lowp + n; /* Limit to test for done */

/* Scan for the first point which
doesn't match the boundary point.
If we encounter enough
matching points,
the boundary is the answer */
while( *lowp++ == v1r ) {
if( lowp >= highp ) return( v1r );
}
v2 = *--lowp; /* Back up to get point */
}

boundary = ( v1 >1 ) + ( v2 >1 ); /* Beware overflows */

highp = data + numdata; /* Now process the whole thing */
thisdata = *lowp; /* Prime the pump */

if( v2 < v1 ) { /* Bin 2 is low bin */
for( ; lowp < highp; thisdata = *lowp ) {
if( thisdata <= boundary ) { /* Bin 2 */
*lowp = *--highp; /* Exchange */
*highp = thisdata;
}
else ++lowp; /* Data point in right place */
}

nlowbin = numdata - ( lowp - data );
if( nlowbin >= n ) return( NTH( highp, nlowbin, n ));
else return( NTH( data, lowp - data, n - nlowbin ));

}
else { /* Primary bin is low bin */
for( ; lowp < highp; thisdata = *lowp ) {
if( thisdata boundary ) { /* Bin 2 */
*lowp = *--highp; /* Exchange */
*highp = thisdata;
}
else ++lowp; /* Don't move point */
}

nlowbin = ( lowp - data );
if( nlowbin >= n ) return( NTH( data, nlowbin, n ));
else return( NTH( highp, numdata - nlowbin, n - nlowbin ));
}
}


--
John Doty, Noqsi Aerospace, Ltd.
--
Specialization is for robots.
Oct 12 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

699
34238
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro capabilities, unfortunately. I'd like to know if it may be possible to add a powerful macro system to Python, while keeping its amazing syntax, and if it could be possible to add Pythonistic syntax to Lisp or Scheme, while keeping all of the...
4
2052
by: John Benson | last post by:
Since I've seen some Forth expertise/interest in evidence here, so I'll permit myself some observations on Lua/Python/Forth. Lua's scripting command syntax is pretty simple, though different from Python's. It is reference- and not variable-oriented (that is, "variables" are merely holders for references, not values) and smacks of Python in this regard. Functions are first-class objects; again, very similar to Python. The big...
38
3739
by: kbass | last post by:
In different articles that I have read, persons have constantly eluded to the productivity gains of Python. One person stated that Python's productivity gain was 5 to 10 times over Java in some in some cases. The strange thing that I have noticed is that there were no examples of this productivity gain (i.e., projects, programs, etc.,...). Can someone give me some real life examples of productivity gains using Python as opposed other...
38
4812
by: BORT | last post by:
Please forgive me if this is TOO newbie-ish. I am toying with the idea of teaching my ten year old a little about programming. I started my search with something like "best FREE programming language for kids." After MUCH clicking and high-level scanning, I am looking at Python and Forth. Both have advocates that say each is a great approach to learning computers. My programming classes were a long, long time ago in a land far, far...
8
1431
by: block111 | last post by:
I'm a student in a university and today we had a final exam in c++. Basicly, I was amazed by the amount of lame and invalid examples/questions. There was a question that in practice works but is a really big mistake (as far as I'm concerned). So, I'll need to prove why exactly it was wrong. Here is the example: int *p1, *p2, x = 3; p1 = new int; *p1 = 60;
43
5026
by: Steven T. Hatton | last post by:
Now that I have a better grasp of the scope and capabilities of the C++ Standard Library, I understand that products such as Qt actually provide much of the same functionality through their own libraries. I'm not sure if that's a good thing or not. AFAIK, most of Qt is compatable with the Standard Library. That is, QLT can interoperate with STL, and you can convert back and forth between std::string and Qt::QString, etc. Are there any...
2
1818
by: aum | last post by:
On Mon, 18 Sep 2006 06:16:29 -0700, Jake Barnes wrote: Thanks for that, Jake, and for your more balanced take on the issue of 'dressing up' Javascript to resemble the style of other languages like Python or Ruby. In Javascript, I see a lot of philosophical similarity to Forth. Both Forth and Javascript are extremely 'liberal' languages of immense expressive power, which lend themselves to highly idiomatic styles. Lua
6
2952
by: toton | last post by:
Hi, Anyone have a link to comparative study of different C++ compilers and how much they conform to C++ language standard? Most of the big platforms I know have GCC which well supports C++ standard. But mainly looking for compilers for small platforms like ARM, XScale, OMAP processors, and mobile OS's. I am not getting enough imformation which of the platforms HAVE a C++ compiler/cross compiler and how much they support C++ standard. It...
85
4874
by: fermineutron | last post by:
Some compilers support __asm{ } statement which allows integration of C and raw assembly code. A while back I asked a question about such syntax and was told that __asm is not a part of a C standard. My question now is: Is there a chance that such statement will become a part of C standard in the future? In some cases using asm language is the best way to acomplish some small task, hence integration of C and asm would greatly enhence C,...
0
9643
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10315
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10147
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10085
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7494
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5379
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3645
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.