By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,519 Members | 1,792 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,519 IT Pros & Developers. It's quick & easy.

Strings and Lists

P: n/a
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.
Jul 19 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Hello Tom,

I think it is more efficient if we can use list (with True,False)
member to do genetics algorithms. Of course a lot of works to do to
change from string binary into boolean list.

I do programming genetics algorithms in C# I guess I have to modify my
program also because my old program use binary string manipulation.

Sincerely Yours,
Pujo

Jul 19 '05 #2

P: n/a
Hi,
I not sure what sorts of operations you plan to do. But if you
intend to use fixed length arrays or even carrying out repetetive
operations. You should probably look at numeric
http://numeric.scipy.org/
On 18 Apr 2005 04:42:17 -0700, Tom Longridge <to**********@gmail.com> wrote:
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.
--
http://mail.python.org/mailman/listinfo/python-list

--
http://blogs.applibase.net/sidharth
Jul 19 '05 #3

P: n/a
On 18 Apr 2005 04:42:17 -0700, Tom Longridge <to**********@gmail.com> wrote:
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.
--
http://mail.python.org/mailman/listinfo/python-list


Tom,

it seems to me that, if efficiency is your main goal, you should store
your data as a list of integers, and use the bit-twiddling operators
to get at your data. These should be *very* fast, as well as memory
efficient.

Peace
Bill Mill
bill.mill at gmail.com
Jul 19 '05 #4

P: n/a
Tom Longridge wrote:
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.


It depends more on the operations you are performing, and
more importantly *which of those you have measured and
found to be slow*, than on anything else.

If, for example, you've got a particular complex set of
slicing, dicing, and mutating operations going on, then
that might say use one type of data structure.

If, on the other hand, the evaluation of the fitness
function is what is taking most of the time, then you
should focus on what that algorithm does and needs
and, after profiling (to get real data rather than
shots-in-the-dark), you can pick an appropriate data
structure for optimizing those operations.

Strings seem to be what people pick, by default, without
much thought, but I doubt they're the right thing
for the majority of GA work...

-Peter
Jul 19 '05 #5

P: n/a
In article <d7**************************@posting.google.com >,
to**********@gmail.com (Tom Longridge) wrote:
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)


I make no representations about how much mileage you would
get out of it, but if character data suits your purposes and
you just need a mutable array - in principle, I think it would
be more efficient to use an array. Like,
import array
a = array.array('c', strdata)

As I understand it, this object simply contains character data,
not a list of 1 character string objects, so it's much more
economical to store and examine.

Donn Cave, do**@u.washington.edu
Jul 19 '05 #6

P: n/a
Thank you all very much for your responses. It's especially reassuring
to hear about other Python GA's as I have had some scepticism about
Python's speed (or lack of it) being too big a problem for such an
application.

With regard to using numeric, arrays or integer lists -- I didn't
mention that these strings can also contain wild cards (so I suppose
it's not really binary -- sorry). This is traditionally done using a
'#' symbol, but I was imagining using a value of None in a boolean list
to represent this. Also there is currently a fair bit of research going
into other representations (floating-point values, paired values etc)
so I was hoping to be able to keep my framework extensible for the
future.

Many thanks again for your help. I will ``take the plunge'' and give
the boolean list a go I think!

Jul 19 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.