473,243 Members | 1,578 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,243 software developers and data experts.

Generating a large random string

Aloha,
i wanted to ask another problem, but as i started to build an example...

How to generate (memory and time)-efficient a string containing
random characters? I have never worked with generators, so my solution
at the moment is:

import string
import random
random.seed(14)
d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)
print s

which is feasible for the 3000, but i need lengths in the range
10.000 to 1.000.000.

Hoping for an answer and wishing a happy day
LOBI
Jul 18 '05 #1
21 23037
How are you planning on using this large random string? If you are just
planning on printing, you can write:

for i in xrange(100000):
sys.stdout.write(random.choice(string.letters))

and it will dump lots of garbage to the screen pretty quickly. If you're
looking to do something else with the data, be specific about what you're
trying to do.

Chris

"Andreas Lobinger" <an**************@netsurf.de> wrote in message
news:40***************@netsurf.de...
Aloha,
i wanted to ask another problem, but as i started to build an example...

How to generate (memory and time)-efficient a string containing
random characters? I have never worked with generators, so my solution
at the moment is:

import string
import random
random.seed(14)
d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)
print s

which is feasible for the 3000, but i need lengths in the range
10.000 to 1.000.000.

Hoping for an answer and wishing a happy day
LOBI

Jul 18 '05 #2
"Andreas Lobinger" <an**************@netsurf.de> wrote in message
news:40***************@netsurf.de...
[snip]
How to generate (memory and time)-efficient a string containing
random characters? I have never worked with generators, so my solution
at the moment is:

import string
import random
random.seed(14)
d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)
print s

which is feasible for the 3000, but i need lengths in the range
10.000 to 1.000.000.

[snip]

There are several things to try, but here's one attempt that's
relatively fast but whose time (and size) still grow linearly
with the size of n:
from string import letters
from random import choice, sample, seed

# Note: should probably use timeit.py but this will do ...
from time import clock as now

n = 1000000

# your approach
seed(14)
start = now()
s = ''.join([choice(letters) for i in xrange(n)])
took = now() - start
print "old way n: %d took: %2.2fs"%(n, took)

# different approach
seed(14)
# add 1 so population > sample size (n)
factor = n/len(letters) + 1
start = now()
s = ''.join(sample(letters*factor, n))
took = now() - start
print "new way n: %d took: %2.2fs"%(n, took)
# Output: tested on Windows 98 500+mhz 128MB
old way n: 1000000 took: 23.94s
new way n: 1000000 took: 8.90s

There's a start ...

Sean
Jul 18 '05 #3
[snip]

# different approach
seed(14)
# add 1 so population > sample size (n)
factor = n/len(letters) + 1
start = now()
s = ''.join(sample(letters*factor, n))
took = now() - start
print "new way n: %d took: %2.2fs"%(n, took)

[snip]

There's a problem with this method as it appears
random.sample is done without replacement, e.g.
from random import sample
sample(range(10), 10) [9, 5, 3, 2, 0, 8, 4, 1, 6, 7]


Sorry about that,
Sean
Jul 18 '05 #4
Here's a sample with replacement, where k can be greater than
len(population):

from random import random, seed
from string import letters
from time import clock as now

# derived from random.sample code ... scarcely tested
def sample(population, k):
"""Chooses k random elements from a population sequence. """
n = len(population)
result = [None] * k
for i in xrange(k):
j = int(random() * n)
result[i] = population[j]
return result

n = 1000000
seed(14)
start = now()
s = ''.join(sample(letters, n))
took = now() - start
print "sample with replacement n: %d took: %2.2fs"%(n, took)

# Output
sample with replacement n: 1000000 took: 7.68s


Even faster than before, and more correct to boot, heh.

Jul 18 '05 #5
I'm pretty sure that the two methods you mention don't give results with
the same distribution.

Instead of letters, let's choose from "01", and choose a string of
length 2.

def f1():
return ''.join([choice("01") for i in xrange(2)])

def f2():
return ''.join(sample("01" * 2, 2))

def test():
for f in (p1, p2):
p = {}
for i in range(1000):
o = f()
p[o] = p.get(o, 0) + 1
print p

[jepler@parrot src]$ ./python /tmp/ross.py
{'11': 25212, '10': 24904, '00': 24996, '01': 24888}
{'11': 16800, '10': 33289, '00': 16705, '01': 33206}

As you can see p1 picks evenly from the 4 alternatives, while p2 favors
the outcomes that have different contents (01 and 10) over the ones that
are a repetition of the same symbol (00 and 11).
Jeff

Jul 18 '05 #6
Andreas Lobinger <an**************@netsurf.de> writes:
How to generate (memory and time)-efficient a string containing
random characters? I have never worked with generators, so my solution
at the moment is:
What do you want to do with the random characters, and what OS are you
running on?
import string
import random
If you're using the random characters for some security-related purpose,
you shouldn't use the random module.

If you're running Linux or *BSD, the simplest way to get good quality
random characters is something like

s = open("/dev/urandom").read(3000)

or however many characters you want. For Windows, it's harder.
random.seed(14)
d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)
print s which is feasible for the 3000, but i need lengths in the range
10.000 to 1.000.000.


Try something like:

import array
from random import randint
d = array.array('B')
for i in xrange(1000000):
d.append(randint(0,255))
s = d.tostring()
Jul 18 '05 #7

"Jeff Epler" <je****@unpythonic.net> wrote in message
news:ma*************************************@pytho n.org...
I'm pretty sure that the two methods you mention don't give results with
the same distribution.

[snip]

Yep. I made an error choosing random.sample() as its sampling is
done without replacement. See my subsequent post with a sample
function that uses replacement and allows for sampling beyond the
population size. That version has a better distribution:

{'11': 266, '10': 251, '00': 245, '01': 238}
Jul 18 '05 #8
"Paul Rubin" <http://ph****@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
[snip]

import array
from random import randint
d = array.array('B')
for i in xrange(1000000):
d.append(randint(0,255))
s = d.tostring()


Hi.

Actually, that is pretty slow: 42.79s (on my machine)

The OP's solution took 23.94s
My sampling with replacement took 7.68s

No doubt someone else will come up with something faster.

I'd be interested to see how

s = open("/dev/urandom").read(3000)

compares, and, if better, whether something similar can
be done on Windows.


Jul 18 '05 #9
"Sean Ross" <sr***@connectmail.carleton.ca> writes:
I'd be interested to see how

s = open("/dev/urandom").read(3000)

compares, and, if better, whether something similar can
be done on Windows.


On Linux, this is almost instantaneous. Reading 1 megabyte takes
under a second on my 700 mhz P3.
Jul 18 '05 #10
"Paul Rubin" <http://ph****@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
"Sean Ross" <sr***@connectmail.carleton.ca> writes:
I'd be interested to see how

s = open("/dev/urandom").read(3000)

compares, and, if better, whether something similar can
be done on Windows.


On Linux, this is almost instantaneous. Reading 1 megabyte takes
under a second on my 700 mhz P3.


And when you've done

s = open("/dev/urandom").read(1000000)

is s a string containing one million letters [a-zA-Z] and no other
types of characters, as the OP is looking for?

I've never used /dev/urandom, so I don't know what kind of stuff
is being read from there - but I suspect further processing would
be needed to meet the OP's requirements. I'll check it out.

Thanks
Sean
Jul 18 '05 #11
"Sean Ross" <sr***@connectmail.carleton.ca> writes:
s = open("/dev/urandom").read(1000000)

is s a string containing one million letters [a-zA-Z] and no other
types of characters, as the OP is looking for?


Oh oops, I didn't catch that part. It's a million random bytes.
Sorry.
Jul 18 '05 #12
"Sean Ross" <sr***@connectmail.carleton.ca> writes:
And when you've done

s = open("/dev/urandom").read(1000000)

is s a string containing one million letters [a-zA-Z] and no other
types of characters, as the OP is looking for?


Oops, per other post, it gives strings of bytes and needs filtering.
The following runs in about 1.2 seconds on my machine, but has an
small (infinitesimal) chance of failure:

import string,array,time
t=time.time()
ttab = string.letters*4 + '\0'*48
a = array.array('B', open("/dev/urandom").read(1500000).translate(ttab))
a = array.array('B', filter(abs,a)).tostring()[:1000000]
print time.time()-t
Jul 18 '05 #13
In article <40***************@netsurf.de>,
Andreas Lobinger <an**************@netsurf.de> wrote:
How to generate (memory and time)-efficient a string containing
random characters?


Find a perl file on your system and read it?
Jul 18 '05 #14
Aloha,

Andreas Lobinger schrieb:
How to generate (memory and time)-efficient a string containing
random characters? d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)


1) Sorry for starting a new thread, but there were calls for more spec.
2) Thanks for all replies so far.

To be more specific about
- OS/availability of /dev/random
Im looking for an elegant solution for all types. At the moment i'm
developing in parallel on a slow (250MHz) i86/Win95 notebook, a typical
i86/linux box and a couple of SUNs (up to 16GB main-mem...).

- using random / crypto-safe-strings
The use of random is intended, because it generates a known sequence.
I can store the seed an re-generate the sequence any time, and afaik
the sequence is OS/machinetype independent.

As i wrote in the original post, the random string is only a prerequisite
for another question.

- What to do with the string
I use the sting as a testpattern for a tokenizer/parser. So the usage
of string.letters is only used as an example here.

The main question was:
How to concat a string without contantly reallocating it.

Wishing a happy day
LOBI
Jul 18 '05 #15
Paul Rubin wrote:
Oops, per other post, it gives strings of bytes and needs filtering.
The following runs in about 1.2 seconds on my machine, but has an
small (infinitesimal) chance of failure:

import string,array,time
t=time.time()
ttab = string.letters*4 + '\0'*48
a = array.array('B', open("/dev/urandom").read(1500000).translate(ttab))
a = array.array('B', filter(abs,a)).tostring()[:1000000]
print time.time()-t


from __future__ import division
import array, random, string, sys

identity = string.maketrans("", "")
ld = 256//len(string.letters)
rest = 256 % len(string.letters)
ttab = string.letters*ld + '\0'*rest
dtab = identity[-rest:]

# a fully functional variant of your approach
def randstrUnix(length, extra=1.25):
a = open("/dev/urandom").read(int(length*extra)).translate(ttab, dtab)
while len(a) < length:
a += randstrUnix(length-len(a), 1.3)
return a[:length]

twoletters = [c+d for c in string.letters for d in string.letters]

# the fastest pure-python version I was able to produce
def randstrPure(length):
r = random.random
n = len(twoletters)
l2 = length//2
lst = [None] * l2
for i in xrange(l2):
lst[i] = twoletters[int(r() * n)]
if length & 1:
lst.append(random.choice(string.letters))
return "".join(lst)

The timings:

$ timeit.py -s"import randchoice as r" "r.randstrUnix(1000000)"
10 loops, best of 3: 2.29e+05 usec per loop
$ timeit.py -s"import randchoice as r" "r.randstrPure(1000000)"
10 loops, best of 3: 6.51e+05 usec per loop

A factor of 3 would hardly justify the OS-dependency in most cases.
Note that using twoletters[int(r() * n)] as seen in Sean Ross' version
instead of random.choice(twoletters) doubled the speed.

Peter
Jul 18 '05 #16
If you're looking to have a string that you can write to in a stream, like a
file, you might try StringIO
import random, string
from cString import StringIO s = StringIO()
for i in xrange(1000000): s.write(random.choice(string.letters))
len(s.getvalue())
1000000

This works fine for strings up to 10 MB, after that you might want to
consider stashing your data to disk and reading/writing in chunks.

Chris

"Andreas Lobinger" <an**************@netsurf.de> wrote in message
news:40***************@netsurf.de... Aloha,

Andreas Lobinger schrieb:
How to generate (memory and time)-efficient a string containing
random characters?

d = [random.choice(string.letters) for x in xrange(3000)]
s = "".join(d)


1) Sorry for starting a new thread, but there were calls for more spec.
2) Thanks for all replies so far.

To be more specific about
- OS/availability of /dev/random
Im looking for an elegant solution for all types. At the moment i'm
developing in parallel on a slow (250MHz) i86/Win95 notebook, a typical
i86/linux box and a couple of SUNs (up to 16GB main-mem...).

- using random / crypto-safe-strings
The use of random is intended, because it generates a known sequence.
I can store the seed an re-generate the sequence any time, and afaik
the sequence is OS/machinetype independent.

As i wrote in the original post, the random string is only a prerequisite
for another question.

- What to do with the string
I use the sting as a testpattern for a tokenizer/parser. So the usage
of string.letters is only used as an example here.

The main question was:
How to concat a string without contantly reallocating it.

Wishing a happy day
LOBI

Jul 18 '05 #17
> > How to generate (memory and time)-efficient a string containing
random characters?


It depends how random you need it to be.

The approach I take in my test harness (which generates a CSV file
with random contents) is to create a 30,000 character string the
old fashioned way:

"".join([random.choice(item) for i in range(30000)])

item is a string of which characters to choose from (some fields
are phone numbers, some are names, some are email addresses etc).

To generate a random string I then take random slices of that
30,000 character object. If I need longer strings, I glue
the random slices together (in a cStringIO).

Roger
Jul 18 '05 #18
> This works fine for strings up to 10 MB, after that you might want to
consider stashing your data to disk and reading/writing in chunks.


Or he could use mmap. It handles all of that for you, is mutable, and
can be used as a replacement for strings in most places.

- Josiah
Jul 18 '05 #19
I suppose this would be cheating?
import random, sys
class ReallyRandomString: "Hey Rocky, watch me pull a random string out of my hat!"
def __init__(self, item):
self.__item = item

def __getitem__(self, key):
if type(key) is int:
return random.choice(self.__item)
elif type(key) is slice:
return ''.join([random.choice(self.__item)
for i in xrange(*key.indices(sys.maxint))])
else:
raise TypeError('Time to get a new hat!')

ReallyRandomString('spam')[1000:1060] 'mapsmpspapaaamppmapammaasapmaspmspmpppmpmamsmpsas pamaamssspm'

Chris

"Roger Binns" <ro****@rogerbinns.com> wrote in message
news:v9************@home.rogerbinns.com... How to generate (memory and time)-efficient a string containing
random characters?


It depends how random you need it to be.

The approach I take in my test harness (which generates a CSV file
with random contents) is to create a 30,000 character string the
old fashioned way:

"".join([random.choice(item) for i in range(30000)])

item is a string of which characters to choose from (some fields
are phone numbers, some are names, some are email addresses etc).

To generate a random string I then take random slices of that
30,000 character object. If I need longer strings, I glue
the random slices together (in a cStringIO).

Roger

Jul 18 '05 #20
Sean Ross wrote:
I'd be interested to see how

s = open("/dev/urandom").read(3000)

compares, and, if better, whether something similar can
be done on Windows.


Works on my Windows boxes:

chris@chrish [501]: uname -a
CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown unknown
Cygwin
chris@chrish [502]: python
Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
[GCC 3.3.1 (cygming special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
s = open("/dev/urandom").read(3000)
len(s) 3000 s[0] ']' s[1]

'W'

;-)

--
Chris Herborth ch****@cryptocard.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.
Jul 18 '05 #21
You're not running Windows, you've been infected with the Cygwin virus, a
fiendish creation by anti-Win32 API hackers that irreparably damages the
proprietary nature of the Windows platform , and worse yet, could
potentially allow applications to be built without Visual Studio, using a
GNU tool chain. Better format your drive immediately before it spreads.

Chris

"Chris Herborth" <ch****@cryptocard.com> wrote in message
news:ys********************@news20.bellglobal.com. ..
Sean Ross wrote:
I'd be interested to see how

s = open("/dev/urandom").read(3000)

compares, and, if better, whether something similar can
be done on Windows.
Works on my Windows boxes:

chris@chrish [501]: uname -a
CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown

unknown Cygwin
chris@chrish [502]: python
Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
[GCC 3.3.1 (cygming special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = open("/dev/urandom").read(3000)
>>> len(s) 3000 >>> s[0] ']' >>> s[1]

'W'

;-)

--
Chris Herborth ch****@cryptocard.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.

Jul 18 '05 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Phipps Xue | last post by:
Hello All, I wanna write an small game which need to generate a random string each time. It should go like this. TCHAR tmpStr; unsigned int nLen=50; TCHAR seedStr=_T("dummy seed"); int...
4
by: Jonathan Burd | last post by:
Greetings everyone, Here is a random string generator I wrote for an application and I'm wondering about the thread-safety of this function. I was told using static and global variables cause...
1
by: LBX | last post by:
I am looking for a large random number generator or ways to take smaller random numbers into large ones. The target size of these random numbers is 512 bits. Any links or codes are nice as I...
1
by: Jen2 | last post by:
hello, help. i'm trying to generate a random string of characters of user defined length, with or without repetitions. i've just started using C and have absolutely no idea what i'm doing.
38
by: Andrea | last post by:
Hi, Anyone could me suggest how to create a function that generates a random string? The function should be: int _RandomString(char** str,int len); so, it takes an empty string str and it puts...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.