473,699 Members | 2,615 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XORing long strings opimization?

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for c1 in s1:
for c2 in s2:
output += chr(ord(c1) ^ ord(c2))
return output

This way is very slow for large strings.
Anyone know of a better and faster way to do it?

Jul 18 '05
17 2275
Here is the result, if anyone wants it :)
It was quite quick indeed, It took me 10.3279999495 seconds to XOR
"a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
if anyone know how to make it even faster, please tell me :)

import string
def xorstring(s1,s2 ):
""" XOR string s1 with s2 """
# Argument check
if not (type(s1) == type("")) or not (type(s2) == type("")):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xor list,"") # Backward compatible

Jul 18 '05 #11
"Alex Martelli" <al***@aleax.it > wrote in message
news:a%******** ************@ne ws1.tin.it...
about performance, think about "array('b', ... -- it could buy you an easy
speedup by a factor of 2, like here.


Why we should read a module a day: just today I learned about the array
module, when I *had* been using struct for similar tasks....

Sigh; if only the standard library weren't so *big* ;).
--
Francis Avila

Jul 18 '05 #12
Francis Avila wrote:
"Noen" <no***********@ na.no> wrote in message
news:ie******** *************@j uliett.dax.net. ..
nope, thought the Argument check would do it, but I used and opeartor
instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
faster?


How about using the struct module?
Caveat: I don't know if struct limits the repeat count.

import struct
def XOR(data, key):
if len(data) != len(key):
raise ValueError, "data and key not of equal length"
ldata = struct.unpack(' =%sb' % len(data), data)
lkey = struct.unpack(' =%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=% sb' % len(lxored), *lxored)
return xored

Just an idea.
--
Francis Avila


Nope. Both your and my approach make it worse. Only Alex Martelli did it
right:

import itertools, array, struct

def xor_noen(s1,s2) :
""" XOR string s1 with s2 """
output = ""
for i in range(len(s1)):
output += chr(ord(s1[i]) ^ ord(s2[i]))
return output

def xor_po(s1, s2):
return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip( s1,
s2)])

def xor_am(s1, s2):
x = array.array("b" , s2)
for i, v in enumerate(array .array("b", s1)):
x[i] ^= v
return x.tostring()

def xor_fa(data, key):
ldata = struct.unpack(' =%sb' % len(data), data)
lkey = struct.unpack(' =%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=% sb' % len(lxored), *lxored)
return xored

# 35 usec
def pro_noen():
return xor_noen("sgnir ts ni sreffid eziS", "Size differs in strings")

# 40 usec
def pro_po():
return xor_po("sgnirts ni sreffid eziS", "Size differs in strings")

# 23 usec
def pro_am():
return xor_am("sgnirts ni sreffid eziS", "Size differs in strings")

# 46 usec
def pro_fa():
return xor_fa("sgnirts ni sreffid eziS", "Size differs in strings")

assert pro_po() == pro_am() == pro_fa() == pro_noen()

I was surprised how good pro_noen() does, but that might change for large
strings. By the way, results varied significantly (several usecs) in
various test runs.

Peter
Jul 18 '05 #13
Noen <no***********@ na.no> wrote in message news:<%A******* **************@ juliett.dax.net >...
Here is the result, if anyone wants it :)
It was quite quick indeed, It took me 10.3279999495 seconds to XOR
"a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
if anyone know how to make it even faster, please tell me :)

import string
def xorstring(s1,s2 ):
""" XOR string s1 with s2 """
# Argument check
if not (type(s1) == type("")) or not (type(s2) == type("")):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xor list,"") # Backward compatible


Just fiddling around with the (excellent) suggestions already posted,
and found a couple of things.

Alex's suggestion using array is the fastest, but was slightly faster
when using xrange() rather than enumerate():

def xor6(s1, s2):
a1 = array.array("b" , s1)
a2 = array.array("b" , s2)

for i in xrange(len(a1)) :
a1[i] ^= a2[i]
return a1.tostring()

On my system, this XORs 500,000 characters in 0.39s, with Noen's
original taking 0.67s.

Using psyco produced a further ~3x speedup, reducing this to 0.14s.
That's pretty fast!

Casey
Jul 18 '05 #14
Well, nothing to add actually, but my results:

import string
import array

def xorstring0(s1,s 2): # the original method
# argument tests were here

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xor list,"") # Backward compatible

def xorstring1(s1,s 2):
xorlist = [chr(ord(x) ^ ord(y)) for (x, y) in zip(s1, s2)]
return string.join(xor list,"") # Backward compatible

def xorstring2(s1,s 2):
xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in range(len(s1))] # range
return string.join(xor list,"") # Backward compatible

def xorstring3(s1,s 2):
xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in xrange(len(s1))] # xrange
return string.join(xor list,"") # Backward compatible

def xorstring4(s1,s 2):
a1 = array.array("B" , s1)
a2 = array.array("B" , s2)
for i in xrange(len(a1)) :
a1[i] ^= a2[i]
return a1.tostring()

s1 = 'abcew'*200000
s2 = 'xyzqa'*200000

fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]

# check if all works OK for some short data --
# protection from some rough error or typo
sx = fns[0]( s1[:100], s2[:100] )
for fn in fns:
assert sx == fn( s1[:100], s2[:100] )

import time

tms = [60] * len(fns) # assume some unreal big times

for n in range(30): # loop 30 times

for i in range(len(fns)) :

tb = time.time()
sx = fns[i]( s1, s2 ) # do it!
tm = time.time() - tb
tms[i] = min( tms[i], tm )

for i in range(len(fns)) :
print "fn%d -- %.3f sec -- %.2f" % (i,tms[i],tms[i]/tms[-1])

# fn0 -- 5.578 sec -- 6.37
# fn1 -- 4.593 sec -- 5.25
# fn2 -- 2.609 sec -- 2.98
# fn3 -- 2.531 sec -- 2.89
# fn4 -- 0.875 sec -- 1.00
BTW, for the same million char strings, a C function runs 0.0035 sec, or 250 times faster!
G-:
Jul 18 '05 #15
Georgy Pruss wrote:
Well, nothing to add actually, but my results: ...but of course when in a hurry one should remember...
psyco
!!!

So I gave your script a little tweak...:
fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]
i.e. right here I added:

import psyco
psyfns = [ psyco.proxy(f) for f in fns ]
fns.extend(psyf ns)

so each function is tried in both the plain and psycotic form --
the latter is at an index 5 above (fn5 is the psycoized xorstring0,
etc etc up to fn9 which is the psycoed xorstring4).
# fn0 -- 5.578 sec -- 6.37
# fn1 -- 4.593 sec -- 5.25
# fn2 -- 2.609 sec -- 2.98
# fn3 -- 2.531 sec -- 2.89
# fn4 -- 0.875 sec -- 1.00

BTW, for the same million char strings, a C function runs 0.0035 sec, or
250 times faster!


My measurements give:

fn0 -- 19.229 sec -- 347.98
fn1 -- 12.489 sec -- 226.01
fn2 -- 4.719 sec -- 85.39
fn3 -- 4.359 sec -- 78.88
fn4 -- 2.046 sec -- 37.02

fn5 -- 16.736 sec -- 302.86
fn6 -- 9.675 sec -- 175.08
fn7 -- 1.045 sec -- 18.90
fn8 -- 1.048 sec -- 18.96
fn9 -- 0.055 sec -- 1.00

so, my machine is quite a bit slower (it IS getting long in the tooth --
over 30 months -- and wasn't the absolute speediest even then;-) by a
factor that's quite variable, 1.72 to 3.45 times, median 2.34 times.

psyco helps marginally on some approaches, 4+ times on others, and
almost 40 times on the fn4 approach, which happens to be the fastest
even without it (luckily...:-). Still not a match for your reported
C function, but just 6-7 times slower (guessing), not 250 times.

One thing worth checking is whether the end result IS needed as a
string, because some more time might be saved by e.g. using buffer(x)
rather than x.tostring() for the result array x, in many cases.

Another possibility to be always kept in mind for bulk array operations
is Numeric. xor-ing 10,000-long byte arrays takes about 10 msec with
array.array (needing a Python-level loop) versus 50 microseconds with
Numeric.array (no loop needed), so a factor of 200 (over the fastest
python-cum-standard-library only approach) is easily available and may
be getting even closer than psyco to bare-C speeds.
Alex

Jul 18 '05 #16
Noen <no***********@ na.no> writes:
This way is very slow for large strings.
Anyone know of a better and faster way to do it?


Yes, use the array module.
Jul 18 '05 #17
I didn't even know about psyco. Thanks!
G-:

"Alex Martelli" <al***@aleax.it > wrote in message news:TJ******** ************@ne ws1.tin.it...
Georgy Pruss wrote:
Well, nothing to add actually, but my results: ...but of course when in a hurry one should remember...
psyco
!!!

<...>


Alex

Jul 18 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2516
by: lawrence | last post by:
"Garp" <garp7@no7.blueyonder.co.uk> wrote in message news:<_vpuc.1424$j_3.13346038@news-text.cableinet.net>... > "lawrence" <lkrubner@geocities.com> wrote in message > news:da7e68e8.0405300907.3c8eabf7@posting.google.com... > > This reg ex will find me any block of strings 40 or more characters in > > length without a white space, yes? > > > > {40} > > > > > > To get it to include tabs and newlines, do I to this?
83
3428
by: kartik | last post by:
there seems to be a serious problem with allowing numbers to grow in a nearly unbounded manner, as int/long unification does: it hides bugs. most of the time, i expect my numbers to be small. 2**31 is good enough for most uses of variables, and when more is needed, 2**63 should do most of the time. granted, unification allows code to work for larger numbers than foreseen (as PEP 237 states) but i feel the potential for more undetected...
4
2303
by: Abhishek Jha | last post by:
You can implement very very long integer using strings. Strings have no limit and using your own code implementation you can store the numbers of as much length as you want..........for more details contact at abhishekjha10@yahoo.co.in
7
7593
by: Eric | last post by:
Hi All, I need to XOR two same-length Strings against each other. I'm assuming that, in order to do so, I'll need to convert each String to a BitArray. Thus, my question is this: is there an easy way to convert a String to a BitArray (and back again)? I explained the ultimate goal (XORing two Strings) so that, if anyone has a better idea of how to go about this they may (hopefully) bring that up...?
3
2376
by: ajaksu | last post by:
Hello c.l.p.ers :) Running long(Decimal) is pretty slow, and the conversion is based on strings. I'm trying to figure out whether there is a good reason for using strings like in decimal.py (that reason would be bound to bite me down the road). This converts Decimal to long and is much faster in my test system (PIII 650MHz, but was written on a P133 a year ago :)). def dec2long(number):
73
7410
by: Yevgen Muntyan | last post by:
Hey, I was reading C99 Rationale, and it has the following two QUIET CHANGE paragraphs: 6.5.3.4: "With the introduction of the long long and extended integer types, the sizeof operator may yield a value that exceeds the range of an unsigned long." 6.5.6: "With the introduction of the long long and extended integer
10
2818
by: Mo | last post by:
Hi, I am trying to write a code to build a string 768 characters long. This string is going to be written to a file which is then read by another application. The format of the string is already defined. For example Firstname starts at position 20, Last Name at position 30, address at position 40 etc... I am reading these data from sql datasource and must substitute the values in the exact position insuring that paramteres end up...
5
5542
by: David Mathog | last post by:
I'm looking at a program which stores perl scripts in an array. Each script is stored as a single entry in that array, and the whole set of them live in a single header file (escaped out the wazoo to get the perl code intact through the C preprocessor.) The issue is that many of these strings are quite long, which causes gcc to throw these sorts of warnings: scripts.h:1: warning: string length '4918' is greater than the length '4095'...
2
1601
by: Jean-Paul Calderone | last post by:
On Fri, 5 Sep 2008 14:24:16 -0500, Robert Dailey <rcdailey@gmail.comwrote: mystring = ( "This is a very long string that " "spans multiple lines and does " "not include line breaks or tabs " "from the source file between " "the strings partitions.") Jean-Paul
0
8686
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9033
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8911
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7748
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5872
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4375
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4627
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2345
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2009
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.