472,807 Members | 2,450 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,807 software developers and data experts.

String Concatenation O(n^2) (was: Re: Explaining Implementing aBinary Search Tree.)

Summary: can't verify big O claim, how to properly time this?

On Jun 15, 2:34 pm, "Terry Reedy" <tjre...@udel.eduwrote:
"Bart Kastermans" <bkast...@gmail.comwrote in message

news:ae**********************************@k13g2000 hse.googlegroups.com...
|I wrote a binary search tree in python, explaining as I was doing it
| how and why I did it. I am very interested in receiving comments on
| the code, process, and anything else that will improve my coding or
| writing.
|
| I wrote this all up in my blog at:
|
|http://kasterma.wordpress.com/2008/0...-binary-search...
|
| The code of the class has been copied below, but the description of
| the process (mostly an attempt at approaching test driving development
| for as far as I understand the term) has not been copied.
|
| Any and all comments are appreciated.
|
| Best,
| Bart
|
| *********** python code ************************
|
|
| import re
|
| class BSTree:
| def __init__ (self, value = None):
| self.value = value
| self.left = None
| self.right = None

There are two possible approaches.
1. Define 1 tree class that is also used for subtrees -- what you did.
Complication: self.value of root node can be none, so you constantly
have to check self.value for None even though only possible for root node.
2. Define tree class and node class. This had other complications, but
removes that above and makes str easier. tree.str = '(' str(rootnode) ')'
and node.str= self.value '(' str(self.left) ')' '(' str(self.right) ')'.

If use '' instead of None, no conditionals are needed. (This might apply
partly to your version as well.) Or define class NullTree with a singleton
instance with no attributes and a str method returning '' and an inOrder
method yielding nothing. This would also eliminate the condifionals in the
inOrder method. Not sure what is best. With a good test suite, it is easy
to make sure alternative implementations 'work' before testing for speed.
Thanks for the idea. I would expect the separation to lead to
somewhat more
code, but all the "checking the root" code would be separated out in
the
tree class. The node class would be very smooth. I'll try this when
I have
some time (today I spend my "alloted" programming time on what is
below).

(also the comment about inOrder returning a generator was because I
tried to
figure it out, failed, and then got enough by doing it without yield.
I forgot
to bring my comment in line with my code. A generator
would certainly be nicer, and I'll work at understanding your
suggestion for
it.)
>
| def __str__ (self):

string appending is an O(n**2) operations. The usual idiom, applied here,
would be slist = ['('], slist.append(str(self.value)), .... return
''.join(slist). In other words, collect list of pieces and join at end.
This is interesting. I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. So I wrote some
code to
collect data, and I can't find that it goes quadratic. I have the
graph
at

http://kasterma.wordpress.com/2008/0...concatenation/

It looks piecewise linear to me.

The code I used to collect the data is as follows:

************************************************** ***********************

import time

NUMBER = 100 # number of strings to concatenate at given length
JUMP = 500 # jump (and start length) of length of strings
END = 100001 # longest length string considered

def randomString (length):
""" returns a random string of letters from {a,b,c,d} of length
"""

string = ""

for i in range (0,length):
string += choice ("abcd")

return string

def randomStrings (number, length):
""" returns an array of number random strings all of length """

array = []

for i in range (0, number):
array.append (randomString (length))

return array

TimingData = []

for length in range (JUMP, END, JUMP):
array1 = randomStrings (NUMBER, length)
array2 = randomStrings (NUMBER, length)

starttime = time.clock ()
for i in range (0, NUMBER):
string = array1 [i] + array2 [i]
endtime = time.clock ()
print "length", length, "done"

TimingData.append ([length, 1000* (endtime-starttime)])
# to get a better looking graph multiply by 1000

sagefile = open ('stringConcatGraph.sage', "w")
sagefile.write ("points =" + str (TimingData) + "\n")
sagefile.write ("graph = line (points)\n")
sagefile.write ("graph.show ()\n")
sagefile.close ()

Jun 27 '08 #1
2 2188
En Mon, 16 Jun 2008 07:34:06 -0300, Bart Kastermans <bk******@gmail.comescribió:
Summary: can't verify big O claim, how to properly time this?

This is interesting. I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. So I wrote some
code to
collect data, and I can't find that it goes quadratic.
In your test code, you're concatenating only two strings, that's a *single* operation, and takes time proportional to the total length.
The quadratic behavior appears when you do *several* concatenations in a row (like in your original code, where += was used several times to build a result).
If you want to verify it, try joining N strings of size M (for varying values of N and M), and plot total time vs. N (for a given M value), and total time vs. M (for a given N value) and finally total time vs. (N*M), see what happens and post your findings again.

--
Gabriel Genellina

Jun 27 '08 #2
On Jun 17, 1:01*am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Mon, 16 Jun 2008 07:34:06 -0300, Bart Kastermans <bkast...@gmail.comescribió:
Summary: can't verify big O claim, how to properly time this?
This is interesting. *I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. *So I wrote some
code to
collect data, and I can't find that it goes quadratic.

In your test code, you're concatenating only two strings, that's a *single* operation, and takes time proportional to the total length.
The quadratic behavior appears when you do *several* concatenations in a row (like in your original code, where += was used several times to build aresult).
If you want to verify it, try joining N strings of size M (for varying values of N and M), and plot total time vs. N (for a given M value), and total time vs. M (for a given N value) and finally total time vs. (N*M), see what happens and post your findings again.

--
Gabriel Genellina
I did the work and found that for trees it does not go quadratic at
all, from the computations you suggest it is easy to get quadratic
behavior though. Also this does not depend in any way I can see on
the implementation by Python. Actual time might certainly improve by
doing it differently (changing the constants), but the big O results
won't.

For pictures and math I have put it up on my blog again see:
http://kasterma.wordpress.com/2008/0...catenation-ii/

Thanks to everyone who commented so far on this subject. I had some
good fun with it so far.

Best,
Bart
Jun 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Jerry Khoo | last post by:
hello, everybody, i am kinda new here, nice to meet u all. Now, i am cs students and are now facing difficult problems in understanding what a binary tree is, how it works, and the algorithm to...
6
by: M B HONG 20 | last post by:
Hi all - I was wondering if Javascript has a way to easily remove duplicates from a string. For example, if I had a string: "car truck car truck truck tree post post tree" it should turn...
32
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if...
16
by: Mark A. Sam | last post by:
Hello, I am having a problem with imputting into a string variable: Dim strSQL As String = "INSERT INTO tblContactForm1 (txtName, txtCompany, txtPhone, txtEmail, txtComment, chkGrower,...
34
by: Larry Hastings | last post by:
This is such a long posting that I've broken it out into sections. Note that while developing this patch I discovered a Subtle Bug in CPython, which I have discussed in its own section below. ...
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
0
by: Jean-Paul Calderone | last post by:
On Mon, 16 Jun 2008 10:41:05 -0600, Ian Kelly <ian.g.kelly@gmail.comwrote: It will depend what version of Python you're using and the *exact* details of the code in question. An optimization was...
5
by: nandor.sieben | last post by:
I am using a small set of functions that implements an n-ary tree. The library is disscusses here: ...
9
by: Adriano | last post by:
Hello, I have the following string headers: Header 1, 2, 3, 4: 1.00TJ123456PPC 00000000DLGLGN 00000001TXBEG Login:user=a, pswd=a"
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.