I want to parse some text and generate an output that is similar but
not identical to the input.
The string I produce will be of similar length to the input string -
but a bit longer.
I'm parsing character by character and adding the characters of the
input string to the output until I come to ones I want to modify. This
means creating a new string for every character (since strings are
immutable) which seems very inneficient - particularly when I know
roughly what the output length will be. In a language like c I think I
could reserve a chunk of memory and keep a track of how much I'd
filled... just putting characters into it.(If I filled it I could
reserve a smaller chunk more - not difficult to keep a track of).
What's an efficient equivalent in python ? I could use a list,
appending characters onto the end of it.. converting to a string at
the end using ''.join(thelist ).
Regards,
Fuzzy http://www.voidspace.org.uk/atlantib...thonutils.html 3 2254
Fuzzyman wrote: I'm parsing character by character and adding the characters of the input string to the output until I come to ones I want to modify.
[...] What's an efficient equivalent in python ?
I think the answer might depend on information you haven't provided.
How are you 'parsing' this? In other words, how do you know
when you've gotten to the "ones you want to modify"? Are you sure
there isn't a way of avoiding the one-by-one manual scan? If
there is, you can probably add characters whole sequences at a
time to a list, as you suggest, by slicing from the input string
once you know where a sequence of "kept" characters starts
and stops.
For that matter, could you use string.replace or re.sub to do
the job even more efficiently?
-Peter
It's not clear what you mean by claiming that "creating a new string for
every character" is inefficient:
$ timeit 'int()'
100000 loops, best of 3: 1.26 usec per loop
$ timeit 'str()'
1000000 loops, best of 3: 1.28 usec per loop
$ timeit 'chr(0)'
1000000 loops, best of 3: 1.73 usec per loop
If your output is a transformation of your input, I'd write
def transform(input ):
def _transform():
for c in input:
yield a string zero or more times
return ''.join(_transf orm())
Python should automatically do some nice overallocation tricks to make
this fairly efficient. You could also write
def transform(input ):
result = ''
for c in input:
result.append(a string) zero or more times
return ''.join(result)
and if you care about the absolute fastest code you'll benchmark both of
them.
A common "gotcha" for starting programmers would be to write something
like
def transform(input ):
result = ''
for c in input:
result += a string zero or more times
return result
because in this case Python won't (currently, anyway) do any clever
overallocation tricks, but instead will do a copy of the partial result
at the site of each +=.
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFA7ou5Jd0 1MZaTXX0RAlqVAJ 9MRmZeJGqEOkjDL Xm84QrXjWHhTwCe M3+7
BlbaNXQZXUbpjt0 2H5Nm9Zg=
=fEQA
-----END PGP SIGNATURE-----
On 9 Jul 2004 04:46:29 -0700, Fuzzyman <mi*****@foord. net> wrote: I want to parse some text and generate an output that is similar but not identical to the input.
The string I produce will be of similar length to the input string - but a bit longer.
I'm parsing character by character and adding the characters of the input string to the output until I come to ones I want to modify. This means creating a new string for every character (since strings are immutable) which seems very inneficient - particularly when I know roughly what the output length will be. In a language like c I think I could reserve a chunk of memory and keep a track of how much I'd filled... just putting characters into it.(If I filled it I could reserve a smaller chunk more - not difficult to keep a track of). What's an efficient equivalent in python ? I could use a list, appending characters onto the end of it.. converting to a string at the end using ''.join(thelist ).
I'm not terribly clear on what you're trying to do, but I'm pretty
sure you can do it with regular expressions a lot easyer than the way
you're describing it; you might not even need that---you might get
away with the 'replace' method on strings. Which you use depends on
the complexity of what you want to do, and on which ends up being
faster on your machine; as soon as its more complicated than one or
two 'replace's, regular expressions usually win.
If you could describe (a subset of) the problem in a bit more detail,
you'll probably get more useful suggestions (as in, code to do it, or
even docs to read to do it).
--
John Lenton (jl*****@gmail. com) -- Random fortune:
bash: fortune: command not found This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Ronnie |
last post by:
Hi All,
A newbie here having a hard time figuring out how to parse out the
City, State and Zip from a text field. I have a text field called
"Registration" with a size of 40. In this field there are city, state
and zip information. I need to seperate all three items into their
own respective fields. Unfortunately, the information varies in size
and location. An example, would be:
"LOS ANGELES CA90063 ", while another record...
|
by: Lucas Tam |
last post by:
Hi all,
Does anyone know of a GOOD example on parsing text with text qualifiers?
I am hoping to parse text with variable length delimiters/qualifiers. Also,
qualified text could run onto mulitple lines and contain characters like
vbcrlf (thus the multiple lines).
Anyhow, any help would be appreciated. Thanks!
|
by: Kevin Chambers |
last post by:
Hi all--
In an attempt to commit an Access MDB to a versioning system (subversion),
I'm trying to figure out how to convert a jet table's metadata to text, a
la SaveAsText. The end goal is to be able to build an MDB completely from
the svn repository text files.
Has anybody dealt with this?
Thanks in advance,
|
by: jim_adams |
last post by:
I need a very efficient way to parse large amounts of text (GBs) on
word boundaries. Words will then be added to an array as long as they
haven't already been added. Splitting on a space is a bit too basic
since punctuation will remain. Maybe regex?
Thanks for any insights.
Jim
|
by: sonald |
last post by:
Hi,
Can anybody tell me how to change the text delimiter in FastCSV Parser
?
By default the text delimiter is double quotes(")
I want to change it to anything else... say a pipe (|)..
can anyone please tell me how do i go about it?
| |
by: Jim Langston |
last post by:
In my program I am accepting messages over the network and parsing them. I
find that the function that does this has gotten quite big, and so want to
break the if else code into functions. I started thinking of how to do
this, but came up with a number of ways and don't know what would be best,
and fit into the C++ idoism.
This is what I have now:
if ( ThisPlayer.Character.GMLevel == 100 && ( StrMessage == "/debugserver"
||...
|
by: RG |
last post by:
I am having trouble parsing the data I need from a Serial Port Buffer.
I am sending info to a microcontroller that is being echoed back that
I need to remove before I start the actual important data reading.
For instance this is my buffer string:
012301234FFFFFFxFFFFFFxFFFFFFx
Where the FFFFFF is my Hex data I need to read. I am using the "x" as
a separater as I was having problems using the VbCrLf. But I think
|
by: list |
last post by:
Hi folks,
I am new to Googlegroups. I asked my questions at other forums, since
now.
I have an important question: I have to check files if they are
binary(.bmp, .avi, .jpg) or text(.txt, .cpp, .h, .php, .html). How to
check a file an find out if the file is binary or text?
Thanks for your help.
|
by: tlpell |
last post by:
Hey, read some tips/pointers on PHP.net but can't seem to solve this
problem. I have a php page that reads the contents of a file and then
displays the last XX lines of the file. Problem is this...whenever
the file gets larger that ~5MB, the page just displays nothing, as
though a timeout has occurred but I get no error. At 4.8MB (last
confirmed size)...the function still works. Any ideas what code below
is lacking??
<?
$handle=...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |