473,804 Members | 2,145 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Text Parsing - character at a time...

I want to parse some text and generate an output that is similar but
not identical to the input.

The string I produce will be of similar length to the input string -
but a bit longer.

I'm parsing character by character and adding the characters of the
input string to the output until I come to ones I want to modify. This
means creating a new string for every character (since strings are
immutable) which seems very inneficient - particularly when I know
roughly what the output length will be. In a language like c I think I
could reserve a chunk of memory and keep a track of how much I'd
filled... just putting characters into it.(If I filled it I could
reserve a smaller chunk more - not difficult to keep a track of).
What's an efficient equivalent in python ? I could use a list,
appending characters onto the end of it.. converting to a string at
the end using ''.join(thelist ).
Regards,
Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #1
3 2254
Fuzzyman wrote:
I'm parsing character by character and adding the characters of the
input string to the output until I come to ones I want to modify. [...] What's an efficient equivalent in python ?


I think the answer might depend on information you haven't provided.
How are you 'parsing' this? In other words, how do you know
when you've gotten to the "ones you want to modify"? Are you sure
there isn't a way of avoiding the one-by-one manual scan? If
there is, you can probably add characters whole sequences at a
time to a list, as you suggest, by slicing from the input string
once you know where a sequence of "kept" characters starts
and stops.

For that matter, could you use string.replace or re.sub to do
the job even more efficiently?

-Peter
Jul 18 '05 #2
It's not clear what you mean by claiming that "creating a new string for
every character" is inefficient:
$ timeit 'int()'
100000 loops, best of 3: 1.26 usec per loop
$ timeit 'str()'
1000000 loops, best of 3: 1.28 usec per loop
$ timeit 'chr(0)'
1000000 loops, best of 3: 1.73 usec per loop

If your output is a transformation of your input, I'd write
def transform(input ):
def _transform():
for c in input:
yield a string zero or more times
return ''.join(_transf orm())
Python should automatically do some nice overallocation tricks to make
this fairly efficient. You could also write
def transform(input ):
result = ''
for c in input:
result.append(a string) zero or more times
return ''.join(result)
and if you care about the absolute fastest code you'll benchmark both of
them.

A common "gotcha" for starting programmers would be to write something
like
def transform(input ):
result = ''
for c in input:
result += a string zero or more times
return result
because in this case Python won't (currently, anyway) do any clever
overallocation tricks, but instead will do a copy of the partial result
at the site of each +=.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA7ou5Jd0 1MZaTXX0RAlqVAJ 9MRmZeJGqEOkjDL Xm84QrXjWHhTwCe M3+7
BlbaNXQZXUbpjt0 2H5Nm9Zg=
=fEQA
-----END PGP SIGNATURE-----

Jul 18 '05 #3
On 9 Jul 2004 04:46:29 -0700, Fuzzyman <mi*****@foord. net> wrote:
I want to parse some text and generate an output that is similar but
not identical to the input.

The string I produce will be of similar length to the input string -
but a bit longer.

I'm parsing character by character and adding the characters of the
input string to the output until I come to ones I want to modify. This
means creating a new string for every character (since strings are
immutable) which seems very inneficient - particularly when I know
roughly what the output length will be. In a language like c I think I
could reserve a chunk of memory and keep a track of how much I'd
filled... just putting characters into it.(If I filled it I could
reserve a smaller chunk more - not difficult to keep a track of).
What's an efficient equivalent in python ? I could use a list,
appending characters onto the end of it.. converting to a string at
the end using ''.join(thelist ).


I'm not terribly clear on what you're trying to do, but I'm pretty
sure you can do it with regular expressions a lot easyer than the way
you're describing it; you might not even need that---you might get
away with the 'replace' method on strings. Which you use depends on
the complexity of what you want to do, and on which ends up being
faster on your machine; as soon as its more complicated than one or
two 'replace's, regular expressions usually win.

If you could describe (a subset of) the problem in a bit more detail,
you'll probably get more useful suggestions (as in, code to do it, or
even docs to read to do it).

--
John Lenton (jl*****@gmail. com) -- Random fortune:
bash: fortune: command not found
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5168
by: Ronnie | last post by:
Hi All, A newbie here having a hard time figuring out how to parse out the City, State and Zip from a text field. I have a text field called "Registration" with a size of 40. In this field there are city, state and zip information. I need to seperate all three items into their own respective fields. Unfortunately, the information varies in size and location. An example, would be: "LOS ANGELES CA90063 ", while another record...
7
5130
by: Lucas Tam | last post by:
Hi all, Does anyone know of a GOOD example on parsing text with text qualifiers? I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto mulitple lines and contain characters like vbcrlf (thus the multiple lines). Anyhow, any help would be appreciated. Thanks!
6
3701
by: Kevin Chambers | last post by:
Hi all-- In an attempt to commit an Access MDB to a versioning system (subversion), I'm trying to figure out how to convert a jet table's metadata to text, a la SaveAsText. The end goal is to be able to build an MDB completely from the svn repository text files. Has anybody dealt with this? Thanks in advance,
10
1550
by: jim_adams | last post by:
I need a very efficient way to parse large amounts of text (GBs) on word boundaries. Words will then be added to an array as long as they haven't already been added. Splitting on a space is a bit too basic since punctuation will remain. Maybe regex? Thanks for any insights. Jim
13
4970
by: sonald | last post by:
Hi, Can anybody tell me how to change the text delimiter in FastCSV Parser ? By default the text delimiter is double quotes(") I want to change it to anything else... say a pipe (|).. can anyone please tell me how do i go about it?
4
1914
by: Jim Langston | last post by:
In my program I am accepting messages over the network and parsing them. I find that the function that does this has gotten quite big, and so want to break the if else code into functions. I started thinking of how to do this, but came up with a number of ways and don't know what would be best, and fit into the C++ idoism. This is what I have now: if ( ThisPlayer.Character.GMLevel == 100 && ( StrMessage == "/debugserver" ||...
2
4889
by: RG | last post by:
I am having trouble parsing the data I need from a Serial Port Buffer. I am sending info to a microcontroller that is being echoed back that I need to remove before I start the actual important data reading. For instance this is my buffer string: 012301234FFFFFFxFFFFFFxFFFFFFx Where the FFFFFF is my Hex data I need to read. I am using the "x" as a separater as I was having problems using the VbCrLf. But I think
29
4913
by: list | last post by:
Hi folks, I am new to Googlegroups. I asked my questions at other forums, since now. I have an important question: I have to check files if they are binary(.bmp, .avi, .jpg) or text(.txt, .cpp, .h, .php, .html). How to check a file an find out if the file is binary or text? Thanks for your help.
28
3284
by: tlpell | last post by:
Hey, read some tips/pointers on PHP.net but can't seem to solve this problem. I have a php page that reads the contents of a file and then displays the last XX lines of the file. Problem is this...whenever the file gets larger that ~5MB, the page just displays nothing, as though a timeout has occurred but I get no error. At 4.8MB (last confirmed size)...the function still works. Any ideas what code below is lacking?? <? $handle=...
0
9715
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9595
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10603
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10099
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9176
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4314
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3836
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3003
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.