Double replace or single re.sub?

Iain King

I have some code that converts html into xhtml. For example, convert
all tags into . Right now I need to do to string.replace calls
for every tag:

html = html.replace('','')
html = html.replace('','')

I can change this to a single call to re.sub:

html = re.sub('<([/]*)i>', r'<\1em>', html)

Would this be a quicker/better way of doing it?

Iain

Oct 26 '05 #1

Subscribe Post Reply

2715

Mike Meyer

"Iain King" <ia******@gmail.com> writes:

I have some code that converts html into xhtml. For example, convert
all tags into . Right now I need to do to string.replace calls
for every tag:

html = html.replace('','')
html = html.replace('','')

I can change this to a single call to re.sub:

html = re.sub('<([/]*)i>', r'<\1em>', html)

Would this be a quicker/better way of doing it?

Maybe. You could measure it and see. But neither will work in the face
of attributes or whitespace in the tag.

If you're going to parse [X]HTML, you really should use tools that are
designed for the job. If you have well-formed HTML, you can use the
htmllib parser in the standard library. If you have the usual crap one
finds on the web, I recommend BeautifulSoup.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Oct 26 '05 #2

Iain King

Mike Meyer wrote:

"Iain King" <ia******@gmail.com> writes:
I have some code that converts html into xhtml. For example, convert
all tags into . Right now I need to do to string.replace calls
for every tag:

html = html.replace('','')
html = html.replace('','')

I can change this to a single call to re.sub:

html = re.sub('<([/]*)i>', r'<\1em>', html)

Would this be a quicker/better way of doing it?

Maybe. You could measure it and see. But neither will work in the face
of attributes or whitespace in the tag.

If you're going to parse [X]HTML, you really should use tools that are
designed for the job. If you have well-formed HTML, you can use the
htmllib parser in the standard library. If you have the usual crap one
finds on the web, I recommend BeautifulSoup.

Thanks. My initial post overstates the program a bit - what I actually
have is a cgi script which outputs my LIveJournal, which I then
server-side include in my home page (so my home page also displays the
latest X entries in my livejournal). The only html I need to convert
is the stuff that LJ spews out, which, while bad, isn't terrible, and
is fairly consistent. The stuff I need to convert is mostly stuff I
write myself in journal entries, so it doesn't have to be so
comprehensive that I'd need something like BeautifulSoup. I'm not
trying to parse it, just clean it up a little.

Iain

Oct 26 '05 #3

SPE - Stani's Python Editor

Of course it is better to precompile the expression, but I guess
replace will beat even a precompiled regular expression. You could see
this posting:
http://groups.google.nl/group/comp.l...eed+python+sub

But performance should be measured, not guessed.

Stani
--
SPE - Stani's Python Editor
http://pythonide.stani.be
http://pythonide.stani.be/manual/html/manual.html

Oct 26 '05 #4

Josef Meile

Hi Iain,

Would this be a quicker/better way of doing it? I don't know if this is faster, but it is for sure more elegant:

http://groups.google.ch/group/comp.l...b8767c793fb8b0

I really like it because of its simplicity an easy use. (Thanks to
Fredrik Lundh for the script). However, I suggested it once to replace
the approach you suggested in a web application we have, but it was
rejected because the person, who benchmarked it, said that it was OK for
small strings, but for larger ones performance were an issue. Anyway,
for my own applications, performance isn't an issue, so, I use it some
times.

By the way, the benchmarking, from which I don't have any information,
was done in python 2.1.3, so, for sure you will get a better performance
with 2.4.

Regards,
Josef
Iain King wrote: I have some code that converts html into xhtml. For example, convert
all tags into . Right now I need to do to string.replace calls
for every tag:

html = html.replace('','')
html = html.replace('','')

I can change this to a single call to re.sub:

html = re.sub('<([/]*)i>', r'<\1em>', html)

Iain

Oct 26 '05 #5

How does Python execute something like the following

oldPhrase="My dog has fleas on his knees"
newPhrase=oldPhrase.replace("fleas",
"wrinkles").replace("knees","face")

Does it do two iterations of the replace method on the initial and then
an intermediate string (my guess) -- or does it compile to something
more efficient (I doubt it, unless it's Christmas in Pythonville... but
I thought I'd query)

Oct 27 '05 #6

Bengt Richter

On 27 Oct 2005 12:39:18 -0700, "EP" <er***********@gmail.com> wrote:

How does Python execute something like the following

oldPhrase="My dog has fleas on his knees"
newPhrase=oldPhrase.replace("fleas",
"wrinkles").replace("knees","face")

Does it do two iterations of the replace method on the initial and then
an intermediate string (my guess) -- or does it compile to something
more efficient (I doubt it, unless it's Christmas in Pythonville... but
I thought I'd query)

Here's a way to get an answer in one form:

def foo(): # for easy disassembly ... oldPhrase="My dog has fleas on his knees"
... newPhrase=oldPhrase.replace("fleas",
... "wrinkles").replace("knees","face")
... import dis
dis.dis(foo)

2 0 LOAD_CONST 1 ('My dog has fleas on his knees')
3 STORE_FAST 1 (oldPhrase)

3 6 LOAD_FAST 1 (oldPhrase)
9 LOAD_ATTR 1 (replace)
12 LOAD_CONST 2 ('fleas')

4 15 LOAD_CONST 3 ('wrinkles')
18 CALL_FUNCTION 2
21 LOAD_ATTR 1 (replace)
24 LOAD_CONST 4 ('knees')
27 LOAD_CONST 5 ('face')
30 CALL_FUNCTION 2
33 STORE_FAST 0 (newPhrase)
36 LOAD_CONST 0 (None)
39 RETURN_VALUE

Regards,
Bengt Richter

Oct 28 '05 #7

Alex Martelli

Iain King <ia******@gmail.com> wrote:

I have some code that converts html into xhtml. For example, convert
all tags into . Right now I need to do to string.replace calls
for every tag:

html = html.replace('','')
html = html.replace('','')

I can change this to a single call to re.sub:

html = re.sub('<([/]*)i>', r'<\1em>', html)

Would this be a quicker/better way of doing it?
*MEASURE*!

Helen:~/Desktop alex$ python -m timeit -s'import re; h="aap"' \ 'h.replace("", "").replace("", "")'

100000 loops, best of 3: 4.41 usec per loop

Helen:~/Desktop alex$ python -m timeit -s'import re; h="aap"' \>
're.sub("<([/]*)i>", r"<\1em>}", h)'
10000 loops, best of 3: 52.9 usec per loop
Helen:~/Desktop alex$

timeit.py is your friend, remember this...!
Alex

Oct 28 '05 #8

by: Jakanapes | last post by:

Hi all, I'm looking for a way to scan a block of text and replace all the double quotes (") with single quotes ('). I'm using PHP to pull text out of a mySQL table and then feed the text into...

PHP

SQL syntax: How to insert srting with double quotes?

by: deko | last post by:

I'm trying to log error messages and sometimes (no telling when or where) the message contains a string with double quotes. Is there a way get the query to insert the string with the double...

Microsoft Access / VBA

Double Quotes In Data?

by: (PeteCresswell) | last post by:

Is his just a flat-out "No-No" or is there some workaround when it comes time for SQL searches and DAO.FindFirsts against fields containing same? I can see maybe wrapping the value searched for...

Microsoft Access / VBA

Bug with String Replace with /"/" or double quotes

by: G. | last post by:

This is an obvious bug in the String.Replace function: //load a XML string into a document XmlDocument doc = new XmlDocument(); doc.LoadXml("<test id='' />"); //Obtain the string...

C# / C Sharp

Replace double quotes (") with single quotes (')

by: gar | last post by:

Hi, I need to replace all the double quotes (") in a textbox with single quotes ('). I used this code text= Replace(text, """", "'" This works fine (for normal double quotes).The problem...

Visual Basic .NET

How to Parse a string with Embedded Double Quotes

by: Charles Law | last post by:

I have a string similar to the following: " MyString 40 "Hello world" all " It contains white space that may be spaces or tabs, or a combination, and I want to produce an array...

Visual Basic .NET

Replacing Double quotes with TWO Single Quotes

by: Justin Fancy | last post by:

Hi everyone, I need to replace all instances of a double quote(") with two single quotes('') in a text file. I already have some replacements of strings going on, but I tried this one, but the...

Visual Basic .NET

Datagrid on load; replace all double single quote to single quote to display to user

by: Eric Layman | last post by:

Hi, I've saved data into the db by doing a replace() on single quote. Right now on data display on a datagrid, it shows double single quote. How do I make changes during run time of datagrid...

ASP.NET

replace single slash with double slash

by: dkirkdrei | last post by:

I am having a bit of trouble trying to double up on slashes in a file path. What I am trying to do is very similar to the code below: <? $var =...

PHP

TextFieldParser problem with Double Quotes within Quotes

by: Yearwood | last post by:

Hi, I'm basically trying to import a CSV into an ACCESS database. Sample date is shown below: "",10173,"Development Manager - Social Economy Sector","Trust Bank",10153,,"Lolalll Pudd","Meet the...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Double replace or single re.sub?

Similar topics