471,349 Members | 2,018 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,349 software developers and data experts.

compressing consecutive spaces

How can I replace multiple consecutive spaces in a file with a single
character (usually a space, but maybe a comma if converting to a CSV
file)? Ideally, the Python program would not compress consecutive
spaces inside single or double quotes. An inelegant method is to
repeatedly replace two consecutive spaces with one.

Jul 9 '07 #1
2 2826
On Jul 9, 9:38 am, Beliavsky <beliav...@aol.comwrote:
How can I replace multiple consecutive spaces in a file with a single
character (usually a space, but maybe a comma if converting to a CSV
file)? Ideally, the Python program would not compress consecutive
spaces inside single or double quotes. An inelegant method is to
repeatedly replace two consecutive spaces with one.
Split with no arguments splits on whitespace, and multiple spaces
count as but a single separator. So split+join = collapsed
whitespace.
>>test = "a b c d efdd slkj sdfdsfl"
" ".join(test.split())
'a b c d efdd slkj sdfdsfl'

Or use something other than " " to join with, such as ",".
>>",".join(test.split())
'a,b,c,d,efdd,slkj,sdfdsfl'

-- Paul

Jul 9 '07 #2
On Jul 9, 7:38 am, Beliavsky <beliav...@aol.comwrote:
How can I replace multiple consecutive spaces in a file with a single
character (usually a space, but maybe a comma if converting to a CSV
file)? Ideally, the Python program would not compress consecutive
spaces inside single or double quotes. An inelegant method is to
repeatedly replace two consecutive spaces with one.


One can try mx.TextTools. E.g.,
from mx.TextTools import *
import re

string_inside_quotes=re.compile(r'(?P<quote>["\']).*?(?<!\\)(?
P=quote)',
re.MULTILINE)

def advance_position(text, position, len_text, sre):
mobj = sre.match(text[position:])
if mobj:
incr = len(mobj.group(0))
else:
incr = 0
return position + incr
table = ('try_again',
('quoted_string', CallArg,
(advance_position, string_inside_quotes), +1,
'try_again'),
('nonspace', AllNotIn, ' ', +1, 'try_again'),
('space', AllIn, ' ', +1, 'try_again'),
(None, EOF, Here, +1, MatchOk),
(None, Fail, Here),)

for target_string in (
" Try using mx.TextTools 'for parsing strings'",
"'It might be' just what you needed",
'I find "it worthwhile"',
):
print "BEFORE:%s" % target_string
_, taglist, _ = tag(target_string, table)
if taglist:
tokens = []
for t in taglist:
tagobj, left_index, right_index = t[0:3]
if tagobj == 'space':
tokens.append(' ')
else:
tokens.append(target_string[left_index:right_index])
print "AFTER:%s" % ''.join(tokens)
else:
print "Something went horribly wrong"
--
Hope this helps,
Steven

Jul 9 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Adam | last post: by
7 posts views Thread by Jim Vorbau | last post: by
22 posts views Thread by Kim Scarborough | last post: by
6 posts views Thread by sri2097 | last post: by
1 post views Thread by Martin Arvidsson, Visual Systems AB | last post: by
6 posts views Thread by cesco | last post: by
reply views Thread by XIAOLAOHU | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.