By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,058 Members | 1,217 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,058 IT Pros & Developers. It's quick & easy.

replacing multiple instances of commas beginning at specific position

P: n/a
I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines. For
example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to the
letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?

Any help would be greatly appreciated.
TIA,
Kevin

Nov 22 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
striker wrote:
I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines. For
example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to the
letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?


Seems like a typical use case for the re module...
-> now you've got *2* problems- !-)
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
Nov 22 '05 #2

P: n/a
On Nov 14, striker wrote:
I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines.
For example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to
the letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to
ADMNSRC?


One possible approach:

#! /usr/bin/env python

import re

# This list simulates the actual opened file.
infile = [
'one, two, three,four,,,,,,ADMNSRC,,,,eighteen,',
'one, two, three,four,five,six'
]

# Placeholder for resultant list.
result = []

for item in infile:
# Use a regex to just reduce *all* multi-commas to singles.
item = re.sub(r',{2,}', r',', item)
# Add back the desired commas for special case.
item = item.replace('ADMNSRC', ',,,ADMNSRC')
# Remove spaces??
item = item.replace(' ', '')
# Add to resultant list.
result.append(item)

--
_ _ ___
|V|icah |- lliott <>< md*@micah.elliott.name
" " """
Nov 22 '05 #3

P: n/a
On Tue, 15 Nov 2005 08:26:22 GMT, Dennis Lee Bieber <wl*****@ix.netcom.com> wrote:
On 14 Nov 2005 09:43:57 -0800, "striker" <st*****@trip.net> declaimed
the following in comp.lang.python:

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?

Simplify the problem... Start by rephrasing... Since ADMNSRC is to
always have four commas leading up to it (at least, as I understand your
statement of intent), you could consider three of those to be part of
the string. So...

Phase one: split on commas, tossing out any null fields
Phase two: replace "ADMNSRC" with ",,,ADMNSRC"
Phase three: rejoin the parts that remain.

Doing this efficiently may be another matter but...

data = [ "one, two, three,,,,four,,,,ADMNSRC, five,,,,six",
"one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so
on" ]

result = []
for ln in data:
wds = [x.strip() for x in ln.split(",") if x]
for i in range(len(wds)):
if wds[i] == "ADMNSRC":
wds[i] = ",,,ADMNSRC"
result.append(",".join(wds))

print result


Or if data is from a single file read, maybe (untested beyond what you see ;-)
data = """\ ... one, two, three,,,,four,,,,ADMNSRC, five,,,,six
... one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so on
... """ import re
rxc = re.compile(',+')
result = ',,,ADMNSRC'.join(','.join(rxc.split(s)) for s in data.split(',,,ADMNSRC'))
print result

one, two, three,four,,,,ADMNSRC, five,six
one, two, three,four,,,,ADMNSRC,eighteen, and so on

Regards,
Bengt Richter
Nov 22 '05 #4

P: n/a
On Tue, 15 Nov 2005 08:26:22 GMT, Dennis Lee Bieber <wl*****@ix.netcom.com> wrote:
On 14 Nov 2005 09:43:57 -0800, "striker" <st*****@trip.net> declaimed
the following in comp.lang.python:

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?

Simplify the problem... Start by rephrasing... Since ADMNSRC is to
always have four commas leading up to it (at least, as I understand your
statement of intent), you could consider three of those to be part of
the string. So...

Phase one: split on commas, tossing out any null fields
Phase two: replace "ADMNSRC" with ",,,ADMNSRC"
Phase three: rejoin the parts that remain.

Doing this efficiently may be another matter but...

data = [ "one, two, three,,,,four,,,,ADMNSRC, five,,,,six",
"one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so
on" ]

result = []
for ln in data:
wds = [x.strip() for x in ln.split(",") if x]
for i in range(len(wds)):
if wds[i] == "ADMNSRC":
wds[i] = ",,,ADMNSRC"
result.append(",".join(wds))

print result


Or if data is from a single file read, maybe (untested beyond what you see ;-)
data = """\ ... one, two, three,,,,four,,,,ADMNSRC, five,,,,six
... one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so on
... """ import re
rxc = re.compile(',+')
result = ',,,ADMNSRC'.join(','.join(rxc.split(s)) for s in data.split(',,,ADMNSRC'))
print result

one, two, three,four,,,,ADMNSRC, five,six
one, two, three,four,,,,ADMNSRC,eighteen, and so on

Regards,
Bengt Richter
Nov 22 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.