By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,550 Members | 1,189 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,550 IT Pros & Developers. It's quick & easy.

Stripping ASCII codes when parsing

P: n/a
I am working with a text format that advises to strip any ascii control
characters (0 - 30) as part of parsing data and also the ascii pipe
character (124) from the data. I think many of these characters are
from a different time. Since I have never seen most of these characters
in text I am not sure how these first 30 control characters are all
represented (other than say tab (\t), newline(\n), line return(\r) ) so
what should I do to remove these characters if they are ever
encountered. Many thanks.
Oct 17 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
In article <ma*************************************@python.or g>,
David Pratt <fa*******@eastlink.ca> wrote:
I am working with a text format that advises to strip any ascii control
characters (0 - 30) as part of parsing data and also the ascii pipe
character (124) from the data. I think many of these characters are
from a different time. Since I have never seen most of these characters
in text I am not sure how these first 30 control characters are all
represented (other than say tab (\t), newline(\n), line return(\r) ) so
what should I do to remove these characters if they are ever
encountered. Many thanks.


Most of those characters are hard to see.

Represent arbitrary characters in a string in hex: "\x00\x01\x02" or
with chr(n).

If you just want to remove some characters, look into "".translate().

nullxlate = "".join([chr(n) for n in xrange(256)])
delchars = nullxlate[:31] + chr(124)
outputstr = inputstr.translate(nullxlate, delchars)
__________________________________________________ ______________________
TonyN.:' *firstname*nlsnews@georgea*lastname*.com
' <http://www.georgeanelson.com/>
Oct 17 '05 #2

P: n/a
David Pratt wrote:
I am working with a text format that advises to strip any ascii control
characters (0 - 30) as part of parsing data and also the ascii pipe
character (124) from the data. I think many of these characters are
from a different time. Since I have never seen most of these characters
in text I am not sure how these first 30 control characters are all
represented (other than say tab (\t), newline(\n), line return(\r) ) so
what should I do to remove these characters if they are ever
encountered. Many thanks.


Use ''.translate. Pass in the identity mapping for the first argument,
and for the second parameter, specify the list of all the characters you
wish to delete. This would probably be something like:

IDENTITY_MAP = ''.join([chr(x) for x in range(256)])
BAD_MAP = ''.join([chr(x) for x in range(32) + [124])

aNewString = aString.translate(IDENTITY_MAP, BAD_MAP)

Note that ASCII 31 is also a control character (US).

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
The believer is happy; the doubter is wise.
-- (an Hungarian proverb)
Oct 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.