473,394 Members | 1,761 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

A 'raw' codec for binary "strings" in Python?

I've encountered an issue dealing with strings read from files. I
read a line from a file, then try to print it out as an ASCII string:

line = fp.readline()
print line.encode('US-ASCII', 'replace')

and of course I get an error like:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 1: ordinal not in range(128)

because the file contained some binary character. You'll notice that
the problem is in *decoding* the string, not in re-encoding it,
because I'm using the default "C" locale, and "US-ASCII" is presumed
for strings. But these strings are *not* US-ASCII, they are raw
bytes. How do I format a string of raw bytes for conversion to a
recognized charset encoding for printing?

There seems to be no 'raw' codec that would capture this. There's no
way of setting an attribute on a file to express this. It looks like
the best I can do is

print string.join([(((ord(x) > 0 and ord(x) < 0x7F) and x) or (r"\x%02x" % ord(x))) for x in line], '')

which seems extremely inefficient.

Bill

Jul 18 '05 #1
2 3090
Bill Janssen wrote:
You'll notice that
the problem is in *decoding* the string, not in re-encoding it,
because I'm using the default "C" locale, and "US-ASCII" is presumed
for strings. But these strings are *not* US-ASCII, they are raw
bytes. How do I format a string of raw bytes for conversion to a
recognized charset encoding for printing?


Since the default encoding is ASCII, those 8-bit octets have no meaning
unless you do an explicit conversion. Trying to print them _should_
raise an error, because you're trying to do something that doesn't make
sense.

As Gerrit pointed out, it sounds like what you want is repr.

--
__ Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
/ \ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
\__/ Liberty is the right to do whatever the law permits.
-- Charles Louis Montesquieu
Jul 18 '05 #2
Bill Janssen <ja*****@parc.com> writes:
I've encountered an issue dealing with strings read from files. I
read a line from a file, then try to print it out as an ASCII string:

line = fp.readline()
print line.encode('US-ASCII', 'replace')

and of course I get an error like:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 1: ordinal not in range(128)

because the file contained some binary character. You'll notice that
the problem is in *decoding* the string, not in re-encoding it,
because I'm using the default "C" locale, and "US-ASCII" is presumed
for strings.
Actually, the "C" locale has precisely nothing to do with it.
But these strings are *not* US-ASCII, they are raw bytes. How do I
format a string of raw bytes for conversion to a recognized charset
encoding for printing?


You don't?

Wouldn't

def m(c):
if c in string.printable:
return c
else:
return '?'

t = ''.join([m(chr(o)) for o in range(m)])

line.translate(t)

make more sense?

Cheers,
mwh

--
I like silliness in a MP skit, but not in my APIs. :-)
-- Guido van Rossum, python-dev
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Bengt Richter | last post by:
Why wouldn't quote-stuffing solve the problem, and let you treat \ as an ordinary character? In a raw string, it's no good for preventing end-of-quoting anyway, unless you want the literal \ in...
9
by: cjl | last post by:
Hey all: I am working on a little script that needs to pull the strings out of a binary file, and then manipulate them with python. The command line utility "strings" (part of binutils) has...
81
by: Matt | last post by:
I have 2 questions: 1. strlen returns an unsigned (size_t) quantity. Why is an unsigned value more approprate than a signed value? Why is unsighned value less appropriate? 2. Would there...
7
by: SunRise | last post by:
Hi I am creating a C Program , to extract only-Printable-characters from a file ( any type of file) and display them. OS: Windows-XP Ple help me to fix the Errors & Warnings and explain...
2
by: ÕÔÁ¢ÈÊ | last post by:
I have a large number of Console.WriteLine() function to display debug information debug mode. Now,I want to save them to a text or display them in a text box.How Can I do it?? thank you for...
35
by: pinkfloydhomer | last post by:
How do I check if a string contains (can be converted to) an int? I want to do one thing if I am parsing and integer, and another if not. /David
2
by: John Nagle | last post by:
I'm trying to clean up a bad ASCII string, one read from a web page that is supposedly in the ASCII character set but has some characters above 127. And I get this: File...
8
by: Ulysse | last post by:
Hello, I need to clean the string like this : string = """ bonne mentalit&eacute; mec!:) \n <br>bon pour info moi je suis un serial posteur arceleur dictateur ^^* \n ...
5
by: Romano Giannetti | last post by:
Hi, while writing some LaTeX preprocessing code, I stumbled into this problem: (I have a -*- coding: utf-8 -*- line, obviously) s = ur"añado $\uparrow$" Which gave an error because the \u...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.