unicode 3 digit decimal conversion

Rune Hansen

Hi,
I've got the string "Gratis øl",or in english:"Free beer", I know there
is no such thing but...

Python 2.3 (#1, Aug 1 2003, 15:23:03)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

unicode("Gratis øl","iso-8859-1") u'Gratis \xf8l'ord("\xf8")

248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?
(not that I have figgured out how to do that right either)

regards

/rune
*) I need to communicate with a telnet interface that only accepts
accented characters as unicode decimals

Jul 18 '05 #1

Subscribe Post Reply

3492

Klaus Alexander Seistrup

Rune Hansen wrote:

unicode("Gratis øl","iso-8859-1") u'Gratis \xf8l'ord("\xf8") 248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character
of the string?
How about

#v+

print unicode('Gratis øl', 'iso-8859-1').encode('utf-8') Gratis Ã¸l

#v-
// Klaus

--<> unselfish actions pay back better

Jul 18 '05 #2

Martin v. Löwis

Rune Hansen <ru*********@viventus.no> writes:

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?

You can register an error callback, like this:

import codecs

def decimal_escape(exc):
try:
data = exc.object
res = u""
for i in range(exc.start, exc.end):
char = ord(data[i])
if char < 1000:
res += u"\\%03d" % char
else:
# Unsupported character
raise exc

return res, exc.end
except:
raise exc

codecs.register_error("decimal-escape", decimal_escape)

print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")

Notice That your specification is a bit unclear as to what to do with
characters > 1000; I assume they are not supported in your protocol.

Regards,
Martin

Jul 18 '05 #3

Rune Hansen

Hi , yes, of course *blush*.
Thanks

/regards

/rune

Klaus Alexander Seistrup wrote:

Rune Hansen wrote:

>unicode("Gratis øl","iso-8859-1")

u'Gratis \xf8l'
>ord("\xf8")

248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character
of the string?

How about

#v+

print unicode('Gratis øl', 'iso-8859-1').encode('utf-8')

Gratis Ã¸l
#v-
// Klaus

Jul 18 '05 #4

Rune Hansen

Hi,
The tip from Klaus "solved" my problem for the time beeing, but your
snipplet definitively goes into my "tool chest"

thanks

regards

/rune

Martin v. Löwis wrote:

Rune Hansen <ru*********@viventus.no> writes:

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?

You can register an error callback, like this:

import codecs

def decimal_escape(exc):
try:
data = exc.object
res = u""
for i in range(exc.start, exc.end):
char = ord(data[i])
if char < 1000:
res += u"\\%03d" % char
else:
# Unsupported character
raise exc

return res, exc.end
except:
raise exc

codecs.register_error("decimal-escape", decimal_escape)

print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")

Notice That your specification is a bit unclear as to what to do with
characters > 1000; I assume they are not supported in your protocol.

Regards,
Martin

Jul 18 '05 #5

Peter Otten

Rune Hansen wrote:

>>> unicode("Gratis øl","iso-8859-1") u'Gratis \xf8l' >>>ord("\xf8") 248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?
(not that I have figgured out how to do that right either)

I see your problem is already solved, just want to add that normally (read:
C and Python) the backslash notation is base 8 not base 10.

ord("\248") Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: ord() expected a character, but string of length 2 found oct(248) '0370' ord("\370") 248

Peter

Jul 18 '05 #6

Martin v. Löwis

Peter Otten <__*******@web.de> writes:

I see your problem is already solved

I'm quite uncertain as to what the solution is, though - or perhaps
what the problem is. The OP said that the telnet server expects
backslash characters in the data stream (atleast I interpreted his
message that way), but then he was happy with an approach that did not
send backslash characters.

Perhaps the telnet server really expects latin-1, in which case
encoding the Unicode string in that encoding, or in iso-8859-15,
would work fine most of the time.

Regards,
Martin

Jul 18 '05 #7

Rune Hansen

Hi Martin, you raise a very interesting question(at least for me it is :-).
The, otherwise, excelent support people at Stalker (it's a CommuniGate
pro server I'm speaking to) has me totally confused.

What I tried to do was to send a quoteattr(u'string') (using quoteattr
from sax). The server was very happy with this, showing the accented
characters, when viewed in a telnet session, as human readable text.
This, it became apparent, was obviously wrong. The data was garbled when
viewing it in the web interface.

Stalker told me to send the letter "ø" as \248 or as xf8 (notice the
missing "\"). At this point I'm sending
quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is
neither of the above.(..?).
Anyway, the server is still happy, and the data views correctly in the
web interface.

Stalker provides a perl and java API for the telnet server. I don't
read perl code very well, and the java API is distributed as .class
files(nothing new there, it's java after all) so I really don't know how
Stalker is handling it.

regards

/rune

Martin v. Löwis wrote:

Peter Otten <__*******@web.de> writes:

I see your problem is already solved

I'm quite uncertain as to what the solution is, though - or perhaps
what the problem is. The OP said that the telnet server expects
backslash characters in the data stream (atleast I interpreted his
message that way), but then he was happy with an approach that did not
send backslash characters.

Perhaps the telnet server really expects latin-1, in which case
encoding the Unicode string in that encoding, or in iso-8859-15,
would work fine most of the time.

Regards,
Martin

Jul 18 '05 #8

Martin v. Löwis

Rune Hansen wrote:

Stalker told me to send the letter "ø" as \248 or as xf8 (notice the
missing "\"). At this point I'm sending
quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is
neither of the above.(..?).
Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.
Anyway, the server is still happy, and the data views correctly in the
web interface.
It is relatively easy to recognize UTF-8 in the input; it is unlikely
that "real" data look like UTF-8 by coincidence (unlike \-escaping
or x-escaping). So it might be that the server studies the input to
guess the encoding. This is bad style, of course - the protocol should
be clear about encodings (this protocol couldn't be published in an
IETF RFC).
Stalker provides a perl and java API for the telnet server. I don't
read perl code very well, and the java API is distributed as .class
files(nothing new there, it's java after all) so I really don't know how
Stalker is handling it.

Even then, you could only find out what the perl and java clients do -
you couldn't tell, from that, what other options the server might support.

Regards,
Martin

Jul 18 '05 #9

Fredrik Lundh

Martin v. Löwis wrote:

Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.

unless they expect you to send "x" as "x78", of course.

</F>

Jul 18 '05 #10

Similar topics

Validate form for 4-digit integer

by: Bunyip Bluegum | last post by:

I have a text field in a form which I need to check to see that only a 4-digit integer has been entered. The field has MAXLENGTH=4 and I'm using this to check for length: function...

Javascript

sgml vs unicode notation

by: S. | last post by:

if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode...

HTML / CSS

silliest \u question (decimal to unicode)

by: Andres A. | last post by:

I have bunch of unicode characters stored as Decimal is there a easy way of displaying unicode from Decimal numbers or do i have to convert the decimal to hex then display the hex? i ran into a...

C# / C Sharp

Python & Unicode decimal interpretation

by: Scott David Daniels | last post by:

In reading over the source for CPython's PyUnicode_EncodeDecimal, I see a dance to handle characters which are neither dec-equiv nor in Latin-1. Does anyone know about the intent of such a...

Python

encodeURI and unicode

by: Csaba Gabor | last post by:

If I do alert(encodeURI(String.fromCharCode(250))); (in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA Now I was sort of expecting something like %u... (and a single (4 digit?) unicode hex...

Javascript

Display Unicode characters on Winforms

by: Bill Nguyen | last post by:

I'm getting data from a mySQL database (default char set = UTF-8). I need to display data in Unicode but got only mongolian characters like this: Phạm Thị Ngọc I changed the textbox font to...

Visual Basic .NET

Array of Bytes to Unicode chars (ISO-8859-1)

by: abhi147 | last post by:

Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should...

C / C++

picking off each digit of an integer

by: Candace | last post by:

I am using the following code to pick off each digit of a number, from right to left. The number I am working with is 84357. So for the first iteration it should return the number 7 and for the...

Visual Basic .NET

wide character (unicode) and multi-byte character

by: =?Utf-8?B?R2Vvcmdl?= | last post by:

Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when...

.NET Framework

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA