Hi,
I've got the string "Gratis øl",or in english:"Free beer", I know there
is no such thing but...
Python 2.3 (#1, Aug 1 2003, 15:23:03)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information. unicode("Gratis øl","iso-8859-1")
u'Gratis \xf8l'ord("\xf8")
248
What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?
(not that I have figgured out how to do that right either)
regards
/rune
*) I need to communicate with a telnet interface that only accepts
accented characters as unicode decimals 9 3474
Rune Hansen wrote: unicode("Gratis øl","iso-8859-1") u'Gratis \xf8l'ord("\xf8") 248
What I need is the converted string to read u'Gratis \248l' (* How do I do this without going through each and every character of the string?
How about
#v+ print unicode('Gratis øl', 'iso-8859-1').encode('utf-8')
Gratis øl
#v-
// Klaus
--<> unselfish actions pay back better
Rune Hansen <ru*********@viventus.no> writes: What I need is the converted string to read u'Gratis \248l' (* How do I do this without going through each and every character of the string?
You can register an error callback, like this:
import codecs
def decimal_escape(exc):
try:
data = exc.object
res = u""
for i in range(exc.start, exc.end):
char = ord(data[i])
if char < 1000:
res += u"\\%03d" % char
else:
# Unsupported character
raise exc
return res, exc.end
except:
raise exc
codecs.register_error("decimal-escape", decimal_escape)
print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")
Notice That your specification is a bit unclear as to what to do with
characters > 1000; I assume they are not supported in your protocol.
Regards,
Martin
Hi , yes, of course *blush*.
Thanks
/regards
/rune
Klaus Alexander Seistrup wrote: Rune Hansen wrote:
>unicode("Gratis øl","iso-8859-1")
u'Gratis \xf8l'
>ord("\xf8")
248
What I need is the converted string to read u'Gratis \248l' (* How do I do this without going through each and every character of the string?
How about
#v+
print unicode('Gratis øl', 'iso-8859-1').encode('utf-8')
Gratis øl
#v-
// Klaus
Hi,
The tip from Klaus "solved" my problem for the time beeing, but your
snipplet definitively goes into my "tool chest"
thanks
regards
/rune
Martin v. Löwis wrote: Rune Hansen <ru*********@viventus.no> writes:
What I need is the converted string to read u'Gratis \248l' (* How do I do this without going through each and every character of the string?
You can register an error callback, like this:
import codecs
def decimal_escape(exc): try: data = exc.object res = u"" for i in range(exc.start, exc.end): char = ord(data[i]) if char < 1000: res += u"\\%03d" % char else: # Unsupported character raise exc
return res, exc.end except: raise exc
codecs.register_error("decimal-escape", decimal_escape)
print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")
Notice That your specification is a bit unclear as to what to do with characters > 1000; I assume they are not supported in your protocol.
Regards, Martin
Rune Hansen wrote: >>> unicode("Gratis øl","iso-8859-1") u'Gratis \xf8l' >>>ord("\xf8") 248
What I need is the converted string to read u'Gratis \248l' (* How do I do this without going through each and every character of the string? (not that I have figgured out how to do that right either)
I see your problem is already solved, just want to add that normally (read:
C and Python) the backslash notation is base 8 not base 10. ord("\248")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: ord() expected a character, but string of length 2 found oct(248)
'0370' ord("\370")
248
Peter
Peter Otten <__*******@web.de> writes: I see your problem is already solved
I'm quite uncertain as to what the solution is, though - or perhaps
what the problem is. The OP said that the telnet server expects
backslash characters in the data stream (atleast I interpreted his
message that way), but then he was happy with an approach that did not
send backslash characters.
Perhaps the telnet server really expects latin-1, in which case
encoding the Unicode string in that encoding, or in iso-8859-15,
would work fine most of the time.
Regards,
Martin
Hi Martin, you raise a very interesting question(at least for me it is :-).
The, otherwise, excelent support people at Stalker (it's a CommuniGate
pro server I'm speaking to) has me totally confused.
What I tried to do was to send a quoteattr(u'string') (using quoteattr
from sax). The server was very happy with this, showing the accented
characters, when viewed in a telnet session, as human readable text.
This, it became apparent, was obviously wrong. The data was garbled when
viewing it in the web interface.
Stalker told me to send the letter "ø" as \248 or as xf8 (notice the
missing "\"). At this point I'm sending
quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is
neither of the above.(..?).
Anyway, the server is still happy, and the data views correctly in the
web interface.
Stalker provides a perl and java API for the telnet server. I don't
read perl code very well, and the java API is distributed as .class
files(nothing new there, it's java after all) so I really don't know how
Stalker is handling it.
regards
/rune
Martin v. Löwis wrote: Peter Otten <__*******@web.de> writes:
I see your problem is already solved
I'm quite uncertain as to what the solution is, though - or perhaps what the problem is. The OP said that the telnet server expects backslash characters in the data stream (atleast I interpreted his message that way), but then he was happy with an approach that did not send backslash characters.
Perhaps the telnet server really expects latin-1, in which case encoding the Unicode string in that encoding, or in iso-8859-15, would work fine most of the time.
Regards, Martin
Rune Hansen wrote: Stalker told me to send the letter "ø" as \248 or as xf8 (notice the missing "\"). At this point I'm sending quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is neither of the above.(..?).
Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.
Anyway, the server is still happy, and the data views correctly in the web interface.
It is relatively easy to recognize UTF-8 in the input; it is unlikely
that "real" data look like UTF-8 by coincidence (unlike \-escaping
or x-escaping). So it might be that the server studies the input to
guess the encoding. This is bad style, of course - the protocol should
be clear about encodings (this protocol couldn't be published in an
IETF RFC).
Stalker provides a perl and java API for the telnet server. I don't read perl code very well, and the java API is distributed as .class files(nothing new there, it's java after all) so I really don't know how Stalker is handling it.
Even then, you could only find out what the perl and java clients do -
you couldn't tell, from that, what other options the server might support.
Regards,
Martin
Martin v. Löwis wrote: Correct: UTF-8 works differently. I find it surprising that anybody actually proposes to send non-ASCII characters using xHH, as this byte sequence my coincidently happen in ASCII text as well.
unless they expect you to send "x" as "x78", of course.
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Bunyip Bluegum |
last post by:
I have a text field in a form which I need to check to see that only a
4-digit integer has been entered. The field has MAXLENGTH=4 and I'm
using this to check for length:
function...
|
by: S. |
last post by:
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?
is there a mathematical formula to calculate a unicode...
|
by: Andres A. |
last post by:
I have bunch of unicode characters stored as Decimal
is there a easy way of displaying unicode from Decimal numbers or do i have
to convert the decimal to hex then display the hex?
i ran into a...
|
by: Scott David Daniels |
last post by:
In reading over the source for CPython's PyUnicode_EncodeDecimal,
I see a dance to handle characters which are neither dec-equiv nor
in Latin-1. Does anyone know about the intent of such a...
|
by: Csaba Gabor |
last post by:
If I do alert(encodeURI(String.fromCharCode(250)));
(in FF 1.5+ or IE6 on my winXP Pro) then I get: %C3%BA
Now I was sort of expecting something like %u... (and a single (4
digit?) unicode hex...
|
by: Bill Nguyen |
last post by:
I'm getting data from a mySQL database (default char set = UTF-8).
I need to display data in Unicode but got only mongolian characters like
this: Phạm Thị Ngọc
I changed the textbox font to...
|
by: abhi147 |
last post by:
Hi ,
I want to convert an array of bytes like :
{79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3}
into Unicode character with ISO-8859-1 standard.
Can anyone help me .. how should...
|
by: Candace |
last post by:
I am using the following code to pick off each digit of a number, from right
to left. The number I am working with is 84357. So for the first iteration it
should return the number 7 and for the...
|
by: =?Utf-8?B?R2Vvcmdl?= |
last post by:
Hello everyone,
Wide character and multi-byte character are two popular encoding schemes on
Windows. And wide character is using unicode encoding scheme. But each time I
feel confused when...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: giovanniandrean |
last post by:
The energy model is structured as follows and uses excel sheets to give input data:
1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM)
Please note that the UK and Europe revert to winter time on...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
|
by: GKJR |
last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
|
by: SueHopson |
last post by:
Hi All,
I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...
| |