473,407 Members | 2,598 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Replacement in unicodestrings?

KvS
Dear all,

could somebody please just put an end to the unicode mysery I'm in,
men... The situation is that I have a Tkinter program that let's the
user enter data in some Entries and this data needs to be transformed
to the encoding compatible with an .rtf-file. In fact I only need to
do some of the usual symbols like ë etc.

Here's the function that I am using:

def pythonUnicodeToRTFAscii(self,s):
if isinstance(s,str):
return s
s_str=repr(s.encode('UTF-8'))
replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\x c3\xa1':"\
\'e1",
'\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':" \
\'e9",
'\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':" \
\'f3",
'\xe2\x82\xac':"\\'80"}
for k in replDic.keys():
if repr(k) in s_str:
s_str=s_str.replace(repr(k),replDic[k])
return s_str

So replDic represents the mapping from one encoding to the other. Now,
if I enter e.g. 'Arjën' in the Entry, then s_str in the above function
becomes 'Arj\xc3\xabn' and since replDic contains the key \xc3\xab I
would expect the replacement in the final lines of the function to
kick in. This however doesn't happen, there's no match.

However interactive:
>>'\xc3\xab' in 'Arj\xc3\xabn'
True

I just don't get it, what's the difference? Is the above anyhow the
best way to attack such a problem?

Thanks & best wishes, Kees
Oct 5 '08 #1
1 1391
s_str=repr(s.encode('UTF-8'))

It would be easier to encode this in cp1252 here, as this is apparently
the encoding that you want to use in the RTF file, too. You could then
loop over the string, replacing all bytes >= 128 with \\'%.2x

As yet another alternative, you could create a Unicode error handler
(call it 'rtf'), and then do

return s.encode('ascii', errors='rtf')
replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\x c3\xa1':"\
\'e1",
'\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':" \
\'e9",
'\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':" \
\'f3",
'\xe2\x82\xac':"\\'80"}
for k in replDic.keys():
if repr(k) in s_str:
s_str=s_str.replace(repr(k),replDic[k])
return s_str

However interactive:
>>>'\xc3\xab' in 'Arj\xc3\xabn'
True

I just don't get it, what's the difference?
It's the repr():

py'\xc3\xab' in 'Arj\xc3\xabn'
True
pyrepr('\xc3\xab') in repr('Arj\xc3\xabn')
False
pyrepr('\xc3\xab')
"'\\xc3\\xab'"
pyrepr('Arj\xc3\xabn')
"'Arj\\xc3\\xabn'"

repr('\xc3\xab') starts with an apostrophe, which doesn't
appear before the \\xc3 in repr('Arj\xc3\xabn').

HTH,
Martin
Oct 5 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Paul Miller | last post by:
We've run into minidom's inabilty to handle large (20+MB) XML files, and need a replacement that can handle it. Unfortunately, we're pretty dependent on a DOM, so a pulldom or SAX replacement is...
53
by: Kerberos | last post by:
I followed Dan Cederholm's image replacement tutorial, to replace a header tag by a logo. The h1 is clickable if no CSS is applied but it I replace it by the logo, the area isn't clickable anymore...
3
by: Vibha Tripathi | last post by:
Hi Folks, I put a Regular Expression question on this list a couple days ago. I would like to rephrase my question as below: In the Python re.sub(regex, replacement, subject)...
2
by: brian | last post by:
Hi, before coming to .NET, I utilized regular expressions mostly in JScript / JavaScript and also in my favorite text editor: TextPad (www.textpad.com) I don't know about JScript/JavaScript, but...
20
by: Paul D. Boyle | last post by:
Hi all, There was a recent thread in this group which talked about the shortcomings of fgets(). I decided to try my hand at writing a replacement for fgets() using fgetc() and realloc() to read...
3
by: Jeffrey D. Gordon | last post by:
I'm wanting to replace Field Values in an existing PDF, I've done this with PHP by doing a replace in the file. I've been able to read the file in a byte array in c# but all my attempts to...
3
by: chris | last post by:
Hallo, I am in need of a replacement for the Microsoft Visual Studio .NET. The reason is quiet simple. I develop forms which are used on different microsoft windows platform, and one...
1
by: lawrence k | last post by:
Want to replace the limit clause in a query, but can't get it right. What's wrong with this: $pattern = "(.*)limit (.*)"; $replacement = '$1'; $replacement .= "LIMIT $limit"; $replacement .=...
3
by: =?Utf-8?B?RHVrZSAoQU4yNDcp?= | last post by:
I've added a web deployment project and want to use the config section replacement but I'm obviously not understanding something. I have set up an alternate appSettings file,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.