473,804 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

python tr equivalent (non-ascii)

Hi,
I was wondering how I ought to be handling character range
translations in python.

What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:

tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;

and I think the string.translat e method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:

my_test_string = u"$B#A#B#C#D#E #F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uf f00"),ord(u"\uf f5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0 020"),ord(u"\u0 07e"))])

then use these as input to maketrans:
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)

but it generates an encoding error... and if I encodethe ranges in
utf8 before passing them on I get a length error because maketrans is
counting bytes not characters and utf8 is variable width...
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange.encode("ut f8"),t_range.en code("utf8")))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: maketrans arguments must have same length
Aug 13 '08 #1
3 3202
On Aug 13, 5:18 pm, kettle <Josef.Robert.N o...@gmail.comw rote:
Hi,
I was wondering how I ought to be handling character range
translations in python.

What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:

tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;

and I think the string.translat e method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:

my_test_string = u"$B#A#B#C#D#E #F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uf f00"),ord(u"\uf f5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0 020"),ord(u"\u0 07e"))])

then use these as input to maketrans:
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)

but it generates an encoding error... and if I encodethe ranges in
utf8 before passing them on I get a length error because maketrans is
counting bytes not characters and utf8 is variable width...
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange.encode("ut f8"),t_range.en code("utf8")))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: maketrans arguments must have same length
Ok so I guess I was barking up the wrong tree. Searching for python $BA43Q(B
$B!!H>3Q(B quickly brought up a solution:
>>>import unicodedata
my_test_stri ng=u"$B%U%,%[%2(B-%*@A$B#B#C!]!s!v!w#1#2(B3"
print unicodedata.nor malize('NFKC', my_test_string. decode("utf8"))
$B%U%,%[%2(B-%*@ABC-%*@123
>>>
still, it would be nice if there was a more general solution, or if
maketrans actually looked at chars instead of bytes methinks.
Aug 13 '08 #2
kettle wrote:
I was wondering how I ought to be handling character range
translations in python.

What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:

tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;

and I think the string.translat e method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:

my_test_string = u"$B#A#B#C#D#E #F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uf f00"),ord(u"\uf f5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0 020"),ord(u"\u0 07e"))])

then use these as input to maketrans:
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)
maketrans only works for byte strings.

as for translate itself, it has different signatures for byte strings
and unicode strings; in the former case, it takes lookup table
represented as a 256-byte string (e.g. created by maketrans), in the
latter case, it takes a dictionary mapping from ordinals to ordinals or
unicode strings.

something like

lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))

new_string = old_string.tran slate(lut)

could work (untested).

</F>

Aug 13 '08 #3
On Aug 13, 5:33 pm, Fredrik Lundh <fred...@python ware.comwrote:
kettle wrote:
I was wondering how I ought to be handling character range
translations in python.
What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:
tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
and I think the string.translat e method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:
my_test_string = u"$B#A#B#C#D#E #F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uf f00"),ord(u"\uf f5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0 020"),ord(u"\u0 07e"))])
then use these as input to maketrans:
my_trans_string =
my_test_string. translate(strin g.maketrans(f_r ange,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeEr ror: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)

maketrans only works for byte strings.

as for translate itself, it has different signatures for byte strings
and unicode strings; in the former case, it takes lookup table
represented as a 256-byte string (e.g. created by maketrans), in the
latter case, it takes a dictionary mapping from ordinals to ordinals or
unicode strings.

something like

lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))

new_string = old_string.tran slate(lut)

could work (untested).

</F>
excellent. i didnt realize from the docs that i could do that. thanks
Aug 13 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1786
by: Robert Oschler | last post by:
What is the Python equivalent of the C# StringCollection container? I'm trying to port some C# code over to Python. Thanks. Robert
15
1727
by: Dan | last post by:
Is there a python equivalent of this trick in C? Logic_Test ? True_Result : False_Result Example: printf( "you have %i %s", num_eggs, num_eggs > 1 ? "eggs" : "egg" );
17
3551
by: Just | last post by:
While googling for a non-linear equation solver, I found Math::Polynomial::Solve in CPAN. It seems a great little module, except it's not Python... I'm especially looking for its poly_root() functionality (which solves arbitrary polynomials). Does anyone know of a Python module/package that implements that? Just
0
1226
by: malachi | last post by:
Is there an python equivalent to perl IDENT module, specifically the client functionality ? TIA
2
3010
by: mirandacascade | last post by:
Situation is this: 1) must write application that does the following: a) creates an xml document, the contents of which, is a request transaction b) send xml document to destination; I am assuming that a process at destination side processes the request and sends back a response c) the application I'm writing must receive response and then examine contents of response 2) hope to write client application in python 3) was provided VB code...
6
1498
by: AndyL | last post by:
Hi, What would by a python equivalent of following shell program: #!/bin/sh prog1 > file1 & prog2 > file2 &
2
2900
by: The Pythonista | last post by:
Yesterday, I was hacking around a bit, trying to figure out how to implement the semantics of call/cc in Python. Specifically, I wanted to translate this Scheme code to equivalent Python: #### (define theContinuation #f) (define (test) (let ((i 0))
22
7927
by: Kurien Mathew | last post by:
Hello, Any suggestions on a good python equivalent for the following C code: while (loopCondition) { if (condition1) goto next; if (condition2) goto next;
0
947
by: Jean-Paul Calderone | last post by:
On Sat, 16 Aug 2008 23:20:52 +0200, Kurien Mathew <kmathew@envivio.frwrote: Goto isn't providing much value over `if´ in this example. Python has a pretty serviceable `if´ too: while loopCondition: if not (condition1 or condition2 or condition3): stmt1 stmt2 stmt3
0
1603
by: ivandatasync | last post by:
I have read about both Plone and Alfresco being considered as alternatives to Sharepoint and unfortunately they may not be enough if you require everything Sharepoint has too offer. Plone and Alfresco are both great applications but out of the box they are too focused to be complete replacements. Sharepoint is quite the Monolithic beast when it comes to both features and complexity. I have done some Sharepoint development and customization...
0
9714
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10600
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10350
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10096
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7638
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6866
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5534
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4311
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.