473,471 Members | 2,008 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Non-unicode strings & Python.

All:

Question

Python is currently Unicode Compliant.

What happens when strings are read in from text files that were
created using GB 2312-1980, or KPS 9566-2003, or other, equally
obscure code ranges?

The idea is to read text in the file format, and replace it with the
appropriate Unicode character,then write it out as a new text file.
[Trivial to program, but incredibly time consuming to actually code]

xan

jonathon
--
Goto http://graphology.meetup.com for information about International
Graphology Meetup Day
Jul 18 '05 #1
1 2241
Jonathon Blake wrote:
What happens when strings are read in from text files that were
created using GB 2312-1980, or KPS 9566-2003, or other, equally
obscure code ranges?
Python has two kinds of strings: byte strings, and Unicode strings.
If you read data from a file, you get byte strings - i.e. a sequence
of bytes representing literally the encoded contents of the file.
If you want Unicode strings, you need to use codecs.open.
The idea is to read text in the file format, and replace it with the
appropriate Unicode character,then write it out as a new text file.
[Trivial to program, but incredibly time consuming to actually code]


Not at all:

data = codecs.open(filename, "r", encoding="gb2312")
codecs.open(newfile, "w", encoding="utf-8").write(data)

assuming that by "appropriate Unicode character" you actually mean
"I want to write the file encoded as UTF-8".

Regards,
Martin
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: klaus triendl | last post by:
hi, recently i discovered a memory leak in our code; after some investigation i could reduce it to the following problem: return objects of functions are handled as temporary objects, hence...
3
by: Mario | last post by:
Hello, I couldn't find a solution to the following problem (tried google and dejanews), maybe I'm using the wrong keywords? Is there a way to open a file (a linux fifo pipe actually) in...
32
by: Adrian Herscu | last post by:
Hi all, In which circumstances it is appropriate to declare methods as non-virtual? Thanx, Adrian.
22
by: Steve - DND | last post by:
We're currently doing some tests to determine the performance of static vs non-static functions, and we're coming up with some odd(in our opinion) results. We used a very simple setup. One class...
8
by: Bern McCarty | last post by:
Is it at all possible to leverage mixed-mode assemblies from AppDomains other than the default AppDomain? Is there any means at all of doing this? Mixed-mode is incredibly convenient, but if I...
14
by: Patrick Kowalzick | last post by:
Dear all, I have an existing piece of code with a struct with some PODs. struct A { int x; int y; };
11
by: ypjofficial | last post by:
Hello All, So far I have been reading that in case of a polymorphic class ( having at least one virtual function in it), the virtual function call get resolved at run time and during that the...
2
by: Ian825 | last post by:
I need help writing a function for a program that is based upon the various operations of a matrix and I keep getting a "non-aggregate type" error. My guess is that I need to dereference my...
0
by: amitvps | last post by:
Secure Socket Layer is very important and useful for any web application but it brings some problems too with itself. Handling navigation between secure and non-secure pages is one of the cumbersome...
12
by: puzzlecracker | last post by:
is it even possible or/and there is a better alternative to accept input in a nonblocking manner?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.