472,805 Members | 1,973 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,805 software developers and data experts.

Wanted: python script to convert to/from UTF-8 to/from XML Entities

Does someone have a little python script that will read a file in
UTF-8/UTF-16/UTF-32 (my choice) and search for all the characters between
0x7f-0xffffff and convert them to an ASCII digit string that begins with
"&#" and ends with ";" and output the whole thing? If not, could someone
tell me how to write one?

How about a script to do the inverse?

Thanks!
siegfried
Aug 30 '08 #1
1 1812
Siegfried Heintze wrote:
Does someone have a little python script that will read a file in
UTF-8/UTF-16/UTF-32 (my choice) and search for all the characters between
0x7f-0xffffff and convert them to an ASCII digit string that begins with
"&#" and ends with ";" and output the whole thing? If not, could someone
tell me how to write one?
file = open("filename.txt", "rb")
text = file.read()
text = unicode(text, "utf-8")
text = text.encode("ascii", "xmlcharrefreplace")
print text

tweak as necessary.

</F>

Aug 30 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Nuff Said | last post by:
When I type the following code in the interactive python shell, I get 'UTF-8'; but if I put the code into a Python script and run the script - in the same terminal on my Linux box in which I...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...
3
by: Jared Wiltshire | last post by:
I'm trying to convert a wstring (actually a BSTR) to UTF-8. This is what I've currently got: size_t arraySize; setlocale(LC_CTYPE,"C-UTF-8"); arraySize = wcstombs(NULL, wstr, 0); char...
8
by: sonald | last post by:
Hi, I am using python2.4.1 I need to pass russian text into python and validate the same. Can u plz guide me on how to make my existing code support the russian text. Is there any module...
9
by: thijs.braem | last post by:
Hi everyone, I'm having quite some troubles trying to convert Unicode to String (for use in psycopg, which apparently doesn't know how to cope with unicode strings). The error I keep having...
4
by: kettle | last post by:
Hi, I am rather new to python, and am currently struggling with some encoding issues. I have some utf-8-encoded text which I need to encode as iso-2022-jp before sending it out to the world. I am...
20
by: weheh | last post by:
Dear web gods: After much, much, much struggle with unicode, many an hour reading all the examples online, coding them, testing them, ripping them apart and putting them back together, I am...
8
by: michael | last post by:
Hi all A client of mine is having a problem with their site and when I looked into the SQL database, I found that most text fields have been altered and appended with script...
6
by: gita ziabari | last post by:
Hello All, The following code does not work for unicode characters: keyword = dict() kw = 'генских' keyword.setdefault(key, ).append (kw) It works fine for inserting ASCII character. Any...
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.