473,554 Members | 3,224 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Writing UTF-8 string to UNICODE file

I'm sure this is a very simple thing to do, once you know how to do it, but
I am having no fun at all trying to write utf-8 strings to a unicode file.
Does anyone have a couple of lines of code that
- opens a file appropriately for output
- writes to this file
Thanks very much.
Michael Weir
Jul 18 '05 #1
3 17599
Michael Weir wrote:

I'm sure this is a very simple thing to do, once you know how to do it, but
I am having no fun at all trying to write utf-8 strings to a unicode file.
Does anyone have a couple of lines of code that
- opens a file appropriately for output
- writes to this file


I can't give you an example, never having done this, but if you would post
a few lines of your own code which you thought would work, someone can probably
point out the error of your ways more easily than writing something from
scratch. (Of course, we'll shortly see a complete working solution from
someone anyway, but in general this is the better way to proceed with such
a problem.)

-Peter
Jul 18 '05 #2
Michael Weir wrote:
Does anyone have a couple of lines of code that
- opens a file appropriately for output
- writes to this file


Simplest way (IMHO), with python 2.3

#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
import codecs

f = codecs.open('my unicodefile.txt ', 'wt', 'utf-8')
for i in range(5):
for j in range(32, 300):
f.write(unichr( j))
f.write('\n')
f.close()
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3
"Michael Weir" <mw***@transres .com> wrote in message
news:4e******** *******@news.on .tac.net...
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file.
Does anyone have a couple of lines of code that
- opens a file appropriately for output
- writes to this file
Thanks very much.
Michael Weir


I don't quite understand, since you seem to be talking about "unicode" as if
it were a distinct encoding. Unicode is not an encoding, but a mapping of
numbers to meaningful symbolic representations (letters, numbers, whatever).
There's no such thing as a "unicode file", strictly speaking, because a file
is a byte stream and unicode has nothing to do with bytes. Of course,
loosely speaking, "unicode file" means "a file which uses one of those
byte-stream encodings by which any arbitrary subset of unicode code points
can be represented."

If you mean, "how do I encode a unicode string as utf-8", do like this:
u"I'm a unicode string in utf-8 encoding.".enco de('utf-8')

"I'm a unicode string in utf-8 encoding."

This serializes an ordered collection of unicode code points into a byte
stream, using the encoding method "utf-8". You want to write this byte
stream to a file? Go right ahead.

If you write a unicode string to something that wants a byte stream, I think
Python's internal representation of the unicode string object will get
serialized. (I'm not really sure what would happen, but it probably won't be
utf-8.) I doubt this is what you want. You have to encode the unicode
string first.

To avoid having to do explicit conversions for every unicode string you want
to write to a file, use codecs.open to open the file. This will wrap all
reads/writes in an encoder/decoder, and all reads will give you a unicode
string. However, I don't think you'll be able to write raw byte streams
anymore--even normal strings will be reencoded. Also, be sure not to
accidentally open the file using file() later--you'll be reading and writing
raw byte
streams, and will make a big mess of things.

Perhaps Python should have all "strings" be unicode strings, and make a
distinct "byte stream" type? This might make the "codepoint v.
representation" distinction cleaner and more explicit, and allow us to go
raw if we really want (although, mixing text and binary in a single file
isn't such a good idea). It'd also be incredibly messy to change things,
and less efficient if all you do is ascii text all day. Oh well.
--
Francis Avila
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2988
by: Brian Burgess | last post by:
Hi all, Anyone see anything wrong with the following: *************************************************************** <%@ LANGUAGE="VBSCRIPT" %> <% Option Explicit Dim nAcctNbr Dim nMsgStart
48
4586
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML...
23
10934
by: lawrence | last post by:
I'd love to ask why this page is not rendering correctly in Safari on a Macintosh but I suspect someone will tell me to validate the page first. Nevertheless, if anyone sees an obvious reason that I'm missing, I'd like to know. It looks like a missing div tag but I can't see one. http://www.krubner.com/ Let's move on to a question...
7
2125
by: Jan Wagner | last post by:
Hi, I'm running into a problem with php 4.1.1 on IIS (XP Pro version). For example writing a page test.php that starts with <?php session_start(); ?> <?php echo '<?xml version="1.0"'; ?> <?php echo ' encoding="utf-8" ?>'; ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
1
4167
by: comic_rage | last post by:
Hi, I am writing code with C# to generate xml schema, however, the following code generate a UTF-16 in the first line. This will create some problems. How can get the code/missing what part of the code in C# to generate a first line with UTF-8 instead of UTF-16? Thanks, public XmlSchema CreateSchemaRoot()
4
1746
by: Gary Bond | last post by:
Hi All, Can anybody point me to some 'how-to' documentation, tutorials, etc as to how to write a shrink/protect wrapper for .Net exes/dlls, (like the Shrinkwrap product for instance). I have got a couple of products nearly ready for sale, and have already come up with some routines to protect them, (in the style of the old TurboPower...
3
1916
by: Rob Nicholson | last post by:
I wasn't sure where to post this as there doesn't seem to be an obvious group to discuss this. I'm writing my own internet email parser mainly because a) I'm mad and b) I need to understand internet email standards in more detail so it's a good way to learn. One of the emails sent from my mobile telephone has encoded the "from" and...
2
2891
by: Elie Roux | last post by:
Hello, I would like to write a wide chars string with printf, but I do not really understand the behaviour I have with this basic test program for example : #include <stdlib.h> #include <stdio.h> #include <wchar.h> int main () {
10
19516
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email message and I cannot get it to work. I need to stay with classic asp for this. Here are some things I tried: 'CDONTS Call msg.SetLocaleIDs(65001)
9
2889
by: Szabolcs | last post by:
I am not familiar with the UTF-8 encoding, but I know that it encodes certain characters with up to four bytes. Is it safe to use UTF-8 encoded comments in C++ source files? For example, is there a remote possibility that some multi-byte character, when interpreted byte-by-byte, will contain */ and close the comment? Or is there something...
0
7570
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7493
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7854
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5411
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5133
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3537
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1992
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1107
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
808
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.