By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,621 Members | 1,029 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,621 IT Pros & Developers. It's quick & easy.

Unicode strings, struct, and files

P: n/a
I am building a file with the help of the struct module.

I would like to be able to put Unicode strings into this file, but I'm
not sure how to do it.

The format I'm trying to write is basically this C structure:

struct MyFile
{
int magic;
int flags;
short otherFlags;
char pad[22];

wchar_t line1[32];
wchar_t line2[32];

// ... other data which is easy. :)
};

(I'm writing data on a PC to be read on a big-endian machine.)

So I can write the four leading members with the output of
struct.pack('>IIH22x', magic, flags, otherFlags). Unfortunately I
can't figure out how to write the unicode strings, since:

message = unicode('Hello, world')
myFile.write(message)

results in 'message' being converted back to a string before being
written. Is the way to do this to do something hideous like this:

for c in message:
myFile.write(struct.pack('>H', ord(unicode(c))))

?

Thanks from a unicode n00b,
-tom!
Oct 9 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a

Tom Plunket wrote:
I am building a file with the help of the struct module.

I would like to be able to put Unicode strings into this file, but I'm
not sure how to do it.

The format I'm trying to write is basically this C structure:

struct MyFile
{
int magic;
int flags;
short otherFlags;
char pad[22];

wchar_t line1[32];
wchar_t line2[32];

// ... other data which is easy. :)
};

(I'm writing data on a PC to be read on a big-endian machine.)

So I can write the four leading members with the output of
struct.pack('>IIH22x', magic, flags, otherFlags). Unfortunately I
can't figure out how to write the unicode strings, since:

message = unicode('Hello, world')
myFile.write(message)

results in 'message' being converted back to a string before being
written. Is the way to do this to do something hideous like this:

for c in message:
myFile.write(struct.pack('>H', ord(unicode(c))))

?
I'd suggest UTF-encoding it as a string, using the encoding that
matches whatever wchar means on the target machine, for example
assuming bigendian and sizeof(wchar) == 2:

utf_line1 = unicode_line1.encode('utf_16_be')
etc
struct.pack(">.........64s64s", ......, utf_line1, utf_line2)
Presumes (1) you have already checked that you don't have more than 32
characters in each "line" (2) padding with unichr(0) is acceptable.

HTH,
John

Oct 9 '06 #2

P: n/a
John Machin wrote:
message = unicode('Hello, world')
myFile.write(message)

results in 'message' being converted back to a string before being
written. Is the way to do this to do something hideous like this:

for c in message:
myFile.write(struct.pack('>H', ord(unicode(c))))

I'd suggest UTF-encoding it as a string, using the encoding that
matches whatever wchar means on the target machine, for example
assuming bigendian and sizeof(wchar) == 2:
Ahh, this is the info that my trawling through the documentation
didn't let me find!

Thanks a bunch.
utf_line1 = unicode_line1.encode('utf_16_be')
etc
struct.pack(">.........64s64s", ......, utf_line1, utf_line2)
Presumes (1) you have already checked that you don't have more than 32
characters in each "line" (2) padding with unichr(0) is acceptable.
This works frighteningly well. ;)
-tom!
Oct 9 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.