473,408 Members | 1,871 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

convert string with raw binary data to unicode

Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_file','rb').read()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('path_to_file','rb','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?

regards,
Achim
Jul 18 '05 #1
3 4443
"Achim Domma" <do***@procoders.net> writes:
Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_file','rb').read()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('path_to_file','rb','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?


If I understand your problem correctly, you want to construct a unicode
object containing arbitrary data in it's internal buffer.

And if I understand Python's unicode implementation correctly, than I
would say it isn't possible - since unicode objects do not contain
binary data, they contain characters (or how is this called in the
unicode world?).

OTOH, it should be possible to write a small extension wrapping the
PyUnicode_FromUnicode() function to accept arbitrary data.

Is there also a possibility to write a codec which does this?

Note that the 'if's above are probably big 'if's...

Thomas
Jul 18 '05 #2
Achim Domma:
data = codecs.open('path_to_file','rb','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting found via google supposes to use UTF-16 but this is not the same and raises an error.


It is better to show the error message when sending queries to a news
group. You may want to look at the 'errors' argument which can be one of:

'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character
'xmlcharrefreplace' Replace with the appropriate XML character reference
'backslashreplace' Replace with backslashed escape sequences.

Take a look at the results after using, say, 'backslashreplace' and you
may find that much of your file is not UTF-16 or that it is byte swapped or
that there are just a few bad characters in a header or similar.

Neil
Jul 18 '05 #3
Achim Domma wrote:
Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_file','rb').read()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('path_to_file','rb','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?


Try utf-16-le or utf-16-be (depending on endianness of the data) as
encoding.
--
Sjoerd Mullender <sj****@acm.org>

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Adam | last post by:
Hello, I'm trying to decifer the data in the table that stores the data in the binary format. All numbers are placed in varbinary fields. All I know is the MS SQL 2000 database useing collation...
1
by: Swarup | last post by:
I am reading a file (txt, xml, gif, ico, bmp etc) byte by byte and filling it into a byte arry. Now i have to convert it into a string to store it in the database. I use...
5
by: [Yosi] | last post by:
Hi, I have a string array includes unicode data, how can I print the char (real string), for example: "\x08\x03\x34\0.\0\x39\0" What should I do, I want to see the char of this array of unicode....
2
by: Joey Lee | last post by:
Hi, Does anyone know how I am able to write a utf-8 encoded binary string into binary file? Currently I am given a UTF-8 string which was read from a gif image. Here are my functions... ...
4
by: Julia | last post by:
Hi, I need to convert unicode string to ansi string Thanks in adavance.
4
by: ad | last post by:
I have a string variable. How can I convert the string to MemoryStream?
15
by: Buddy Home | last post by:
Hello, I'm trying to speed up a piece of code that is causing performance issues with our product. The problem is we are using serialization to convert the object to a string, this is costing us...
6
by: Bob Altman | last post by:
Hi all, I'm looking for the fastest way to convert an array of bytes to String. I also need to convert a String back to its original Byte() representation. Convert.ToBase64String and...
2
by: neovantage | last post by:
hey geeks, I am using a function which convert unicode to entities. So that i can save values into mysql database into entities. This function really helps me when i display the store entity data...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.