473,666 Members | 2,138 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

convert string with raw binary data to unicode

Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_f ile','rb').read ()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('pa th_to_file','rb ','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?

regards,
Achim
Jul 18 '05 #1
3 4467
"Achim Domma" <do***@procoder s.net> writes:
Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_f ile','rb').read ()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('pa th_to_file','rb ','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?


If I understand your problem correctly, you want to construct a unicode
object containing arbitrary data in it's internal buffer.

And if I understand Python's unicode implementation correctly, than I
would say it isn't possible - since unicode objects do not contain
binary data, they contain characters (or how is this called in the
unicode world?).

OTOH, it should be possible to write a small extension wrapping the
PyUnicode_FromU nicode() function to accept arbitrary data.

Is there also a possibility to write a codec which does this?

Note that the 'if's above are probably big 'if's...

Thomas
Jul 18 '05 #2
Achim Domma:
data = codecs.open('pa th_to_file','rb ','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting found via google supposes to use UTF-16 but this is not the same and raises an error.


It is better to show the error message when sending queries to a news
group. You may want to look at the 'errors' argument which can be one of:

'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character
'xmlcharrefrepl ace' Replace with the appropriate XML character reference
'backslashrepla ce' Replace with backslashed escape sequences.

Take a look at the results after using, say, 'backslashrepla ce' and you
may find that much of your file is not UTF-16 or that it is byte swapped or
that there are just a few bad characters in a header or similar.

Neil
Jul 18 '05 #3
Achim Domma wrote:
Hi,

I want to pass raw binary data from a file to a COM object. I read the data
from file like this:

data = file('path_to_f ile','rb').read ()

If passed to a COM object, data is converted to unicode in the way one would
expect for strings. I.e. a lot of zeros are filled in. I want each two
characters from data to be interpreted as one unicode character. I read the
docu about codecs but can not find a suitable codec. I also tried to read
the data like this:

data = codecs.open('pa th_to_file','rb ','???').read()

I tried to use UCS2 for the ???, but this encoding does not exist. A posting
found via google supposes to use UTF-16 but this is not the same and raises
an error.

This shouldn't be a big problem, but I can figure out how to solve it. Can
anybody help?


Try utf-16-le or utf-16-be (depending on endianness of the data) as
encoding.
--
Sjoerd Mullender <sj****@acm.org >

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
11733
by: Adam | last post by:
Hello, I'm trying to decifer the data in the table that stores the data in the binary format. All numbers are placed in varbinary fields. All I know is the MS SQL 2000 database useing collation SQL_Latin1_General_CP1_CI_AS (default). For example the content of the field is: (0xB4F5000000000000) in unicode and defined as varbinary(8).
1
4472
by: Swarup | last post by:
I am reading a file (txt, xml, gif, ico, bmp etc) byte by byte and filling it into a byte arry. Now i have to convert it into a string to store it in the database. I use System.Text.UnicodeEncoding enc = new System.Text.UnicodeEncoding(); now i am using enc.GetString(value) and the value retured is one byte less if the size of the byte array is Odd. In case of files having even number of bytes, the convertion is happening correctly and...
5
6570
by: [Yosi] | last post by:
Hi, I have a string array includes unicode data, how can I print the char (real string), for example: "\x08\x03\x34\0.\0\x39\0" What should I do, I want to see the char of this array of unicode. I want to make convert between Unicode to string and string to Unicode.
2
11200
by: Joey Lee | last post by:
Hi, Does anyone know how I am able to write a utf-8 encoded binary string into binary file? Currently I am given a UTF-8 string which was read from a gif image. Here are my functions... public Byte GetDocument(string DocumentName) { string strdocPath;
4
79635
by: Julia | last post by:
Hi, I need to convert unicode string to ansi string Thanks in adavance.
4
61113
by: ad | last post by:
I have a string variable. How can I convert the string to MemoryStream?
15
12302
by: Buddy Home | last post by:
Hello, I'm trying to speed up a piece of code that is causing performance issues with our product. The problem is we are using serialization to convert the object to a string, this is costing us performance degrade. Does anyone know any better way to archive this which just not degrade performance. The code is written in C#. Thanks
6
5260
by: Bob Altman | last post by:
Hi all, I'm looking for the fastest way to convert an array of bytes to String. I also need to convert a String back to its original Byte() representation. Convert.ToBase64String and Convert.FromBase64String seem like the closest thing I can find to what I'm looking for baked into the base class library. Can anyone suggest a better way to do this? TIA - Bob
2
4868
by: neovantage | last post by:
hey geeks, I am using a function which convert unicode to entities. So that i can save values into mysql database into entities. This function really helps me when i display the store entity data into web page n it shows special charactor easily. Here is the function code function charset_decode_utf_8($string) { /* Only do the slow convert if there are 8-bit characters */ /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not...
0
8445
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8871
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8781
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8551
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7386
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6198
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5664
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4198
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2771
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.