473,624 Members | 2,169 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Issue with unicode

4 New Member
Hi Guys
I have written a library for reading from XML files.
I have used the SAX library for the same.
Then i use the strings(unicode strings) returned from the parser and pass it on to a Fuzzing Library written by me.
The Strings consists of hex characters.
for eg.They can be something like
String = "a\x01\x02\ x0c"
However the problem is that Sax is returning strings like this as a series of contiguous characters and not as escaped hex characters.
If you do a len(String) you will get something like 13 instead of 4.
I tried a lot of encoding and decoding combinations to get the original string but i cannot.
i have been stuck in this situation for around 2 days now and my coding phase has been obstructed.
Please help me with this ASAP.

Details:
OS:Linux
Python:Version 2.4.3.
XML Library:sax
Jul 24 '07 #1
5 3504
bartonc
6,596 Recognized Expert Expert
It should be as easy as converting to a string. Given a unicode string:
>>> a = ""
the built in str() should convert it
>>> b = str(a)
>>> b
'\xc0\xc8\xcc\x d2\xd9'
>>>
Jul 24 '07 #2
anandrage
4 New Member
It should be as easy as converting to a string. Given a unicode string:
>>> a = ""
the built in str() should convert it
>>> b = str(a)
>>> b
'\xc0\xc8\xcc\x d2\xd9'
>>>



Thanks for the reply
This wont work however.
This is the first thing that everyone tries in order to convert.(str())
The problem is not with the conversion aspect of it.
The issue is with the format of conversion.When we are trying to typecast as you have mentioned it will do a regular typecast but the format remains the same.
That is the characters are still contigous and the "Escaped" meaning is not present.
The issue is that i want the unicode to be treated as escaped hex characters and not one character each for "\" ,"x" and so on.
This is because of the SAX Parser returning the input in such a fashion.

See when the string is u"\x01\x02"
i want the len(string) to return 2 and not 8.
so that \x01 is treated as one character.

If u use the python command prompt and make a plain declaration like
a="\x01\x02"
len(a) will return 2 and not 8.You can check the same.Hence i want the same behavior with my string as well as that is how my Fuzzing Library expects the string to come.

I hope i have made the issue a little clearer this time around.

Thanks a lot for the prompt reply.I need a solution ASAP or else im in deep trouble.
Jul 25 '07 #3
bvdet
2,851 Recognized Expert Moderator Specialist


Thanks for the reply
This wont work however.
This is the first thing that everyone tries in order to convert.(str())
The problem is not with the conversion aspect of it.
The issue is with the format of conversion.When we are trying to typecast as you have mentioned it will do a regular typecast but the format remains the same.
That is the characters are still contigous and the "Escaped" meaning is not present.
The issue is that i want the unicode to be treated as escaped hex characters and not one character each for "\" ,"x" and so on.
This is because of the SAX Parser returning the input in such a fashion.

See when the string is u"\x01\x02"
i want the len(string) to return 2 and not 8.
so that \x01 is treated as one character.

If u use the python command prompt and make a plain declaration like
a="\x01\x02"
len(a) will return 2 and not 8.You can check the same.Hence i want the same behavior with my string as well as that is how my Fuzzing Library expects the string to come.

I hope i have made the issue a little clearer this time around.

Thanks a lot for the prompt reply.I need a solution ASAP or else im in deep trouble.
If I understand your problem correctly, you can do something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> b
  2. u'\x01\x02'
  3. >>> str(b)
  4. '\x01\x02'
  5. >>> repr(str(b))
  6. "'\\x01\\x02'"
  7. >>> len(repr(str(b)))
  8. 10
  9. >>> s = repr(str(b)).replace("'", "")
  10. >>> s
  11. '\\x01\\x02'
  12. >>> len(s)
  13. 8
Jul 25 '07 #4
anandrage
4 New Member
If I understand your problem correctly, you can do something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> b
  2. u'\x01\x02'
  3. >>> str(b)
  4. '\x01\x02'
  5. >>> repr(str(b))
  6. "'\\x01\\x02'"
  7. >>> len(repr(str(b)))
  8. 10
  9. >>> s = repr(str(b)).replace("'", "")
  10. >>> s
  11. '\\x01\\x02'
  12. >>> len(s)
  13. 8
Hi
Thanks for the reply
Im afraid you still didnt understand the actual issue.
I want the length of the string as two and not 10 or 8.
When we are doin len(string) it should consider the escaped characters and not the string as a series of raw characters as we can see here.

I want something like This

Code:
"What Normally happens on the command Line":
>>>s = u"\x01\x02"
>>>len(s)
2



However if i do the same on the Unicode String sent back by the Sax Parser I get this:
>>>s
"\x01\x02"
>>>len(s)
8
>>>type(s)
<type 'unicode>
>>>s = str(s) or s = s.encode("utf-8")
>>>s
"\x01\x02"
>>>len(s)
8


>>>"THIS IS NOT WHAT I WANT"
>>>"What i want Follows this"
>>>s = <some Function>(s)
>>>len(s)
2
>>>s
"\x01\x02"
>>>"I THINK THIS MAKES A BIT MORE CLEAR"


If some one can get me that function(some function above)
Then my problem will be solved.

Thanks for the try anyhow.
Jul 26 '07 #5
bvdet
2,851 Recognized Expert Moderator Specialist
Hi
Thanks for the reply
Im afraid you still didnt understand the actual issue.
I want the length of the string as two and not 10 or 8.
When we are doin len(string) it should consider the escaped characters and not the string as a series of raw characters as we can see here.

I want something like This

Code:
"What Normally happens on the command Line":
>>>s = u"\x01\x02"
>>>len(s)
2



However if i do the same on the Unicode String sent back by the Sax Parser I get this:
>>>s
"\x01\x02"
>>>len(s)
8
>>>type(s)
<type 'unicode>
>>>s = str(s) or s = s.encode("utf-8")
>>>s
"\x01\x02"
>>>len(s)
8


>>>"THIS IS NOT WHAT I WANT"
>>>"What i want Follows this"
>>>s = <some Function>(s)
>>>len(s)
2
>>>s
"\x01\x02"
>>>"I THINK THIS MAKES A BIT MORE CLEAR"


If some one can get me that function(some function above)
Then my problem will be solved.

Thanks for the try anyhow.
Sorry, but I took a part of your previous post out of context.
I cannot explain why your string is represented as unicode but returns a length of 8. You might try something like this:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def lenUstr(s):
  4.     patt = r'\\x\d+'
  5.     return len(re.findall(patt, s1))
  6.  
  7. s = u'\x01\x02'
  8. print type(s)
  9.  
  10. print lenUstr(repr(str(s)).replace("'", ""))
Expand|Select|Wrap|Line Numbers
  1. >>> <type 'unicode'>
  2. 2
  3. >>> 
Jul 26 '07 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

1
2170
by: Marc Petitmermet | last post by:
The line below looks up the name "ttinger" (with the German umlaut) of an author using the mysql console: mysql> select author from records where author like '%&#xd6;ttinger%'; This successfully finds all entries in the records database where "ttinger" is the author or the co-author. In a web form, the user enters "ttinger" and wants to search with this search string. My idea is now to convert the search string (which also
0
2579
by: Jonathan | last post by:
I have a unicode database and I basically wish to publish out certain data (via views) from it to a non unicode database. Unfortunately we can not change the type of either of the databases due to the applications which hang off of them. Unicode database details -> (Windows 2K, SQL Server 2K, collation = Latin1_General_BIN). Non Unicode database details -> (Windows 2K, SQL Server 2K, collation = SQL_Latin1_General_CP1_CI_AS).
2
12095
by: Mark Anderson | last post by:
Hi, I've a problem with code that should produce a Windows(ANSI) encoded text file but doesn't. Server is IIS 5 on Win 2k, with ASP ver? My ASP uses data from an upstream HTML form on a UTF-8 encoded page. The latter is output by a server-side system and I can't alter that source format. Here's some of the ASP code (may wrap): ' Following vars are longer strings but of this type - just more Request
2
2697
by: john | last post by:
Hello, We have a tablet pc that is trying to sync data to our main sql server during the night but we keep getting a timeout issue. The tablet pc has an access database that contains about 5000 rows of data. We have setup the machine.config for a executionTimeout of 900. Below is our code for the specific vb.net app and the webservice it uses. Can someone see if I am missing something as to why this is bombing out? Also is there a
4
3870
by: candide_sh | last post by:
Hello, I created a script with database publishing wizard to convert a SS2005 db into a SS2k db. The script has schema and data and looks very good. When I try to start the created script in SS2k-Query analyzer, I get an error like (in german) Server: Nachr.-Nr. 105, Schweregrad 15, Status 1, Zeile 2 Öffnendes Anführungszeichen vor der Zeichenfolge '䝘퉊䘒 '.
8
1416
by: John Nagle | last post by:
The Python documentation for "str" says "str() : Return a string containing a nicely printable representation of an object." However, there's no mention of the fact that "str" of a Unicode string with non-ASCII characters will raise a conversion exception. The documentation (and several Python books) seem to indicate that "str" will produce some "printable representation" for anything, not raise an exception. I know, it was proposed...
23
1314
by: Thorsten Kampe | last post by:
Hi, I've written a script which uses Optik/Optparse to display the options (which works fine). The text for the help message is localised (with german umlauts) and when I execute the script with the localised environment variable set, I get this traceback. The interesting thing is that the localised optparse messages from displays fine - it's only my localisation that errors. From my understanding, my script doesn't put out anything,...
7
8520
by: simonroses | last post by:
Hello Guys, I have installed python 2.5 (AMD64) on Vista (64), also installed wx 2.8 but I'm getting this error: """" Traceback (most recent call last): File "mymodule.py", line 39, in <module> import wx File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\__init__.py",
10
1414
by: Samuel | last post by:
Hi I am trying to read text files that are saved in ANSI format with Unicode characters such as French e German big S etc, and as I read the file these characters appear as squares etc. I know that if the file would be saved as Unicode this wouldn't be a problem. The question is whether there is an option that when I create the Stream
0
1287
by: amollokhande1 | last post by:
Hi All, Currently we are facing an issue while decoding the Base64Encoded unicode data. Here is the scenario We have one custom javascript function that encodes the unicode data using Base64 mechanism. After encoding the data on client side we are sending it back to the server. On Server side we are decoding this unicode data using microsoft framework inbuild functions as below Private Function DecodeVarHash(ByVal strEncoded As...
0
8234
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
8172
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8677
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8620
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8335
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6110
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5563
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4079
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
1784
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.