473,888 Members | 2,194 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode characters converting to minus-numbers??

Robbie
180 New Member
Hi again all, here's something I'm stuck on...
I'm making a function to convert a unicode character into the kind of code you need to put on a UTF-8 encoded web page (ampersand, hash, digits, semicolon).
The program sees whether it's a unicode character by seeing if AscW(character) >255.

It works fine most of the time, e.g. it converts this:
Expand|Select|Wrap|Line Numbers
  1. original:
  2. 泳ぐ
  3. converted:
  4. &# 27891;&# 12368;
  5. (I put spaces so that they didn't get converted on the forum - the spaces are not made by the function)
But, for some characters, I found that AscW(character) gives back minus numbers! I can't find the relation between the minus-number and what the number should be, either, so I can't make some formula to convert the minus-number to the appropriate number.

Here are a few it messes up on with hex version in square brackets in case that helps:
Expand|Select|Wrap|Line Numbers
  1. Character: こ - Number should be: 12371 [3053] - AscW() gives: -28740
  2. Character: い - Number should be: 12356 [3044] - AscW() gives: -30644
  3. Character: み - Number should be: 12415 [307F] - AscW() gives: -30325
  4.  
Sorry if it seems confusing, but it does also to me. >_<

I tried also to 'manually' work out the number by looking at the 2 bytes, multiplying the leftmost one by 255 and adding the rightmost byte, by using MidB(character, 1,1) for the first byte and MidB(character, 2,1) for the second byte, but something weird happens here also.

Doing MidB(character, 1, 2) gives you back the character, as I expected, because the character's made up of 2 bytes.
However, if I try to get just 1 byte (length 1), it seems to fail to get anything, because
Expand|Select|Wrap|Line Numbers
  1. len(MidB(character, 1, 1))
  2.  
is 0. Also Asc(MidB(charac ter, 1, 1)) causes an error saying an argument is invalid, also making me think MidB() is giving back an empty string.
Agh... my head hurts... does anyone know why on earth AscW() doesn't give the correct number like it does with most other characters?
Apr 22 '07 #1
6 5322
Robbie
180 New Member
I have finally fixed it!
I have no idea what's up with VB's AscW() function, but I've made my own alternative to it which works even when AscW() doesn't!

So I'm posting it here, to benefit anyone who may have the same problem...
Only works with a Unicode character, so only use it if AscW() gives back >255.

Actually you should probably check to see if AscW() gives some minus-number instead of the correct number before resorting to using my function at all because AscW() is a little faster, I think.

Expand|Select|Wrap|Line Numbers
  1. Public Function UnicodeAsc(RealUnicode As String)
  2.     'This function returns what AscW() should return, but which it sometimes doesn't.
  3.     'I think it's slightly slower than AscW(), but at least it doesn't ever fail.
  4. 'RealUnicode - String containing the unicode character to return the code of.
  5. 'If it contains more than one character, only the first one is used.
  6.  
  7. Dim UnicodeByte1 As Long
  8. Dim UnicodeByte2 As Long
  9. Dim UnicodeRecreate As Long
  10.  
  11.     UnicodeByte1 = AscB(MidB(RealUnicode, 1, 1))
  12.     UnicodeByte2 = AscB(MidB(RealUnicode, 2, 1))
  13.     UnicodeRecreate = (UnicodeByte2 * 256!) + UnicodeByte1
  14.  
  15. UnicodeAsc = UnicodeRecreate
  16.  
  17. End Function
  18.  
  19.  
As you can see, I might be able to reduce the number of variables used and the size of them (Double -> Long -> Integer -> Byte) in the function, but I was getting lots of overflow errors, so I increased their size to Long.

Also, as Killer42 mentioned in this thread , at the expense of a little more memory, Long is faster than Integer on 32-bit processors (as most are nowadays) (it's their 'native' format - no converting internally!).
Apr 24 '07 #2
Killer42
8,435 Recognized Expert Expert
Glad to see you've got it sorted. :)

I was puzzling over this one for a while, but couldn't think of anything helpful. It would still be nice to find an answer rather than just working around the problem, but as long as you've achieved what you want to do I guess that's the important thing.

By the way, I feel I should mention that they are called "negative numbers" not "minus-numbers".
Apr 24 '07 #3
Robbie
180 New Member
Glad to see you've got it sorted. :)

I was puzzling over this one for a while, but couldn't think of anything helpful. It would still be nice to find an answer rather than just working around the problem, but as long as you've achieved what you want to do I guess that's the important thing.

By the way, I feel I should mention that they are called "negative numbers" not "minus-numbers".
Yes, it'd be nice to understand what was going on in VB's AscW()... I think it might have something to do with getting an answer to some equation past the lower limit of a 'word' (I mean 2 bytes) (<0).

For example...
Load the calculator on Windows (Scientific mode), put it on Hex out of Hex/Dec/Oct/Bin, and put it on Word out of Qword/Dword/Word/Byte.
Then switch to Dec and type -12,000. When you switch over to Hex again you get a sort of 'inverted' version D120 which turns out to be 65,535-11,999 (unfortunately, the number AscW gave wasn't this simple)... meh... food for thought, I guess...?

Oh and okay about the negative numbers. Maybe 'minus-numbers' is only used mainly in the UK where I am? Sure sounds less clumsy than 'negative numbers'. ;) Or is 'negative numbers' just VB-specific... oh well. Gotta get back to coding now, no excuse about being stuck anymore :P

EDIT:
The problem I was having on the first post about the way MidB looked like it was giving back blankness is because I was using Len instead of LenB and Asc instead of AscB! >_<
If I had realized that then, I could have fixed it then... --;
Still, we live and learn
Apr 24 '07 #4
Killer42
8,435 Recognized Expert Expert
Yeah, checking for signed -vs- unsigned values was one of the first things I did. As you can see, it didn't pan out.

So are you saying that they are actually referred to commonly as "minus numbers" in the UK? I don't know, it just sounds... childish, I suppose. Certainly it's not anything specific to VB.

I would never have spotted the "B" -vs- "non-B" functions, as I don't think I've ever used the B versions. Never really worked much with byte values in VB. And I've never had to deal with Unicode.

One of these days I need to check out how to use byte arrays in place of strings in certain areas, for performance reasons. Some day. Real soon...
Apr 24 '07 #5
Robbie
180 New Member
Yeah, checking for signed -vs- unsigned values was one of the first things I did. As you can see, it didn't pan out.

So are you saying that they are actually referred to commonly as "minus numbers" in the UK? I don't know, it just sounds... childish, I suppose. Certainly it's not anything specific to VB.

I would never have spotted the "B" -vs- "non-B" functions, as I don't think I've ever used the B versions. Never really worked much with byte values in VB. And I've never had to deal with Unicode.

One of these days I need to check out how to use byte arrays in place of strings in certain areas, for performance reasons. Some day. Real soon...
Yep, we often use childish 'minus-numbers' in the UK... :P
You mentioned using a byte array... now I think about it I suppose looking at a single value in an array must be faster than using MidB() on each character in a string...
I think I'll have to try that too some time ;)
Apr 24 '07 #6
Killer42
8,435 Recognized Expert Expert
Yep, we often use childish 'minus-numbers' in the UK... :P
You mentioned using a byte array... now I think about it I suppose looking at a single value in an array must be faster than using MidB() on each character in a string...
I think I'll have to try that too some time ;)
I've done a number of jobs where I was processing huge batches of information from file to file. Reading and writing the file was surprisingly fast when transferred a large chunk at a time in binary mode. But my string processing has tended to be a weak point. You can transfer to/from a binary array with the same Get/Put statements, so I think I could speed tihngs up this way.
Apr 25 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

6
26677
by: ..... | last post by:
I have an established program that I am changing to allow users to select one of eight languages and have all the label captions change accordingly. I have no problems with English, French, Dutch, German, Spanish or Italian. The Polish language is causing me trouble. From what I have read, VB supports UNICODE, in fact it uses UNICODE internally, which means that ANY character in pretty much any language should be readable from a UNICODE...
22
5525
by: Keith MacDonald | last post by:
Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on codecvt, in Josuttis' "The Standard C++ Library", did not help, and I'm still awaiting delivery of Langer's "Standard C++ IOStreams and Locales". Thanks,
4
6227
by: Marco Iannaccone | last post by:
I'd like to start using Unicod (especially UTF-8) in my C programs, and would like some infos on how to start. Can you tell me some documents (possibily online) explaining Unidoce and UTF-8, and how I can use them in my programs (writing and reading from file, from the console, processing Unicode strings and chars inside the program, etc...)? Thanx
7
1865
by: sams | last post by:
Hi @all, I'm searching for a solution for the following problem: I want to replace all unicode characters in a string with a valid substituition. For example: string s = "Catalán"; string s2 = ModifyMyString(s); //s2 = "Catal\xC3\xA1n"
2
3550
by: Gidi | last post by:
Hi, I'm writing a C# win application program, and i need to transfer my hebrew letters from unicode to ascii, now if i use the ascii encoding it writes me ??? instead of the hebrew letter i've entered. I know what the Ascii value of each letter, so i understood that i can transfer my string to BYTE and enter the ascii value by myself. if someone has a better idea, i'll be happy to hear about it. how can i know the Unicode value of a...
40
3285
by: apprentice | last post by:
Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages, including asian languages (chinese, japanese, korean, etc). First of all, strings will be passed to my class methods, some of which based on the language (and on the encoding) might contain characters that require more that a single byte.
12
3057
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into character encoding intricacies before. In c/c++, the unicode characters are introduced by the means of wchar_t type. Based on the presence of _UNICODE definition C functions are macro'd to either the normal version or the one prefixed with w. Because...
14
6442
by: abhi147 | last post by:
Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should I go about doing it ? Thanks
0
2387
by: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= | last post by:
"C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in Here it complains that it deals with the character U+2013, which is "EN DASH"; it complains that the encoding called "latin-1" does not support that character. That is a fact - Latin-1 does not support EN DASH. That's because your console uses the code page 437:
4
227
by: billsahiker | last post by:
Where do I find the unicode values for math operators like equal, minus and plus sign and how to I check if the value of a byte array is one of these operators? I populate the byte array from a filestream object using the Read method. So far Ihave been working with utf8 files and I just use if(byte == 61) //0x3D works also it returns true if it is the equal sign. But how do I do this if I work with a unicode/utf16 encoded file? I...
0
9799
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11173
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10772
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10434
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9593
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7988
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7143
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4635
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4239
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.