By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,426 Members | 2,928 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,426 IT Pros & Developers. It's quick & easy.

Unicode characters converting to minus-numbers??

Robbie
100+
P: 180
Hi again all, here's something I'm stuck on...
I'm making a function to convert a unicode character into the kind of code you need to put on a UTF-8 encoded web page (ampersand, hash, digits, semicolon).
The program sees whether it's a unicode character by seeing if AscW(character)>255.

It works fine most of the time, e.g. it converts this:
Expand|Select|Wrap|Line Numbers
  1. original:
  2. 泳ぐ
  3. converted:
  4. &# 27891;&# 12368;
  5. (I put spaces so that they didn't get converted on the forum - the spaces are not made by the function)
But, for some characters, I found that AscW(character) gives back minus numbers! I can't find the relation between the minus-number and what the number should be, either, so I can't make some formula to convert the minus-number to the appropriate number.

Here are a few it messes up on with hex version in square brackets in case that helps:
Expand|Select|Wrap|Line Numbers
  1. Character: こ - Number should be: 12371 [3053] - AscW() gives: -28740
  2. Character: い - Number should be: 12356 [3044] - AscW() gives: -30644
  3. Character: み - Number should be: 12415 [307F] - AscW() gives: -30325
  4.  
Sorry if it seems confusing, but it does also to me. >_<

I tried also to 'manually' work out the number by looking at the 2 bytes, multiplying the leftmost one by 255 and adding the rightmost byte, by using MidB(character,1,1) for the first byte and MidB(character,2,1) for the second byte, but something weird happens here also.

Doing MidB(character, 1, 2) gives you back the character, as I expected, because the character's made up of 2 bytes.
However, if I try to get just 1 byte (length 1), it seems to fail to get anything, because
Expand|Select|Wrap|Line Numbers
  1. len(MidB(character, 1, 1))
  2.  
is 0. Also Asc(MidB(character, 1, 1)) causes an error saying an argument is invalid, also making me think MidB() is giving back an empty string.
Agh... my head hurts... does anyone know why on earth AscW() doesn't give the correct number like it does with most other characters?
Apr 22 '07 #1
Share this Question
Share on Google+
6 Replies


Robbie
100+
P: 180
I have finally fixed it!
I have no idea what's up with VB's AscW() function, but I've made my own alternative to it which works even when AscW() doesn't!

So I'm posting it here, to benefit anyone who may have the same problem...
Only works with a Unicode character, so only use it if AscW() gives back >255.

Actually you should probably check to see if AscW() gives some minus-number instead of the correct number before resorting to using my function at all because AscW() is a little faster, I think.

Expand|Select|Wrap|Line Numbers
  1. Public Function UnicodeAsc(RealUnicode As String)
  2.     'This function returns what AscW() should return, but which it sometimes doesn't.
  3.     'I think it's slightly slower than AscW(), but at least it doesn't ever fail.
  4. 'RealUnicode - String containing the unicode character to return the code of.
  5. 'If it contains more than one character, only the first one is used.
  6.  
  7. Dim UnicodeByte1 As Long
  8. Dim UnicodeByte2 As Long
  9. Dim UnicodeRecreate As Long
  10.  
  11.     UnicodeByte1 = AscB(MidB(RealUnicode, 1, 1))
  12.     UnicodeByte2 = AscB(MidB(RealUnicode, 2, 1))
  13.     UnicodeRecreate = (UnicodeByte2 * 256!) + UnicodeByte1
  14.  
  15. UnicodeAsc = UnicodeRecreate
  16.  
  17. End Function
  18.  
  19.  
As you can see, I might be able to reduce the number of variables used and the size of them (Double -> Long -> Integer -> Byte) in the function, but I was getting lots of overflow errors, so I increased their size to Long.

Also, as Killer42 mentioned in this thread , at the expense of a little more memory, Long is faster than Integer on 32-bit processors (as most are nowadays) (it's their 'native' format - no converting internally!).
Apr 24 '07 #2

Expert 5K+
P: 8,434
Glad to see you've got it sorted. :)

I was puzzling over this one for a while, but couldn't think of anything helpful. It would still be nice to find an answer rather than just working around the problem, but as long as you've achieved what you want to do I guess that's the important thing.

By the way, I feel I should mention that they are called "negative numbers" not "minus-numbers".
Apr 24 '07 #3

Robbie
100+
P: 180
Glad to see you've got it sorted. :)

I was puzzling over this one for a while, but couldn't think of anything helpful. It would still be nice to find an answer rather than just working around the problem, but as long as you've achieved what you want to do I guess that's the important thing.

By the way, I feel I should mention that they are called "negative numbers" not "minus-numbers".
Yes, it'd be nice to understand what was going on in VB's AscW()... I think it might have something to do with getting an answer to some equation past the lower limit of a 'word' (I mean 2 bytes) (<0).

For example...
Load the calculator on Windows (Scientific mode), put it on Hex out of Hex/Dec/Oct/Bin, and put it on Word out of Qword/Dword/Word/Byte.
Then switch to Dec and type -12,000. When you switch over to Hex again you get a sort of 'inverted' version D120 which turns out to be 65,535-11,999 (unfortunately, the number AscW gave wasn't this simple)... meh... food for thought, I guess...?

Oh and okay about the negative numbers. Maybe 'minus-numbers' is only used mainly in the UK where I am? Sure sounds less clumsy than 'negative numbers'. ;) Or is 'negative numbers' just VB-specific... oh well. Gotta get back to coding now, no excuse about being stuck anymore :P

EDIT:
The problem I was having on the first post about the way MidB looked like it was giving back blankness is because I was using Len instead of LenB and Asc instead of AscB! >_<
If I had realized that then, I could have fixed it then... --;
Still, we live and learn
Apr 24 '07 #4

Expert 5K+
P: 8,434
Yeah, checking for signed -vs- unsigned values was one of the first things I did. As you can see, it didn't pan out.

So are you saying that they are actually referred to commonly as "minus numbers" in the UK? I don't know, it just sounds... childish, I suppose. Certainly it's not anything specific to VB.

I would never have spotted the "B" -vs- "non-B" functions, as I don't think I've ever used the B versions. Never really worked much with byte values in VB. And I've never had to deal with Unicode.

One of these days I need to check out how to use byte arrays in place of strings in certain areas, for performance reasons. Some day. Real soon...
Apr 24 '07 #5

Robbie
100+
P: 180
Yeah, checking for signed -vs- unsigned values was one of the first things I did. As you can see, it didn't pan out.

So are you saying that they are actually referred to commonly as "minus numbers" in the UK? I don't know, it just sounds... childish, I suppose. Certainly it's not anything specific to VB.

I would never have spotted the "B" -vs- "non-B" functions, as I don't think I've ever used the B versions. Never really worked much with byte values in VB. And I've never had to deal with Unicode.

One of these days I need to check out how to use byte arrays in place of strings in certain areas, for performance reasons. Some day. Real soon...
Yep, we often use childish 'minus-numbers' in the UK... :P
You mentioned using a byte array... now I think about it I suppose looking at a single value in an array must be faster than using MidB() on each character in a string...
I think I'll have to try that too some time ;)
Apr 24 '07 #6

Expert 5K+
P: 8,434
Yep, we often use childish 'minus-numbers' in the UK... :P
You mentioned using a byte array... now I think about it I suppose looking at a single value in an array must be faster than using MidB() on each character in a string...
I think I'll have to try that too some time ;)
I've done a number of jobs where I was processing huge batches of information from file to file. Reading and writing the file was surprisingly fast when transferred a large chunk at a time in binary mode. But my string processing has tended to be a weak point. You can transfer to/from a binary array with the same Get/Put statements, so I think I could speed tihngs up this way.
Apr 25 '07 #7

Post your reply

Sign in to post your reply or Sign up for a free account.