By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,928 Members | 1,200 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,928 IT Pros & Developers. It's quick & easy.

Any foreign character mapping charts available?

P: n/a
I'm in the US, and have to constantly take data input from other
countries. Some of this data has characters which I can't understand,
since it's input from other language keyboards. This prevents me from
reading the name and passing it to a master database for proper
storage and reporting. Can anyone tell me if there is a chart mapping
non-english characters to English equivalents? We can use any of the
extended characters, such as ▄ (as in M▄NCHEN, Germany).

The problem is that with some words, multiple characters make up one,
such as "BA├▒ARES", which is a Spanish entry that translates to
BAĐARES; "D├╝SSELDORF" is the German entry I often find for
D▄SSELDORF. Many such entries are providing me with a new hobby, the
likes of which I'd like to give up.

So, any assistance on this will merit great admiration and gratitude,
and maybe even a lolipop (limit one per solution, please--allow 4-6
weeks for delivery).

Thanks in advance
Nov 12 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
See

http://microsoft.com/globaldev/reference/WinCP.mspx

for the Windows "default ANSI" [sic] code pages, and

http://microsoft.com/globaldev/reference/oem.mspx

for the Windows "deault OEM" code pages.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
I'm in the US, and have to constantly take data input from other
countries. Some of this data has characters which I can't understand,
since it's input from other language keyboards. This prevents me from
reading the name and passing it to a master database for proper
storage and reporting. Can anyone tell me if there is a chart mapping
non-english characters to English equivalents? We can use any of the
extended characters, such as ▄ (as in M▄NCHEN, Germany).

The problem is that with some words, multiple characters make up one,
such as "BA├▒ARES", which is a Spanish entry that translates to
BAĐARES; "D├╝SSELDORF" is the German entry I often find for
D▄SSELDORF. Many such entries are providing me with a new hobby, the
likes of which I'd like to give up.

So, any assistance on this will merit great admiration and gratitude,
and maybe even a lolipop (limit one per solution, please--allow 4-6
weeks for delivery).

Thanks in advance

Nov 12 '05 #2

P: n/a
MichKa:

Thanks for the great reference. Unfortunately, it isn't quite a
bulls-eye. I can reproduce any of the characters brought into my
Access 97 database, but knowing what they translate to is the
question. If there is a single int'l character, like ╔ (character
0201), I can leave it as is. However, usually, there are two
characters that represent a third. For example, in the French word
D├ęFENSE as well as the Spanish word M├ęXICO, I can assume the ├ę
translates to ╔ (character 0201), but I can't assume in all cases. For
one, I don't recognize names in most languages, and second, the same
character can be used in different ways in different countries. For
example, ├╝ in Z├╝RICH translates to ▄ (char. 0220), as in Z▄RICH, but
that's for Germany. For Switzerland, I think it's different, as in
HANS-J├╝RGEN translating to HANS-JUERGEN, and M├╝NCHENSTEIN to
MUENCHENSTEIN. By the way, those translations are from searches I've
done on Google, so I may be off on my research.

From the above examples, can you suggest a more accurate approach to
translating the data in question? Of course, if you're going to
suggest becoming proficient in 27 different languages, please be
informed that my brain is begging for a vacation as it is (and I just
started this project :-).

Thanks again, in advance.

J
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in message news:<40********@news.microsoft.com>...
See

http://microsoft.com/globaldev/reference/WinCP.mspx

for the Windows "default ANSI" [sic] code pages, and

http://microsoft.com/globaldev/reference/oem.mspx

for the Windows "deault OEM" code pages.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
I'm in the US, and have to constantly take data input from other
countries. Some of this data has characters which I can't understand,
since it's input from other language keyboards. This prevents me from
reading the name and passing it to a master database for proper

...

Nov 12 '05 #3

P: n/a
You need to talk to the people who are doing the conversion to say what code
page is being used -- it is unreasonable to try to guess (though I would
guess UTF-8 if I had to).
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.
"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
MichKa:

Thanks for the great reference. Unfortunately, it isn't quite a
bulls-eye. I can reproduce any of the characters brought into my
Access 97 database, but knowing what they translate to is the
question. If there is a single int'l character, like ╔ (character
0201), I can leave it as is. However, usually, there are two
characters that represent a third. For example, in the French word
D├ęFENSE as well as the Spanish word M├ęXICO, I can assume the ├ę
translates to ╔ (character 0201), but I can't assume in all cases. For
one, I don't recognize names in most languages, and second, the same
character can be used in different ways in different countries. For
example, ├╝ in Z├╝RICH translates to ▄ (char. 0220), as in Z▄RICH, but
that's for Germany. For Switzerland, I think it's different, as in
HANS-J├╝RGEN translating to HANS-JUERGEN, and M├╝NCHENSTEIN to
MUENCHENSTEIN. By the way, those translations are from searches I've
done on Google, so I may be off on my research.

From the above examples, can you suggest a more accurate approach to
translating the data in question? Of course, if you're going to
suggest becoming proficient in 27 different languages, please be
informed that my brain is begging for a vacation as it is (and I just
started this project :-).

Thanks again, in advance.

J
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in

message news:<40********@news.microsoft.com>...
See

http://microsoft.com/globaldev/reference/WinCP.mspx

for the Windows "default ANSI" [sic] code pages, and

http://microsoft.com/globaldev/reference/oem.mspx

for the Windows "deault OEM" code pages.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
I'm in the US, and have to constantly take data input from other
countries. Some of this data has characters which I can't understand,
since it's input from other language keyboards. This prevents me from
reading the name and passing it to a master database for proper

...

Nov 12 '05 #4

P: n/a
I was afraid you'd say that, but I guess you're right. I just thought
this was a common enough issue for others to perhaps have created some
kind of table they could share (or sell?).

Thanks again for your references.

JV
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in message news:<40********@news.microsoft.com>...
You need to talk to the people who are doing the conversion to say what code
page is being used -- it is unreasonable to try to guess (though I would
guess UTF-8 if I had to).
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.
"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
MichKa:

Thanks for the great reference. Unfortunately, it isn't quite a
bulls-eye. I can reproduce any of the characters brought into my
Access 97 database, but knowing what they translate to is the
question. If there is a single int'l character, like ╔ (character
0201), I can leave it as is. However, usually, there are two
characters that represent a third. For example, in the French word
D├ęFENSE as well as the Spanish word M├ęXICO, I can assume the ├ę
translates to ╔ (character 0201), but I can't assume in all cases. For
one, I don't recognize names in most languages, and second, the same
character can be used in different ways in different countries. For
example, ├╝ in Z├╝RICH translates to ▄ (char. 0220), as in Z▄RICH, but
that's for Germany. For Switzerland, I think it's different, as in
HANS-J├╝RGEN translating to HANS-JUERGEN, and M├╝NCHENSTEIN to
MUENCHENSTEIN. By the way, those translations are from searches I've
done on Google, so I may be off on my research.

From the above examples, can you suggest a more accurate approach to
translating the data in question? Of course, if you're going to
suggest becoming proficient in 27 different languages, please be
informed that my brain is begging for a vacation as it is (and I just
started this project :-).

Thanks again, in advance.

J
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in

message news:<40********@news.microsoft.com>...
See

http://microsoft.com/globaldev/reference/WinCP.mspx

for the Windows "default ANSI" [sic] code pages, and

http://microsoft.com/globaldev/reference/oem.mspx

for the Windows "deault OEM" code pages.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
> I'm in the US, and have to constantly take data input from other
> countries. Some of this data has characters which I can't understand,
> since it's input from other language keyboards. This prevents me from
> reading the name and passing it to a master database for proper
...

Nov 12 '05 #5

P: n/a
Well, you have not really said enough about what the format is that someone
would have a notion of what table you need.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
I was afraid you'd say that, but I guess you're right. I just thought
this was a common enough issue for others to perhaps have created some
kind of table they could share (or sell?).

Thanks again for your references.

JV
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in

message news:<40********@news.microsoft.com>...
You need to talk to the people who are doing the conversion to say what code page is being used -- it is unreasonable to try to guess (though I would
guess UTF-8 if I had to).
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.
"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
MichKa:

Thanks for the great reference. Unfortunately, it isn't quite a
bulls-eye. I can reproduce any of the characters brought into my
Access 97 database, but knowing what they translate to is the
question. If there is a single int'l character, like ╔ (character
0201), I can leave it as is. However, usually, there are two
characters that represent a third. For example, in the French word
D├ęFENSE as well as the Spanish word M├ęXICO, I can assume the ├ę
translates to ╔ (character 0201), but I can't assume in all cases. For
one, I don't recognize names in most languages, and second, the same
character can be used in different ways in different countries. For
example, ├╝ in Z├╝RICH translates to ▄ (char. 0220), as in Z▄RICH, but
that's for Germany. For Switzerland, I think it's different, as in
HANS-J├╝RGEN translating to HANS-JUERGEN, and M├╝NCHENSTEIN to
MUENCHENSTEIN. By the way, those translations are from searches I've
done on Google, so I may be off on my research.

From the above examples, can you suggest a more accurate approach to
translating the data in question? Of course, if you're going to
suggest becoming proficient in 27 different languages, please be
informed that my brain is begging for a vacation as it is (and I just
started this project :-).

Thanks again, in advance.

J
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in
message news:<40********@news.microsoft.com>...
> See
>
> http://microsoft.com/globaldev/reference/WinCP.mspx
>
> for the Windows "default ANSI" [sic] code pages, and
>
> http://microsoft.com/globaldev/reference/oem.mspx
>
> for the Windows "deault OEM" code pages.
>
>
> --
> MichKa [MS]
> NLS Collation/Locale/Keyboard Development
> Globalization Infrastructure and Font Technologies
>
> This posting is provided "AS IS" with
> no warranties, and confers no rights.
>
>
>
> "jmev7" <jm***@yahoo.com> wrote in message
> news:c7**************************@posting.google.c om...
> > I'm in the US, and have to constantly take data input from other
> > countries. Some of this data has characters which I can't

understand, > > since it's input from other language keyboards. This prevents me from > > reading the name and passing it to a master database for proper
> ...

Nov 12 '05 #6

P: n/a
I apologize. I'm not trying to be ambiguous, but I'm not sure what you
mean by format. The format of file, as in Access 97? The format of
language? In that case, it's multiple, but I can get a list of what
I've experienced thus far and list them, if that will be of help. I
thought of putting together a simple find and replace table, where
every occurrence of certain characters would then translate to another
character "acceptable" to the target database application. From what I
understand, the reason this is not practical is that two people from
the same country can use different Encoding settings in their
browsers, resulting in the same keystroke translated to a slightly
different character.

I also learned recently that some of the Germanic countries are
replacing some characters, such as the ▄ character being replaced by
UE. This, as was explained to me by a more educated individual, was
due to the difficulties of reproducing such characters on the web by
users from other countries. How this will affect my efforts is another
question to consider.

I am all open to suggestions and references to existing resources
(free or for purchase).

Thanks again.

JV
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in message news:<40********@news.microsoft.com>...
Well, you have not really said enough about what the format is that someone
would have a notion of what table you need.
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
I was afraid you'd say that, but I guess you're right. I just thought
this was a common enough issue for others to perhaps have created some
kind of table they could share (or sell?).

Thanks again for your references.

JV
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in

message news:<40********@news.microsoft.com>...
You need to talk to the people who are doing the conversion to say what code page is being used -- it is unreasonable to try to guess (though I would
guess UTF-8 if I had to).
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.
"jmev7" <jm***@yahoo.com> wrote in message
news:c7**************************@posting.google.c om...
> MichKa:
>
> Thanks for the great reference. Unfortunately, it isn't quite a
> bulls-eye. I can reproduce any of the characters brought into my
> Access 97 database, but knowing what they translate to is the
> question. If there is a single int'l character, like ╔ (character
> 0201), I can leave it as is. However, usually, there are two
> characters that represent a third. For example, in the French word
> D├ęFENSE as well as the Spanish word M├ęXICO, I can assume the ├ę
> translates to ╔ (character 0201), but I can't assume in all cases. For
> one, I don't recognize names in most languages, and second, the same
> character can be used in different ways in different countries. For
> example, ├╝ in Z├╝RICH translates to ▄ (char. 0220), as in Z▄RICH, but
> that's for Germany. For Switzerland, I think it's different, as in
> HANS-J├╝RGEN translating to HANS-JUERGEN, and M├╝NCHENSTEIN to
> MUENCHENSTEIN. By the way, those translations are from searches I've
> done on Google, so I may be off on my research.
>
> From the above examples, can you suggest a more accurate approach to
> translating the data in question? Of course, if you're going to
> suggest becoming proficient in 27 different languages, please be
> informed that my brain is begging for a vacation as it is (and I just
> started this project :-).
>
> Thanks again, in advance.
>
> J
>
>
> "Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in
message news:<40********@news.microsoft.com>... > > See
> >
> > http://microsoft.com/globaldev/reference/WinCP.mspx
> >
> > for the Windows "default ANSI" [sic] code pages, and
> >
> > http://microsoft.com/globaldev/reference/oem.mspx
> >
> > for the Windows "deault OEM" code pages.
> >
> >
> > --
> > MichKa [MS]
> > NLS Collation/Locale/Keyboard Development
> > Globalization Infrastructure and Font Technologies
> >
> > This posting is provided "AS IS" with
> > no warranties, and confers no rights.
> >
> >
> >
> > "jmev7" <jm***@yahoo.com> wrote in message
> > news:c7**************************@posting.google.c om...
> > > I'm in the US, and have to constantly take data input from other
> > > countries. Some of this data has characters which I can't understand, > > > since it's input from other language keyboards. This prevents me from > > > reading the name and passing it to a master database for proper
> > ...

Nov 12 '05 #7

P: n/a
"jmev7" <jm***@yahoo.com> wrote...
I apologize. I'm not trying to be ambiguous, but I'm not sure what you
mean by format.
I mean where is the data coming from?
The format of file, as in Access 97? The format of
language? In that case, it's multiple, but I can get a list of what
I've experienced thus far and list them, if that will be of help.
Hmmm.... so, how is the data being entered? What is the default system
locale of the machine on which it is being entered, and what is the
database's collation? And what is your default system locale now, when you
are looking at the data.
I
thought of putting together a simple find and replace table, where
every occurrence of certain characters would then translate to another
character "acceptable" to the target database application. From what I
understand, the reason this is not practical is that two people from
the same country can use different Encoding settings in their
browsers, resulting in the same keystroke translated to a slightly
different character.
This would not be correct -- encoding translations are not done at the
keystroke level. If you look at the links I gave, all of those code pages
overlap each other such that the same code points means different things,
depending on which one you are looking at.

I also learned recently that some of the Germanic countries are
replacing some characters, such as the ▄ character being replaced by
UE. This, as was explained to me by a more educated individual, was
due to the difficulties of reproducing such characters on the web by
users from other countries. How this will affect my efforts is another
question to consider.
Well, this is actually okay -- if they use the other form then that is what
is stored. And users know what it means.
I am all open to suggestions and references to existing resources
(free or for purchase).


Truly?

If you are doing multilingual work then you MUST consider ugrading to any
version of Access that uses a Unicode version of Jet (2000, 2002, or 2003).
This is the only way that you will be able to get good results in this area
without doing a LOT of work. Better for it to all work without adding
tremendous hacks....
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

Nov 12 '05 #8

P: n/a
"Michael \(michka\) Kaplan [MS]" <mi*****@online.microsoft.com> wrote in message news:<40********@news.microsoft.com>...
Hmmm.... so, how is the data being entered? What is the default system
locale of the machine on which it is being entered, and what is the
database's collation? And what is your default system locale now, when you
are looking at the data.
Well, The data is entered on websites throughout the world as the
company collects data from visitors. There is no default system
locale, as the data is entered from each visitor's system (home or
office) worldwide. My default system has the encoding set to Western
European (ISO). I believe the system collecting the data is an Oracle
system, which is situated in the US. The fact is, I'm weak on these
issues, so I'm going to have to delve deeper, and you're providing me
with some good questions to ask.
<snipped my stuff> This would not be correct -- encoding translations are not done at the
keystroke level. If you look at the links I gave, all of those code pages
overlap each other such that the same code points means different things,
depending on which one you are looking at.
I think it's the overlap that is giving me the difficulty. If striking
a "U" always produced a "U", that would be great. If it produces a "▄"
from one system and a "UE" from another, and they are both correct,
that's fine. When it produces a "├╝" combination, I then have to trap
it.
If you are doing multilingual work then you MUST consider upgrading to any
version of Access that uses a Unicode version of Jet (2000, 2002, or 2003).
This is the only way that you will be able to get good results in this area
without doing a LOT of work. Better for it to all work without adding
tremendous hacks....


Actually, Access 2003 is my goal, as I figured that would help solve
this issue at least to some extent. Due to client restrictions and
inhibitions, we must continue to use the original version for now. I'm
sure you've dealt with clients that say no to progress, yet demand it
in other ways.
Nov 12 '05 #9

P: n/a
"jmev7" <jm***@yahoo.com> wrote...
Hmmm.... so, how is the data being entered? What is the default system
locale of the machine on which it is being entered, and what is the
database's collation? And what is your default system locale now, when you are looking at the data.
Well, The data is entered on websites throughout the world as the
company collects data from visitors. There is no default system
locale, as the data is entered from each visitor's system (home or
office) worldwide. My default system has the encoding set to Western
European (ISO). I believe the system collecting the data is an Oracle
system, which is situated in the US. The fact is, I'm weak on these
issues, so I'm going to have to delve deeper, and you're providing me
with some good questions to ask.


There IS a default system locale for the machine that hosts the database,
and there is a database collation. It is the direct reason why you are
having problems here.
<snipped my stuff>
This would not be correct -- encoding translations are not done at the
keystroke level. If you look at the links I gave, all of those code pages overlap each other such that the same code points means different things, depending on which one you are looking at.


I think it's the overlap that is giving me the difficulty. If striking
a "U" always produced a "U", that would be great. If it produces a "▄"
from one system and a "UE" from another, and they are both correct,
that's fine. When it produces a "├╝" combination, I then have to trap
it.


Well, the problem is that you are trying to store multilingual data in a
non-multilingual database. This will essentially corrupt the data.
If you are doing multilingual work then you MUST consider upgrading to any version of Access that uses a Unicode version of Jet (2000, 2002, or 2003). This is the only way that you will be able to get good results in this area without doing a LOT of work. Better for it to all work without adding
tremendous hacks....


Actually, Access 2003 is my goal, as I figured that would help solve
this issue at least to some extent. Due to client restrictions and
inhibitions, we must continue to use the original version for now. I'm
sure you've dealt with clients that say no to progress, yet demand it
in other ways.


You may have to really INSIST here, as there is no good way to support that
which cannot be supported. I used consult for companies in your client's
situation and make a lot of money providing solutions in such cases, but it
is really only possible if you find someone with the expertise to come up
with an extreme solution. In the meantime, it is crucial that they
understand that ANY upgrade from the version they have is designed to work
here (and that Jet 4.0 itself is free!).
--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.

Nov 12 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.