473,698 Members | 2,378 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

i18n'ed Character Set in DBMS and tables

.. Can you define the Character Set for particular tables instead of
databases?
. Which DBMSs would let you do that?
. How do you store in a DBMS i18n'ed users' from input, coming over
the web (basically from everywhere) store it and properly serve it
back to users, . . .?
. Can you point me to info on this?

I would preferably use Java/JDBC drivers.
Nov 12 '05 #1
10 3731
Albretch wrote:
. Can you define the Character Set for particular tables instead of
databases?
Yes, sometimes in the entire world of computer science, but since you have
cross-posted so widely, you need to choose a single newsgroup and ask again
for a specific answer.
**.*Which*DBMS s*would*let*you *do*that?
That depends.
**.*How*do*you *store*in*a*DBM S*i18n'ed*users '*from*input,*c oming*over
the web (basically from everywhere) store it and properly serve it
back to users, . . .?


Unicode. Choose one newsgroup, make one post.
--
Paul Lutus
http://www.arachnoid.com

Nov 12 '05 #2
"Albretch" <lb*****@hotmai l.com> wrote in message
news:f8******** *************** ***@posting.goo gle.com...
. Can you define the Character Set for particular tables instead of databases?
Depends on the DBMS . Which DBMSs would let you do that? Various. With such a massive crosspost, there's little point in anybody's
answering with any specific set of DBMS's . How do you store in a DBMS i18n'ed users' from input, coming over
the web (basically from everywhere) store it and properly serve it
back to users, . . .? You don't. I18n is designed to permit you to SELECT ONE SPECIFIC language
and process that correctly. To store and retrieve "ALL" languages, you use
Unicode. . Can you point me to info on this? www.unicode.org
I would preferably use Java/JDBC drivers.

Many databases support these APIs.
Nov 12 '05 #3
"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message news:<ch******* ***@ngspool-d02.news.aol.co m>...
.. . .
You don't. I18n is designed to permit you to SELECT ONE SPECIFIC language
and process that correctly. To store and retrieve "ALL" languages, you use
Unicode.

.. . .

Well, I see 'issues' right there. I think your approach to it is
wrong 'by design and implementation' and this is what I am trying to
avoid.

Let's say you can set the character set and collation all the way
down to the column. Now, if you 'use unicode to store and retrieve
"ALL" languages' (as you suggest) and since, naturally and per SQL
ANSI's spex, you can only set a character set and a related collation
on a column (and AFAIK (and I could see why it is not so) you can not
specify/change collation on the fly as you run a 'select' query with
an 'order by' clause)

Unless in Unicode, which by the way I see as a good technical example
of a waste by trying to keep extensively ASCII'ing all nat langs (this
is the weakest/silliest 'standard' I know of), collation is not
necessary since there is a 1=1 map between character set and collation
orders for all langs which to me sounds really unnatural

Say you have Korean names and Swahili ones in a table, do people 'use
unicode to store and retrieve "ALL" languages and since' and then keep
an extra columns specifying the character set, . . . and then 'SELECT
ONE SPECIFIC language and process that correctly' a la':

SELECT SrName, FName from NamesTable
WHERE(CHAR_SET_ Col='Korean_CHA R_SET') SORT BY SrName, FName;

and/or

SELECT SrName, FName from NamesTable
WHERE(CHAR_SET_ Col='Swahili_CH AR_SET') SORT BY SrName, FName;

this would be -way- slower than having the two tables collation
sensitive columns set to the correct char_set + collation pair,
keeping an index on them and periodically physically sorting them.

Or?
Nov 12 '05 #4
> Or?

I18n is about Internationaliz ation = I(18 chars)n. Even the name by itself
should make it obvious that it's not about multilingual text (German:
Internationalis ierung = I19g?). See the details at
www.w3.org/International/. I18n is about processing, not about storage
representation. Unicode is about multilingual storage (www.unicode.org).
Much of I18n applies to Unicode storage, and issues related to choosing one
encoding over another (www.w3.org/TR/i18n-html-tech-char/)

If you need to store Korean, you have to use some form of 2 byte
representation. DBCS is the older mechanism, Unicode the standardized
technique, both are supported by DB2 (DBCS is called GRAPHIC).

Issues of multilingual collation are linguistic, not representationa l.
Knowledge of the language is essential. German sorts ö equivalently to o,
whereas Swedish sorts it after z, and English writers tend to drop the
umlaut complete, leading to nonsensical pronunciation (schon and schön are
two different words with different meaning). Collation order is an
intractable problem: consider the problem of how to sort a table containing
both German and Swedish names - any choice you make will be totally wrong to
the other nationality, and hence you need to consider not the origin of the
data but the origin of the consumer! On the other hand, Swahili is correctly
sorted using the general Unicode sort.

You need to read up on both I18n and Unicode and understand what each is
about and how they complement each other. I've given you URLs above.

You also need to do some thinking about your problem, and understand its
ramifications. Getting hold of people with experience in other languages,
including complex scripts (Arabic, Chinese, Japanese, Korean, Thai) and
other alphabets (Hebrew, Russian), as well as differing rules when using
accented latin characters (most continental European, Vietnamese) will
undoubtedly help you understand the consequence of multilinguage databases.
You'll also learn that issues such as sorting (your "way slower" comment)
are in many ways the least of your problems. Fortunately, most systems
provide highly efficient processing of a range of common languages, but not
all languages. DB2 or SQL Server, for example, do not have native Swahili
support (your example); Windows XP introduced Swahili, Windows 2000 didn't
support it (other than through Unicode).

"Albretch" <lb*****@hotmai l.com> wrote in message
news:f8******** *************** ***@posting.goo gle.com...
"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message news:<ch******* ***@ngspool-d02.news.aol.co m>... . . .
You don't. I18n is designed to permit you to SELECT ONE SPECIFIC language and process that correctly. To store and retrieve "ALL" languages, you use Unicode.

. . .

Well, I see 'issues' right there. I think your approach to it is
wrong 'by design and implementation' and this is what I am trying to
avoid.

Let's say you can set the character set and collation all the way
down to the column. Now, if you 'use unicode to store and retrieve
"ALL" languages' (as you suggest) and since, naturally and per SQL
ANSI's spex, you can only set a character set and a related collation
on a column (and AFAIK (and I could see why it is not so) you can not
specify/change collation on the fly as you run a 'select' query with
an 'order by' clause)

Unless in Unicode, which by the way I see as a good technical example
of a waste by trying to keep extensively ASCII'ing all nat langs (this
is the weakest/silliest 'standard' I know of), collation is not
necessary since there is a 1=1 map between character set and collation
orders for all langs which to me sounds really unnatural

Say you have Korean names and Swahili ones in a table, do people 'use
unicode to store and retrieve "ALL" languages and since' and then keep
an extra columns specifying the character set, . . . and then 'SELECT
ONE SPECIFIC language and process that correctly' a la':

SELECT SrName, FName from NamesTable
WHERE(CHAR_SET_ Col='Korean_CHA R_SET') SORT BY SrName, FName;

and/or

SELECT SrName, FName from NamesTable
WHERE(CHAR_SET_ Col='Swahili_CH AR_SET') SORT BY SrName, FName;

this would be -way- slower than having the two tables collation
sensitive columns set to the correct char_set + collation pair,
keeping an index on them and periodically physically sorting them.

Or?

Nov 12 '05 #5
"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message news:<ch******* ***@ngspool-d02.news.aol.co m>...
Or?
I18n is about processing, not about storage representation AM: Yeah, sure! But ultimately storage, representation, reencoding .
.. . are all part of the functional pipe of 'processing'. These are
all naturally related as you serve the data to clients.
consider the problem of how to sort a table containing both German and Swedish names - any choice you make will be totally wrong to the other nationality, and hence you need to consider not the origin of the data but the origin of the consumer! AM: This is what I think is exaclty wrong; you do not store in a
table German and Swedish names together in the same column!
Say someone hits your site and you get from the language User-Agent
headers her browser is setting to preferably handle Hungarian then you
have a large database with foreign names data from a wild array of
nationalities/character sets (like the immigrant database at Ellis
Island). Now, say these guys at Ellis Island for
'let-me-know-about-your-life' issues decide to include the original
names in their langs. (an incredible lot of people's names where
change to the 'gringo' 'Mary' and 'John')
Would you design a monster table with all names and then set the
column char set as Unicode or have different tables for the different
charset+collati on pairs + a primary/foreign Key design of the
database, making your Hungarian user life faster/easier?

If you read into daddy's E.F. Codd defined well-structured 'normal
forms' of relations and 'normalization' , you should not mix 'different
type of data' in the same column (and the charset+collati on makes
these data differ, don't they?)

And I do think all 'theories' are 'unpractical' until proven
otherwise
. . . Getting hold of people with experience in other languages . . . AM: I do have 'experience in other languages', threee of them,
although they are all western ones :-(
I think this problem can be reduced to a simple mathematical one. You
don't need to know may people to understand this problem.
You'll also learn that issues such as sorting (your "way slower" comment) are in many ways the least of your problems.

AM: . . . 'least of your problems' . . . when you will have to set
indexes in these types of columns, since they are 'free text'
searchable ones?
For me, a developer, "way slower" means anything that would run more
than 30% slower. Being ready/able to have three 'customer' instead of
two makes a difference in business as well as in life. DBMS issues
(and their related IO ones) is the number one performance issue in
large DBMS-based software development
Nov 12 '05 #6
How many of your 3 languages do you speak, read and write fluently? How many
of these use non-Latin characters? Are you aware of how many languages there
are in the world? Have you considered that some languages have multiple
writing systems, even going so far as to use different alphabets, in
different locales? What about users whose language and locale don't mix?
Here I am, posting in English, and living in the Swiss German locale. My
keyboard is a Swiss German one, my Windows is an English one. Swiss German
don't even use the same alphabetic characters for writing as Germany does
for "German German".

Also, did you actually understand Codd's normalization? Why are you
confusing type and interpretation?

There is absolutely no way I would use a separate table for each (language,
locale) combination. And I'm speaking as somebody who develops software that
supports 4 languages (simultaneously ) for a living (the three main national
languages of this country: German, French and Italian, plus English), and
stores additional languages in the database (international financial and
economic data from OECD, BIS, World Bank, etc.).

You want to support all languages, locales and scripts. But you don't appear
to have the faintest idea of the problems involved.

"Albretch" <lb*****@hotmai l.com> wrote in message
news:f8******** *************** ***@posting.goo gle.com...
"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message news:<ch******* ***@ngspool-d02.news.aol.co m>...
Or?

I18n is about processing, not about storage representation

AM: Yeah, sure! But ultimately storage, representation, reencoding .
. . are all part of the functional pipe of 'processing'. These are
all naturally related as you serve the data to clients.
consider the problem of how to sort a table containing both German and

Swedish names - any choice you make will be totally wrong to the other
nationality, and hence you need to consider not the origin of the data but
the origin of the consumer! AM: This is what I think is exaclty wrong; you do not store in a
table German and Swedish names together in the same column!
Say someone hits your site and you get from the language User-Agent
headers her browser is setting to preferably handle Hungarian then you
have a large database with foreign names data from a wild array of
nationalities/character sets (like the immigrant database at Ellis
Island). Now, say these guys at Ellis Island for
'let-me-know-about-your-life' issues decide to include the original
names in their langs. (an incredible lot of people's names where
change to the 'gringo' 'Mary' and 'John')
Would you design a monster table with all names and then set the
column char set as Unicode or have different tables for the different
charset+collati on pairs + a primary/foreign Key design of the
database, making your Hungarian user life faster/easier?

If you read into daddy's E.F. Codd defined well-structured 'normal
forms' of relations and 'normalization' , you should not mix 'different
type of data' in the same column (and the charset+collati on makes
these data differ, don't they?)

And I do think all 'theories' are 'unpractical' until proven
otherwise
. . . Getting hold of people with experience in other languages . . . AM: I do have 'experience in other languages', threee of them,
although they are all western ones :-(
I think this problem can be reduced to a simple mathematical one. You
don't need to know may people to understand this problem.
You'll also learn that issues such as sorting (your "way slower"

comment) are in many ways the least of your problems. AM: . . . 'least of your problems' . . . when you will have to set
indexes in these types of columns, since they are 'free text'
searchable ones?
For me, a developer, "way slower" means anything that would run more
than 30% slower. Being ready/able to have three 'customer' instead of
two makes a difference in business as well as in life. DBMS issues
(and their related IO ones) is the number one performance issue in
large DBMS-based software development

Nov 12 '05 #7
Usually I stop paying attention to people when they start getting
personal. However I think our talk has been constructive for the most
part.
How many of your 3 languages do you speak, read and write fluently? AM: I would say the three of them. Spanish is my mother tongue; I
studied in Germany graduating with a Master's in Math/Physics and have
lived in the US for ten years.
How many of these use non-Latin characters? AM: Do you mean latin-1/ISO-8859-1? Spanish and German use a few.
Are you aware of how many languages there are in the world? AM: Pretty much, if 'a whole lot' would qualify as an answer to you
:-)
Have you considered that some languages have multiple writing systems, even going so far as to use different alphabets, in different locales?
AM: Yes, I have.
What about users whose language and locale don't mix? AM: What do you f*ck&ng mean? Are you using the terms 'language' and
'locale' as a free speech kind of thing or as defined technical
standards?

The Java API did a fine job at functionally describing both terms

http://java.sun.com/j2se/1.5.0/docs/...il/Locale.html

if you understand Java/OOP; the fact that there is no Locale
constructor without a specified language would tell you something.

The language argument is a valid ISO Language Code. These codes are
the lower-case, two-letter codes as defined by ISO-639. You can find a
full list of these codes at a number of sites, such as:
http://www.loc.gov/standards/iso639-2/englangn.html
The country argument is a valid ISO Country Code. These codes are the
upper-case, two-letter codes as defined by ISO-3166. You can find a
full list of these codes at a number of sites, such as:
http://www.iso.ch/iso/en/prods-servi.../list-en1.html
Here I am, posting in English, and living in the Swiss German locale. My
keyboard is a Swiss German one, my Windows is an English one. AM: . . . and your browser settings are?

Swiss German
don't even use the same alphabetic characters for writing as Germany does
for "German German". AM: the differences are very minimal indeed. I have spoken to Swiss
German people and we have understood each other 'einwandfrei'. I would
even dare to say American and Brittish English differ more and people
use both without paying much attention to the differences
Also, did you actually understand Codd's normalization? Why are you
confusing type and interpretation? AM: . . . because it affects the sort order (even down to a
physical level) and how fast the table is sampled with select stats
that include these columns in the 'order by' clause.

Dr. Codd's Rule #1: All information in a relational database is
represented explicitly at the logical level in exactly one way: by
values in tables. And, the data in each field is assumed to be atomic;
that is, the smallest bit of useful information -- a single value.

Think about this phrase in his stat "the smallest bit of useful
information" . . . and you will understand what I mean
There is absolutely no way I would use a separate table for each (language,
locale) combination. And I'm speaking as somebody who develops software that
supports 4 languages (simultaneously ) for a living (the three main national
languages of this country: German, French and Italian, plus English), and
stores additional languages in the database (international financial and
economic data from OECD, BIS, World Bank, etc.).
AM: Wow! If you pay me to and/or I had the time to do it, I would
technically prove my point to you, but since it is you the one gaining
something by understanding it and I think you are pretty much capable
of showing it to yourself (if you 'want to' see it), I will leave it
to you as 'homework'

"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message news:<ch******* ***@ngspool-d02.news.aol.co m>... How many of your 3 languages do you speak, read and write fluently? How many
of these use non-Latin characters? Are you aware of how many languages there
are in the world? Have you considered that some languages have multiple
writing systems, even going so far as to use different alphabets, in
different locales? What about users whose language and locale don't mix?
Here I am, posting in English, and living in the Swiss German locale. My
keyboard is a Swiss German one, my Windows is an English one. Swiss German
don't even use the same alphabetic characters for writing as Germany does
for "German German".

Also, did you actually understand Codd's normalization? Why are you
confusing type and interpretation?

There is absolutely no way I would use a separate table for each (language,
locale) combination. And I'm speaking as somebody who develops software that
supports 4 languages (simultaneously ) for a living (the three main national
languages of this country: German, French and Italian, plus English), and
stores additional languages in the database (international financial and
economic data from OECD, BIS, World Bank, etc.).

You want to support all languages, locales and scripts. But you don't appear
to have the faintest idea of the problems involved.

Nov 12 '05 #8
On Wed, 8 Sep 2004 01:14:42 +0800, Albretch wrote
(in article <f8************ **************@ posting.google. com>):
"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message
news:<ch******* ***@ngspool-d02.news.aol.co m>...
Or?

I18n is about processing, not about storage representation

AM: Yeah, sure! But ultimately storage, representation, reencoding .
. . are all part of the functional pipe of 'processing'. These are
all naturally related as you serve the data to clients.
consider the problem of how to sort a table containing both German and
Swedish names - any choice you make will be totally wrong to the other
nationality, and hence you need to consider not the origin of the data but
the origin of the consumer!

AM: This is what I think is exaclty wrong; you do not store in a
table German and Swedish names together in the same column!
Say someone hits your site and you get from the language User-Agent
headers her browser is setting to preferably handle Hungarian then you
have a large database with foreign names data from a wild array of
nationalities/character sets (like the immigrant database at Ellis
Island). Now, say these guys at Ellis Island for
'let-me-know-about-your-life' issues decide to include the original
names in their langs. (an incredible lot of people's names where
change to the 'gringo' 'Mary' and 'John')
Would you design a monster table with all names and then set the
column char set as Unicode or have different tables for the different
charset+collati on pairs + a primary/foreign Key design of the
database, making your Hungarian user life faster/easier?

If you read into daddy's E.F. Codd defined well-structured 'normal
forms' of relations and 'normalization' , you should not mix 'different
type of data' in the same column (and the charset+collati on makes
these data differ, don't they?)

And I do think all 'theories' are 'unpractical' until proven
otherwise
. . . Getting hold of people with experience in other languages . . .

AM: I do have 'experience in other languages', threee of them,
although they are all western ones :-(
I think this problem can be reduced to a simple mathematical one. You
don't need to know may people to understand this problem.
You'll also learn that issues such as sorting (your "way slower" comment)
are in many ways the least of your problems.

AM: . . . 'least of your problems' . . . when you will have to set
indexes in these types of columns, since they are 'free text'
searchable ones?
For me, a developer, "way slower" means anything that would run more
than 30% slower. Being ready/able to have three 'customer' instead of
two makes a difference in business as well as in life. DBMS issues
(and their related IO ones) is the number one performance issue in
large DBMS-based software development


look it is very simple.

If your app is ONLY EVER going to use a language that can be encoded by
using ascii and single byte data, then that is fine.

if you are going to support multi languages then you MUST use some sort of
multi byte system 2-3 bytes per character, FOR ALL DATA.

so yes you have 1 multi byte enabled table. and you put all the shit in 1
table.

let's consider for a moment how you would handle your system of multi
tables.

say your application deals with single byte and muilti byte data, how for
example are you Going to :

1.spit & con-cat strings ? is it 1 byte or is it 2 bytes, or 3
2.how do you check the length of a string?, is it 200 bytes for a single line
of address data or is it 400, or 600.

3.how are you going to build your sql strings to select the different tables?
4. what happens if the user puts english & chinese characters together on the
same line of data?
( in china & asia we mix asian glifs & roman together)
which table does the resulting string go in?

are you going to check which country the user is in and make dangerous
assumptions?
( origin of the consumer?), say i'm in England, using Chinese windows?
or are you going to scan the string & see if you can stick it in a single
byte table or a multi byte table?
you are going to bury yourself in the logistics of trying to deal with
different encoding lengths.

using 1 table is NOT slower, tables can be partitioned ( well in oracle they
can)
so you could store the data in 1 table, then partition the table on a
"language column"

the only time i have found using multi tables a good idea, is when i have 2
clearly defined languages English & Chinese,.
When i have to store the same address in both forms ( 1 for the english
staff, 1 for the asian staff) AND not every English address has a Chinese
equivalent. ( or visa versa, depending on your point of view)

but i still have to have 2 sets of duplicate edits screens, ( luckily its is
only 10 data fields), and that is such a pain , i am trying to reduce it
back to 1 set of screens.
you need to get over your fixation with "how big the data is" & how slow is
my database going to be.

storage is just too cheap these days.
top end databases , are so optimized these days that they spend a large part
of their time "asleep"



Nov 12 '05 #9
You may speak 3 western European languages, but all three of these can be
encoded within a single "Western European" ASCII character set, and do not
have seriously conflicting sort orders. Mixing German and Hebrew would be
somewhat messier, although rather common - even within a single document. Of
course, bidirectionalit y adds yet another complexity that you haven't
considered.

We use the 2 character ISO country code (ISO 3166); that's what is used
generally by international financial reporting. We also use GESMES
(UN/EDIFACT) (www.unece.org, c.f.
http://www.unece.org/trade/untdid/d9.../gesmes_c.htm), but that's less
concerned with languages, and more with data exchange. Locales, as we use
them, are those from I18n (http://www.w3.org/International/). The "problem"
(e.g. in Java's model) is that just because a user lives in some country /
locale, he does not necessarily use the language(s) defined for these, hence
he is forced to lie about his locale in order to get the desired language.
Microsoft, BTW, worked this out, and fixed the problem in Windows 2000. My
IE6 browser is set up for "German (Switzerland) [de-ch]" by default, with
the browser language set to English. There is no "English (Switzerland)
[en-ch]". Except for a few sites which seem to believe that I have to be
presented with the language of "my locale", I have no problems (google uses
it as a default, but lets me override it, saving my configuration).

Fortunately, you don't work with me, so I have no need to explain that mixed
language documents prevent separate tables, even if the underlying design of
vertically partitioning information without maintaining the partitioning key
were not totally wrong. It will be your boss's problem to clean up the chaos
you leave behind.

Since this conversation is a waste of time, this is my last response.

Dr Mark Yudkin

"Albretch" <lb*****@hotmai l.com> wrote in message
news:f8******** *************** ***@posting.goo gle.com...
Usually I stop paying attention to people when they start getting
personal. However I think our talk has been constructive for the most
part.
How many of your 3 languages do you speak, read and write fluently? AM: I would say the three of them. Spanish is my mother tongue; I
studied in Germany graduating with a Master's in Math/Physics and have
lived in the US for ten years.
How many of these use non-Latin characters?

AM: Do you mean latin-1/ISO-8859-1? Spanish and German use a few.
Are you aware of how many languages there are in the world?

AM: Pretty much, if 'a whole lot' would qualify as an answer to you
:-)
Have you considered that some languages have multiple writing systems, even going so far as to use different alphabets, in different locales?
AM: Yes, I have.
What about users whose language and locale don't mix? AM: What do you f*ck&ng mean? Are you using the terms 'language' and
'locale' as a free speech kind of thing or as defined technical
standards?

The Java API did a fine job at functionally describing both terms

http://java.sun.com/j2se/1.5.0/docs/...il/Locale.html

if you understand Java/OOP; the fact that there is no Locale
constructor without a specified language would tell you something.

The language argument is a valid ISO Language Code. These codes are
the lower-case, two-letter codes as defined by ISO-639. You can find a
full list of these codes at a number of sites, such as:
http://www.loc.gov/standards/iso639-2/englangn.html
The country argument is a valid ISO Country Code. These codes are the
upper-case, two-letter codes as defined by ISO-3166. You can find a
full list of these codes at a number of sites, such as:

http://www.iso.ch/iso/en/prods-servi.../list-en1.html
Here I am, posting in English, and living in the Swiss German locale. My
keyboard is a Swiss German one, my Windows is an English one.

AM: . . . and your browser settings are?

Swiss German
don't even use the same alphabetic characters for writing as Germany

does for "German German".

AM: the differences are very minimal indeed. I have spoken to Swiss
German people and we have understood each other 'einwandfrei'. I would
even dare to say American and Brittish English differ more and people
use both without paying much attention to the differences
Also, did you actually understand Codd's normalization? Why are you
confusing type and interpretation?

AM: . . . because it affects the sort order (even down to a
physical level) and how fast the table is sampled with select stats
that include these columns in the 'order by' clause.

Dr. Codd's Rule #1: All information in a relational database is
represented explicitly at the logical level in exactly one way: by
values in tables. And, the data in each field is assumed to be atomic;
that is, the smallest bit of useful information -- a single value.

Think about this phrase in his stat "the smallest bit of useful
information" . . . and you will understand what I mean
There is absolutely no way I would use a separate table for each (language, locale) combination. And I'm speaking as somebody who develops software that supports 4 languages (simultaneously ) for a living (the three main national languages of this country: German, French and Italian, plus English), and stores additional languages in the database (international financial and
economic data from OECD, BIS, World Bank, etc.).


AM: Wow! If you pay me to and/or I had the time to do it, I would
technically prove my point to you, but since it is you the one gaining
something by understanding it and I think you are pretty much capable
of showing it to yourself (if you 'want to' see it), I will leave it
to you as 'homework'

"Mark Yudkin" <my************ ***********@nos pam.org> wrote in message

news:<ch******* ***@ngspool-d02.news.aol.co m>...
How many of your 3 languages do you speak, read and write fluently? How many of these use non-Latin characters? Are you aware of how many languages there are in the world? Have you considered that some languages have multiple
writing systems, even going so far as to use different alphabets, in
different locales? What about users whose language and locale don't mix?
Here I am, posting in English, and living in the Swiss German locale. My
keyboard is a Swiss German one, my Windows is an English one. Swiss German don't even use the same alphabetic characters for writing as Germany does for "German German".

Also, did you actually understand Codd's normalization? Why are you
confusing type and interpretation?

There is absolutely no way I would use a separate table for each (language, locale) combination. And I'm speaking as somebody who develops software that supports 4 languages (simultaneously ) for a living (the three main national languages of this country: German, French and Italian, plus English), and stores additional languages in the database (international financial and
economic data from OECD, BIS, World Bank, etc.).

You want to support all languages, locales and scripts. But you don't appear to have the faintest idea of the problems involved.

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
5819
by: Pierre Morel | last post by:
Hi, I have install the JDK1.4.2_02 and the installer doesn't install the i18n.jar in the lib directory. In 1.3.1, there was an International version of the JDK and that file was installed. What is happening in 1.4.2 ? I want to test my application to see if it run in say Cyrilic. Everythings goes well with Unicode text in properties files. But if I hardcoded some text in my application, it shows little sqare
3
2242
by: F. GEIGER | last post by:
When I start a py2exe-ed application I get the error 'ascii' codec can't encode character u'\xe9' in position 10: ordinal not in range(128) This is how I run py2exe: setup.py py2exe -O1 --packages encodings This is how the .po-file looks like:
3
639
by: Albretch | last post by:
.. Can you define the Character Set for particular tables instead of databases? . Which DBMSs would let you do that? . How do you store in a DBMS i18n'ed users' from input, coming over the web (basically from everywhere) store it and properly serve it back to users, . . .? . Can you point me to info on this? I would preferably use Java/JDBC drivers.
13
8649
by: Guido Wesdorp | last post by:
Hi! I've just released a JavaScript library to allow internationalizing JavaScript code and/or to do HTML translation from JavaScript. It's a first release, and it doesn't have all the features I'm interested in (e.g. it doesn't support domains, although I don't think that's much of a problem in most JavaScript applications, and it uses a non-standard message catalog format, instead of .po files translations are stored in XML) but it's...
3
1974
by: David Winter | last post by:
If I set up a form and have an ASP script process the users's input, which component defines the character set of that input? - His browser/OS? - The encoding attribute in the doctype declaration? - The charset attribute in the content-type meta tag? - The script processing the form? - ..? TIA.
13
27979
by: Michal | last post by:
Hello, is there any way how to detect string encoding in Python? I need to proccess several files. Each of them could be encoded in different charset (iso-8859-2, cp1250, etc). I want to detect it, and encode it to utf-8 (with string function encode). Thank you for any answer Regards Michal
40
3107
by: Shmuel (Seymour J.) Metz | last post by:
I'd like to include some Hebrew names in a web page. HTML 4 doesn't appear to include character attributes for ISO-8859-8. I'd prefer avoiding numeric references, e.g., "&#x05E9;&#x05DE;&#x05D5;&#x05D0;&#x05DC;". Is there currently a standardized set of character attributes for Hebrew? If so, is there a downloadable set of definitions for those attributes? Thanks. --
3
1624
by: fyleow | last post by:
I just spent hours trying to figure out why even after I set my SQL table attributes to UTF-8 only garbage kept adding into the database. Apparently you need to execute "SET NAMES 'utf8'" before inserting into the tables. Does anyone have experience working with other languages using Django or Turbogears? I just need to be able to retrieve and enter text to the database from my page without it being mangled. I know these frameworks...
44
9474
by: Kulgan | last post by:
Hi I am struggling to find definitive information on how IE 5.5, 6 and 7 handle character input (I am happy with the display of text). I have two main questions: 1. Does IE automaticall convert text input in HTML forms from the
2
3303
by: Norman Diamond | last post by:
My C# code is I18N'ed by appropriately naming and editing .resx files. At execution time, it works. My C++ code is somewhat I18N'ed. When I put UI code in C++ I use .rc files. When I link to a third party's DLLs I get what they supplied. When I link to Microsoft's CRT and MFC, um... Well anyway, at execution time it seems to be working, except when a third party's DLL has a secret dependency. But what about the installer? A Visual...
0
8608
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9161
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8897
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8867
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
4370
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3050
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2332
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2006
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.