Use arrays to store multilanguage texts

Michal Táborský

I am wondering, if it's effective to use text arrays to store
multilanguage information.

We used to do it like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
...
)

CREATE TABLE product_txt (
product_id integer,
language_id integer, -- 1=English, 2=Czech, ...
name text,
description text,
...
PRIMARY KEY (product_id, language_id)
)

Then in queries for English version we used joins:

SELECT product.*, product_txt.*
FROM product JOIN product_txt ON product.id=product_txt.product_id
WHERE product_txt.language_id=1

It works as is supposed to, but of course there is some database
overhead and mainly it is much more complicated to handle data this way.
Since 7.4 introduced arrays, I was thinking about using them for storing
multilanguage strings. Like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
name text[],
description text[]
)

Then I'd just do:

SELECT id, price, name[1], description[1] FROM product

Much simpler and IMHO faster (but I'm not a pg-hacker). I never had time
to test it much, but now we are going to build a new database with
multilanguage strings and I am seriously considering using arrays.

The only question, which remains unanswered is how is it with indexes. I
mean--most of the queries on such table are ordered by name, for
example. Therefore I'd do this:

SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.

Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?

Thanks for your attention.

--
Michal Taborsky
http://www.taborsky.cz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAuFh5f8hkbVJsvrIRAsw6AJ48RbAY5UzsnzavnFBN/PHbtkRhOwCghjTO
dUw7fk8M2ae/H+T5k78947s=
=V6NY
-----END PGP SIGNATURE-----

Nov 23 '05 #1

Subscribe Reply

3027

Greg Stark

Michal Táborský <mi****@taborsky.cz> writes:

I am wondering, if it's effective to use text arrays to store
multilanguage information.

We used to do it like this:
I went through almost exactly the same design evolution. My initial design had
dozens of "text_id" fields littered throughout the database and had a
"localized_text" table with the expansion of every text_id for every language.

It was a total headache.
name text[],
description text[]
That's exactly what I have now.
Then I'd just do:

SELECT id, price, name[1], description[1] FROM product
Well I defined functions lang_?? for each language code, so that the index of
each language wasn't hard coded into the application.
The only question, which remains unanswered is how is it with indexes. I
mean--most of the queries on such table are ordered by name, for
example. Therefore I'd do this:
Well in my case I was never using an index on the localized text columns
anyways.
SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.
It is in 7.4. You might need another set of parentheses to make it parse, like
"ON product ((r[1]))". In 7.3 you would need to make a function for each
language and index on lang_en(r). As I said I think that's not a bad idea
anyways.
Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?

My main worry is what happens if I have to add a new language. I'll have to
update every record of every table to add a new element to the arrays. There's
no centralized place that "knows" about all the translated texts, I have to
know all the tables and columns that will have to be fixed.

Similarly it's impossible to query the database for "tell me all the strings
that haven't been translated yet". I would have to do an individual query for
every column of this type.

Note also that in my case I'm entering the data for both languages (we only
have two so far) at the same time in the same form. Most forms in the
application just have two columns and the user has to fill out the same
information in each language.

If you wanted a more traditional l10n interface where the user has to enter
translations for every string in the application then again the inability to
pull out all the strings might be a problem.

But in my model the convenience of being able to update both languages in the
same update and not having to worry about the order in which inserts are done
and keeping track of ids to ensure relationships, are huge. I can't imagine
having continued in the other track.

The benefit of having the translation available immediately without having to
do an extra join depends a lot on the application. In my case I had dozens of
fields to look up and often in the midst of fairly complex queries. I was
worried about performance and also about it raising the likelihood of the
optimizer getting confused. Also it made utter messes of the queries.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #2

Joe Conway

Michal Táborský wrote:

I am wondering, if it's effective to use text arrays to store
multilanguage information.
[...snip...]
SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.
Maybe something like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
name text[],
description text[]
);
insert into product (price, name, description)
values (10,
array['apples-english','apples-spanish','apples-german'],
array['big bunch of apples-english','...-spanish','...-german']
);
insert into product (price, name, description)
values (42,
array['pears-english'],
array['big bunch of pears-english']
);
create or replace function get_lang(text) returns int as '
select case
when $1 = ''english'' then 1
when $1 = ''spanish'' then 2
when $1 = ''german'' then 3
else 1
end
' language sql strict immutable;
create or replace function get_lang_str(text[], int)
returns text as '
select coalesce($1[$2], $1[1])
' language sql strict immutable;
create index product_idx1_english on
product(get_lang_str(name,get_lang('english')));
create index product_idx1_spanish on
product(get_lang_str(name,get_lang('spanish')));
create index product_idx1_german on
product(get_lang_str(name,get_lang('german')));

set enable_seqscan to off;
explain analyze
select
id,
price,
get_lang_str(name,get_lang('spanish')) as name,
get_lang_str(description,get_lang('spanish')) as description
from
product
order by
get_lang_str(name,get_lang('spanish'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using product_idx1_spanish on product (cost=0.00..6.03
rows=2 width=72) (actual time=0.147..0.193 rows=2 loops=1)
Total runtime: 0.246 ms
(2 rows)

select
id,
price,
get_lang_str(name,get_lang('foo')) as name,
get_lang_str(description,get_lang('foo')) as description
from
product
order by
get_lang_str(name,get_lang('foo'));
id | price | name | description
----+-------+----------------+-----------------------------
1 | 10 | apples-english | big bunch of apples-english
2 | 42 | pears-english | big bunch of pears-english
(2 rows)

select
id,
price,
get_lang_str(name,get_lang('spanish')) as name,
get_lang_str(description,get_lang('spanish')) as description
from
product
order by
get_lang_str(name,get_lang('spanish'));
id | price | name | description
----+-------+----------------+----------------------------
1 | 10 | apples-spanish | ...-spanish
2 | 42 | pears-english | big bunch of pears-english
(2 rows)

Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?

Notice I was trying to be careful about cases where a bad language
string is used, or the array is missing languages other than english.
I.e. I picked english as the default language and would need to ensure
every array had at least that string.

I'm not sure how maintainable it will be. You'd have to play with it for
a while and decide for yourself.

It would be interesting to see a speed comparison in a real-life
application -- so if you give it a try, please let us know how it turns out.

HTH,

Joe
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

Karsten Hilbert

> > I am wondering, if it's effective to use text arrays to store

multilanguage information.

We solved it like this:

http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15

- i18n_curr_lang holds the (ISO) language string per user
- i18n_keys holds the strings to be translated
- i18n_translations holds the translations
- function i18n() inserts a text value into i18n_keys thus
marking it for translation, this we use during insertion of
data in the "primary language" (think gettext, think English)
- function _() returns a translated text value (or it's
primary language equivalent), we use that in queries and view
definitions
- v_missing_translations is a view of, well, ...

Then here:

http://savannah.gnu.org/cgi-bin/view...ons.py?rev=1.3

we've got a Python tool that will connect to your database and
create schema files ready for re-insertion via psql after
adding the missing translations. Kind of like a gettext po
file.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #4

Greg Stark

Karsten Hilbert <Ka*************@gmx.net> writes:

I am wondering, if it's effective to use text arrays to store
multilanguage information.
We solved it like this:

http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15

This doesn't work for me I get

/cgi-bin/viewcvs/gnumed/gnumed/gnumed/serve: unknown location
- i18n_curr_lang holds the (ISO) language string per user
I'm interested in this part. Is there a facility for having variables on the
postgres side that you can set once at the beginning of a session and then
refer to in functions?

I instead made my application insert lang_en,lang_fr etc into the queries as
appropriate. This made the application code a little uglier than necessary,
but it's not too bad, I put the function name in a variable and call
$xlate(column) in a double-quoted string.

I think having a global postgres variable would be a nice shortcut, though I
fear it would make it harder to do things like index lang_en() and lang_fr().

Do you cheat and make your _() IMMUTABLE? It's only immutable if you guarantee
that i18n_currr_lang can never change in the middle of a query. But even then
you're lying to the server and indexing _() would do strange things, for
example. In fact I think it would produce corrupted indexes if it worked at
all.
- v_missing_translations is a view of, well, ...

This is the one thing that the array solution can't do. But in my case it's
dynamic text and both languages are being provided at the same time by the
same people. I'm not going to have a translator going through batches of them
like gettext.

Actually I also have a separate solution for the static application text that
is going to be translated in the gettext style. In that case there is a
translations table much like gettext. However all the logic for lookups is
handled on the application side in a Smarty prefilter.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5

Karsten Hilbert

http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15

Works for me. You'll have to make sure you get the entire
string for the URL. It may have been wrapped by your mail
handling agent.

- i18n_curr_lang holds the (ISO) language string per user
I'm interested in this part. Is there a facility for having variables on the
postgres side that you can set once at the beginning of a session and then
refer to in functions?

We set them the first time a user connects to a particular
database. Unless they are changed later (possible anytime with
immediate transparent switching of languages in queries and
views) the language will stay the same across sessions/
clients/ connections.
I instead made my application insert lang_en,lang_fr etc into the queries as
appropriate. This made the application code a little uglier than necessary,
but it's not too bad, I put the function name in a variable and call
$xlate(column) in a double-quoted string.
We then use select _(fruits.name) style queries. The _()
function uses the language in i18n_curr_lang (for
CURRENT_USER).
I think having a global postgres variable would be a nice shortcut, though I
fear it would make it harder to do things like index lang_en() and lang_fr(). I'd think on could index i18n_translations just fine:

create unique index on i18n_translations(trans) where lang='xx';
Do you cheat and make your _() IMMUTABLE? It's only immutable if you guarantee
that i18n_currr_lang can never change in the middle of a query. Given that it'd take a writing query that should run in a
transaction it shouldn't change within a read query. However,
I'm not too sure about views, like we use translated columns
in views like this:

create view a_view as
select
bla as bla,
_(bla) as l10n_bla
....
But even then
you're lying to the server and indexing _() would do strange things, for
example. In fact I think it would produce corrupted indexes if it worked at
all.

I haven't experienced any of that but it may well happen -
dunno.

- v_missing_translations is a view of, well, ...

This is the one thing that the array solution can't do. But in my case it's
dynamic text and both languages are being provided at the same time by the
same people. I'm not going to have a translator going through batches of them
like gettext.

Well, you wouldn't *have* to use that view, it's just for
convenience. You could just as well insert the dynamic text
into i18n_translations (lang, orig, trans) whenever dynamic
text is inserted. We mainly use that (+ i18n()) for gettext
style batch translation of static application text. Except
that all the lookup is handled by _() on the server right
inside the query :-)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6

by: Giovane Calabrese | last post by:

ok. im try to make a multilanguage system based on that tutorial : http://www.123aspx.com/redir.aspx?res=29112 in my aspx pages all text are in labels , and i want to take the name labels...

ASP.NET

Multilanguage

by: Frank Landen | last post by:

Hi NG, I´ve wrote a really big application in vb6 and now, i´ve to implement a multilanguage function in it. I´ve searched with google for a tool, which i could use to do the localization...

Visual Basic .NET

2.0: implementing multilanguage applications?

by: R.A.M. | last post by:

Hi, (Sorry for my English...) I am learning .NET 2.0 and I have a question: how to, generally, implement applications with many country languages? For instance, I want to write ASP.NET...

.NET Framework

JavaScript associative arrays not ordered?

by: Rene Nyffenegger | last post by:

Hello everyone. I am not fluent in JavaScript, so I might overlook the obvious. But in all other programming languages that I know and that have associative arrays, or hashes, the elements in...

Javascript

Multilanguage Support 2.0

by: AG | last post by:

I have seen the samples for multilanguage support regarding what is normally static content. Can anyone point me to a good sample for multilanguage support for dynamic content. Like a gridview...

ASP.NET

Visaul Studio 2005 - c++ - Multilanguage Application

by: paolob | last post by:

I need to provide multilanguage support to my application. I alredy writed a multilanguage test application, using VS6 and resource only dll. Now i'm using Visual Studio 2005, I find out that it...

C / C++

Question about Arrays from php and arrays from mysql

by: nickiz | last post by:

Hello, i'm trying to create a simple converter application using php. The main function i use is strtr() that can translate from $string via arrays. Because the arrays need to add in the file...

PHP

Multilanguage Support and Databases

by: Sourcerer | last post by:

How to implement multilanguage support for a program in VC++ in MSVS .NET? What I mean by this is how to make texts, such as texts on command buttons, labels, etc. show in different languages,...

.NET Framework

Accessing string arrays

by: Giovanni | last post by:

Hi! I've an array of string like this: $arraystring="John"; $arraystring="50"; $arraystring="teacher"; $arraystring is global variable. I'm trying to code a function that get a string as...

PHP

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

Use arrays to store multilanguage texts

Similar topics