By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,053 Members | 1,595 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,053 IT Pros & Developers. It's quick & easy.

Use arrays to store multilanguage texts

P: n/a
I am wondering, if it's effective to use text arrays to store
multilanguage information.

We used to do it like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
...
)

CREATE TABLE product_txt (
product_id integer,
language_id integer, -- 1=English, 2=Czech, ...
name text,
description text,
...
PRIMARY KEY (product_id, language_id)
)

Then in queries for English version we used joins:

SELECT product.*, product_txt.*
FROM product JOIN product_txt ON product.id=product_txt.product_id
WHERE product_txt.language_id=1

It works as is supposed to, but of course there is some database
overhead and mainly it is much more complicated to handle data this way.
Since 7.4 introduced arrays, I was thinking about using them for storing
multilanguage strings. Like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
name text[],
description text[]
)

Then I'd just do:

SELECT id, price, name[1], description[1] FROM product

Much simpler and IMHO faster (but I'm not a pg-hacker). I never had time
to test it much, but now we are going to build a new database with
multilanguage strings and I am seriously considering using arrays.

The only question, which remains unanswered is how is it with indexes. I
mean--most of the queries on such table are ordered by name, for
example. Therefore I'd do this:

SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.

Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?

Thanks for your attention.

--
Michal Taborsky
http://www.taborsky.cz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAuFh5f8hkbVJsvrIRAsw6AJ48RbAY5UzsnzavnFBN/PHbtkRhOwCghjTO
dUw7fk8M2ae/H+T5k78947s=
=V6NY
-----END PGP SIGNATURE-----

Nov 23 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a

Michal Táborský <mi****@taborsky.cz> writes:
I am wondering, if it's effective to use text arrays to store
multilanguage information.

We used to do it like this:
I went through almost exactly the same design evolution. My initial design had
dozens of "text_id" fields littered throughout the database and had a
"localized_text" table with the expansion of every text_id for every language.

It was a total headache.
name text[],
description text[]
That's exactly what I have now.
Then I'd just do:

SELECT id, price, name[1], description[1] FROM product
Well I defined functions lang_?? for each language code, so that the index of
each language wasn't hard coded into the application.
The only question, which remains unanswered is how is it with indexes. I
mean--most of the queries on such table are ordered by name, for
example. Therefore I'd do this:
Well in my case I was never using an index on the localized text columns
anyways.
SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.
It is in 7.4. You might need another set of parentheses to make it parse, like
"ON product ((r[1]))". In 7.3 you would need to make a function for each
language and index on lang_en(r). As I said I think that's not a bad idea
anyways.
Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?


My main worry is what happens if I have to add a new language. I'll have to
update every record of every table to add a new element to the arrays. There's
no centralized place that "knows" about all the translated texts, I have to
know all the tables and columns that will have to be fixed.

Similarly it's impossible to query the database for "tell me all the strings
that haven't been translated yet". I would have to do an individual query for
every column of this type.

Note also that in my case I'm entering the data for both languages (we only
have two so far) at the same time in the same form. Most forms in the
application just have two columns and the user has to fill out the same
information in each language.

If you wanted a more traditional l10n interface where the user has to enter
translations for every string in the application then again the inability to
pull out all the strings might be a problem.

But in my model the convenience of being able to update both languages in the
same update and not having to worry about the order in which inserts are done
and keeping track of ids to ensure relationships, are huge. I can't imagine
having continued in the other track.

The benefit of having the translation available immediately without having to
do an extra join depends a lot on the application. In my case I had dozens of
fields to look up and often in the midst of fairly complex queries. I was
worried about performance and also about it raising the likelihood of the
optimizer getting confused. Also it made utter messes of the queries.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #2

P: n/a
Michal Táborský wrote:
I am wondering, if it's effective to use text arrays to store
multilanguage information.
[...snip...]
SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]

Is it possible to build an index, which will be used in such query? I
had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
the syntax is just somehow different.
Maybe something like this:

CREATE TABLE product (
id serial NOT NULL,
price float4,
name text[],
description text[]
);
insert into product (price, name, description)
values (10,
array['apples-english','apples-spanish','apples-german'],
array['big bunch of apples-english','...-spanish','...-german']
);
insert into product (price, name, description)
values (42,
array['pears-english'],
array['big bunch of pears-english']
);
create or replace function get_lang(text) returns int as '
select case
when $1 = ''english'' then 1
when $1 = ''spanish'' then 2
when $1 = ''german'' then 3
else 1
end
' language sql strict immutable;
create or replace function get_lang_str(text[], int)
returns text as '
select coalesce($1[$2], $1[1])
' language sql strict immutable;
create index product_idx1_english on
product(get_lang_str(name,get_lang('english')));
create index product_idx1_spanish on
product(get_lang_str(name,get_lang('spanish')));
create index product_idx1_german on
product(get_lang_str(name,get_lang('german')));

set enable_seqscan to off;
explain analyze
select
id,
price,
get_lang_str(name,get_lang('spanish')) as name,
get_lang_str(description,get_lang('spanish')) as description
from
product
order by
get_lang_str(name,get_lang('spanish'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using product_idx1_spanish on product (cost=0.00..6.03
rows=2 width=72) (actual time=0.147..0.193 rows=2 loops=1)
Total runtime: 0.246 ms
(2 rows)

select
id,
price,
get_lang_str(name,get_lang('foo')) as name,
get_lang_str(description,get_lang('foo')) as description
from
product
order by
get_lang_str(name,get_lang('foo'));
id | price | name | description
----+-------+----------------+-----------------------------
1 | 10 | apples-english | big bunch of apples-english
2 | 42 | pears-english | big bunch of pears-english
(2 rows)

select
id,
price,
get_lang_str(name,get_lang('spanish')) as name,
get_lang_str(description,get_lang('spanish')) as description
from
product
order by
get_lang_str(name,get_lang('spanish'));
id | price | name | description
----+-------+----------------+----------------------------
1 | 10 | apples-spanish | ...-spanish
2 | 42 | pears-english | big bunch of pears-english
(2 rows)

Are there any more drawbacks or limitations, that maybe I am not aware
of, which would discourage you from doing the multilanguage support this
way? Should we consider this or stick to the old ways?


Notice I was trying to be careful about cases where a bad language
string is used, or the array is missing languages other than english.
I.e. I picked english as the default language and would need to ensure
every array had at least that string.

I'm not sure how maintainable it will be. You'd have to play with it for
a while and decide for yourself.

It would be interesting to see a speed comparison in a real-life
application -- so if you give it a try, please let us know how it turns out.

HTH,

Joe
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

P: n/a
> > I am wondering, if it's effective to use text arrays to store
multilanguage information.

We solved it like this:

http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15

- i18n_curr_lang holds the (ISO) language string per user
- i18n_keys holds the strings to be translated
- i18n_translations holds the translations
- function i18n() inserts a text value into i18n_keys thus
marking it for translation, this we use during insertion of
data in the "primary language" (think gettext, think English)
- function _() returns a translated text value (or it's
primary language equivalent), we use that in queries and view
definitions
- v_missing_translations is a view of, well, ...

Then here:

http://savannah.gnu.org/cgi-bin/view...ons.py?rev=1.3

we've got a Python tool that will connect to your database and
create schema files ready for re-insertion via psql after
adding the missing translations. Kind of like a gettext po
file.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #4

P: n/a

Karsten Hilbert <Ka*************@gmx.net> writes:
I am wondering, if it's effective to use text arrays to store
multilanguage information.
We solved it like this:

http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15


This doesn't work for me I get

/cgi-bin/viewcvs/gnumed/gnumed/gnumed/serve: unknown location
- i18n_curr_lang holds the (ISO) language string per user
I'm interested in this part. Is there a facility for having variables on the
postgres side that you can set once at the beginning of a session and then
refer to in functions?

I instead made my application insert lang_en,lang_fr etc into the queries as
appropriate. This made the application code a little uglier than necessary,
but it's not too bad, I put the function name in a variable and call
$xlate(column) in a double-quoted string.

I think having a global postgres variable would be a nice shortcut, though I
fear it would make it harder to do things like index lang_en() and lang_fr().

Do you cheat and make your _() IMMUTABLE? It's only immutable if you guarantee
that i18n_currr_lang can never change in the middle of a query. But even then
you're lying to the server and indexing _() would do strange things, for
example. In fact I think it would produce corrupted indexes if it worked at
all.
- v_missing_translations is a view of, well, ...


This is the one thing that the array solution can't do. But in my case it's
dynamic text and both languages are being provided at the same time by the
same people. I'm not going to have a translator going through batches of them
like gettext.

Actually I also have a separate solution for the static application text that
is going to be translated in the gettext style. In that case there is a
translations table much like gettext. However all the logic for lookups is
handled on the application side in a Smarty prefilter.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5

P: n/a
http://savannah.gnu.org/cgi-bin/view...N.sql?rev=1.15

Works for me. You'll have to make sure you get the entire
string for the URL. It may have been wrapped by your mail
handling agent.
- i18n_curr_lang holds the (ISO) language string per user
I'm interested in this part. Is there a facility for having variables on the
postgres side that you can set once at the beginning of a session and then
refer to in functions?

We set them the first time a user connects to a particular
database. Unless they are changed later (possible anytime with
immediate transparent switching of languages in queries and
views) the language will stay the same across sessions/
clients/ connections.
I instead made my application insert lang_en,lang_fr etc into the queries as
appropriate. This made the application code a little uglier than necessary,
but it's not too bad, I put the function name in a variable and call
$xlate(column) in a double-quoted string.
We then use select _(fruits.name) style queries. The _()
function uses the language in i18n_curr_lang (for
CURRENT_USER).
I think having a global postgres variable would be a nice shortcut, though I
fear it would make it harder to do things like index lang_en() and lang_fr(). I'd think on could index i18n_translations just fine:

create unique index on i18n_translations(trans) where lang='xx';
Do you cheat and make your _() IMMUTABLE? It's only immutable if you guarantee
that i18n_currr_lang can never change in the middle of a query. Given that it'd take a writing query that should run in a
transaction it shouldn't change within a read query. However,
I'm not too sure about views, like we use translated columns
in views like this:

create view a_view as
select
bla as bla,
_(bla) as l10n_bla
....
But even then
you're lying to the server and indexing _() would do strange things, for
example. In fact I think it would produce corrupted indexes if it worked at
all.

I haven't experienced any of that but it may well happen -
dunno.
- v_missing_translations is a view of, well, ...


This is the one thing that the array solution can't do. But in my case it's
dynamic text and both languages are being provided at the same time by the
same people. I'm not going to have a translator going through batches of them
like gettext.

Well, you wouldn't *have* to use that view, it's just for
convenience. You could just as well insert the dynamic text
into i18n_translations (lang, orig, trans) whenever dynamic
text is inserted. We mainly use that (+ i18n()) for gettext
style batch translation of static application text. Except
that all the lookup is handled by _() on the server right
inside the query :-)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.