473,406 Members | 2,549 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

RPM init-script: Why the locale setting?

Hello,

In the init-script contained in the RPMs downloadable from the PostgreSQL
site (I checked the one for Fedora), an explicit locale is set before
running initdb. - And the explicit locale is not "C".

This means that a PostgreSQL installation will not use indexes for LIKE
queries (I just ran into this). See
http://www.postgresql.org/docs/faqs/FAQ.html#4.8

I suggest that the init-script be rewritten so that LANG and LC_ALL are
unset before initdb is run (which happens the first time PostgreSQL is
started after the RPM-based installation).

--
Greetings from Troels Arvin, Copenhagen, Denmark

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #1
6 2966

On 04/04/2004 21:48 Troels Arvin wrote:
Hello,

In the init-script contained in the RPMs downloadable from the PostgreSQL
site (I checked the one for Fedora), an explicit locale is set before
running initdb. - And the explicit locale is not "C".

This means that a PostgreSQL installation will not use indexes for LIKE
queries (I just ran into this). See
http://www.postgresql.org/docs/faqs/FAQ.html#4.8
No. It says that [normal] indexes won't be used for:

select foo from bar where col like '%abc';
or
select foo from bar where col like '%abc%';

or ILIKE is used. And even then you can use a functional index of the form

CREATE INDEX tabindex ON tab (lower(col));
I suggest that the init-script be rewritten so that LANG and LC_ALL are
unset before initdb is run (which happens the first time PostgreSQL is
started after the RPM-based installation).


I'll admit that I don't know what effect this would have but I'm
interested to find out.

regards

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business |
| Computer Consultants |
http://www.thomas-micro-systems-ltd.co.uk |
+------------------------------+---------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2
On Sun, 4 Apr 2004, Troels Arvin wrote:
Hello,

In the init-script contained in the RPMs downloadable from the PostgreSQL
site (I checked the one for Fedora), an explicit locale is set before
running initdb. - And the explicit locale is not "C".

This means that a PostgreSQL installation will not use indexes for LIKE
queries (I just ran into this). See
http://www.postgresql.org/docs/faqs/FAQ.html#4.8
Technically you should be able to use an index in the appropriate
*_pattern_ops opclass, but yes, normal indexes aren't used.
I suggest that the init-script be rewritten so that LANG and LC_ALL are
unset before initdb is run (which happens the first time PostgreSQL is
started after the RPM-based installation).


Wouldn't this get in the way of having the server "do the right thing"
when in a locale that doesn't collate by "C" rules?

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3
Troels Arvin <tr****@arvin.dk> writes:
In the init-script contained in the RPMs downloadable from the PostgreSQL
site (I checked the one for Fedora), an explicit locale is set before
running initdb. - And the explicit locale is not "C".


Only if you don't have a sysconfig file:

# Just in case no locale was set, use en_US
[ ! -f /etc/sysconfig/i18n ] && echo "LANG=en_US" > $PGDATA/../initdb.i18n

I agree though that it seems like a bad choice to default to en_US
rather than C. Lamar, any reason why it's like that?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #4
On Sunday 04 April 2004 10:50 pm, Tom Lane wrote:
Troels Arvin <tr****@arvin.dk> writes:
In the init-script contained in the RPMs downloadable from the PostgreSQL
site (I checked the one for Fedora), an explicit locale is set before
running initdb. - And the explicit locale is not "C".
Only if you don't have a sysconfig file:
# Just in case no locale was set, use en_US
[ ! -f /etc/sysconfig/i18n ] && echo "LANG=en_US" > $PGDATA/../initdb.i18n I agree though that it seems like a bad choice to default to en_US
rather than C. Lamar, any reason why it's like that?
Yes.

A bit of history before I enclose an e-mail from Trond Eivind Glomsrød (former
Red Hat internal PostgreSQL RPMmaintainer) on the subject. I am only
enclosing a single e-mail of an exchange that occurred over a period of a
couple of weeks; I have pretty much whole exchange archived if you want to
read more, although I cannot reveal the whole exchange due to some NDA stuff
in it. Although it might be OK at this point, since that was, after all, 3
years ago.

Back in PostgreSQL 7.1 days, locale settings and the issue of a database being
initdb'ed in one locale and the postmaster starting in another locale reared
up its head. I 'solved' the issue by hardcoding LC_ALL=C in the initscript.
This had the side-effect of making the regression tests pass. Trond wasn't
happy with my choice of C locale, and here is why:

Re: Thought you might find this very interesting.
From: te*@redhat.com (Trond Eivind Glomsrød)
To: Lamar Owen <la********@wgcr.org>

Lamar Owen <la********@wgcr.org> writes:
On Friday 25 May 2001 15:04, you wrote:
Lamar Owen <la********@wgcr.org> writes:
> I also intend to kill the output from database initialization. I thought you had, at least in the RedHat 7.1 7.0.3 set.

Yup, but it has started showing up again in PostgreSQL 7.1.x


I need to sync that in with this set.


I've fixed a couple of issues with the inistscript, I'll send it to
you when it's finished.... even after sourcing a file with locale
values, the postmaster process doesn't seem to respect it. I'll need
to make this work before I build (I've confirmed that the current way
of handling this, using "C", is not acceptable. The locale needs to be
different, and if that causes problems for pgsql, it's a bug in pgsql
which needs fixing - handling other aspects, like ordering, in a bad
way isn't an acceptable workaround.
"C" equals broken for non-English locales, and isn't an acceptable choice.


That is one argument I'll not be involved in, as I'm so used to the ASCII
sequence that it is second-nature, thus disqualifying me from commenting on
any collation issues.


1) It's not a vaslid choice for English - if you're looking in a
Â* Â*lexicon, you'll find Aspen, bridge, Cambridge, not Aspen,
Â* Â*Cambridge, bridge.

2) It's much worse in other locales... it gets the order of
Â* Â*chaaracters wrong as well.

Here is a test:

create table bar(
Â*Â*Â*Â*Â*Â*Â*Â*ord varchar(40),
Â*Â*Â*Â*Â*Â*Â*Â*foo int,
Â*Â*Â*Â*Â*Â*Â*Â*primary key(ord));

insert into bar values('ære',2);
insert into bar values('Ã¥re',3);
insert into bar values('are',4);
insert into bar values('zsh',5);
insert into bar values('begynne',6);
insert into bar values('øve',7);

select ord,foo from bar order by ord;

Here is a valid result:

Â*are Â* Â* | Â* 4
Â*begynne | Â* 6
Â*zsh Â* Â* | Â* 5
Â*ære Â* Â* | Â* 2
Â*øve Â* Â* | Â* 7
Â*Ã¥re Â* Â* | Â* 3

Here is an invalid result:

Â*are Â* Â* | Â* 4
Â*begynne | Â* 6
Â*zsh Â* Â* | Â* 5
Â*Ã¥re Â* Â* | Â* 3
Â*ære Â* Â* | Â* 2
Â*øve Â* Â* | Â* 7
Â*
The last one is what you get with LANG=C - as you can see, the
ordering of the Norwegian characters is wrong. The same would be the
issue for pretty much any non-English characters - their number in the
character table (as used by C) is not the same as their location in
the local alphabet (as used by the local locale).

--
Trond Eivind Glomsrød
Red Hat, Inc.

So there is a reason it is like it is. If you want to change that in the
local setting, you will have to reinitdb in C locale (and
edit /var/lib/pgsql/initdb.i18n accordingly, and be prepared for collation
differences and problems). The initial initdb is done in the system locale,
unless one does not exist, in which case en_US is used (again, so that when
you do store non-English characters you get sane ordering, and so that you
get the mixed-case ordering preferred by many people). The initdb locale
settings are stored in initdb.i18n, and they are re-sourced everytime
postgresql is started to prevent data corruption if postmaster is started
with a different locale from the initdb. Tom, is the data corruption issue
still an issue with 7.4.x, or is this just historical? It has been a long
time since I've looked in this corner of the RPM.... :-)
--
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772
(828)862-5554
www.pari.edu

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #5
Lamar Owen <lo***@pari.edu> writes:
... The initdb locale
settings are stored in initdb.i18n, and they are re-sourced everytime
postgresql is started to prevent data corruption if postmaster is started
with a different locale from the initdb. Tom, is the data corruption issue
still an issue with 7.4.x, or is this just historical?


That's historical. For several versions now, the LC_COLLATE and
LC_CTYPE settings seen by initdb have been saved in pg_control and
re-adopted by the postmaster at start, so that index order corruption
problems are impossible. We do still adopt other settings such as
LC_MESSAGES from the postmaster environment, although I believe that
these will generally be read from postgresql.conf if you haven't toyed
with what initdb puts into that file.

In short then I doubt there's a need for initdb.i18n anymore. It would
make more sense to have postgres' bash_profile source /etc/sysconfig/i18n
directly.

The question of what postgresql.init should do if there's no available
LANG or LC_ALL setting seems orthogonal to me. I do not find Trond's
arguments convincing at all: a person who feels that C locale is broken
ought to set up /etc/sysconfig/i18n to specify another locale. The
POSIX standards say that the default locale in the absence of any
environmental variable is C, not en_US, and the fact that Trond doesn't
like that default doesn't give him license to change it, nor IMHO to try
to make an end run around the standard by pressuring initscript authors
to override the POSIX spec. I have no objection to making en_US the
default at the sysconfig level, but inserting it in lower levels of the
system seems at best misguided.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6
On Monday 05 April 2004 02:02 pm, Tom Lane wrote:
In short then I doubt there's a need for initdb.i18n anymore. It would
make more sense to have postgres' bash_profile source /etc/sysconfig/i18n
directly.
Probably a good idea, then. I'll look at removing that cruft in the next
release; although, you may get to another release before I do, in which case
do with as you see fit (unless you just want to leave it to me...:-))
The question of what postgresql.init should do if there's no available
LANG or LC_ALL setting seems orthogonal to me. I do not find Trond's
arguments convincing at all: a person who feels that C locale is broken
ought to set up /etc/sysconfig/i18n to specify another locale. The
POSIX standards say that the default locale in the absence of any
environmental variable is C, not en_US, and the fact that Trond doesn't
like that default doesn't give him license to change it, nor IMHO to try
to make an end run around the standard by pressuring initscript authors
to override the POSIX spec. I have no objection to making en_US the
default at the sysconfig level, but inserting it in lower levels of the
system seems at best misguided.


Well, Trond no longer has the reins, no? :-) However, I would like to see a
sane default that is consistent system-wide: if the whole system defaults to
en_US in the presence of no environment variable, then PostgreSQL should
default the same way.

What does LSB say (which is where the RPMset has to live)?

I personally favored a default at C locale and have no problem reinstating
that if that is really a sane default.
--
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772
(828)862-5554
www.pari.edu

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: clusardi2k | last post by:
Hello, What can I do about these gcc Red Hat Linux errors. They are coming from the archive file. %gcc -I/usr/fltk -O2 -Wall -Wunused -fno-exceptions -I/usr/X11R6/include -o run main.o...
2
by: FC | last post by:
Hello all: I am fairly new the Oracle and SQL and I am trying to find out how can I achieve the following: I have a program that uses a database (schema for Oracle?) and I am creating NEW tables...
6
by: Paul | last post by:
Hello, Consider this case: Init(params) does some initialization and there is a function Connect(params). The system starts and is idle until some one calls Connect, as soon as first...
0
by: Nicholas Irving | last post by:
Hi all, I am having an issue with one of my HttpModules where I cannot get the application path on Init. What I would like to do is to the following public class wherefrom : IHttpModule {...
4
by: Anatoly | last post by:
Put any control on web page. create Init event for ths control. Write Response.Write("here") inside this event. Compile\build\run. I never saw "here" string appear on web page. Why???
6
by: Shimon Sim | last post by:
I have Panel control on the page. I am handling Init event for it. It doesn't seem to fire at all. Why? Thank you Shimon.
2
by: Urs Eichmann | last post by:
Upon startup of my ASP.NET 2.0 application, I check if the application configuration is in an acceptable state inside an override of HttpApplication.Init(). If not, I raise an exception from this...
8
by: Ender.Dai | last post by:
I have writen following demo code, but it doesn't work :( Source code: -------------------------------- /* hello.c */ #include <stdio.h> extern int hello_init() __attribute__...
4
by: Jess | last post by:
Hello, I tried several books to find out the details of object initialization. Unfortunately, I'm still confused by two specific concepts, namely default-initialization and...
2
by: Christof Warlich | last post by:
Hi, I'm working on a (template) library that is up to now entirely implemented in header-files. This makes the library quite convenient to use as no extra object code needs to be linked when...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.