By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,695 Members | 1,956 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,695 IT Pros & Developers. It's quick & easy.

Replication Bundled with Main Source.

P: n/a
Hi All,

Firstly I've gotta say that I think that PostgreSQL is one of the finest
OSS projects out there and full credit to all of those involved.

After talking to a couple of other consultants who use Pg, and fully
encourage their clients in the enterprise that Pg is a perfectly viable
solution for a variety of scenarios, the question seems to crop up quite
often: "What About Replication?". Whilst I understand that the eRServer
project is a fine project, and more than capable, and rapidly reaching
the point of having minimal bugginess, I have to wonder why there is no
talk of including replication capability within the main source tree.
After all in todays RDMS arena, it would seem almost like it is an after
thought, as CTOs expect replication to be a feature of the server,
rather than seeming to be an afterthought. Almost like having an
transactional engine bolted on after the fact.

Are there likely to be any plans to integrate a replication engine into
the main code which could be switchable at compile time
'--with-replication' for instance. I beleive that this would encourage
acceptance within the corporate environment and lead to a more
well-rounded offering.

Just my 2 cents (or tuppence-ha'penny for those in blighty)

Regards

Tony
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Hello,

It is not that we don't want to include replication in the base
project it is that ERserver does not meet the requirements of what can
be included in the base project. Specifically (I believe) the
requirement of Java.

Sincerely,

Joshua Drake
Unihost Web Hosting wrote:
Hi All,

Firstly I've gotta say that I think that PostgreSQL is one of the
finest OSS projects out there and full credit to all of those involved.
After talking to a couple of other consultants who use Pg, and fully
encourage their clients in the enterprise that Pg is a perfectly
viable solution for a variety of scenarios, the question seems to crop
up quite often: "What About Replication?". Whilst I understand that
the eRServer project is a fine project, and more than capable, and
rapidly reaching the point of having minimal bugginess, I have to
wonder why there is no talk of including replication capability within
the main source tree. After all in todays RDMS arena, it would seem
almost like it is an after thought, as CTOs expect replication to be a
feature of the server, rather than seeming to be an afterthought.
Almost like having an transactional engine bolted on after the fact.

Are there likely to be any plans to integrate a replication engine
into the main code which could be switchable at compile time
'--with-replication' for instance. I beleive that this would
encourage acceptance within the corporate environment and lead to a
more well-rounded offering.

Just my 2 cents (or tuppence-ha'penny for those in blighty)

Regards

Tony
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #2

P: n/a
Joshua D. Drake wrote:
Hello,

It is not that we don't want to include replication in the base
project it is that ERserver does not meet the requirements of what can
be included in the base project. Specifically (I believe) the
requirement of Java.


Maybe they will move to C someday.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #3

P: n/a
Bruce Momjian <pg***@candle.pha.pa.us> writes:
Joshua D. Drake wrote:
It is not that we don't want to include replication in the base
project it is that ERserver does not meet the requirements of what can
be included in the base project. Specifically (I believe) the
requirement of Java.
Maybe they will move to C someday.


Well, JDBC requires Java, and it's still in the main distro.

I think the real answer is that until recently, ERserver wasn't open
source and we didn't have the option to include it. Now that it is
open source, we could think about it. Having looked at the code, I
think it's definitely not ready for prime time, but it could get there
with some work. When it's of comparable solidity to the base project
I'd be in favor of adding it to the base distro.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4

P: n/a
On Wed, Oct 08, 2003 at 08:28:57PM -0400, Tom Lane wrote:
open source, we could think about it. Having looked at the code, I
think it's definitely not ready for prime time, but it could get there
with some work. When it's of comparable solidity to the base project


I agree completely. You can get a long way with this code -- we've
used a version of it in production for 2 years now -- but it's a long
way from "turn it on and forget it" right now.

I also wonder why there's a push to put "replication" in the main
distribution, though. I know, I know, the argument is that if you
have to get it separately, it's not the same as the "main" code. But
it's just not true that Oracle (or whoever you like) "includes"
replication. You have to buy the right licenses to get the functions
you want, and there are different kinds of subsystems depending on
what needs you have. (e.g. RAC is gonna be a lousy choice across a
frame-relay VPN. I shudder to think.) I suppose someone could
package a "kitchen sink" Postgres which included all kinds of stuff
from gborg.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Afilias Canada Toronto, Ontario Canada
<an****@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #5

P: n/a
Tom Lane wrote:
Bruce Momjian <pg***@candle.pha.pa.us> writes:
Joshua D. Drake wrote:
It is not that we don't want to include replication in the base
project it is that ERserver does not meet the requirements of what can
be included in the base project. Specifically (I believe) the
requirement of Java.

Maybe they will move to C someday.


Well, JDBC requires Java, and it's still in the main distro.

I think the real answer is that until recently, ERserver wasn't open
source and we didn't have the option to include it. Now that it is
open source, we could think about it. Having looked at the code, I
think it's definitely not ready for prime time, but it could get there
with some work. When it's of comparable solidity to the base project
I'd be in favor of adding it to the base distro.


Unfortunately I don't think it'll get there ever. There is a fundamental
design flaw in the system that is not fixable (there are multiple, but
this is one of the biggies). That is that eRServer only remembers that a
row has been modified, but not what, in what order, not even how often.

The problem is really easy to demonstrate. With a UNIQUE constraint on a
column, you change the values of two rows like

A->C
B->A
C->B

If these 3 changes fall into one "snapshot", you have no chance to
replicate that. eRServer tries to do

A->B
B->A

and whatever order it tries, you'd need a deferred UNIQUE constraint to
get it done, and I don't have the slightest clue how the ever get _that_
implemented.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #6

P: n/a
Jan Wieck wrote:
I think the real answer is that until recently, ERserver wasn't open
source and we didn't have the option to include it. Now that it is
open source, we could think about it. Having looked at the code, I
think it's definitely not ready for prime time, but it could get there
with some work. When it's of comparable solidity to the base project
I'd be in favor of adding it to the base distro.


Unfortunately I don't think it'll get there ever. There is a fundamental
design flaw in the system that is not fixable (there are multiple, but
this is one of the biggies). That is that eRServer only remembers that a
row has been modified, but not what, in what order, not even how often.

The problem is really easy to demonstrate. With a UNIQUE constraint on a
column, you change the values of two rows like

A->C
B->A
C->B

If these 3 changes fall into one "snapshot", you have no chance to
replicate that. eRServer tries to do

A->B
B->A

and whatever order it tries, you'd need a deferred UNIQUE constraint to
get it done, and I don't have the slightest clue how the ever get _that_
implemented.


I was wondering about this. It seems to be part of our existing problem
with handling unique contraints during the query, rather than at query
end or transaction end:

test=> CREATE TABLE test (x INT);
CREATE TABLE
test=> INSERT INTO test VALUES (1);
INSERT 17144 1
test=> INSERT INTO test VALUES (2);
INSERT 17145 1
test=> UPDATE test SET x = x + 1;
UPDATE 2
test=> CREATE UNIQUE INDEX test_i ON test (x);
CREATE INDEX
test=> UPDATE test SET x = x + 1;
ERROR: duplicate KEY violates UNIQUE CONSTRAINT "test_i"

We have pretty complex handling of foreign key constraints, allowing
them to fire at the end of the transaction, we nothing for UNIQUE
constraints. I assume we do this because it is more efficient to check
the unique index during insert/update of each row, but perhaps we need a
queue, as you suggest.

Another thing you might need is the ability to _not_ see changes made by
your transaction, so when you go to change B to A, you see the original
B but not the A->B you just changed.

Another idea would be to only queue up the unique constraint failures,
and re-check on transaction commit --- that way, you only have a queue
when you have a possible unique constraint violation, and you re-check
at the end.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #7

P: n/a
Bruce Momjian wrote:
Jan Wieck wrote:
Unfortunately I don't think it'll get there ever. There is a fundamental
design flaw in the system that is not fixable (there are multiple, but


I was wondering about this. It seems to be part of our existing problem
with handling unique contraints during the query, rather than at query
end or transaction end:

[...]

Another idea would be to only queue up the unique constraint failures,
and re-check on transaction commit --- that way, you only have a queue
when you have a possible unique constraint violation, and you re-check
at the end.


_That_ actually is _the_ idea I was missing!

During index insert I think we know everything that needs to be known.
We know the index in question, which definitely leads to the relation in
question. And we know the CTID of the new heap tuple containing the key
values in conflict. IIRC that is enough to schedule some sort of
[DEFERRED] AFTER INSERT trigger ... one more of these generic C monsters.

Some sort of, because it's call interface might be a bit different. We
won't have a pg_trigger row for it anywhere. But since it'd be a generic
function for all index dupkey checks, I wouldn't mind much to hardwire
it into the trigger queue.

The trigger actually only needs to do a

SELECT 1 FROM <rel> WHHERE <full qualification>

That over SPI_execp() with a tupcount of 2 and it'll be it. Maybe it
needs to do it FOR UPDATE to have the correct visibility and locking,
but that's a minor implementation detail.
Cool!
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.