By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,304 Members | 1,283 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,304 IT Pros & Developers. It's quick & easy.

Replication Ideas

P: n/a
Hi--

I had been thinking of the issues of multimaster replication and how to
do highly available, loadballanced clustering with PostgreSQL. Here is
my outline, and I am looking for comments on the limitations of how this
would work.

Several PostgreSQL servers would share a virtual IP address, and would
coordinate among themselves which will act as "Master" for the purposes
of a single transaction (but connection could be easier). SELECT
statements are handled exclusively by the transaction master while
anything that writes to a database would be sent to all the the
"Masters." At the end of each transaction the systems would poll
eachother regarding whether they were all successful:

1: Any system which is successful in COMMITting the transaction must
ignore any system which fails the transaction untill a recovery can be made.

2: Any system which fails in COMMITting the transaction must cease to
be a master, provided that it recieves a signat from any other member of
the cluster that indicates that that member succeeded in committing the
transaction.

3: If all nodes fail to commit, then they all remain masters.

Recovery would be done in several steps:

1: The database would be copied to the failed system using pg_dump.
2: A current recovery would be done from the transaction log.
3: This would be repeated in order to ensure that the database is up to
date.
4: When two successive restores have been achieved with no new
additions to the database, the "All Recovered" signal is sent to the
cluster and the node is ready to start processing again. (need a better
way of doing this).

Note: Recovery is the problem, I know. my model is only a starting
point for the purposes of discussion and trying to bring something to
the conversation.

Any thoughts or suggestions?

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #1
Share this Question
Share on Google+
10 Replies

P: n/a
On Sat, 2003-08-23 at 23:27, Chris Travers wrote:
Hi--

I had been thinking of the issues of multimaster replication and how to
do highly available, loadballanced clustering with PostgreSQL. Here is
my outline, and I am looking for comments on the limitations of how this
would work.

Several PostgreSQL servers would share a virtual IP address, and would
coordinate among themselves which will act as "Master" for the purposes
of a single transaction (but connection could be easier). SELECT
statements are handled exclusively by the transaction master while
anything that writes to a database would be sent to all the the
"Masters." At the end of each transaction the systems would poll
eachother regarding whether they were all successful:

1: Any system which is successful in COMMITting the transaction must
ignore any system which fails the transaction untill a recovery can be made.

2: Any system which fails in COMMITting the transaction must cease to
be a master, provided that it recieves a signat from any other member of
the cluster that indicates that that member succeeded in committing the
transaction.

3: If all nodes fail to commit, then they all remain masters.

Recovery would be done in several steps:

1: The database would be copied to the failed system using pg_dump.
2: A current recovery would be done from the transaction log.
3: This would be repeated in order to ensure that the database is up to
date.
4: When two successive restores have been achieved with no new
additions to the database, the "All Recovered" signal is sent to the
cluster and the node is ready to start processing again. (need a better
way of doing this).

Note: Recovery is the problem, I know. my model is only a starting
point for the purposes of discussion and trying to bring something to
the conversation.


This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

"Eternal vigilance is the price of liberty: power is ever
stealing from the many to the few. The manna of popular liberty
must be gathered each day, or it is rotten... The hand entrusted
with power becomes, either from human depravity or esprit de
corps, the necessary enemy of the people. Only by continual
oversight can the democrat in office be prevented from hardening
into a despot: only by unintermitted agitation can a people be
kept sufficiently awake to principle not to let liberty be
smothered in material prosperity... Never look, for an age when
the people can be quiet and safe. At such times despotism, like
a shrouding mist, steals over the mirror of Freedom"
Wendell Phillips
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #2

P: n/a
On Mon, 2003-08-25 at 12:06, Chris Travers wrote:
Ron Johnson wrote:
This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.


I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions. This would make a clustered
database server have a downtime equivalent to the total downtime of all
of its nodes. This is a real problem. Of course my understanding of
Two Phase Commit may be incorrect, in which case, I would appreciate it
if someone could point out where I am wrong.


Note that I didn't mean to imply that 2PC is sufficient to implement
M-M. The DBMS designer(s) must decide what to do (like queue up
changes) if 2PC fails.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

"Our computers and their computers are the same color. The
conversion should be no problem!"
Unknown
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #3

P: n/a
On Mon, Aug 25, 2003 at 10:06:22AM -0700, Chris Travers wrote:
Ron Johnson wrote:
This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.


I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions.


Before the discussion goes any further, have you read the work related
to Postgres-r? It's a substantially different animal from 2PC AFAIK.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended." (Gerry Pourwelle)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #4

P: n/a
Alvaro Herrera wrote:
Before the discussion goes any further, have you read the work related
to Postgres-r? It's a substantially different animal from 2PC AFAIK.

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover, although it is a very useful project on
its own.

Best Wishes,
Chris Travers.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #5

P: n/a
Tom Lane wrote:
Chris Travers <ch***@travelamericas.com> writes:

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,


What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #6

P: n/a
Chris Travers <ch***@travelamericas.com> writes:
Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,


What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #7

P: n/a
WARNING: This is getting long ...

Postgres-R is a very interesting and inspiring idea. And I've been
kicking that concept around for a while now. What I don't like about it
is that it requires fundamental changes in the lock mechanism and that
it is based on the assumption of very low lock conflict.

<explain-PG-R>
In Postgres-R a committing transaction sends it's workset (WS - a list
of all updates done in this transaction) to the group communication
system (GC). The GC guarantees total order, meaning that all nodes will
receive all WSs in the same order, no matter how they have been sent.

If a node receives back it's own WS before any error occured, it goes
ahead and finalizes the commit. If it receives a foreign WS, it has to
apply the whole WS and commit it before it can process anything else. If
now a local transaction, in progress or while waiting for it's WS to
come back, holds a lock that is required to process such remote WS, the
local transaction needs to be aborted to unlock it's resources ... it
lost the total order race.
</explain-PG-R>

Postgres-R requires that all remote WSs are applied and committed before
a local transaction can commit. Otherwise it couldn't correctly detect a
lock conflict. So there will not be any read ahead. And since the total
order really counts here, it cannot apply any two remote WSs in
parallel, a race condition could possibly exist and a later WS in the
total order runs faster and locks up a previous one, so we have to
squeeze all remote WSs through one single replication work process. And
all the locally parallel executed transactions that wait for their WSs
to come back have to wait until that poor little worker is done with the
whole pile. Bye bye concurrency. And I don't know how the GC will deal
with the backlog either. Could well choke on it.

I do not see how this will scale well in a multi-SMP-system cluster. At
least the serialization of WSs will become a horror if there is
significant lock contention like in a standard TPC-C on the district row
containing the order number counter. I don't know for sure, but I
suspect that with this kind of bottleneck, Postgres-R will have to
rollback more than 50% of it's transactions when there are more than 4
nodes under heavy load (like in a benchmark run). That will suck ...
But ... initially I said that it is an inspiring concept ... soooo ...

I am currently hacking around with some C+PL/TclU+Spread constructs that
might form a rude kind of prototype creature.

My changes to the Postgres-R concept are that there will be as many
replicating slave processes as there are in summary masters out in the
cluster ... yes, it will try to utilize all the CPU's in the cluster!
For failover reliability, A committing transaction will hold before
finalizing the commit and send it's "I'm ready" to the GC. Every
replicator that reaches the same state send's "I'm ready" too. Spread
guarantees in SAFE_MESS mode that messages are delivered to all nodes in
a group or that at least LEAVE/DISCONNECT messages are deliverd before.
So if a node receives more than 50% of "I'm ready", there would be a
very small gap where multiple nodes have to fail in the same split
second so that the majority of nodes does NOT commit. A node that
reported "I'm ready" but lost more than 50% of the cluster before
committing has to rollback and rejoin or wait for operator intervention.

Now the idea is to split up the communication into GC distribution
groups per transaction. So working master backends and associated
replication backends will join/leave a unique group for every
transaction in the cluster. This way, the per process communication is
reduced to the required minimum.
As said, I am hacking on some code ...
Jan

Chris Travers wrote:
Tom Lane wrote:
Chris Travers <ch***@travelamericas.com> writes:

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,


What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #8

P: n/a
Jan Wieck wrote:
WARNING: This is getting long ...

Postgres-R is a very interesting and inspiring idea. And I've been
kicking that concept around for a while now. What I don't like about
it is that it requires fundamental changes in the lock mechanism and
that it is based on the assumption of very low lock conflict.

<explain-PG-R>
In Postgres-R a committing transaction sends it's workset (WS - a list
of all updates done in this transaction) to the group communication
system (GC). The GC guarantees total order, meaning that all nodes
will receive all WSs in the same order, no matter how they have been
sent.

If a node receives back it's own WS before any error occured, it goes
ahead and finalizes the commit. If it receives a foreign WS, it has to
apply the whole WS and commit it before it can process anything else.
If now a local transaction, in progress or while waiting for it's WS
to come back, holds a lock that is required to process such remote WS,
the local transaction needs to be aborted to unlock it's resources ...
it lost the total order race.
</explain-PG-R>

Postgres-R requires that all remote WSs are applied and committed
before a local transaction can commit. Otherwise it couldn't correctly
detect a lock conflict. So there will not be any read ahead. And since
the total order really counts here, it cannot apply any two remote WSs
in parallel, a race condition could possibly exist and a later WS in
the total order runs faster and locks up a previous one, so we have to
squeeze all remote WSs through one single replication work process.
And all the locally parallel executed transactions that wait for their
WSs to come back have to wait until that poor little worker is done
with the whole pile. Bye bye concurrency. And I don't know how the GC
will deal with the backlog either. Could well choke on it.

I do not see how this will scale well in a multi-SMP-system cluster.
At least the serialization of WSs will become a horror if there is
significant lock contention like in a standard TPC-C on the district
row containing the order number counter. I don't know for sure, but I
suspect that with this kind of bottleneck, Postgres-R will have to
rollback more than 50% of it's transactions when there are more than 4
nodes under heavy load (like in a benchmark run). That will suck ...
But ... initially I said that it is an inspiring concept ... soooo ...

I am currently hacking around with some C+PL/TclU+Spread constructs
that might form a rude kind of prototype creature.

My changes to the Postgres-R concept are that there will be as many
replicating slave processes as there are in summary masters out in the
cluster ... yes, it will try to utilize all the CPU's in the cluster!
For failover reliability, A committing transaction will hold before
finalizing the commit and send it's "I'm ready" to the GC. Every
replicator that reaches the same state send's "I'm ready" too. Spread
guarantees in SAFE_MESS mode that messages are delivered to all nodes
in a group or that at least LEAVE/DISCONNECT messages are deliverd
before. So if a node receives more than 50% of "I'm ready", there
would be a very small gap where multiple nodes have to fail in the
same split second so that the majority of nodes does NOT commit. A
node that reported "I'm ready" but lost more than 50% of the cluster
before committing has to rollback and rejoin or wait for operator
intervention.

Now the idea is to split up the communication into GC distribution
groups per transaction. So working master backends and associated
replication backends will join/leave a unique group for every
transaction in the cluster. This way, the per process communication is
reduced to the required minimum.
As said, I am hacking on some code ...
Jan

Chris Travers wrote:
Tom Lane wrote:
Chris Travers <ch***@travelamericas.com> writes:
Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match


As my british friends would say, "Bully for you",and I applaud you
playing, struggling, learning from this for our sakes. Jeez, all I think
about is me,huh?
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #9

P: n/a


On Mon, 25 Aug 2003, Tom Lane wrote:
Chris Travers <ch***@travelamericas.com> writes:
Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,


What makes you say that? My understanding is it's supposed to survive
loss of individual servers.


How does it play 'catch up' went a server comes back online?

note that I did go through the 'docs' on how it works, and am/was quite
impressed at what they were doing ... but, if I have a large network, say,
and one group is connecting to ServerA, and another group with ServerB,
what happens when ServerA and ServerB loose network connectivity for any
period of time? How do they re-sync when the network comes back up again?

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #10

P: n/a
"Marc G. Fournier" <sc*****@hub.org> writes:
On Mon, 25 Aug 2003, Tom Lane wrote:
What makes you say that? My understanding is it's supposed to survive
loss of individual servers.
How does it play 'catch up' went a server comes back online?


The recovered server has to run through the part of the GCS data stream
that it missed the first time. This is not conceptually different from
recovering using archived WAL logs (or archived trigger-driven
replication data streams). As with using WAL for recovery, you have to
be able to archive the message stream until you don't need it any more.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.