Replication Ideas - PostgreSQL Database

Chris Travers

Hi--

I had been thinking of the issues of multimaster replication and how to
do highly available, loadballanced clustering with PostgreSQL. Here is
my outline, and I am looking for comments on the limitations of how this
would work.

Several PostgreSQL servers would share a virtual IP address, and would
coordinate among themselves which will act as "Master" for the purposes
of a single transaction (but connection could be easier). SELECT
statements are handled exclusively by the transaction master while
anything that writes to a database would be sent to all the the
"Masters." At the end of each transaction the systems would poll
eachother regarding whether they were all successful:

1: Any system which is successful in COMMITting the transaction must
ignore any system which fails the transaction untill a recovery can be made.

2: Any system which fails in COMMITting the transaction must cease to
be a master, provided that it recieves a signat from any other member of
the cluster that indicates that that member succeeded in committing the
transaction.

3: If all nodes fail to commit, then they all remain masters.

Recovery would be done in several steps:

1: The database would be copied to the failed system using pg_dump.
2: A current recovery would be done from the transaction log.
3: This would be repeated in order to ensure that the database is up to
date.
4: When two successive restores have been achieved with no new
additions to the database, the "All Recovered" signal is sent to the
cluster and the node is ready to start processing again. (need a better
way of doing this).

Note: Recovery is the problem, I know. my model is only a starting
point for the purposes of discussion and trying to bring something to
the conversation.

Any thoughts or suggestions?

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #1

Subscribe Reply

2440

Ron Johnson

On Sat, 2003-08-23 at 23:27, Chris Travers wrote:

Hi--

I had been thinking of the issues of multimaster replication and how to
do highly available, loadballanced clustering with PostgreSQL. Here is
my outline, and I am looking for comments on the limitations of how this
would work.

Several PostgreSQL servers would share a virtual IP address, and would
coordinate among themselves which will act as "Master" for the purposes
of a single transaction (but connection could be easier). SELECT
statements are handled exclusively by the transaction master while
anything that writes to a database would be sent to all the the
"Masters." At the end of each transaction the systems would poll
eachother regarding whether they were all successful:

1: Any system which is successful in COMMITting the transaction must
ignore any system which fails the transaction untill a recovery can be made.

2: Any system which fails in COMMITting the transaction must cease to
be a master, provided that it recieves a signat from any other member of
the cluster that indicates that that member succeeded in committing the
transaction.

3: If all nodes fail to commit, then they all remain masters.

Recovery would be done in several steps:

1: The database would be copied to the failed system using pg_dump.
2: A current recovery would be done from the transaction log.
3: This would be repeated in order to ensure that the database is up to
date.
4: When two successive restores have been achieved with no new
additions to the database, the "All Recovered" signal is sent to the
cluster and the node is ready to start processing again. (need a better
way of doing this).

Note: Recovery is the problem, I know. my model is only a starting
point for the purposes of discussion and trying to bring something to
the conversation.

This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@c ox.net
Jefferson, LA USA

"Eternal vigilance is the price of liberty: power is ever
stealing from the many to the few. The manna of popular liberty
must be gathered each day, or it is rotten... The hand entrusted
with power becomes, either from human depravity or esprit de
corps, the necessary enemy of the people. Only by continual
oversight can the democrat in office be prevented from hardening
into a despot: only by unintermitted agitation can a people be
kept sufficiently awake to principle not to let liberty be
smothered in material prosperity... Never look, for an age when
the people can be quiet and safe. At such times despotism, like
a shrouding mist, steals over the mirror of Freedom"
Wendell Phillips
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #2

Ron Johnson

On Mon, 2003-08-25 at 12:06, Chris Travers wrote:

Ron Johnson wrote:
This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.

I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions. This would make a clustered
database server have a downtime equivalent to the total downtime of all
of its nodes. This is a real problem. Of course my understanding of
Two Phase Commit may be incorrect, in which case, I would appreciate it
if someone could point out where I am wrong.

Note that I didn't mean to imply that 2PC is sufficient to implement
M-M. The DBMS designer(s) must decide what to do (like queue up
changes) if 2PC fails.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@c ox.net
Jefferson, LA USA

"Our computers and their computers are the same color. The
conversion should be no problem!"
Unknown
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 11 '05 #3

Alvaro Herrera

On Mon, Aug 25, 2003 at 10:06:22AM -0700, Chris Travers wrote:

Ron Johnson wrote:
This is vaguely similar to Two Phase Commit, which is a sine qua
non of distributed transactions, which is the s.q.n. of multi-master
replication.

I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions.

Before the discussion goes any further, have you read the work related
to Postgres-r? It's a substantially different animal from 2PC AFAIK.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwis e.
It's worth the effort. Recommended." (Gerry Pourwelle)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #4

Chris Travers

Alvaro Herrera wrote:

Before the discussion goes any further, have you read the work related
to Postgres-r? It's a substantially different animal from 2PC AFAIK.

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover, although it is a very useful project on
its own.

Best Wishes,
Chris Travers.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #5

Chris Travers

Tom Lane wrote:

Chris Travers <ch***@travelam ericas.com> writes:

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #6

Tom Lane

Chris Travers <ch***@travelam ericas.com> writes:

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #7

Jan Wieck

WARNING: This is getting long ...

Postgres-R is a very interesting and inspiring idea. And I've been
kicking that concept around for a while now. What I don't like about it
is that it requires fundamental changes in the lock mechanism and that
it is based on the assumption of very low lock conflict.

<explain-PG-R>
In Postgres-R a committing transaction sends it's workset (WS - a list
of all updates done in this transaction) to the group communication
system (GC). The GC guarantees total order, meaning that all nodes will
receive all WSs in the same order, no matter how they have been sent.

If a node receives back it's own WS before any error occured, it goes
ahead and finalizes the commit. If it receives a foreign WS, it has to
apply the whole WS and commit it before it can process anything else. If
now a local transaction, in progress or while waiting for it's WS to
come back, holds a lock that is required to process such remote WS, the
local transaction needs to be aborted to unlock it's resources ... it
lost the total order race.
</explain-PG-R>

Postgres-R requires that all remote WSs are applied and committed before
a local transaction can commit. Otherwise it couldn't correctly detect a
lock conflict. So there will not be any read ahead. And since the total
order really counts here, it cannot apply any two remote WSs in
parallel, a race condition could possibly exist and a later WS in the
total order runs faster and locks up a previous one, so we have to
squeeze all remote WSs through one single replication work process. And
all the locally parallel executed transactions that wait for their WSs
to come back have to wait until that poor little worker is done with the
whole pile. Bye bye concurrency. And I don't know how the GC will deal
with the backlog either. Could well choke on it.

I do not see how this will scale well in a multi-SMP-system cluster. At
least the serialization of WSs will become a horror if there is
significant lock contention like in a standard TPC-C on the district row
containing the order number counter. I don't know for sure, but I
suspect that with this kind of bottleneck, Postgres-R will have to
rollback more than 50% of it's transactions when there are more than 4
nodes under heavy load (like in a benchmark run). That will suck ...
But ... initially I said that it is an inspiring concept ... soooo ...

I am currently hacking around with some C+PL/TclU+Spread constructs that
might form a rude kind of prototype creature.

My changes to the Postgres-R concept are that there will be as many
replicating slave processes as there are in summary masters out in the
cluster ... yes, it will try to utilize all the CPU's in the cluster!
For failover reliability, A committing transaction will hold before
finalizing the commit and send it's "I'm ready" to the GC. Every
replicator that reaches the same state send's "I'm ready" too. Spread
guarantees in SAFE_MESS mode that messages are delivered to all nodes in
a group or that at least LEAVE/DISCONNECT messages are deliverd before.
So if a node receives more than 50% of "I'm ready", there would be a
very small gap where multiple nodes have to fail in the same split
second so that the majority of nodes does NOT commit. A node that
reported "I'm ready" but lost more than 50% of the cluster before
committing has to rollback and rejoin or wait for operator intervention.

Now the idea is to split up the communication into GC distribution
groups per transaction. So working master backends and associated
replication backends will join/leave a unique group for every
transaction in the cluster. This way, the per process communication is
reduced to the required minimum.
As said, I am hacking on some code ...
Jan

Chris Travers wrote:

Tom Lane wrote:
Chris Travers <ch***@travelam ericas.com> writes:

Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== Ja******@Yahoo. com #
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #8

Dennis Gearon

Jan Wieck wrote:

WARNING: This is getting long ...

Postgres-R is a very interesting and inspiring idea. And I've been
kicking that concept around for a while now. What I don't like about
it is that it requires fundamental changes in the lock mechanism and
that it is based on the assumption of very low lock conflict.

<explain-PG-R>
In Postgres-R a committing transaction sends it's workset (WS - a list
of all updates done in this transaction) to the group communication
system (GC). The GC guarantees total order, meaning that all nodes
will receive all WSs in the same order, no matter how they have been
sent.

If a node receives back it's own WS before any error occured, it goes
ahead and finalizes the commit. If it receives a foreign WS, it has to
apply the whole WS and commit it before it can process anything else.
If now a local transaction, in progress or while waiting for it's WS
to come back, holds a lock that is required to process such remote WS,
the local transaction needs to be aborted to unlock it's resources ...
it lost the total order race.
</explain-PG-R>

Postgres-R requires that all remote WSs are applied and committed
before a local transaction can commit. Otherwise it couldn't correctly
detect a lock conflict. So there will not be any read ahead. And since
the total order really counts here, it cannot apply any two remote WSs
in parallel, a race condition could possibly exist and a later WS in
the total order runs faster and locks up a previous one, so we have to
squeeze all remote WSs through one single replication work process.
And all the locally parallel executed transactions that wait for their
WSs to come back have to wait until that poor little worker is done
with the whole pile. Bye bye concurrency. And I don't know how the GC
will deal with the backlog either. Could well choke on it.

I do not see how this will scale well in a multi-SMP-system cluster.
At least the serialization of WSs will become a horror if there is
significant lock contention like in a standard TPC-C on the district
row containing the order number counter. I don't know for sure, but I
suspect that with this kind of bottleneck, Postgres-R will have to
rollback more than 50% of it's transactions when there are more than 4
nodes under heavy load (like in a benchmark run). That will suck ...
But ... initially I said that it is an inspiring concept ... soooo ...

I am currently hacking around with some C+PL/TclU+Spread constructs
that might form a rude kind of prototype creature.

My changes to the Postgres-R concept are that there will be as many
replicating slave processes as there are in summary masters out in the
cluster ... yes, it will try to utilize all the CPU's in the cluster!
For failover reliability, A committing transaction will hold before
finalizing the commit and send it's "I'm ready" to the GC. Every
replicator that reaches the same state send's "I'm ready" too. Spread
guarantees in SAFE_MESS mode that messages are delivered to all nodes
in a group or that at least LEAVE/DISCONNECT messages are deliverd
before. So if a node receives more than 50% of "I'm ready", there
would be a very small gap where multiple nodes have to fail in the
same split second so that the majority of nodes does NOT commit. A
node that reported "I'm ready" but lost more than 50% of the cluster
before committing has to rollback and rejoin or wait for operator
intervention.

Now the idea is to split up the communication into GC distribution
groups per transaction. So working master backends and associated
replication backends will join/leave a unique group for every
transaction in the cluster. This way, the per process communication is
reduced to the required minimum.
As said, I am hacking on some code ...
Jan

Chris Travers wrote:
Tom Lane wrote:
Chris Travers <ch***@travelam ericas.com> writes:
Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

regards, tom lane

My mistake. I must have gotten them confused with another
(asynchronous) replication project.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

As my british friends would say, "Bully for you",and I applaud you
playing, struggling, learning from this for our sakes. Jeez, all I think
about is me,huh?
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #9

Marc G. Fournier

On Mon, 25 Aug 2003, Tom Lane wrote:

Chris Travers <ch***@travelam ericas.com> writes:
Yes I have. Postgres-r is not a high-availability solution which is
capable of transparent failover,

What makes you say that? My understanding is it's supposed to survive
loss of individual servers.

How does it play 'catch up' went a server comes back online?

note that I did go through the 'docs' on how it works, and am/was quite
impressed at what they were doing ... but, if I have a large network, say,
and one group is connecting to ServerA, and another group with ServerB,
what happens when ServerA and ServerB loose network connectivity for any
period of time? How do they re-sync when the network comes back up again?

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #10

Similar topics

1545

Backup transaction logs and replication simultaneously

by: Vincento Harris | last post by:

Is there any problem with implementating some kind of replication and backing up transaction logs at the same time? SQL Server 2000 Standard Edition The server is configured to back up log files regularly, with the presence of a(New) Disaster recovery server and with an inability for standby(Standard Edition)I am planning to implement some kind of replication hopefully still backing up the transaction logs to enable a point in time...

Microsoft SQL Server

1568

replication problems with structure/trigger/identity

by: Marko Damaschke | last post by:

Hello alltogether, i'm working on a database-plattform which should work with fail-over-technics. Therefor 2 identic machines with internal RAID are available, which work in single-host-netloadbalance on win2003-server. Because of the internal RAID and caused by costs, a solution without external SCSI-RAID and win2003-clustering-mechanism is looked for. First i tried to use merge-replication but the rowguid-column, which is added,...

Microsoft SQL Server

3255

replication error after promoting slave to master

by: webhosting | last post by:

After a failure, we promoted a slave to master. This included renaming the server to the old master's name and ip address, but leaving the server id alone. The new master works fine. We repaired the old server, gave it a different name and ip address but leaving the server id the same. I get the following error on the slave:

MySQL Database

2451

Proposal for a cascaded master-slave replication system

by: Jan Wieck | last post by:

Dear community, for some reason the post I sent yesterday night still did not show up on the mailing lists. I have set up some links on the developers side under http://developer.postgresql.org/~wieck/slony1.html The concept will be the base for some of my work as a Software Engineer here at Afilias USA INC. in the near future. Afilias is like many of you in need of reliable and performant replication solutions for backup and failover...

PostgreSQL Database

5285

DB2 Replication Error - Oracle 10g to DB2 8.2 (8.1.8)

by: Rafael Faria | last post by:

Hello All, I'm new to DB2, so apologies in advance for any stupid comments I might make on this replication problem. Using Information integrator and DB2 8.2 I managed to create a nickname that references a table sitting on an Oracle 10g Database on a different server. I can reference the nickname and retrieve all the information on that table from my DB2 instance. The plan now is to have a replica of that table on my DB2 instance. I

DB2 Database

2301

Replication Manager 4.0 - Changing IP Addresses (HELP)

by: mark | last post by:

HELP!!! Last week we moved the Server From 1 Location to Another that was running Replication Manager 4.0. (We changed the IP Addresses and changed the name servers also... ) After the switchover, everyone that is outside our Local Area Network is unable to get a successful sync. I've checked all the settings that I know and everything looks right - The Drop Box is getting data from the outside and It looks like the Managed Databases are...

Microsoft Access / VBA

3072

Filter records during Transactional Replication.

by: Query Builder | last post by:

Hi All, I have a table with a column DeletedDate which stores a logical delete of a record. I need to set up transactional replication for reporting purposes that this deleted records should not be replicated to the subscriber. That is, if i see a value on the DeletedDate, I don't want that record to be picked up for replication.

Microsoft SQL Server

2545

Application needs replication

by: John | last post by:

Hi We have an MS Access application which runs on the server but some laptop users need the app to work offline. Our solution has been to use Access replication which automatically syncs data (both ways) when laptop is connected to the network. We are looking to rewrite the app to vb.net/sql server. My question is how does replication work in sql server specially in context of a vb.net front end i.e. what sort of coding/configuration we...

Visual Basic .NET

1139

SQL 2000 server replication and a MSmerge_tombstone problem

by: tsharp89 | last post by:

We currently have SQL Server 2000 and have merge replication. We have the following process 1. Data is transfered to a laptop, data on the laptop is changed and the data on the server is 'locked' no changes can be made. 2. After a few days, the laptop is returned to the office. 3. A script is ran to delete the data on the publisher. Replication works, the subscriber side is updated with the change (no data in the tables).

Microsoft SQL Server

9519

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10437

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10164

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

7538

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6780

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5437

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4113

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3723

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2920

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General