1. Create replication slot
This way, you preserve WAL (new changes) from being deleted. This would also create a snapshot of a current state. Snapshot can be exported, or reused in current transaction.
2. Import existing data.
In replication only changes are streamed. Current state needs to be backuped manually.
There is a mechanism to make a backup of a running server, called “base backup”. You need to specify snapshot name, to avoid data races. Base backup is tar archive of current pg_data folder, that can be used as replica initial data.
Base backup would not work here, but we have a snapshot, that allows us to query DB, using that exact state.
So we fetch all the data in old fashioned way, using
COPY foo TO or
SELECT FROM foo command.
COPY is prefered, since it’s faster, and is available in simple protocol.
3. Subscribe and Stream changes. Protocol
START_REPLICATION SLOT command server sends all updates. Format depends of slot. Physical is raw WAL, while Logical depends of decoding plugin, specified it this command
pgoutput for example, uses same encoding as regular query results. Since 14.0 binary encoded rows are also awailable.
4. Always Commit changes
After data was successfuly processed and is no longer needed, update WAL positions via
Standby status update message. This would unlock garbage collection of old log files.
5. Resume streaming
After restart (regular, crash, powerloss…)
If slot IS NOT
Goto P3 and continue streaming from last commited position, no additional actions are required.
If slot IS
Clean the data Goto P1, and start from scratch