Yuriy Markiv blog

TECH

Track db changes - logical decoding approach in PostgreSQL RDS (version 0)

04 June, 2019 | tech

Prerequisites

We need to audit changes made in database. We have list of tables to be tracked, list can be changed.

Previous known solution: audit tables based on triggers. It works, but it is rather old-fashioned approach. Also it requires creating audit tables, triggers. Also in case of changes in tracked tables structure we should change audit tables and triggers as well.

 

Modern solution

PostgreSQL has more modern solution, it's logical decoding. This feature is supported by Amazon RDS.

To start using it, we need to change wal_level parameter to logical.

 

Speaking of self-hosted instance, the parameter is in postgresql.conf, depending on OS it can be located in different directories, on Ubuntu 18.04.2 LTS it is located in /etc/postgresql/10/main/postgresql.conf

Reboot db instance after changing it.

 

PostgreSQL RDS should be fixed from AWS console.

Amazon docs regarding this problem: (a) https://aws.amazon.com/premiumsupport/knowledge-center/rds-modify-parameter-group-values/ (b) https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.FeatureSupport.LogicalReplication

Long story short, (1) create parameter group; (2) set rds.logical_replication static parameter to 1 and this will do the trick with setting wal_level parameter to logical; (3) apply parameter group to your db instance; (4) reboot db instance.

 

Next steps will work for both self-hosted and RDS-based PostgreSQL instances.

 

Now open SQL editor.

Run below query to create logical replication slot:

select * from pg_create_logical_replication_slot('pgday_slot','test_decoding');

(put something meaningful instead of test_decoding if needed)

 

Create test table:

create table table1 (
id SERIAL PRIMARY KEY,
data character varying(50),
updated_at TIMESTAMP
DEFAULT CURRENT_TIMESTAMP,
updated_by character varying(50)
DEFAULT CURRENT_USER
);

 

Insert some data:

INSERT INTO table1 (data)
values ('test value 1');

 

Update some data:

UPDATE table1 set data = 'test value 2' where id = 1;

 

Delete some data:

Delete from table1 where id = 1;

 

See all your changes:

select data from pg_logical_slot_get_changes('pgday_slot',NULL,NULL) where upper(data) like upper('%table1%');

 

You will get log data like this:

"table public.table1: INSERT: id[integer]:1 data[character varying]:'test value 1' updated_at[timestamp without time zone]:'2019-06-03 20:10:47.872703' updated_by[character varying]:'user1"
"table public.table1: UPDATE: id[integer]:1 data[character varying]:'test value 2' updated_at[timestamp without time zone]:'2019-06-03 20:10:47.872703' updated_by[character varying]:'user1'"
"table public.table1: DELETE: id[integer]:1"

 

! Please consider that INSERT and UPDATE can be tracked including timestamp and user, if the table has those columns with default CURRENT_TIMESTAMP and CURRENT_USER values. Current solution version will not show timestamp and user for DELETE. That's the cost of simplicity for this solution.

 

Now the data should be stored in log table, because each time you query log using pg_logical_slot_get_changes it disappears from system to save space.

 

Create log table:

Create table log (
id SERIAL PRIMARY KEY,
log_text text,
updated_at TIMESTAMP
DEFAULT CURRENT_TIMESTAMP,
updated_by character varying(50)
DEFAULT CURRENT_USER
);

 

Insert some data into table1:

INSERT INTO table1 (data)
values ('test value 1');

INSERT INTO table1 (data)
values ('test value 2');

 

Store log entries:

Do $$
Declare
rec RECORD;
Begin
For rec In Select data from pg_logical_slot_get_changes('pgday_slot',NULL,NULL) where upper(data) like upper('%table1%') and upper(data) not like upper('%table public.log:%')
Loop
If rec.data Is Not Null Then
Insert Into log (log_text)
Select rec.data;
End If;
End Loop;
End $$;

 

Test result:

Select log_text from log;

 

You should receive something like:

"table public.table1: INSERT: id[integer]:14 data[character varying]:'test value 1' updated_at[timestamp without time zone]:'2019-06-03 21:06:05.128496' updated_by[character varying]:'user1'"
"table public.table1: INSERT: id[integer]:15 data[character varying]:'test value 2' updated_at[timestamp without time zone]:'2019-06-03 21:06:05.128496' updated_by[character varying]:'user1'"

 

Known yet-to-do: (1) create settings table to store table names to be tracked and table name for log; (2) parse settings table instead of hardcoded upper(data) like upper('%table1%') and upper(data) not like upper('%table public.log:%')

This way we will be able to easily add/remove table names to be tracked. Without settings table we should hardcode table name which is not nice.