The PostgreSQL Anonymizer extension can be used for masking or replacing personally identifiable information (PII) or commercially sensitive data in a YSQL database.

The extension has a declarative approach to anonymization. This means you can declare the masking rules using the PostgreSQL Data Definition Language (DDL) and specify your anonymization policy inside the table definition itself.

YugabyteDB uses v1.3.1 of PostgreSQL Anonymizer.

Enable Anonymizer

To enable the Anonymizer extension, add anon to shared_preload_libraries in the PostgreSQL server configuration parameters using the YB-TServer --ysql_pg_conf_csv flag:

--ysql_pg_conf_csv=shared_preload_libraries=anon

Note that modifying shared_preload_libraries requires restarting the YB-TServer.

Customize Anonymizer

You can customize the following anon parameters:

Parameter Description Default
anon.algorithm The hashing method used by pseudonymizing functions. sha256
anon.maskschema The schema (that is, the namespace) where the dynamic masking views will be stored. mask
anon.restrict_to_trusted_schemas By enabling this parameter, masking rules must be defined using functions located in a limited list of namespaces. true
anon.salt The salt used by pseudonymizing functions. (empty)
anon.sourceschema The schema (that is, the namespace) where the tables are masked by the dynamic masking engine. public

For more information, refer to Configuration in the Anonymizer documentation.

Create the anon extension

To enable the extension:

CREATE EXTENSION anon;

If you want to use the anon.fake_* functions, you need to load the fake data (see Declare masking rules).

BEGIN;
SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO TRUE;
SELECT anon.init(); -- Loads fake data
COMMIT;

Use Anonymizer

Declare masking rules

Too use Anonymizer, you first declare a masking policy. A masking policy is a set of masking rules stored inside the database model and applied to various database objects.

You declare data masking rules using security labels. For example:

CREATE TABLE player( id SERIAL, name TEXT, total_points INT, highest_score INT);

INSERT INTO player VALUES
  ( 1, 'Kareem Abdul-Jabbar', 38387, 55),
  ( 5, 'Michael Jordan', 32292, 69);

SECURITY LABEL FOR anon ON COLUMN player.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN player.id
  IS 'MASKED WITH VALUE NULL';

Anonymizer provides many different functions that you can use to declare the masking rules. For a list of masking functions, refer to Masking functions in the Anonymizer documentation.

Note that YugabyteDB does not currently support the anon.dummy_ functions.

Refer to Declare masking rules in the Anonymizer documentation for more information.

Dynamic masking

To enable dynamic masking:

SELECT anon.start_dynamic_masking(false);

You must run this every time a masked security label is created for a user or role.

The boolean parameter indicates whether fake data should be loaded or not. It is recommended to use anon.init() to load fake data. This creates masked views on the anon.maskschema for all the tables present in anon.sourceschema, and alters the privileges of all users with a masked security label so that a masked user will only be able to read masked data and not the original data.

To check if a role with a masked security label will see masked values, use the following query:

SELECT rolname, rolconfig FROM pg_roles WHERE rolname = '<role_name>'

If rolconfig contains search_path=<anon.maskschema>, <anon.sourceschema>, then the masked user will see masked values.

The following shows output where the anon.maskschema and anon.sourceschema parameters are set to their default values:

 rolname |          rolconfig
---------+------------------------------
 skynet  | {"search_path=mask, public"}

Note that Backup and restore doesn't preserve roles, and will also not restore masked security labels for roles. After a restore, you will need to manually recreate security labels for roles, and then enable dynamic masking.

To disable dynamic masking:

SELECT anon.stop_dynamic_masking();

This drops all the masked views, unmasks all the masked roles, and drops anon.maskschema.

Static masking

After declaring masking rules for columns, you can use the following parameters to define masks:

  • anon.anonymize_column(<table-name>, <column-name>) - Anonymize the column of the table with the declared masking rule.
  • anon.anonymize_table(<table-name>) - Anonymize all the columns with masking rules of the table.
  • anon.anonymize_database(<database-name>) - Anonymize all the columns with masking rules of all the tables in the given database.

With static masking, the data is lost forever once it is anonymized. Refer to Static masking in the Anonymizer documentation for more information.

Examples

Dynamically mask contact information

CREATE EXTENSION anon;

BEGIN;
SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO true;
SELECT anon.init(); -- Loads fake data
COMMIT;

CREATE TABLE people ( id TEXT, firstname TEXT, lastname TEXT, phone TEXT);
INSERT INTO people VALUES ('1', 'John', 'Doe','1234567890');
SELECT * FROM people; -- non masked user can read original values
 id | firstname | lastname |   phone    
----+-----------+----------+------------
 1  | John      | Doe      | 1234567890
SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 2, $$******$$, 2)';

CREATE ROLE skynet LOGIN;
SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO skynet;

\c yugabyte skynet

SELECT * FROM people;
 id | firstname | lastname |   phone    
----+-----------+----------+------------
 1  | John      | Doe      | 1234567890

Note how, as we have not yet started dynamic masking, the data is not masked for the masked user.

\c yugabyte yugabyte

SELECT anon.start_dynamic_masking(false);

\c yugabyte skynet

SELECT * FROM people;
 id | firstname | lastname |   phone    
----+-----------+----------+------------
 1  | John      | Doe      | 12******90

Prevent a user from reading values from the source table

First set the cluster-wide value of anon.sourcesschema to test:

--ysql_pg_conf_csv=shared_preload_libraries=anon,anon.sourceschema=test

Execute the following commands:

CREATE EXTENSION anon;

BEGIN;
SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO true;
SELECT anon.init(); -- Loads fake data
COMMIT;

CREATE SCHEMA test;

CREATE TABLE test.people ( id TEXT, firstname TEXT, lastname TEXT, phone TEXT);
INSERT INTO test.people VALUES ('1', 'John', 'Doe','1234567890');

CREATE ROLE skynet LOGIN;
SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';

SECURITY LABEL FOR anon ON COLUMN test.people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 2, $$******$$, 2)';

SELECT anon.start_dynamic_masking(false);

\c yugabyte skynet

SELECT * FROM people;
 id | firstname | lastname |   phone    
----+-----------+----------+------------
 1  | John      | Doe      | 12******90
SELECT * FROM test.people;
ERROR:  permission denied for schema test
LINE 1: SELECT * FROM test.people;

Statically mask contact information

CREATE EXTENSION anon;

BEGIN;
SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO true;
SELECT anon.init();
COMMIT;

CREATE TABLE people ( id TEXT, firstname TEXT, lastname TEXT, phone TEXT);
INSERT INTO people VALUES ('1', 'John', 'Doe','1234567890');

SECURITY LABEL FOR anon ON COLUMN people.lastname
IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 2, $$******$$, 2)';

SELECT anon.anonymize_table('people');

SELECT * FROM people;
 id | firstname | lastname |   phone    
----+-----------+----------+------------
 1  | John      | Bryant   | 12******90

Mask without faking

CREATE EXTENSION anon;

CREATE TABLE people ( id TEXT, firstname TEXT, lastname TEXT, phone TEXT);
INSERT INTO people VALUES ('1', 'John', 'Doe', '1234567890');

SECURITY LABEL FOR anon ON COLUMN people.firstname
IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';

SECURITY LABEL FOR anon ON COLUMN people.lastname
IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 2, $$******$$, 2)';

CREATE ROLE skynet LOGIN;
SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';

SELECT anon.start_dynamic_masking(false);

\c yugabyte skynet

SELECT * FROM people;
 id |  firstname   |   lastname   |   phone    
----+--------------+--------------+------------
 1  | CONFIDENTIAL | CONFIDENTIAL | 12******90

Limitations

  • Masking views and materialized views are not supported.
  • The SECURITY LABEL commands on tables and databases are not supported.
  • YugabyteDB does not currently support the anon.dummy_ functions.

Refer to Masking rule limitations and Legacy masking rule limitations in the Anonymizer documentation for information on the Anonymizer extension limitations.