The tool for anonymizing databases, nxs-data-anonymizer 1.4.0, has been launched.

Published nxs-data-anonymizer 1.4.0 – a tool for anonymizing PostgreSQL and MySQL/MariaDB/Percona database dumps. The utility supports data anonymization based on templates and functions of the Sprig library. Among other things, you can use the values ​​of other columns for the same row to fill. It is permissible to use the tool through unnamed pipes on the command line and redirect the dump from the source database directly to the target database with the necessary transformations. The tool is written in Go and released licensed under Apache License 2.0.

In less than a year after the first release of version 1.0.0, the tool has the following features:

Advertisement

  • Added filter function for working with null values.
  • The -l/–log-format command has been implemented, allowing you to select the logging format (json or plain).
  • An indication of the anonymization process has been added – data on the progress of the operation is displayed at specified intervals.
  • In version 1.4, it became possible to set field values ​​using external commands by adding the “type: command” column to the value. For example:
     filters: some_table_name: columns: some_column_name: type: command value: /path/to/command/or/script.sh

    If “type: command” is specified for a column, then the value of the value field is treated as the file path to the command that will be run each time for this field. While the command is running, additional environment variables will be available:

    • ENVVARTABLE={TABLE_NAME}: contains the name of the table to be filtered
    • ENVVARCOLUMN_{COLUMN_NAME}={COLUMN_VALUE}: Contains all columns and their values ​​(before replacement) for the current filtered row.

    As a result, the team concept has the following properties:

    • Stdout will be used as the new value for the anonymized field.
    • The command must return a 0 output code, otherwise nxs-data-anonymizer will fail with an error (in this case, stderr will be used as the error text)
    • Environment variables with row data are available inside the command: ENVVARTABLE: contains the name of the filtered table; ENVVARCURCOLUMN: contains the name of the current column; ENVVARCOLUMN_{COLUMN_NAME}: Contains the values ​​(before replacements) for all columns of the current row.

Thanks for reading:

Advertisement