Data Metric Function in Snowflake
Data Metric Functions (DMF) allow you to automate data quality validation and monitoring. DMFs allow you to:
- Create custom quality metric rules as a reusable functions
- Apply rules to one or more columns
- Specific how frequently you want the quality metrics to be calculated
- Execute metric functions ad-hoc (for testing) or incorporate into your pipelines
- Metrics are executed by the serverless backend, no need to keep your warehouse running
- View and manage all of your quality metrics in a single place
Pre-defined Data Metric Functions (DMF) in Snowflake
Snowflake has the following DMF pre-defined:
FRESHNESS
NULL_COUNT
DUPLICATE_COUNT
UNIQUE_COUNT
ROW_COUNT
Trying out pre-defined DMF
You can use try out the pre-defined DMF functions in Snowflake as following:
select snowflake.core.null_count(select email_address from customer_dim);
select snowflake.core.duplicate_count(select email_address from customer_dim);
select snowflake.core.unique_count(select email_address from customer_dim);
Attaching a pre-defined Data Metric Function to a Table
The following ALTER
will attach a built-in Data Metric Function NULL_COUNT
to the EMAIL_ADDRESS
column in the CUSTOMER_DIM
table
ALTER TABLE customer_dim ADD DATA METRIC FUNCTION NULL_COUNT ON (EMAIL_ADDRESS);
Tell Snowflake to evaluate the DMF every hour:
ALTER TABLE customer_dim SET DATA_METRIC_SCHEDULE = '1 HOUR';
Inpect the quality metrics:
SELECT * FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
Custom Data Metric Functions
Defining a custom Data Metric Functions (DMF)
Creating a Data Metric Function is as simple as creating a UDF. The function takes a new TABLE data type with one or more column arguments and returns a single result value.
CREATE DATA METRIC FUNCTION count_of_nulls(
ARG_T TABLE (ARG_C1 STRING, ARG_C2 STRING, ARG_C3 STRING)
)
RETURNS NUMBER
AS
$$
SELECT COUNT_IF(ARG_C1 IS NULL OR ARG_C2 IS NULL OR ARG_C3 IS NULL)
from ARG_T
$$;
All Data Metric Functions are added to your warehouse so other users can reuse them in a consistent manner without having to recreate the same rules over and over.
You can see which metrics are available by running:
SHOW DATA METRIC FUNCTIONS IN <DATABASE_NAME>.<SCHEMA_NAME>;
Attaching custom Data Metric Function to a Table
You attach the custom Data Metric Function to the columns you want to evaluate:
ALTER TABLE CUSTOMER_DIM
ADD DATA METRIC FUNCTION count_of_nulls ON (FIRST_NAME, LAST_NAME, EMAIL_ADDRESS);
Next, you set the interval at which metrics should be calculated. This is configured at the table level and can be as frequent as 1 minute.
ALTER TABLE some_table SET DATA_METRIC_SCHEDULE = '1 HOUR';
You can test a customer DMFs by executing them ad-hoc or include them directly in your data pipelines running on Snowflake for in-line validation. This is how you execute a DMF:
SELECT scratch.saqib_ali.count_of_nulls (
SELECT FIRST_NAME, LAST_NAME, EMAIL_ADDRESS FROM CUSTOMER_DIM
) AS VALUE;