MATCH_RECOGNIZE and Data Vault Effectivity Satellite - Part II
Effectivity Satellite is a powerful Data Vault artifact that is used to track the effectivity of a relationship. For e.g. it can be used to track the relationship between:
- Order and Customer, or
- Opportunity and Account, or
- Subscription Tier and Customer etc.
A Data Vault Effectivity Satellite hangs from the LINK, and only contains the Driving Business Key, Secondary Foreign Key and the Start Date and End Date for the given Relationship.
Example of an Effectitity Satellite
CUSTOMER_BK | SUBSCRIPTION_TIER_BK | START_DATE | END_DATE |
---|---|---|---|
Scott | Personal | 2021-01-01 | 9999-12-31 |
Scott | Personal | 2021-01-01 | 2021-06-01 |
Scott | Free | 2021-06-01 | 9999-12-31 |
Bob | Business | 2021-02-01 | 9999-12-31 |
Angela | Premium | 2021-02-01 | 9999-12-31 |
Angela | Premium | 2021-02-01 | 2021-03-01 |
Angela | Free | 2021-03-01 | 9999-12-31 |
Angela | Free | 2021-03-01 | 2021-05-15 |
Angela | Premium | 2021-05-15 | 9999-12-31 |
Ryan | Free | 2021-03-01 | 9999-12-31 |
The above Effectivity Satellite is used to track the subscription tier a Customer has purchased. The CUSTOMER_BK
is the Driving Key, and the SUBSCRIPTION_TIER_BK
is the Secondary Foreign Key. The START_DATE
and the END_DATE
indicate when the Subscription was purchased and when it was ended.
Note: Since Data Vault 2.0 is an insert only pattern, we do not update the existing record to set the
END_DATE
, instead we insert a new record with theEND_DATE
set.
Effectivity Satellite and Data Analysis
The structure of an Effectivity Satellite lends itself to Longitudinal Data Analysis. You can identify patterns, perform timeseries analysis or even use the data directly in statistical learning model.
MATCH_RECOGNIZE and pattern recognition
In relational systems, a row pattern recognition task is to detect a sequence of ordered rows from an input table that match a specified pattern. For example, a financial service provider needs to identify sequences of suspicious transactions that match known patterns of criminal activities; an e-commerce site analyzes the steps taken by customers from landing through a social media referrer to a successful purchase. The MATCH_RECOGNIZE clause in SQL allows users to search for patterns in rows of data using a powerful and expressive syntax that is based on RegEx.
One way to analyze data in an Effectivity Satellite is to use SQL’s MATCH_RECOGNIZE clause.
MATCH_RECOGNIZE clause in a SQL Query allows us to:
- Define patterns using REGEX convention
- Match longitudinal data against those patterns
- Identify records that match the patterns
Pattern Definition
In the MATCH_RECOGNIZE clause, the pattern is constructed from basic building blocks, called pattern variables, to which operators (quantifiers and other modifiers) like Kleene Star *
and Kleene Plus +
and can be applied. The whole pattern must be enclosed in paranthesis. For example a pattern can be defined as following:
(init modification+ reversal)
The following MATCH_RECOGNIZE query can be used on the above Effectivity Satellite to identity Customer(s) that started with Subscription Tier, switched to a different Subscription Tier, and then switched back to the first Subscription Tier they had.
with effectivity_sat as(
select *
from sate_customer_subscription
where end_date = '9999-12-31'
)
select *
from effectivity_sat
match_recognize(
partition by customer_bk
order by start_date
measures
classifier() as action
all rows per match
pattern (init modification+ reversal)
define
init as subscription_tier_bk = first_value(subscription_tier_bk)
, modification as subscription_tier_bk <> first_value(subscription_tier_bk)
, reversal as subscription_tier_bk = first_value(subscription_tier_bk)
);
CUSTOMER_BK | SUBSCRIPTION_TIER_BK | START_DATE | END_DATE | ACTION |
---|---|---|---|---|
Angela | Premium | 2021-02-01 | 9999-12-31 | INIT |
Angela | Free | 2021-03-01 | 9999-12-31 | MODIFICATION |
Angela | Premium | 2021-05-15 | 9999-12-31 | REVERSAL |
There are three tiers of PAID plans– Personal, Premium and Business. A Customer may switch between these tiers multiple times before finally downgrading to a FREE tier.
Query Output
The query identified the following Customers started with a Premium Subscription Tier, switched to a Free Subscription Tier, and then eventually switched back to the original Subscription Tier (Premium).
CUSTOMER_BK | SUBSCRIPTION_TIER_BK | START_DATE | END_DATE | ACTION |
---|---|---|---|---|
Angela | Premium | 2021-02-01 | 9999-12-31 | INIT |
Angela | Free | 2021-03-01 | 9999-12-31 | MODIFICATION |
Angela | Premium | 2021-05-15 | 9999-12-31 | REVERSAL |
See also:
- A/B Analysis on Streaming Data using MATCH_RECOGNIZE
- Applied overview of MATCH_RECOGNIZE clause
- Correlated pattern definitions and Snowflake’s MATCH_RECOGNIZE
- How to Predict Customer Churn Using SQL Pattern Detection - MATCH_RECOGNIZE
- Association Rule Mining using MATCH_RECOGNIZE
- MATCH_RECOGNIZE
- Stock Analysis using MATCH_RECOGNIZE
- MATCH_RECOGNIZE and Data Vault Effectivity Satellite - Part II
- MATCH_RECOGNIZE and Data Vault Effectivity Satellite
- What’s eating up your Snowflake Virtual Warehouse - Part II featuring MATCH_RECOGNIZE