Cumulative Metrics
What is a cumulative metric?
Cumulative metrics aggregate a measure over a given window. If no window is specified, the window is considered infinite and would accumulate the measure over all time.
Window Options
If a window option is specified, the MetricFlow framework applies a sliding window to the underlying measure.
Suppose the underlying measure distinct_users
is configured as such to reflect a count of distinct users by user_id and user_status.
measures:
- name: distinct_users
description: The number of distinct users creating mql queries
expr: case when user_status in ('PENDING','ACTIVE') then user_id else null end
agg: count_distinct
We can write a cumulative metric wau_rolling_7
as such:
---
metric:
name: wau_rolling_7
owners:
- support@transformdata.io
# Define the measure and the window.
type: cumulative
type_params:
measures:
- distinct_users
# the default window is infinity - omitting window will accumulate the measure over all time
window: 7 days
From the sample yaml above, note the following:
type
: Specify cumulative to indicate the type of metric.type_params
: Specify the measure to be aggregated as a cumulative metric. You have the option of specifying awindow
, or agrain to date
.
In the cumulative metric wau_rolling_7
, the MetricFlow framework first takes a sliding 7 day window of all relevant users, and then applies the measure's count distinct function.
Note that omitting window
will accumulate the measure over all time. Otherwise, the available granularities are:
- day or days
- week or weeks
- quarter or quarters
- month or months
As an example, you can use simple phrases to describe the window:
- 7 days
- 1 month
- 3 months
Omitting Window Example Use Case
Suppose you (a subscription-based company for the sake of this example) have an event-based log table with the following columns:
date
: a date columnuser_id
: (integer) an ID specified for each user that is responsible for the eventsubscription_plan
: (integer) a column that indicates a particular subscription plan associated with the user.subscription_revenue
: (integer) a column that indicates the value associated with the subscription plan.event_type
: (integer) a column that populates with +1 to indicate an added subscription, or -1 to indicate a deleted subscription.revenue
: (integer) a column that multipliesevent_type
andsubscription_revenue
to depict the amount of revenue added or lost for a specific date.
We can use cumulative metrics without a window specified to obtain a running total, which can be used to calculate metrics like the count of active subscriptions and revenue at any point in time.
Below would be the configuration YAML to create cumulative metrics (without a window) to obtain the current revenue or total number of active subscriptions (as a cumulative sum).
measures:
- name: revenue
description: Total revenue
agg: sum
expr: revenue
- name: subscription_count
description: Count of active subscriptions
agg: sum
expr: event_type
---
metric:
name: current_revenue
description: Current revenue
owners:
- support@transformdata.io
type: cumulative
type_params:
measures:
- revenue
---
metric:
name: active_subscriptions
description: Count of active subscriptions
owners:
- support@transformdata.io
type: cumulative
type_params:
measures:
- subscription_count
Grain to Date
You can optionally specify grain to date in your cumulative metric configuration to accumulate the metric from the start of a grain (i.e. week, year, month). When using a window (like month), Transform will go back one full calendar month, whereas grain to date will always start at the beginning of the grain regardless of the latest date of data.
Suppose we have the underlying measure of total_revenue
:
measures:
- name: total_revenue
description: Total revenue (summed)
agg: sum
expr: revenue
We can compare the difference between a window of 1 month vs. a grain to date by month. Notice how for a window, the cumulative metric behaves by applying a sliding window of 1 month whereas the grain to date by month acts as a resetting point at the beginning of each month.
---
metric:
name: revenue_monthly_window #For this metric, we use a window of 1 month
description: Monthly revenue using a window of 1 month (think of this as a sliding window of 30 days)
owners:
- support@transformdata.io
type: cumulative
type_params:
measures:
- total_revenue
window: 1 month
---
metric:
name: revenue_monthly_grain_to_date #For this metric, we use a monthly grain to date
description: Monthly revenue using a grain to date of 1 month (think of this as a monthly resetting point)
owners:
- support@transformdata.io
type: cumulative
type_params:
measures:
- total_revenue
grain_to_date: month
Implementation
The current implementation joins the table we're using to calculate the metric to a timespine table. The join key is the primary time dimension. We use the accumulation window in the join to determine if a record should be included on a given day. The generated SQL from an example cumulative metric is shown below:
-- Constrain Output with WHERE
-- Pass Only Elements:
-- ['distinct_users', 'metric_time']
-- Aggregate Measures
-- Compute Metrics via Expressions
-- Order By [] Limit 100
SELECT
COUNT(DISTINCT distinct_users) AS weekly_active_users
, metric_time
FROM (
-- Join Standard Outputs
-- Pass Only Elements:
-- ['distinct_users','metric_time']
-- Constrain Time Range to [2000-01-01T00:00:00, 2040-12-31T00:00:00]
SELECT
subq_3.distinct_users AS distinct_users
, subq_3.metric_time AS metric_time
FROM (
-- Join Self Over Time Range
SELECT
subq_2.distinct_users AS distinct_users
, subq_1.metric_time AS metric_time
FROM (
-- Date Spine
SELECT
metric_time
FROM transform_prod_schema.mf_time_spine subq_1356
WHERE (
metric_time >= CAST('2000-01-01' AS TIMESTAMP)
) AND (
metric_time <= CAST('2040-12-31' AS TIMESTAMP)
)
) subq_1
INNER JOIN (
-- Read elements from data source 'transactions'
-- Constrain Time Range to [1999-12-26T00:00:00, 2040-12-31T00:00:00]
-- Pass Only Elements:
-- ['distinct_users', 'metric_time']
SELECT
distinct_users AS distinct_users
, DATE_TRUNC('day', ds) AS metric_time
FROM demo_schema.transactions transactions_src_426
WHERE (
(DATE_TRUNC('day', ds)) >= CAST('1999-12-26' AS TIMESTAMP)
) AND (
(DATE_TRUNC('day', ds)) <= CAST('2040-12-31' AS TIMESTAMP)
)
) subq_2
ON
(
subq_2.metric_time <= subq_1.metric_time
) AND (
subq_2.metric_time > DATEADD(day, -7, subq_1.metric_time)
)
) subq_3
)
GROUP BY
metric_time
LIMIT 100