Skip to main content

Join Logic

Joins are a powerful part of MetricFlow and allow you to make all valid dimensions available for your metrics on the fly, even if they are defined in a different data source. Joins also alow you to construct metrics using measures from different data sources.

MetricFlow uses the identifiers defined in your data source configs as the join keys between tables. Assuming the identifiers are set, MetricFlow can perform joins under the hood by creating a graph using the data sources as nodes and the join paths as edges. MetricFlow will generate the appropriate join type and avoid fan-out or chasm joins with other tables based on the identifier types.

Example:

This example uses two data sources with a common identifier and shows a MetricFlow query that requires a join between the two data sources.

Say you have two data sources, transactions and user_signup as seen below:

data_source:
name: transactions
identifiers:
- name: id
type: primary
- name: user_id
type: foreign
measures:
- name: average_purchase_price
agg: avg
expr: purchase_price
create_metric: true
--
data_source:
name: user_signup
identifiers:
- name: user_id
type: primary
dimension:
- name: type
type: categorical

Under the hood, MetricFlow will use user_id as the join key between the data sources transactions and user_signup. As a result, we can now ask for the metric average_purchase_price (in the data source transactions) to be sliced via the dimension type from data source user_signup.

Additionally, notice that the metric average_purchase_price is defined in transactions, which has user_id as a primary identifier. Furthermore, the data source user_signup has user_id as a primary identifier. As a reuslt, this is a primary to foreign relationship, and implemented as a left join (transactions left join user_signup, as the measure average_purchase_price is defined in transactions).

note

If you have multiple dimensions with the same name, it would be possible for an ambiguous join path to be created. To avoid this, use a dimension which includes the prefixed identifier that's meant to disambiguate where the dimension comes from. You can see the fully scoped dimension name by running mql list-metrics in the CLI.

When querying dimensions from different data sources, the convention is to prefix the dimension name with a dunder (double underscore) following the joining identifier in order to avoid ambiguous join paths. The CLI query below includes user_id__type for the argument dimensions; the query is specified to jump from data source transactions (where the metric average_purchase_price is defined) to the data source user_signup via the identifier user_id to obtain the dimension type.

mql query --metrics average_purchase_price --dimensions ds,user_id__type 

Types of Joins

MetricFlow implements most joins as left joins, and does not allow fan-out and chasm joins. Please reference the table below to identify which joins are/are not allowed based on specific identifer types in order to avoid constructing dangerous joins.

Identifier type - Table AIdentifier type - Table BJoin type
PrimaryPrimaryLeft
PrimaryUniqueLeft
PrimaryForeignFanout (Not Allowed)
UniquePrimaryLeft
UniqueUniqueLeft
UniqueForeignFanout (Not Allowed)
ForeignPrimaryLeft
ForeignUniqueLeft
ForeignForeignFanout (Not Allowed)

Multi-Hop Joins

Transform allows users to join measures and dimensions across a graph of identifiers. We call this a 'multi-hop join' as a user can 'hop' across one table to another.

Observe the schema below for the following example discussion:

Multi-Hop-Join

Notice how this schema can be translated into the 3 MetricFlow data sources below to create the metric 'Average Purchase Price by Country' using the purchase_price measure from the sales table and the country_name dimension from the country_dim table.

data_source:
name: sales
identifiers:
- name: id
type: primary
- name: user_id
type: foreign
measures:
- name: average_purchase_price
agg: avg
expr: purchase_price
create_metric: true
--
data_source:
name: user_signup
identifiers:
- name: user_id
type: primary
- name: country_id
type: Unique
dimension:
- name: signup_date
type: time
type_params:
is_primary: true
--
data_source:
dame: country_dim
identifiers:
- name: country_id
type: primary
dimension:
- name: country_name
type: categorical

Using Multi-Hop Joins

To query dimensions without a multi-hop join involved, you use the dimension name prefixed with a dunder (double underscore) and identifier. The same paradigm applies for dimensions retrieved by a multi-hop join, except you may have multiple sets of dundered identifiers, since you are hopping across multiple data sources.

As an example -- suppose you are interested in viewing the metric average_purchase_price split by the dimension country_name. Based on the schema specified in the data sources, MetricFlow knows to involve a multi-hop of (1) from data source sales to data source user_signup via identifier user_id; (2) from data source user_signup to data source country_dim via identifier country_id to obtain the desired final dimension of country_name.

To query this multi-hop join, the dimension is specified with 2 sets of dunders as user_id__country_id__country_name to reflect the 2 identifiers (user_id and country_id):

MetricFlow

mf query --metrics average_purchase_price dimensions --ds,user_id__country_id__country_name

Transform

mql query --metrics average_purchase_price dimensions --ds,user_id__country_id__country_name