Skip to main content

Dimensions

Dimensions Overview

Dimensions are non-aggregatable expressions that are used to define the level of aggregation that a MetricFlow user would like a metric to be aggregated to. You can also think of dimensions as the various data cuts (the slices and dices) that you would like to see a metric grouped by. Dimensions cannot be aggregated, so they are then considered to be a property of the primary or unique identifiers of the table.

Dimensions are defined within data sources along with measures and identifiers. You can think of dimensions as the columns in your data table or query that are non-aggregatable — they provide categorical or time-based context to enrich queries, and are usually included in the GROUP BY clause of your SQL query.

Consider the following data source as the basis for the examples to follow.

data_source:
name: transactions
description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.
owners: support@transform.co
sql_table: schema.transactions

# --- IDENTIFIERS ---
identifiers:
- name: transaction_id
type: primary
- name: customer_id
type: foreign
- name: store_id
type: foreign
- name: product_id
type: foreign

# --- MEASURES ---
measures:
- name: revenue
description:
expr: price * quantity
agg: sum

# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('day', ts)
type_params:
is_primary: true
time_granularity: day
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end

All dimensions require a name, type and in most cases, an expr parameter.

  • name refers to the name for the dimension, which will be exposed to the end user. The name parameter can also act as an alias if the underlying column or SQL query reference is a different string within the expr parameter.

    • Note that dimension names must be unique within a data source, but they are not required to be unique across different data sources. MetricFlow will identify the correct dimension via the proper join identifiers. For further discussion on join paths, see the Join logic page.
  • type refers to the type of dimension created within the data source. Current types include:

    • Categorical: Usually a way to group rows within a table, such as geography, product type, color, etc.
    • Time: Time dimensions will point to an underlying date field in the warehouse. All time dimensions must be TIMESTAMP or the DB engine equivalent.
  • expr (optional) refers to the underlying column or SQL query to define the dimension. If no expr parameter is present for a dimension, MetricFlow will look for a column that matches the name parameter.

    • For valid inputs into the expr parameter: you may input the column name itself, or a valid SQL expression such as a case statement.

Dimensions Types

Categorical Dimensions

Category dimensions allow metrics to be grouped by different "category" slices and dices, such as product type, color, or geographical area. These may point to existing columns in the data table, or be calculated using a SQL expression with the expr parameter.

As different business functional areas may have different ways to categorize the data, it can be very valuable to create new dimensions to capture those bespoke groupings.

In the example file provided below, is_bulk_transaction is a categorical dimension where we have provided a case statement for further grouping based on the underlying column quantity. is_bulk_transaction will then be available to the end user to filter or group by in the UI.

Example:

dimensions: 
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end

Time Dimensions

tip

Time dimension columns must be datetime data type if you are using BigQuery as your Data Warehouse. If they are stored as timestamp or another data type, you can cast the dimensions to datatime in the expr property. i.e expr: cast(timestamp as datetime) All time dimensions must be TIMESTAMP or the DB engine equivalent.

Time dimensions are used to aggregate metrics against different levels of time granularity. Current granularities supported by MetricFlow include day, week, month, quarter, and year (see the parameter time_granularity for further details).

Note that a primary time dimension is required for data sources that contain measures. The name of your primary time dimension must also be the same across all of your data sources

  • Set the is_primary parameter to True to indicate the time dimension that is to be the primary, or preferred time dimension for a measure (or metric).
  • The primary time dimension is used for graphing the x-axis within the Transform UI. For materializations, the primary time dimension is used as the common time dimension for all metrics.

Time dimensions in MetricFlow have a few more parameters, which are specified under the type_params section, as seen in the example below.

Example:

dimensions: 
- name: ds
type: time
expr: date_trunc('day', ts)
is_partition: True
type_params:
is_primary: True
time_granularity: day

Within type_params:

  • is_primary: Set the is_primary parameter to True to indicate if a time dimension is to be the primary, or preferred, time dimension for a measure or metric.
    • Example: Suppose you have a list of clients in a table with 2 date columns–one as the date of client creation (client_creation_date) in the system and the other as client contract closing date (client_close_date). In MetricFlow, you may specify these 2 time dimensions and indicate which date column is the preferred time dimension via is_primary: True.
    • Note (again) that a primary time dimension is required for data sources that contain measures. The name of your primary time dimension must also be the same across all of your data sources. If you have other time dimensions in your data source, these time dimensions will require is_primary: False.
  • time_granularity: Time granularity refers to the minimum granularity that a measure or metric should be reported by. For example, a metric that is specified with weekly granularity would not be allowed to be shown with daily granularity.
    • Current options for time granularity are day, week, month, quarter, and year.
    • Moreover, the specified time granularity allows for proper aggregation when querying metrics of different granularities. For example, when querying two metrics (one with a DAY granularity and another with a MONTH granularity), MetricFlow will only return a result with the MONTH granularity by default.
    • Within the Transform UI, the specified time granularity also allows graphs to render correctly (e.g. a DAY granularity cannot be specified for a MONTHLY metric).

Also notice the is_parition parameter in the example above.

  • is_partition: To assert that a dimension exists over a specific time window(e.g. a date-partitioned dimensional table), add the is_partition: True parameter to the dimension. MetricFlow will add this date to any joins on this table to ensure that the correct dimensional values are joined to measures in other tables.

MetricFlow also allows for easy aggregation of metrics on the fly via the time-granularity option in the CLI. Suppose we have a measure named messages_per_month where the original time_granularity of the time dimension ds was set to monthly as seen below:

Example:

data_source:
name: messages_month
description: Each row represents a message truncated to the month.
owners: support@transform.co
sql_table: schema.messages

# --- IDENTIFIERS ---
identifiers:
- name: message_id
type: primary
- name: user_id_from
type: foreign
- name: user_id_to
type: foreign
- name: message_user_from_to # Composite key
type: unique
identifiers:
- ref: message_id
- ref: user_id_from
- ref: user_id_to

# --- MEASURES ---
measures:
- name: messages_by_month
description: The total number of messages sent by month
expr: 1
agg: sum
create_metric: True

# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('month', ts) #Underlying column is named “ts”
type_params:
is_primary: true
time_granularity: month

The CLI command example below shows aggregating the messages_per_month metric up to yearly via the time-granularity option:

MetricFlow:

mf query --metrics messages_per_month --dimensions ds --order ds --time-granularity year  

Transform:

mql query --metrics messages_per_month --dimensions ds --order ds --time-granularity year