Skip to main content

Working with the output

caution

This feature is still in Beta 🧪. As such, you should not expect it to be 100% stable or be free of bugs. Any public CLI or Python interfaces may change without prior notice.

If you find any bugs or feel like something is not behaving as it should, feel free to open an issue on the MetricFlow Github repo.

A data source inference pipeline can produce two different types of output: stream output and configuration files.

Stream output​

Stream output is mostly useful for exploring a set of rules and debugging why certain results were or weren't produced. Here's an example stream output:

db.schema.table
id
type: PRIMARY_IDENTIFIER
reasons:
- Column name ends with `id` (IDENTIFIER)
- The values in the column are unique (UNIQUE_IDENTIFIER)
- Column name matches `(table_name?)(_?)id` (PRIMARY_IDENTIFIER)
problems: --

user_id
type: FOREIGN_IDENTIFIER
reasons:
- Column name ends with `id` (IDENTIFIER)
- Column has low cardinality (FOREIGN_IDENTIFIER)
problems: --

email
type: UNKNOWN
reasons: --
problems:
- Inference solver could not determine a type for this column

type
type: MEASURE
reasons:
- Column type is INTEGER (MEASURE)
problems: --

runtime_ms
type: MEASURE
reasons:
- Column type is real (FLOAT, DOUBLE, DOUBLE PRECISION) (MEASURE)
problems: --

created_at
type: PRIMARY_TIME_DIMENSION
reasons:
- Column type is time (TIME, DATE, DATETIME, TIMESTAMP) (TIME_DIMENSION)
- The column is the only time column in its table (PRIMARY_TIME_DIMENSION)
problems: --

You can use this to quickly understand the reasons why the solver decided a column is of a certain type, and check for any problems if necessary.

Configuration files​

You can also write inference results as MetricFlow configuration files. Those should mostly look like normal data source configurations, except for cases where any problems occurred. You can search for such situations by looking for FIXME comments.

Here's a sample:

# FIXME: Unreviewed inferred config file
# FIXME: email
# - Inference solver could not determine a type for this column
data_source:
name: mql_query_base
sql_table: transform_analytics_db.prod_dbt.mql_query_base
identifiers:
- name: id
type: primary
- name: user_id
type: foreign
dimensions:
- name: created_at
type: time
type_params:
time_granularity: day
is_primary: true
measures: []

It is worth mentioning again that inference results should not be expected to be 100% correct, and are only a way to make it easier to get started with MetricFlow. You should always review your configuration files before using them in production.