Caching is only available to Transform users and is not available in MetricFlow. Mutability is irrelevant without caching.
An incorrect mutability setting will lead to inaccurate metric values. To avoid this, be sure to understand the various options and the update process for the underlying dataset.
Mutability defines how your data source changes through an update process. Mutability settings create a contract between the update process for your datasets and MetricFlow. Mutability settings allow MetricFlow to create the most efficient queries possible while returning accurate datasets.
The options for mutability are below:
immutable mutability configuration setting, MetricFlow only creates the dataset once and uses the resulting data for all queries going forward. This applies to both cached datasets downstream as well as primed tables (e.g., in the case of data sources produced through a sql_query).
A good example of a type of dataset that would be suited for
immutable setting would be states and their abbreviations.
full_mutation configuration setting, MetricFlow updates the data on periodic basis. By default, it's configured to refresh the data hourly. Optionally, the time of day and the freqency in which Transform picks up the updates to your data can be configured in the data source using cron.
If specifying a cron string, it is highly recommended to set the timing of the
update_cron setting to be scheduled whenever the upstream data processes are definitively complete. This ensures that Transform is properly picking up the most updated data in a timely manner. For further discussion, see the Cron Options section at the bottom of this guide.
# 6:00 Pacific is 13:00 UTC
update_cron: 0 13 * * *
append_only mutability configuration setting allows for the specification of a column that indicates when new rows have been added to the underlying data source.
Example: In the example below, we have a mutability setting of
append_only, where MetricFlow will refer to the column
ds to check when new rows of data have been added.
In addition to specifying mutability, you can also specify the time of day for the cache to be "invalidated". By default, this occurs every hour for
append_only. After the data is "invalidated", the next time a query is executed against the data source in question, the data source will be reloaded from the original source and stored in the cache for future queries. This will then be invalidated upon the expiry of the next cron period. To specify a cadence, use
update_cron. In the example below, the invalidation will occur at 00:00 UTC every day for every month.
update_cron: 0 0 * * *
Crontab Syntax and Operators
We've provided a brief intro to Crontab Syntax below. For more information, check out Crontab Guru.
Each line in crontab syntax contains five fields separated by a space:
- - - - -
| | | | |
| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)
| | | ------- Month (1 - 12)
| | --------- Day of month (1 - 31)
| ----------- Hour (0 - 23)
------------- Minute (0 - 59)
The first five fields may contain one or more values, separated by a comma or a range of values separated by a hyphen.
*- The asterisk operator means any value or always. If you have the asterisk symbol in the Hour field, it means the task will be performed each hour.
,- The comma operator allows you to specify a list of values for repetition. For example, if you have
1,3,5in the Hour field, the task will run at 1 am, 3 am and 5 am.
-- The hyphen operator allows you to specify a range of values. If you have
1-5in the Day of week field, the task will run every weekday (From Monday to Friday).
/- The slash operator allows you to specify values that will be repeated over a certain interval between them. For example, if you have
/4in the Hour field, it means the action will be performed every four hours. It is same as specifying
0,4,8,12,16,20. Instead of asterisk before the slash operator, you can also use a range of values,
1-30/10means the same as