Hypertables
Hypertables are usually used to ingest time-series data. They are a high-performance version of regular Postgres tables focussed on time-based bucketting, chunking, and partitioning.
Hypertables make less sense as dbt models to store transformed data. However, you can still use them as such. A more useful version right now is the empty option, which will create empty hypertables.
Look into virtual hypertables
If you're looking to use dbt to configure your leverage pre-existing hypertables, check out the virtual hypertables guide.
Only run hypertable models once
dbt will always recreate your entire model. This means that all existing data in your hypertables will be lost when you run them again. If you're using hypertables for ingesting time-series data, you probably don't want this.
Info
Consult the Timescale docs for more information regarding hypertables.
Usage
To materialize a model as a hypertable, simply set its materialization in the config to hypertable. Every hypertable also requires you to set the name of the time column.
{{
config(
materialized='hypertable',
main_dimension='time_column'
)
}}
select current_timestamp as time_column
models:
your_project_name:
folder_containing_the_hypertables:
+materialized: hypertable
model_one:
+main_dimension: time_column # (1)!
model_two:
+main_dimension: time_column_in_model_two
# ...
- While you can set the
hypertablematerialization for multiple models, you'll still have to configure themain_dimensionfor each model individually.
Configuration options
dbt-specific options
The following options are not taken from the TimescaleDB APIs, but are specific to this adapter.
empty_hypertable: If set totrue, the hypertable will be truncated right after creation (as a regular table) and right before converting it into a hypertable. Defaults tofalse.
TimescaleDB hypertable options
The TimescaleDB option create_default_indexes can be set to true or false. It defaults to true.
Dimensions
Hypertables have one or more dimensions, defined upon creation of the hypertable. The main dimension of a hypertable is provided using the main_dimension configuration option. Additional dimensions can be added to the hypertable using the dimensions configuration option.
In this adapter, dimensions can be provided as a dictionary with the following options:
column_nametype:by_hashorby_range(default isby_range)partition_interval(only forby_range)number_partitions(only forby_hash)partitioning_func
Since most dimensions will probably be by_range dimensions with a column name, you can also provide the name of the column as a shorthand instead of a dictionary.
Empty hypertable required
You can only add dimensions to an empty hypertable.
Info
Consult the Timescale docs for more information regarding adding dimensions or the documentation on dimension builders.
{{ config(
materialized = 'hypertable',
main_dimension = 'time_column',
dimensions=[
{"column_name": "id", "type": "by_hash", "number_partitions": 5},
{"column_name": "col_1", "type": "by_range", "partition_interval": "interval '1 day'"},
{"column_name": "another_column", "type": "by_range"}
]
}}
select
current_timestamp as time_column,
1 as id,
2 as col_1,
3 as another_column
models:
your_project_name:
model_name:
+materialized: hypertable
+main_dimension:
column_name: time_column
type: by_range
# the above would be equivalent to +main_dimension: time_column
+dimensions:
- column_name: id
type: by_hash
number_partitions: 5
- column_name: another_time_column
type: by_range
partition_interval: interval '1 day'
# ...
integer_now_func
The following 2 options are available for (virtual) hypertables where the time column is not a timestamp:
integer_now_func(string): name of a function to be used to generate the current time as an integer.integer_now_func_sql(string, optional): SQL code for the function mentioned above. If provided, the function with the name set ininteger_now_funcwill be created. If not provided, an error will be thrown if the function does not exist already.
Use a macro
You could also call a macro for your integer_now_func_sql.
Idempotent
The integer_now_func_sql is idempotent and will replace an existing function if a function with the given name already exists. So while it may cause some overhead during the dbt run, it doesn't matter if you share this config across multiple models.
The name is enough
You don't have to provide the SQL code for the function if you already have a function with the name set in integer_now_func in your database. You could create the function once in a single model or with dbt run-operation and then reuse it in all other models.
Info
Consult the Timescale docs for more information regarding this functionality.
{{
config(
materialized='hypertable',
main_dimension='time_column',
integer_now_func='my_hypertable_int_to_now',
integer_now_func_sql='select extract(epoch from now())::bigint'
)
}}
select 1::bigint as time_column
chunk_time_interval
The chunk_time_interval config option allows you to set the interval at which TimescaleDB will chunk your (virtual) hypertable. This is useful for optimizing query performance and storage efficiency. The default value is 1 week.
Note that the type of the interval depends on the type of your time column and has to match.
```sql+jinja title="models/my_hypertable.sql" {{ config( materialized='hypertable', main_dimension='time_column', chunk_time_interval="interval '1 day'" ) }}