Scale PostgreSQL via Partitioning: A Dev’s Intro to Hypertables
As a PostgreSQL enthusiast, it’s been very interesting to join Timescale and learn about TimescaleDB, a PostgreSQL extension with numerous solutions for handling large volumes of data. While Timescale has excellent documentation, having a quick and easy getting started guide is always nice.
Scaling PostgreSQL via partitioning is a challenge for many PostgreSQL users, but TimescaleDB has a core feature to help solve it. Here’s a straightforward breakdown of hypertables, a great feature designed to significantly improve query performance on massive volumes of data. A benchmark vs. vanilla PostgreSQL is also included.
(Looking for more details? Check out Timescale’s official docs on hypertables.)
Testing Hypertables vs. Regular PostgreSQL Partitioning
Generally speaking, hypertables are designed to make working with time-series data easy by automatically partitioning regular PostgreSQL tables into smaller data partitions or chunks. There are a ton of different examples time-series data, including IoT applications, weather data, financial data analysis, and system monitoring.
Simply put: time-series data is any data that has been collected at regular or irregular intervals that is now associated with timestamps.
In this blog post, we’ll set up a basic IoT sensor dataset and compare the differences in query run time between regular PostgreSQL tables and hypertables.
For real-time dashboards and alerts, this kind of data would need to be collected frequently; efficient querying is key for optimal performance and timely rendering of the results. This is a perfect example use case for hypertables. Why? They enable you to do the following:
1. Handle high-frequency inserts efficiently
2. Optimize queries for time-based data, and
3. Ensure fast query performance, especially at scale
So, how do you actually use them?
Let’s create a hypertable
First, let's create a hypertable to store our sensor data:
Now, let’s convert that to a hypertable.
This creates a hypertable optimized for time-series data that is partitioned on the ts column. The TimescaleDB extension automatically manages this for you.
Inserting data
Here's how you might insert data into your hypertable:
In practice, you'd have a script or application inserting data at regular intervals (such as every minute or as data arrives).
Hypertables automatically partition by time (and by space)
One of my favorite features of hypertables is that they automatically partition time-series data for you without a lot of extra work on your part. This automatic partitioning means you can work with hypertables as if they were just normal PostgreSQL tables. Behind the scenes, they do the heavy lifting of partitioning your PostgreSQL tables and speeding up your queries and data ingestion rate while keeping up with your data as it scales.
As part of this functionality, when you create a hypertable, indexes are automatically created along with it. By default, you’ll notice a descending index on time and an index on the space parameter and time (if there’s a space partition created). Let’s check it out.
List the associated indexes:
And you should see the following results:
It is possible, of course, to disable the default behavior of creating indexes automatically; just set create_default_indexes
, to false when creating the hypertable, like so:
Additionally, take note that if you’d like to configure unique indexes, you must reference all the partitioning columns for the table. This is a requirement in order to ensure unique constraints apply across partitions based on the time column.
For example, to create a unique index on both sensor_id
and ts
, you would run:
Performance Benefits: Hypertables vs. Regular PostgreSQL Tables
Here's a simple benchmark you can run to compare query performance:
Generate a time series for each minute for a month, and associate each timestamp with 100 devices. This will insert approximately 4,320,100 rows (100 sensors * 60 minutes/hour * 24 hours/day * 31 days).
Now let’s create a regular table with the same structure as our hypertable:
Let’s reproduce the same additional index that exists on our hypertable on this regular table:
Copy over the records we just inserted into the hypertable into regular_iot_sensor_data
:
Then, check the results of EXPLAIN ANALYZE
against each table, starting with the hypertable:
Which returns:
Then do the same against the regular table:
That returns:
As your datasets grow, you’ll notice even more benefits to query performance. Already, we can see quite a difference—about a 148.3% increase in execution time for a regular table, from 58.710 ms (left) to 145.799 ms (right):
For fun, I just ran a simple SELECT * FROM
query against both the iot_sensor_data
and regular_iot_sensor_data
tables to compare the difference in query execution times (no limit).
The iot_sensor_data
table returned results in 1762.9 s vs. 1940.6 s for the regular table. That’s about 10 % slower for the regular table, just retrieving results—without even performing any calculations.
What’s Next?
Hypertables are great for storing and querying large volumes of time-stamped data, whether you’re just looking for a performance boost or hoping to enable real-time analytics and monitoring at scale.
We covered a number of useful features that are unlocked when using hypertables in TimescaleDB. There’s even more that can be researched to fully take advantage of all the benefits that hypertables provide. So, here’s a complete list:
- Automatic data partitioning (by time or optionally by space)
- Optimization for high insert rates and management of large data volumes
- Continuous aggregates (incrementally updated materialized views)
- Data retention policies (to automatically drop unneeded data and manage storage effectively)
- Time-based compression for improved storage footprint and query performance
- Automated data tiering for cost management
- Advanced time-series hyperfunctions specifically designed for hypertables
- Automatic indexing
- Real-time continuous aggregation
Look out for more articles like this to see other real-world examples of TimescaleDB features like continuous aggregates and compression.
Interested in trying out hypertables yourself? If you’re self-hosting, install the open source TimescaleDB extension—else, you can sign up for the fully managed Timescale Cloud solution free for 30 days (with no credit card required).