Getting Sensor Data Into TimescaleDB via Django
Over 2022-23, while working at Mainstream Renewable Power on an internal web application, I maintained a "data pipeline" that fetches files of sensor data readings from the world's most remote places and transforms them into useful datasets.
These datasets form the basis upon which the construction of renewables (wind turbines or solar panels) on site hinges. I rebuilt the pipeline on top of TimescaleDB
, which enabled me to massively reduce the complexity of the system involved.
I reflect on this experience in detail in Struggling to Sync Sensors & Databases. I do not, however, discuss how I adapted Django
, a Python web framework, to play nicely with this database. In my case, Django served as the “glue” between web browsers and the database. Specifically, to display a web page, it asks a database for the data it needs to render files that the browser interprets (HTML
, CSS
, and JavaScript
) so it can display a user interface.
Let’s walk through an example project to make these adaptations a bit more concrete.
This tutorial assumes some familiarity with Django or a similar web framework. If you have never used Django, I highly recommend the official tutorial.
Create a Sensor Data App Using Django
Let’s first run
to create files
and register the app in core/settings.py
.
Create a Homepage
Let’s quickly create a homepage that will be displayed on first opening this web application in a browser.
So now http://localhost:8000
should display index.html
. We can build on this index.html
to link to other pages.
Create a Data Model for Files
Now, I can adapt sensor/models.py
to add a File model to track uploaded files,
create its database migration,
and roll it out:
Handle File Uploads via Browser
Now that we have somewhere to store files of readings, we need to handle file uploads.
Let’s consider only the case where time-series data originates only from text files. How do I copy data from files into TimescaleDB via Django?
The Django documentation covers File Uploads. However, it doesn’t advise on importing file contents to a database. One normally uses Django to add and save new entries to PostgreSQL using input from a browser:
- Django sends a web page to a browser containing one or more
<form>
elements. - Once filled in, these
<form>
elements are sent back to Django. - Django processes these entries and saves them to the database using the Django ORM.
The key enabler here is the ORM (or “Object Relational Mapper”). It maps a Python class to a database table so that this table’s data is easily accessible from within Python. Without an ORM, one would have to use the SQL language to communicate with the database.
We need to do a bit of work to adapt this workflow to handle file contents.
In a similar manner, I can create a “view” to render HTML to accept browser file uploads:
This requires someone to click through this web application every time they want to add new data. If data is synced automatically from remote sensors to a file system somewhere, then why not set up automatic file uploads? For this, we need an API.
An API (or Application Programming Interface) lets our web application accept file uploads from another program.
The django-rest-framework
library does a lot of heavy lifting here, so let’s use it.
Create an API Homepage
As suggested by Two Scoops of Django 3.x, let’s create core/api_urls.py
to wire up our API.
So now we can add new views and/or view sets to sensor/api_urls.py
. Plus, they will be “connectable” via /api/sensor/
.
Handle File Uploads via API
We can use a viewset
to create an endpoint like /api/sensor/file/
to which another program can upload files:
Create a Data Model for Sensor Data Readings
Let’s add a reading model to store readings:
This time, we’re using timestamp instead of the default id
field as a primary key since row uniqueness can be defined by a composite of file
, timestamp
, and sensor_name
if required.
Don’t we want to store readings in a TimescaleDB hypertable to make them easier to work with? Django won’t automatically create a hypertable (it wasn’t designed to), so we need to do so ourselves. Since we need to customize table creation ourselves rather than let Django do it, we have to set managed to False.
Let’s create a “base” migration,
and manually edit the migration:
Now we can roll out migrations,
and connect to the database to inspect the newly created hypertable.
Import Files
Let’s imagine that all of our sensor data readings are stored nicely formatted in JSON files, like:
How do we import these readings?
We can add a method to File to bring the JSON file into Python and create a new Reading entry for each reading in the file:
How do we call the import_to_db
method?
We can go about this a few different ways, but perhaps the simplest is just to implement it directly in the views
and viewsets
so that it will be triggered on browser and API file uploads.
For Django, we can call it directly in our upload-file
view like:
And for django-rest-framework
we can override the perform_create method
.
Import Files via Celery
What if each file contains a few gigabytes of readings? Won’t this take an age to process?
If you can’t guarantee that the sensor files are small enough to be processed quickly, you might need to offload file importing to a task queue.
Celery is a mature Python task queue library and works well with Django, so let’s use it. It coordinates “waiters” and “chefs” using the above analogy by leveraging a database (or message broker), typically Redis or RabbitMQ.
A task queue can significantly improve performance here. It makes file uploads instant from the user’s perspective since now file upload tasks are added to a queue rather than running immediately. It also enables parallel processing of files since task queue workers run in parallel to one another.
To set up Celery, we can follow their official tutorial:
So now we can add tasks to sensor/tasks.py
like:
And replace all calls to <file_obj>.import_to_db()
with tasks.import_to_db(file_obj)
, and this task won’t be run immediately but rather will be run by Celery when it has the availability to do so!
Next Steps?
If you really want to further eke out import performance, you’ll have to go deeper and experiment with the following:
- Batch sizes: how many readings do you want to save at once?
- Compression: TimescaleDB really shines once hypertables are compressed since it reduces storage costs and delivers faster analytics queries.
We want to thank Rowan Molony for this awesome guest blog post (scroll down for some bonus advice) and for gracefully implementing our team's technical revisions in collaboration with the Timescale developer advocate Jônatas. Join Rowan and thousands of other developers building with TimescaleDB in our Slack community. Once you're there, reach out to Ana if you, too, would like to write a guest blog post.
To try TimescaleDB, you can self-host or create a fully managed Timescale account (it's free for 30 days, no credit card required).
Bonus
Import messy files
How do we convert files like
Lat=0 Lon=0 Hub-Height=160 Timezone=00.0 Terrain-Height=0.0
Computed at 100 m resolution
YYYYMMDD HHMM M(m/s) D(deg) SD(m/s) DSD(deg) Gust3s(m/s) T(C) PRE(hPa) RiNumber VertM(m/s)
20151222 0000 20.54 211.0 1.22 0.3 21.00 11.9 992.8 0.15 0.18
20151222 0010 21.02 212.2 2.55 0.6 21.35 11.8 992.7 0.29 -0.09
into
so we can store them in the Reading data model?
In this file, YYYYMMDD and HHMM clearly represent the timestamp, so 20151222 0000 corresponds to datetime.datetime(2015, 12, 22, 0, 0)
. However, this may differ between sources.
One way to generalize the importer is to upload a FileType specification alongside each file so we know how to standardize it.
We can create a new model FileType and link it to File like:
Django forms are smart enough to automatically render the upload-file
view with a type field since we specified fields = "__all__" in sensor/forms.py
.
django-rest-framework
viewsets will also include it, thanks to fields = "__all__"
in sensor/api/serializers.py
. However, its default behavior for foreign keys is not ideal. It expects to receive a numeric ID for field type, whereas it’s more intuitive to specify the name field instead.
We can easily override this default behavior by specifying SlugRelatedField in our serializer:
so we can create files by passing the endpoint a payload like:
{
"file": "file",
"type": "name-of-file-type",
}
Now we have all of the information we need to extract time series from files into our data model.
Validate messy files
What if a File is created with an inappropriate FileType? How do we catch this before it causes importing to fail?
We can implement a clean method on File! Django will automatically call this method on running form.is_valid()
in our view; however, we’ll have to connect django-rest-framework
ourselves. We can just add a validate method to our serializer to achieve the same behavior.
Now, we can implement the clean method to check file contents prior to saving a file:
Import and validate messy files
Now we have everything we need to import files:
Once again, let’s adapt the views and viewsets to call the import_to_db
method.
For Django, we can call it directly in our upload-file
view like
and for django-rest-framework
we can override the perform_create
method: