Why You Should Use PostgreSQL for E̶v̶e̶r̶y̶t̶h̶i̶n̶g̶ Industrial IoT Data
If the growth of IoT (Internet of Things) is by now a given, with reports suggesting two-digit growth and 34.4 billion connected IoT devices worldwide by 2032, the development of industrial IoT (IIoT, for short) is one of the segments propelling such massive expansion.
IIoT refers to the application of IoT technology in industrial settings, using smart sensors, machines, and devices to collect and analyze data to improve efficiency, productivity, safety, and operational insights. As more and more companies use IIoT data in critical industrial applications for sectors like energy, transportation, and manufacturing, this segment is expected to grow to 3.3 trillion U.S. dollars by the end of the decade.
However, despite its impressive growth, developers building IIoT applications struggle with many of the same problems as devs building data-intensive applications: analyzing and storing the deluge of timestamped data (like sensor data) with other relational data without relying on multiple databases and complex data stacks. There surely must be a better solution to drive fast business decisions, right? Plus, can these solutions ensure SCADA systems, the foundation of industrial applications, keep running seamlessly?
With IIoT customers representing 66 % of our IoT client base here at Timescale, we can confidently say you don’t need a specialized database for your industrial IoT data. You need PostgreSQL. Before you can mutter, “PostgreSQL doesn’t scale for large amounts of sensor data,” in this blog post, we’ll look at the challenges of IIoT data, including SCADA systems, and explain how you can simplify your data stack. Finally, we’ll share examples of users who did just that.
The Challenges of IIoT Data
What are SCADA systems?
SCADA (Supervisory Control and Data Acquisition) systems are a critical component of IIoT. At its core, SCADA is a type of software application used to control industrial processes and machinery. It operates by collecting data from sensors and equipment on the factory floor or in the field (such as in oil rigs, water treatment plants, or power generation facilities), analyzing this data, and then using it to control and optimize those processes in real time.
A typical SCADA system consists of several key components:
- Sensors and control relays that gather data and execute commands.
- Remote terminal units (RTUs) or Programmable Logic Controllers (PLCs) that connect sensors and actuators to the system.
- Human-Machine Interface (HMI) that displays the processed data and allows human operators to interact with the system.
- Communication infrastructure that connects the HMI, RTUs/PLCs, and other components to facilitate data exchange.
- Data historians for long-term storage and analysis of data collected by the SCADA system.
Let’s focus on this last bullet point. If operational historians have been the gold standard for operational process data, like IIoT data, many industry insiders are now advocating that these expensive and often outdated systems do not play well with the plethora of sensors and external systems currently generating data (or data visualizations). Instead, they recommend using modern, scalable, reliable databases.
But are modern database systems capable of handling the complexity of SCADA systems? To answer this question, let’s first examine their unique challenges.
The complexities of SCADA systems
Operation in remote areas with low connectivity
Many industrial operations monitored by SCADA systems are located in remote areas where connectivity might be limited or unreliable. This includes oil and gas pipelines, wind farms, and water treatment facilities spread over vast distances. For this reason, SCADA systems often need the following:
- Edge computing: SCADA systems often incorporate edge computing elements that can process data on-site or near the data source, reducing the need for constant connectivity to a central system for data processing. Here at Timescale, we’ve seen some customers run TimescaleDB locally and then push all or part of the data to a central database running in our cloud platform.
- Store-and-forward mechanism: They employ a store-and-forward mechanism to deal with intermittent connectivity. Data is collected and stored locally and then transmitted to the central system once a connection is available.
Need to backfill data
Given the critical nature of the operations they monitor and control, SCADA systems cannot afford to lose data during downtimes or connectivity issues. They need the ability to backfill data, ensuring that once the system is back online, it can update the central database with all the information gathered during the period of disconnection. This completeness of data is vital for accurate monitoring, analysis, and decision-making.
Flexible data structures
SCADA systems collect more than just numerical values; they need to process a wide range of data types:
Complex data types: Including images or video feeds from surveillance cameras, PDFs for manuals or schematics, and audio files for alarms or verbal warnings.
State data in the form of strings: These are critical for understanding the status of switches, valves, or modes of operation (e.g., "Open," "Closed," "Automatic," "Manual"). This type of data provides insights into the operational state of the equipment being monitored.
Numerical values: Such as temperatures, pressures, and flow rates, which are essential for monitoring operational parameters.
Most of this data is captured as time series, that is, data that reads as successive points in time, making it crucial for trend analysis, alerting, and predictive modeling.
The IIoT Database Journey
“There is no way a general purpose database like PostgreSQL can ensure all of this, especially at this scale,” you’re now thinking. We’ll get there, but let’s first follow the path of a developer or engineering team trying to navigate the complexities of IIoT data.
If you’re building an IIoT application, you’re familiar with the challenges SCADA systems pose and their importance to your application. For this reason, you and your team will probably opt to build an industrial sensor data solution on top of battle-tested, robust database technology, typically PostgreSQL, MySQL, SQL Server, Oracle, or even MongoDB.
As mentioned, IIoT applications need to process different data types, including time-series data—the timestamped measurements collected by sensors. However, these solutions also store a lot of what you would typically consider traditional relational data: customer and user accounts, facilities, equipment, sensors linked to each other, etc.
As your application’s adoption grows and data accumulates, your rock-solid general-purpose database starts showing some worrying signs:
- Query performance degrades: You can either fine-tune your database performance or throw more hardware at the problem, which becomes quite expensive.
- Unmanageable storage footprint and growing costs: Even if you don’t need to store historical data, it’s not easy to delete it, placing a considerable load on your database to run the queries and delete data. On the other hand, if you need to retain data for historical analysis or compliance reasons (quite a few of our clients have these limitations), keeping it in hot storage becomes extremely expensive.
By now, you realize you need to solve the problem. It’s time to reach out for a time-series database, introducing yet more complexity for your team:
- Multiple databases: you now have to keep at least two databases, one for time-series data and another for relational data.
- Build pipelines: you will have to build pipelines to keep data in sync across databases and to join data for querying if needed.
- Learn a new query language: if you’re not using a database with full SQL support, you will have to learn some SQL-based querying language or something else.
- Data model: a new database always comes with new data model limitations, which you’ll have to work with.
- Cost: you need a larger infrastructure to run two databases.
“So why did you say PostgreSQL could solve all these problems?”, you’re wondering. Because it can. You can use PostgreSQL for everything. You just need to ensure it’s PostgreSQL engineered for high performance.
PostgreSQL for IIoT Data and Other Demanding Workloads
Timescale began as an IoT company, so we’re well acquainted with many of these challenges—in fact, they’re what led us to create Timescale. We have always advocated for using PostgreSQL to store IoT data, but as PostgreSQL evolves and becomes an increasingly sophisticated platform and ecosystem, we expanded that belief to using PostgreSQL for everything.
You don’t have to compromise your favorite database—an opinion shared by increasing numbers of developers—or rely on outdated historians to get an end-to-end solution. You can simply expand PostgreSQL by making it powerful with Timescale.
Combine relational and time-series data into one database
With Timescale, you benefit from a single database that efficiently handles both time-series data and traditional relational data. This eliminates the overhead of maintaining separate systems and simplifies your data architecture.
One company that achieved just that was Everactive, which began operating in the IIoT space and has since expanded to traditional IoT, combining self-powered sensors and powerful cloud analytics to provide “end-to-end” IoT solutions.
“The capacity of Timescale to support both traditional schemas and time-series data in the same database allowed us to consolidate into one storage solution.”
Carlos Olmos, senior principal engineer at Everactive
Additionally, Timescale supports the rich data model of PostgreSQL, allowing developers to use complex joins, window functions, and custom user-defined functions without the limitations often introduced by specialized time-series databases. This flexibility supports more sophisticated analyses and applications.
Just use SQL
Since Timescale is built on PostgreSQL and leverages its entire ecosystem, developers can use SQL, a familiar and powerful query language, across all their data. There’s no need to learn a new or specialized query language, which accelerates development and reduces errors.
“It quickly reached the point where we had to implement Python scripts to process data because Flux had reached its limits in use cases that would work seamlessly using SQL.”
Jeremy Theocharis, co-founder of United Manufacturing Hub
Process billions of data points with ease
Hypertables are Timescale’s foundation. They automatically partition your PostgreSQL tables into smaller data partitions called chunks and instantly speed up your queries. They are optimized for time-based partitioning but also work for tables that have something similar to a time element.
Timescale’s hypertables are a crucial feature for Newtrax, which integrates the latest IIoT and analytics technologies to monitor and provide real-time insights on underground mining operations, including people, machines, and the environment.
“More importantly, in order for our solutions (such as predictive maintenance and collision avoidance) to provide contextualized and accurate results, we must gather and process hundreds of millions of data points per machine or worker, per week or month, depending on various circumstances. We use hypertables to handle these large datasets and act upon them. We have saved lives using TimescaleDB.”
Jean-François Lambert, lead data engineer at Newtrax
Don’t break the bank: Save on storage
By consolidating your data needs into Timescale, you reduce the total cost of ownership by paying for one database instead of two. This consolidation also means savings in operational costs, as managing a single database system is simpler and less resource-intensive.
But Timescale has more features that help you reduce your storage footprint effectively:
- Drop data with ease: One of the amazing things about partitioning your data with hypertables is that you can drop individual partitions instantly using data retention policies, which isn’t the case when writing large
DELETE
statements. - Compression: Timescale provides advanced compression mechanisms specifically designed for time-series data, which significantly reduce the size of stored data. Compression lowers storage requirements and, consequently, storage costs. More importantly, compressed data can improve query performance, reducing the computational resources needed to process queries.
“Compression was a game-changer from our perspective as a company: not having to worry about getting databases on the order of 5, 10, or 15 TB to store this information was a massive factor for us.”
Michael Gagliardo, Software Architect at Ndustrial
- Tiered storage: Timescale’s multi-tiered storage architecture allows developers to balance cost and performance by storing frequently accessed "hot" data on faster, more expensive storage and older, less frequently accessed "cold" data on cheaper storage (but still queryable). Our low-cost storage tier has a flat price of $0.021 per GB/month for data—cheaper than Amazon S3.
To try all these benefits today and start storing your IoT or IIoT data in a more efficient and performant way, create a Timescale account. It's free for 30 days, no credit card required.