As the Internet of Things (IoT) comes of age, we’re seeing more and more data from event-triggered sensors instead of sensors that record measurements at regular time intervals. These event-triggered sensors give rise to unevenly-spaced time series. Many analysts will immediately convert unevenly-spaced data to evenly-spaced time series to be compatible with existing sensor data analytics tools, but we have found that the conversion is usually unnecessary, and sometimes even causes problems.
Imagine a wireless sensor that measures when a light is on.
The former, event-driven approach has several advantages (It also comes with a potential disadvantage that missed signals can cause larger errors. For example, if the wireless connection is unreliable and the last “switched turned off” signal is missed at the end of the day, you could inadvertently make the mistake that the light was on all night long. In this case it can often make sense to include an infrequent “heartbeat” signal, say every hour, that prevents very large measurement errors in the event of dropped signal: this still only requires about 1/60 of the energy-consuming wireless signals to be sent.):
These practical issues are especially important with increasingly more “things” that are wirelessly sending data. Let’s consider a quick calculation.
Suppose you record if a light bulb is on every second (maybe you’ve got a Hue or WeMo lightbulb). In a year, then, you’ll record about 86 thousand data points per day, or about 32 million data points in a year for just one light bulb. If you have 40 light bulbs in your home (the average number of lightbulbs in a U.S. home), that is more than 1.2 billion data points for your home in a year: BIG DATA ;)
Suppose you only turn on and off a light 10 times a day. If you collect your data in an event-driven manner, you will only have 20 data points per day, about 7 thousand data points per year, or about 290 thousand data points for your home. It’s more than a 4,000x reduction in the number of measurements to store!
When analysts are presented with unevenly-spaced sensor data, they usually convert the unevenly-spaced data to a evenly-spaced time series by regular sampling or linear interpolation. This conversion helps get the data into a format that are used by the most common tools for time series analysis. In addition to the constraints on data size, this method also presents other practical challenges:
Beyond the practical challenges, there are technical reasons to be careful when converting unevenly-spaced data to regular time series including:
At Datascope, we’re increasingly finding ourselves helping clients with data from battery-powered sensors that are attached to “things”. In response, we built a lightweight Python package called traces to simplify the reading, writing, and analysis of unevenly-spaced data.
For example, if you want to know how many lightbulbs are turned on given the data from all light bulbs in your house, you can get this information using a very simple syntax.
We’ve found that handling unevenly-spaced data natively is not just useful for sensor data. For example, it’s great for handling time series with missing observations or aggregating multiple time series taken at differing regular intervals. We hope you find it useful, and if you’re interested, we welcome contributions to the code!
Come work with us! We are always looking for great talent to join our global teams.See Jobs