Real-time data seems to be everywhere -- in augmented reality, digital twins, 5G, IoT, AI, machine learning, wearables, and beacon technology. One can be forgiven for thinking that today's enterprises are streaming real-time data across every vital task area. We're getting there -- thanks in large part to many open-source solutions such as Apache Flink, Kafka, Spark, and Storm, as well as cloud-based platforms. However, there is still a lot of work that needs to be done before we reach the point at which data moves through and between organizations at light speed or something close to it.
First, a level set, courtesy of IDC's John Rydning: "Often, the terms streaming data and real-time data are used in conjunction with each other and sometimes interchangeably. While not all streaming data created is real time, and not all real-time data is streamed, organizations indicate that over two-thirds of streaming use cases require ultra-real-time or real-time data."
Also: Every AI project begins as a data project, but it's a long, winding road
There's even ultra-real-time data in use -- and companies are anxious to make this work. "Understanding the value of capturing and processing real-time data is growing at the fastest pace in recent times," says Avtar Raikmo, director of engineering at Hazelcast. "With platforms taking complexity away from the individual user or engineer, it has accelerated adoption across the industry. Innovation such as SQL support, help make it democratized and provide ease of access to the vast majority rather than a select few."
There is a wide range of use cases, including "compute at the edge for audio and video streaming, computer vision for AI and machine learning processing, or even active noise-canceling headphones," Raikmo says. Another emerging use case is digital twins, especially for mobility. "Being able to capture real-time data and telemetry from cars, trucks or rockets enables organizations to model scenarios as they unfold. Digital twins can be used to optimize real-world routes taken, energy used or assisted driving improvements. In the world of sport, Formula 1 strategists determine the optimum pit-stop and tire compounds to maximize race performance."
Still, there are many technical and organizational issues standing in the way of full real-time -- or ultra-real-time data realities. "Real-time data deployments typically use higher performance technologies that cater to the large volumes and fast analysis that are required to make instant decisions," says Emma McGrattan, senior VP of engineering and product for Actian. "For very large volumes that some verticals, like financial services, tend to generate, moving to real time will require investment in additional resources for hardware, software, and network components."
Also: How AI reshapes the IT industry will be 'fast and dramatic'
Investments are needed to "increase availability and reliability of data infrastructure and services," McGrattan says. "For lower volumes, the existing infrastructure is likely able to be usable, with modifications to the applications to do the real-time analysis and deployment."
The process of capturing, visualizing, and storing real-time data requires "substantial investments in infrastructure components that are capable of handling heavy and complex data streams," says Rakesh Jayaprakash, head of product management at ManageEngine and Zoho. "This is particularly true when real-time data streams require some level of pre-processing. Unfortunately, many organizations, particularly SMBs, lack the necessary infrastructure to handle such intensive processing."
Many companies' infrastructures aren't ready, and neither are the organizations themselves. "Some yet to understand or see the value of real-time while others are all-in, with solutions that were designed for streaming throughout the organization," says Raikmo. "Combining datasets in motion with advanced techniques such as watermarking and windowing, is not a trivial matter. It requires correlating multiple streams, combining the data in memory and producing merged stateful result sets, at enterprise scale and resilience."
Also: The real-time revolution is here, but it's unevenly distributed
The good news is not every bit of data needs to be streaming or delivered in real time. "Organizations often fall into the trap of investing in resources to make every data point they visualize be in real time, even when it is not necessary," Jayaprakash points out. "However, this approach can lead to exorbitant costs and become unsustainable."
"While visualizing real-time data is more appealing than analyzing data that is a few minutes old, you must carefully assess the cost-benefit ratio and ROI associated with building real-time data streams and visualizations," says Jayaprakash. "Moreover, organizations should exercise due diligence in selecting the metrics they wish to stream in real time."
IDC's Amy Machado makes the case for carefully considering what needs to be delivered in real-time: "I always say, 'Let the use case lead,'" she writes in a blog post. "It should direct how you think about real-time architecture, which ideally, is an expansion of your existing framework to avoid creating data silos."
Also: Business leaders continue to struggle with harnessing the power of data
Machado outlines key questions to ask about real-time data delivery:
To optimize real-time data investments, "carefully select metrics that truly require real-time reporting," Jayaprakash advises. "The complex nature of the infrastructure needed to operate and maintain real-time data streams introduces potential points of failure, necessitating a dedicated staff for troubleshooting and maintenance. To mitigate data continuity issues resulting from stream failures, you need to implement fail-safe mechanisms, which adds to overall costs."