The Internet of Things (IoT) produces, directs, and drives data continuously. How can current approaches to data streaming help us manage these continuous information channels?
Data streaming is one of the key defining factors in the way information architectures and software applications are being developed. Much of its focus involves infrastructure automation. But what does that mean in the real world?
What is data streaming?
When we add streaming capabilities to data infrastructures, we add the ability to automate real-time data flows. That is, we add the flexibility to ingest and process the increasing availability of IoT, social media, and other streaming data sources, and manage them alongside traditional batch-based data.
But unlike traditional batch-based flows, where data is loaded in sets on a scheduled basis (i.e. hourly, daily, or monthly), real-time or streaming data flows are different in the following ways:
- They are often more time-sensitive.
- They are typically larger in volume.
- They may or may not be of long-term value after processing.
Real-time data flows, or streaming data sources, can come from many areas in an organisation’s data landscape, including in-field units that share sensor-based data, social media feeds (for sentiment analysis), or a variety of internal systems.
But why is this significant?
United states of data
“A streaming data architecture makes the core assumption that data is continuous and always moving, in contrast to the traditional assumption that data is static. This turns a lot of our preconceptions upside down, along with the tools that we currently use to build data-driven applications,” says Kostas Tzoumas, co-founder and CEO of open source stream processing company, data Artisans.
“For example, application code processes the moving data directly, while it keeps next to it the working memory it needs (state), instead of first landing the data in a database and then querying it.”
Vendors are now architecting this increasingly important aspect of data streaming into their platforms. A recent example is data infrastructure automation specialist, WhereScape.
WhereScape CEO Mark Budzinski says that the company has added streaming to its platform in response to the fact that IT teams – already stretched for time – now face the added challenge of understanding and incorporating these new data sources – and streaming technologies – into their analytics environments.
Matt Aslett, research director at 451 Research, explains the problem facing many non-specialist organisations, “We see increasing interest in stream-based processing from mainstream companies, but many lack the internal resources and skills to integrate these new technologies effectively and efficiently.”
While it is hardly early days for data streaming, or for the acknowledgement of real-time data as a category, it is early days (comparatively) for automated, software-defined, cloud-centric streaming services, especially in terms of their application to IoT environments.
The challenge for companies working in this sector of data engineering is that it is dominated by Hadoop (still difficult to master) for distributed processing of large data sets, and by the murky waters of unstructured information that flow into the data lake.
This isn’t plug-and-play technology by any means, and so it’s wise to remember that ‘automated’ doesn’t always mean ‘turn it on and go for a pizza’. Bringing specialist expertise onboard may be essential.