Data-driven organizations are hungry for information. They pull data from internal enterprise applications such as supply chain management and utilize inputs from social media and third-party sources to augment the internal “360-degree view” of domains such as customer and product.
Now, data-driven initiatives are pushing into the realm of real-time event ingestion and processing. Data sources come from Internet of Things (IoT) device sensors associated with manufacturing and distribution; real-time applications like order, payment, and fulfillment; and customer-facing touchpoints like websites and mobile apps. Each source provides information that can be used for competitive advantage. Key changes in customer sentiment is derived from real-time social media interaction. Product quality and utilization information from IoT device sensors along the manufacturing line and streaming from device usage give insight into potential product defects at the earliest stages of product design based on actual operation.
However, with the value of these opportunities for implementation comes the challenges of how to best move, ingest, transform, and utilize the data to make the business opportunities a reality.
How can organizations manage the flood of event data into their data management landscapes? There is a significant difference in the handling of traditional datasets and the facilities required to handle movement and ingestion of event data. Previously, batch-based file collection and transfer methodologies were sufficient management requirements. Event data demands real-time streaming collection and delivery. This is often referred to as a streaming data pipeline. Many times, these data pipelines connect event queuing platforms such as the Apache projects Kafka, Flume, and Nifi with either data storage layers or other event queuing platforms.
How can organizations effectively transform and direct event data? Formerly, organizations could sequentially land, transform, and process data without adverse impact to its value. Streaming data pipelines need to have an agile and flexible deployment model that allows for changes to pipeline configuration, implementation, and management to preserve the timely delivery of the data.
How can organizations process and manage event data? In a traditional processing environment, transactional or analytical processing engines married data storage. This required companies to maintain multiple replicas of data in different locations. Effective data pipelines need to have processing engines decoupled from data management and storage layers to allow multiple consecutive usage scenarios on a single stream of data. Organizations with real-time fulfillment applications may use this approach for the evaluation, validation, and treatment of potential fraud events. Other data pipelines may be configured to guide a single stream to multiple concurrent processing engines and used to evaluate select components of data for uses in operations, customer care, and product management.
In each of these situations, the requirements of streaming data and streaming data pipelines are significantly different from traditional data management best practices. However, the implementation of newfound streaming best practices can be aided by the best strategic patterns of current data management techniques.
- The collection of technical, semantic, and operational metadata in a single location allows organizations to govern and manage their data streams in accordance with corporate data governance policies and governmental regulations such as PCI and GDPR compliance.
- The automation of common change management and DevOps tasks improves implementation quality and reduces deployment timeframes.
- Encapsulation of technical and schematic complexity lowers implementation risk by allowing organizations to focus on the timely implementation of business objectives instead of the lengthy process many organizations face when looking for qualified technologists for their implementations.
Today, in conjunction with the TDWI conference in Las Vegas, WhereScape will announce a new offering, WhereScape® automation with Streaming, giving organizations the ability to automate, integrate and leverage streaming data pipelines within their existing data infrastructure. The offering provides organizations with new opportunities to take advantage of available streaming data, alongside or enriched by the traditional batch-based data currently stored within their existing data warehouses, data vaults or data marts. WhereScape automation speeds the implementation of streaming data pipelines with an intuitive configuration user interface that allows organizations to encapsulate the complexity of connecting to and configuring data from event queueing platforms, powered by StreamSets’ dataflow management technology. Now, organizations can increase the speed, quality and success of streaming data pipeline implementations.
In pairing Streaming capabilities with its WhereScape® RED or WhereScape® Data Vault Express™ automation offerings, WhereScape automation with Streaming can manage the change management and deployment processes streaming data requires. This means data pipelines can be adjusted, validated, and promoted from development and test environments to production with assurance of high-quality implementation. The WhereScape platform takes the configuration metadata associated with those pipelines to give organizations the visibility needed into data lineage and the downstream impacts of changes to those data formats.
On Monday, March 1, 2018 WhereScape CTO Neil Barton and I will host a webcast discussing the topics of streaming data’s importance to data-driven organizations. We will evaluate the unique challenges streaming data pipelines present, and how WhereScape automation with Streaming can speed time to implementation and lower the configuration and maintenance overhead associated with streaming data pipelines. You can register for the webcast here.
You can find more information about WhereScape automation with Streaming here.