Are big data technologies over-hyped? Sure. Do big data technologies lack for substance? Certainly. Why, then, should you care about big data? Because how we generate, consume, and use information is changing -- radically.

Big data is a catch-all term for this ongoing shift to an information economy that uses statistical, numerical, and analytical technologies and methods to capture and model a richer, more contextualized world. An n-dimensional world. It's an economy that requires more data, of different types and varieties, at greater rates, than ever before. It's an economy that constantly requires more and more and more data -- and which expects to consume data as it streams and pulses.

What does this mean? What does it look like in practice? Consider just one example -- that of Canadian National (CN) Railway, which generates and stores petabytes of telemetry and sensor information on a regular basis. First, the obligatory back story: CN operates and maintains tens of thousands of miles of track in both Canada and the United States. It operates and maintains thousands of engines and freight cars -- so-called "rolling stock." CN has several priorities in operating and maintaining its rail and rolling-stock assets; the most important of these, says Alain Bond, business intelligence manager with CN, is safety.

CN's rails and train cars are all but stuffed with sensors, says Bond. It collects this sensor data to monitor the wear and tear on rails, rolling stock, and other equipment. On the one hand, this permits it to optimize its preventive maintenance and routing schedules. CN also uses this data to improve operating safety for its employees and, even more important, for the people who live in and around its thousands of miles of track.

"CN has always been known to be a pioneer on the safety side. We just want to keep pushing that envelope. We have 20,000 miles of track, but on the main tracks, you have these wayside sensors every 25 miles. Different types of sensors for hot wheels, cold wheels, bearings, dragging equipment," he explains. Bond uses the example of audio sensors that can detect wear on rail-car wheels. "We have KPIs for the amount of decibels [of sound generated by] the wheels. If there's a flat spot on a wheel, you have a train that will make louder and louder sounds."

This isn't just an issue of preventive maintenance -- i.e., fixing the wheel -- but of identifying deep patterns, says Bond. "You would like to know at some point if the wheels themselves are defective, if they're on certain types of cars -- and if the design of those cars is in some way responsible -- if they're associated with certain kinds of freights or loads, or certain speeds. Maybe cars that are used on certain stretches of track wear out more quickly," he explains.

In this respect, "big data" describes a complex world of interdependent events, interactions, and reactions. It's the interplay -- the interconnectedness -- of wheels, springs, cars, track, track grades, freight types, speeds, temperatures, even seemingly insignificant variances (slight variations in track width, unexpected brakings and/or slowdowns). Big data also describes the scale of the data CN collects, both on an everyday basis and as part of diagnostic testing (e.g., when diagnostic cars traverse its entire network of thousands of miles of track), says Bond.

"That sensor on that car traveling at 60 mph can know that that spike moved a quarter of a single millimeter. One run over the whole network, 20,000 miles, generates 8 petabytes of data."

CN is collecting all of that data. It's just beginning to figure out what it's going to do with it. It's defined KPIs to monitor decibel levels, along with other kinds of KPIs and metrics. Ironically, or not so ironically, a data warehouse powers this use case. Here, too, however, CN is innovating: it's using data warehouse automation software from WhereScape Inc. to permit it to more easily change, refit, even retrofit its data warehouse to address new requirements.

In this and other ways, big data isn't a license to wipe the slate clean and start over. It builds on and extends an existing technology foundation. In the final analysis, Bond says, so much of what CN is doing is prospective: the use cases -- the algorithms and insights -- will come. What matters is that CN is developing a bigger, richer, more contextual picture of its daily operational world.

That's what big data is all about.

About the Author
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at

Read article on TDWI Upside site