Menu Request Demo

Is Data Management Broken? Can It Be Fixed?

Author: Steve Swoyer | Upside

Last month, industry luminary Claudia Imhoff kicked off the 15th annual Pacific Northwest BI Summit with a tendentious topic: what's going on -- or going wrong -- with data management? Imhoff described a context in which self-service tools abound, governance is downplayed, and metadata standards are anything but standardized.

As Imhoff noted, she could easily have been describing the late 1980s.

"Where does something like [data governance] fit into this ... all-inclusive [data management] stuff?" she wondered. Imhoff, along with Bill Inmon, helped create and popularize the corporate information factory data warehousing model in the 1990s.

"Here's my fear," she continued, "I've been around long enough that I've seen things come and go. Are we going back down a similar path where we'll end up with all of these silos all throughout the enterprise like we had in the 1980s?"

"I see lots of data all over the place. I wonder if we're going to get to where we were 30 years ago ... where we get people arguing about whose data is 'right' -- and whose data is [poor quality]."

Imhoff's presentation helped set the stage for a big chunk of the discussion at this year's summit. 

Today's Metadata Lacks Standards

In his presentation, research analyst Mike Ferguson, president of Intelligent Business Strategies (an information management consultancy based in the UK), discussed the uncertain state of metadata standardization. He noted there are no less than three competing open source metadata cataloging standards: Apache Atlas, WhereHows (developed by LinkedIn), and Ground, a new effort championed by Joe Hellerstein of the University of California, Berkeley and data prep specialist TriFacta.

"Because we're dealing with lots of different data stores, we need the data management software to ... allow you to define something once and run it anywhere," he said, noting that this goal remains as elusive as it's ever been.

"Everything's reliant to some degree on APIs, but there aren't any standards ... and it all depends on whether you have interfaces between Vendor X and Vendor Y. There's another option, however: Can we get back to common metadata again, which is the [reason behind the] emergence of Atlas, WhereHows, and Ground? We're now starting to see the ability to connect to Atlas to harvest the metadata and bring it back."

Confusion Around Data Integration Continues

In particular, Imhoff and other attendees struggled to explain the new data integration (DI) technology landscape as it relates to data management.

As Imhoff herself put it, she has an especially hard time explaining the new DI to a very new type of customer: the business user.

"There's what I would call a 'great confusion' [among customers]. I have people come up to me every talk that I give on [data integration] and they ... ask me, 'Do I even need ETL? Do I need data prep? Do I need data quality? If so, when do I need these things? Where does one drop off and the other pick up?' These are the same questions [we used to get from IT], but now they're from business users."

There was even grousing from some attendees that self-service had the potential to wreck data management.

Self-Service Realigning Data Management

Ever the iconoclast, WhereScape CEO Michael Whitehead celebrated the self-service movement as a fundamentally good thing. "This is healthy. It's really healthy," said Whitehead, referring both to the popularity of self-service discovery and data prep tools and the laissez faire fashion in which these tools are typically used and managed.

"We [in data management] were broken before, so now people are going out and preparing their own analyses and preparing their own data sets without giving much thought to governance or reuse or repeatability. Let's be honest, there will be a helluva lot of mistakes.

"[However,] this is the way we realign data management with the needs and priorities of the business."

Modern Questions Need Context-Dependent Answers

Whitehead's position might sound a little unlikely to anyone familiar with WhereScape, which markets software designed to automate the design, population, and ongoing management of data warehouse systems. He doesn't see this as a contradiction. Instead, Whitehead argued, data management must become more tolerant of what he called "epistemic relativism."

"Let's give our business users some credit. In some cases, it's actually okay to turn up with different sets of information. In other [cases], we need different contextual viewpoints. Think about [a question such as] how many people were there on Flight N from San Francisco to Portland? To an airline, that question has different answers depending on how you define [its parameters]," he said.

This isn't to say we should throw away data management standards. However, we should understand that some questions are more tolerant of relativistic answers than others, Whitehead argued.

Data Management Has to Stop Ignoring Business Needs

"The position some people [in the industry] take is that these things are categorically wrong," Whitehead continued, referring to relative answers and the spread of self-service. "They think we're going to end up in a bad spot, with people essentially writing their own reports, [spinning up] their own Access databases and Excel spreadsheets, much like they did 25 years ago. They think we need to shut this stuff down before [users] can cause harm for themselves or others," he said.

"This doesn't work. The spot we're in right now is our own doing. We're broken because we've ignored [business] needs for so long. We're broken because we've given [business people] every incentive to go around us.

"If our answer to the question 'What's wrong with data management?' doesn't include the ability to provide data very quickly to people to do things with it, then our current answer is wrong. Historically, no one on the inside [of IT] would listen when [business people] tried to express this, so we need people on the outside to point it out to us."