What Does a Data Vault Do?

All right everyone, let's get this thing started. First off, I want to say thank you for everyone that has joined so far. It looks like we're still having a few people joining. Is your organization ready for data vault? We have two great speakers ahead of us. Some of you guys are looking to potentially implement Data Vault and see what organizations need to achieve a timely, cost efficient, and successful implementation. The others might already have Data Vault looking for new agile tips and tricks and approaches from a couple of experts. Let's go over to who will be speaking today.

Data Warehousing

First, don't need too much of a bio. We have Dan Linsted. Probably the reason why you guys are here. Dan is the inventor of data of all architecture. He's a world renowned expert in data warehousing and business intelligence. With over 25 years of It experience.

He's helped organizations including Fortune 50 companies like Nike, US. Air Force, US. Department of Defense, american Automobile Association, the Jardins Bank, and a bunch of others successfully implement business intelligence solutions. Dan offers a database 20 certifications and cofoundedlearn databalls.com to help spread the word and education about Data Vault and share his real world experiences.

After him, we have Kevin Marsh bank. Kevin is a weirdscape senior solutions architect. He's helped people and organizations quickly automate their data warehouse initiatives from build up to continuous improvement and maintenance. Kevin teaches teams to work in an agile method while increasing engagement with business and greatly improving their ability to leverage one of their greatest assets their data.

Data Vault 2.0

The topics that we are going to run through today is first off, we have what are the benefits of employing Database 2.0 for the enterprise and specifically for It? Next we got what does It need to be successful in implementing Data Vault 2.0?

Last, how can Data Vault automation increase It's ability to succeed? Let's get into the value and I will pass it over to Dan.

Great. Thank you very much. I'm Dan Linstead. For those of you that know me or know of me as I was introduced to lots of experience in the field. Let's just get right into it, shall we, and see where we can go here. So why Data Vault? Before we step into the value of Data Vault, you might be wondering, if you've never done one, why should you get into Data Vault at all? Well, if you have lots of issues ranging from late delivery to non shareable team resources, non standardized ways of working, perhaps you've got global teams or multiple split teams, all working to different standards. They all come up with different results that don't match, don't balance. Some of them have privacy built in, some don't. Some handle GDPR, some don't. You might have some issues that you want to address.

Well, Data Vault 2.0 is a system of business intelligence that addresses people, process and technology from this perspective we definitely want to address inconsistent and missing standards, constant reengineering to fit big data or GDPR or privacy concerns.

Maybe you've got teams putting things into production where you've got a lot of bugs coming along with your production releases that end up being a lot of rework or reengineering in your solution. And these can be a constant problem. Maybe you need to parse images and audio or what do you do with video files and you go to your teams and they say well, we don't have a standard for that or it doesn't fit in a common dimension or fact table. Well, we could use hadoop for that. Well, what does hadoop do? You really need to think about the people, the process and the technology. All combined database is so much more than just a data model. A lot of people look at Data vault and think well, it's just a data model and this is something that's a fallacy out there on the web today.

Sure, we have a data model that is a hybrid design between third normal form and star schema or dimensional modeling, if about that. We have so much more to offer.

Data Vault Business Intelligence

Databall really is a system of business intelligence that brings a methodology for your team, for your people, for your processes that's consistent, that's repeatable, that's scalable. We focus on enterprise solutions and big data systems, bi analytics systems that really span or hybridize your environment, ranging from combining hadoop to relational access to cubes all the way through end to end. What you really want is enterprise bi solutions that are end to end so that you don't have disparate answer sets in the methodology. We focus on repeatability and pattern based components and the reason why we need pattern based components and it's again, beyond just the data model. The reason why we need pattern based components is because, well, how do you get good at something?

Data Vault 2.0 Methodology

How do you become agile in your workspace? Well, I liken it to riding or learning how to ride a bicycle. When you learn how to ride a bicycle, normally you start with training wheels and eventually the training wheels come off, but you might fall down a couple of times. You need to know exactly how to ride a bicycle, how much speed you need, how to maintain your balance and how to pedal right? There's really only one way, but it's a pattern based learning component. We take these life lessons from the standard universe and we bring them into business intelligence and analytics. The reason why is we like consistency through the team builds.

Scalable Data Architecture

Now, we also have this architectural feel or architectural component. When I say architecture, everybody immediately thinks only of systems architecture. Architecture is so much more than that. We need scalable architecture, we need data architecture, process architecture, people architecture.

How do we increase our team size on demand? How do we share team resources from an idle team halfway around the world? I've got some insurance companies that have global offices all over China, Japan, US. Asia, Europe and so on. So how do they coordinate efforts? Right? Architecture is a big part of the Database 2.0, and then of course the model, and again, when I say model, everybody thinks of data vault models. I e the common hub and spoke architecture there for the data model. It's so far beyond the data model. We have a model for sure, but we also have a process model. We have a model for gathering requirements, we have a model for being agile, for improving the consistency of the team output and improving the performance of the team and being agile. Lots of things that we need to think about.

Benefits of Data Vault 2.0

Some of the benefits that we get when we do data vault properly, when we stick to the standards, when we leverage the right tooling, when we put the right processes in place, we get multi team agility. And this is worldwide cross spread. We get 360 degree enterprise vision of all analytics. It's not just data warehousing analytics. This includes your data scientist programs and your Bi. Analytics run through deep learning, anything that goes on in the Bi and analytics landscape. We also focus on something called gap analysis of business effectiveness.

Gap Analysis

From that perspective, what we're talking about is being able to detect and reduce the gaps that are in your business. For instance, today your business users might say, well, we believe that the business is operating this way, and a common example of something like that would be in a manufacturing system. They'll tell me, or they did tell me at Lockheed Martin anyway, you'll never see new contracts being produced by the manufacturing system.

They should always be produced by the finance system. Why? Because you shouldn't build to a contract that hasn't been financed. Makes sense. It turns out that through the data sets and through Databall 2.0 and Discovery, we're able to show them not just how many contracts were produced in the manufacturing system, but which ones, and what percentage and how often it happened. This is the gap analysis that we're talking about. Where are the broken business processes in your enterprise? Furthermore, what's the value of that broken business process? Cross Program Collaboration we definitely want to talk about not just Bi and analytics, but I've got a large bank in Australia who's been using Database for seven years and they're so successful with their processes and teams that they now have used or leveraged the database methodology inside of their operational build life cycles. It doesn't mean that they're using the database model inside their operational systems, but they're using the methodology, they're using the architecture, they're using the standards, the ways of working.

They have over about 400 people in it, including teams that are split offshore accuracy and auditability is a big one. Adaptation to change without reengineering. This all comes from the standards. This all comes from the rigor that we apply in the processes. Not only the processes that we build, but the ability to write automated meeting and or if you're lucky, generate the automated testing. That leads into the next piece here. Without reengineering, what are we talking about? That means we can absorb a brand new system and new data sets. Whether it's real time feeds or batch feeds, whether it's Twitter feeds, Google feeds, customer sentiment analysis, it doesn't matter where it comes from. We can absorb these feeds and enrich our existing bi and analytics solutions, including the results of data mining and operations in say, R for example, and enrich everything we have in our analytics without reengineering the model or the processes.

We simply add on or increment build incrementally to produce new bi solutions. Okay, so automation and generation is a big one and of course big data and no sequel compatibility. I've been talking about the hybridization of data vault. Now if you're wondering, are you the only one? Are you the first one? I just want to say no, you're not the first ones out there to think about databall. You're certainly not the only ones out there. We have ten years of research and design inside of Database 2.0 from 1990 to 2000, plus another 15 plus years from then till now of successful deliveries around the world. We have 300 test cases against the system. Not just the data model, but the processes that go on, the maturity of the processes and the maturity of getting data out, producing cubes, producing analysis results, producing dashboards and on and on.


All of these ways of working are backed by the grandfather of Agile called SEI CMI. That is software Engineering Institute capability, maturity model and discipline. Agile delivery. I partner with Scott Ambler. He is one of the founding fathers of the Agile Manifesto and he and Mark Lines co created Discipline Agile Delivery which we leverage inside the methodology. Six sigma is an error reduction strategy, monitoring and measuring strategy and then of course TQM total Quality Management is a big part of the methodology and that isn't just about the processes that we produce into production and the errors that get put there and how do we fix them. It's also the quality of data and it's the quality of the business processes. At the end of the day, to provide maximum ROI, we want to focus on how do we expose the gaps of the broken business processes and how do we fix them.

And we have over 6000 worldwide practitioners. That in itself really doesn't say a whole lot. It all depends on how good they are really. We do have an awful lot of practitioners out there that are working in the database landscape today. Banking, insurance, telco, governments, healthcare, manufacturing, if you want to know about specific case studies, feel free to ask and I'll go through it.

Data Vault

We're going to move on now and talk about what your It team might need to succeed in the database landscape. The title of this presentation is are ready for Data Vault 2.0. I would tend to say if you've got disparate answers, you're probably ready. If you've got multiple teams worldwide all running to different beats and none of them are synchronizing and what they produce doesn't line up and they're still building silos or things like this nature. If one team is successful in the cloud but another one can't seem to get there, you've got some issues with the people, you've got some issues with your practices, then Database 2.0 can help.

Okay, so we focus on standards, best practices and skilled agile leadership. We also focus on continuous improvement, continuous build. We're not talking months, we're not talking years, and in some cases we're not even talking about weeks to build a single deliverable. In some cases we can get a single deliverable through the date of all 2.0 solution and out to the bi environment on the dashboard in a matter of hours or days. Measurement and rigor are a big part of this. How did we do what we do? Did we succeed in what we did? Did we fail? Did our estimates not meet our actuals why and how can we improve it? That's a big part of the agile system. Of course the technology side of this. We're talking about flexible templates and this is where we're at gateway comes in. They've got a number of flexible templates.

We definitely want a set of well founded standards to work from and we provide those standards in the databall two landscape. There are some customizations that need to happen on site for certain sets of data. I mean, templates at the end of the day are great for 90% of what you need to do and you should stick to the standards as much as possible. There's anywhere between five and percent ten of minor modification that needs to occur on site for certain things.

Metadata Driven Data Vault

You definitely want a metadata driven approach along with the end to end control, having a metadata driven approach and the repository allow you to leverage these components and execute with version Control Wizarddriven generation. That's an accelerator for your team. Just a couple of sample examples here. I've got a retailer that built in two days. What took their outsource for three months after they'd been trained in Database 2.0 and engaged in location.

I've got another insurance corporation that built in one week what used to take them six months, same number of people and same exact amount of work. Instead of six months, we have optimized it with automation and they built it in a week. I've got a big bank that now builds in 2 hours what used to take them two weeks. That is a team of about 20 people that build in two hour lifecycle. How would you like to get from the time you give your It team a requirement to the time that you produce into production? How would you like that turnaround time to be 2 hours? Do you think your users will try to go around you or go with you and go through you and actually increase your backlog if you're a two hour turnaround? Executing successfully in order to get the automation and goal, you definitely need a couple of pieces to make that happen.

You need standards and rigor, you need good practices, you need people to follow good practices. You can't build willynilly and you can't afford to have multiple global parallel teams all running around doing different things and following a different drummer. It doesn't work. You also have to have a desire to do things differently. I'm sure you've heard the old adage if you do the same thing over and over again, that's basically you're going to end up where you're headed. You're going to recreate the mess that you have today. I'm sad to see that this is actually happening in the industry. People are saying, oh, we'll just load a data lake or data dump and we'll go back to rogue development. We'll let free form development take place. This is all this self service bi with no rigor, no governance, no PII or private information protection, no standards, and we'll just stick a team on it.

We'll write some R code and produce a result and we'll call it good enough. This is what led to the problems that we had with all of the Star schemas before. If you can think about this was related to loading a stage area and letting people go willy nilly, build dashboards on top of stage areas and we ended up with all kinds of information silos. You have to want to do things differently in order for databall to succeed. It's a culture change. We have collaboration as well.

Data Vault Automation

Perfect. Well, thanks, Dan. I appreciate that. We'll go ahead and jump in here and talk about the automation aspect in the technology part of the three pillars the methodology, architecture and model. If you go back and think about what Dan was talking about there with the people, the process and the technology as you walk through this, that really applies throughout this process. WhereScape is metadata driven? We have a metadata repository that tracks your objects, your design, all the way down to data types and documentation, either that you've entered or it's extracted from your source system. Also built in best practices for Data Vault. Two auto in our partnership with Dan built best practices and industry standards around that and always keeping the documentation and the data lineage up to date. Again. Dan mentioned Scott Ambler and the disciplined agile delivery. Part of that process is focusing on consumable solutions, not just potentially shippable software.

That's part of what you'll see during this process. That full lifecycle management moving from just an iteration of development of code to actually going and delivering something in each cycle. A couple of customer success stories apt us health and again. Going back to the examples that Dan gave. Again.

SQL Server to Snowflake

We have many of those examples with this customers moving from a SQL Server on premise solution to Snowflake and the cloud using Data Vault automation with War Escape. They were able to create their first date of all design in three days and then move into production data within Snowflake in three months. And that's fully documented. That's not just a warehouse that's considered production, but now you've got to go through the process of making it available to your end users. This is fully documented, ready for, as you can see on the far right with App to be able to implement Power bi tablet and giving your data scientists access to the documentation to move forward with design immediately.

Also Micron, one of the largest, if not the largest state of all two auto implementation in the world using WhereScape automation on top of Teradata implemented their global data warehouse within 90 days. The foundation for that and they're able to deliver new data warehouse prototypes in less than an hour. Being in a manufacturing facility, they spin up new warehouses for new lines and things like that. One of the other advantages with West Cape is it's not a black box. We expose everything and fully transparent in the code that it generates. Again, you get consistency and improve speed with your teams. Vodafone. Another customer on Data Vault with War Escape automation on Teradata save costs by being able to decommission an existing ETL solution and you save on licensing and hardware costs with that as well as at the same time accelerating their time to production from six months to two days.

Which is an incredible improvement of turnaround also because whereas Cape generates code for the platform that you choose to use. You can take advantage of the power of your platform and they were able to reduce. For instance. Their load time by 90%. We see this over and over again, regardless of the platform that you choose. Just like Databall two auto we're platform agnostic, we will write the code native to your platform. If it's snowflake, we'll write snow sequel. If it's SQL Server, we'll write SQL for SQL Server, azure, Dwdb, Two, Etc. On whichever platform you'd like to use. Earth Cape Vault Express is a combination of two of our main products, three D and Red. It generates your hubs, satellites and links automatically, your hash keys, your change keys, all your metadata attributes. It also generates the code to instantiate and load that Data Vault.

WhereScape Data Vault Express

So, what we'll do is we'll take a quick look at Data Vault Express. I'll jump right in here to WhereScape three D, the design modeling tool. Now, I've already modeled a source system and we can see real quick here the source model. Nice, beautiful source model. Not all of your sources will be like this. We provide wizards to generate your keys and find your relationships. Also building to profile the data and quickly look at what's going to be of use to your endless. Here we can see quickly that ship region is mostly null. That won't have a lot of analytical value, but this is a good quick design to do a data driven approach to see what's in the system. We also have model conversion rules that will automatically generate this into a data vault, but that is of limited value in the sense that it will get you 60% of the way there, possibly with giving you a good idea of what your vault should look like.

Really from a database to auto perspective, you need to be having those conversations with your business and in that build out your logical model and source your source system into these entities. If we look at the shippers entity here and trace the source, we can see that's coming from the shippers tablet in the source orders connection and then populating that logical entity. I've had the conversations with the business, I've marked business keys, I've marked units of work as well as my attributes and I've talked with them. We're missing the customers entity at this moment. I'm going to bring that in and even with the customer sitting right next to me, I can pull up this and ask them how do they view the customer, what is the actual business key? Now the source system says the customer ID is the primary key, but that's not necessarily the business key and how the customer views the business.

They've told me as we walk through this, the company name is actually the business key and how they refer to their businesses or their companies and that's always going to be unique. Also discussions about the attributes of the company. They've told me the contact name and the contact title change fairly frequently. I'm going to mark that as a medium volatility attribute and then the rest of these are low volatility. They don't change very often, especially that Fax number there at the bottom. We'll go ahead and mark those and now we see those colors marked here to show how we've identified these various attributes. I'm also going to take the company name and link that over to the customer's entity. Now we have the logical model split out here to add this additional entity that the customer said they needed. I'm going to go ahead and take this logical model at this point and I'm going to take an advanced copy of this and create a database using this logical model, give it a version of 3.0 and apply a model conversion.

Data Vault Processes

Now, part of the power in this tool set around data governance and really that process that Dan was talking about is creating these model conversions that apply your standards again, you need to have that flexibility. We provide some of these out of the box, but they all can be modified and customized. We have a set of model conversions here that generate your data vault for you based on that logical model. You can actually add your own customizations. If you want to apply naming standards to how this is built, you can do that as well. I'm going to go ahead and kick that off and let that run and it's going to go through and read that logical model, make the determinations on what the databall should look like based off the keys and the relationships. If I display this now and then reorganize it here so it looks a little prettier on the screen, we now have a data vault model based off of our discussions with the business.

We have our satellites and our links with our hash keys defined. Our Hub hash keys. Our new cache keys. Et peter. If I look down here at the customers. You can see that it's taken the attributes that have been marked as slowly changing as low rate of change satellite along with that change hash for that particular set and then for the medium rate of change attributes. Contact name and title. It has split that out into a separate satellite as well. Now, some of our customers choose to split that out also based off of HIPAA data or PII data to be able to secure that off. You can do that, you can apply model conversions to automatically split. You can mark your attribute types and have the model conversions split those satellites out for you depending on whether it's financial data, health care data, etc. For.

What we're going to do is we're going to take this model now that we've built off the logical model, but the Raw vault, we're going to build a deployment which is going to generate load and stage objects for us. I'm going to apply this model conversion here and it's going to walk through and it's going to look at the source and it's going to look at the target data vault model and it's going to go through and build out the low tables to be able to land that data. It's going to build the stage tables as well as the hash key definitions for your various satellites, for your rates of change across the attributes as well as your Hubs, your Hub hash keys and do all that work ahead of time so that we can follow this insert only no update approach to loading the vault which lends itself very well to streaming data, slow changing data.

All of that is supported in the database two auto methodology. Now that we've built that, if we take a look at what has occurred, it has brought over that data vault model that we already looked at. It's also built these stage tables as well as these low. Tablet. Again, this can be on whichever platform that you're operating on. If we take this now and export this to WhereScape read here's, that integration to take you from that design and modeling over to your actual build out here. You can have multiple targets, you can have different platforms involved and there's various in here, but it identifies that. This particular version I'm doing is SQL Server and it's also applying a mapping to SQL Server. What it's going to do at this point is generate a deployment package. It's going to take all these objects in the model and dump this to an XML deployment on my file system here.

What I can do is load that in to my actual metadata repository. I'm going to go ahead and run this and it's going to walk through the manifest from that 3D model that we built out and it's going to tell me what it's going to replace and what it's going to create new. Also, if there is existing items, what it's going to alter. We have a lot of options here. This is a Deployment Settings Wizard and you can actually create your settings and export those. You San Jose Jenkins, some of our customers use Jenkins to automate these deployments. Now what it's going to do is update the metadata repository with all the information that we did in our design. Once it's done with that, red will start creating those objects in the target platform and also bring over the comments and add those comments into your target platform.

WhereScape RED

However, it supports embedded comments, it will put those into your target platform as well as store in your metadata. Now that I've done that, I'll bring up our builder tool, whereascape Red and if I refresh this and expand this out, I now have my hubs and my links and my satellites actually built on my target platform along with my load tables and my stage tables. Now the ones that are in green, I've already created. They're on a different target point from web services and Blob storage et. Peter. If I take my customers table here and load that, it's going to dynamically generate in my Settings and ssize package, execute that and load that, and I can look at the data pulled over from that particular source. If I go to my Stage Tablet, I'll go ahead and generate the code. Again, it's generating code for my target platform.

In this case, that SQL Server. It's building SQL Server store procedures using templates which again can be modified. I'm going to execute that. You can see how quickly that executed because it's executing in my target platform. Just took milliseconds to execute and load the data. Now that we're on the warehouse and you can see that it's generated by hash keys, a Hub hash key, as well as my change hash keys. I have two of those on here before the low rate of change and medium rate of change, as well as some of the data volt Toyota standards with your source system being pulled through. I'm going to go to my Hub customers and I'm going to generate the code for that to load it and then I'll execute that and that quickly. I now have a populated Hub table and then if I look at my satellites, I have my two satellites here with the low rated change and the medium rated change satellite.

I'll go and build the code for those and then I will execute the update and display the data. And now I've loaded a satellite. I've got my hash key with my attributes as well as my change hash and my source columns. So that's all well and good. Some of the other things that this is doing for you while we're walking through this is it's maintaining the documentation. If I do a trackback diagram, I can see here that I have my customers source, my load customers, stage customers and my satellite customers. On top of that, I can also build a diagram of my vault. If I go to my Hub tablet, grab customers for instance, and go out eight levels. Look, my links diagram here is my fully created data vault set of objects. I have my Hub customers, my links, my satellites, et cetera. And this is interactive.

I can double click on these and actually work on the procedures and the properties, et cetera, within the tool set. I can also build out a job from here to populate this also very easily. From here I can go up and doc create documentation to build out an external website. If I click all objects, it will give me a few options about where to create this. Give it a header, a title, and it's going to walk through and generate that website for me and pull in all of those diagrams and split out the documentation between user and technical documentation. I click on user documentation here. We can quickly look at my objects here. I've got a glossary now generated with all of my metrics and my attributes defined object name defaults. Here again, if we talk about that consistency and pattern based, we have our standards in here for our users to see.

I can look at fact tables if I'm looking through here, I can look at the star schema and link over to the dimensions. I have some sample queries to start me off and also linking over to my dimensions. What's really great is the technical documentation. If we want to look here, same documentation, but now if we look at the Hubs for instance, I can look at Hub customers that we're just working on, we have a lot more information. We have relationships defined, source to target mapping, we have our data lineage, we have our spine diagram which is our hubs and links embedded in here. Any indices depending on your platform, it may create indices for you automatically, which you can turn on or off hierarchies and aggregates. But what's really great, again, that transparency. We can actually look at the code in the documentation that it generates so quickly.

We can have a fully fleshed out set of documentation which also helps protect against or help with employee churn as employees turnover. You have this full documentation that's up to date to bring people back in and be able to read through what has been done and use that and get on board very quickly. Hopefully that gives you a good overview to see how automation is going to help cut that complexity to meeting to your first date of all or to really increase the speed and reduce your project risk by increasing the time or reducing the time to move through your projects and deliver something usable to your data scientists and your report developers and your analysts and really automate for speed. Again, this is everything you saw there. You're not boxed in. You can change the templates and modify it.