Hey everyone. Welcome to another episode of Data Dialogues. I'm Julian Redmond, general manager of Service Insight and the host for the series. For this episode, we've got a really special discussion about to happen. I've got some really great guests. If this is your first time to one of these podcasts, welcome. The Data Vault Innovators community is a community that we've helped create to give people a place to connect for professional development, for discussions and debates around Data Vault 2.0, about the benefits that it can bring to organizations, help people understand what it will do for their organization, and then how to go about doing that. As part of the podcast series, we're doing a series of episodes as a buyer's guide. For today, I have some really special guests from WhereScape joining me. I actually have Sri and ship from WhereScape.
No, I think it's great and it's showing the global reach of Data Vault. We've got Shiva late at night for you in the US. Three, you just said the sun is not even up yet india, so middle of the day for me. Appreciate you both taking the time to jump on. As part of the Buyer's Guide series that we're running, we're talking about I guess, the types of things that people need to consider when they're actually going about building a technical platform to develop a data vault on top of. And so, let's start there. What would be the cautions and considerations that you would put in front of someone when they're starting a Data Vault journey? Maybe? Shree, do you want to jump into that one?
Yeah, that's a great question. That's the key area where a lot of companies kind of struggle that was going to be ahead of us in terms of datawalk, right? So, I believe there are three phases to a datawall journey. First is, of course, adoption, second is implementation, and third is the operations. So, during the adoption and implementation phase, companies should be extra cautious about the misalignment that usually happens between business and It teams. Because when you look back, data Vault was simply a model that was structured with agility and flexibility in mind and then two whole came around with that adds comprehensive methodology and that is pretty much consistent, repeatable and pattern based. So, while the model kind offers the agility, the developers are pretty much expected to deliver the agility that's offered by the underlying model itself, right? So, when It teams go on to focus at implementing new enterprise focused modeling techniques, an over engineering mindset in It can make the It team a bit behind or not in sync with the business teams.
Business teams in that process can feel a bit estranged from clear understanding of project timelines. Because what's happening in the background is new technology has come, there's this exodus of people happening, trying to figure out how to make it really work, how to put that enterprise level modeling together. Without using the right tools, It teams are just lost in the woods trying to figure out how to make this work. Business people are getting increasingly impatient, right? So, because of all of these things, there could be huge delays in the project timelines, particularly for the business solutions. So, what happens in this process is business teams start looking for quick fix solutions from other vendors and they lose confidence within the It teams that they have got. What we usually recommend people during this part of the journey is they should consider models and methods fading into the background with pure focus on seamless collaboration between business and It teams.
Right. Because if you don't have that alignment, regardless of the type of technology, the advantages of It, you will never get around the real meat, right. Where you have to actually implement the model, reap the agility of the model and things like that. Also that's a good point in time for companies during the implementation phase to embrace an automation tool. Because automation tools like WhereScape Datawald Express, that's our data world automation tool, really emphasizes on joint design and development between business and It.
It provides an integrated environment where both the teams can collaborate and get the rapid prototyping done make decisions with real data and ensure that they are always in sync as opposed to operating at two different teams and creating more silos between them. The second important thing to be cautious about is the nature of the environment. It's a very new technical environment when you look at DV 20.
So, most of the developers or data warehouse developers in the industry come from a substantial experience of implementing a three and F or star schema approach to data warehousing, but may not necessarily carry good experience with Data Vault. It's a new thing for many of the people. Data Vault differs from dimensional modeling in significant ways.
For example, when you're implementing Data Vault design pattern, it has got a concept of splitting source data which is actually tables or files into multiple target objects. Target objects in this case would be the satellite objects. Within the data vault you need to reduce the amount of data being stored as well based on rate of change. That's where you get all of these different multiple objects being created. The end result is you get a significant increase in total number of target objects being constructed, right? When you go and try to tackle this manually, the volume of objects, the tedious, the error prone processes, you're not going to make a lot of progress within a shorter period of time and you're going to make your business people much more irritated about the fact that nothing is coming out of the door.
All of these technical aspects, in my opinion, can and should be automated to lessen the impact on the data wall development projects to reduce the risk. Automation tools like again, Datawall Express could actually ease a lot of things in this process because it autogenerates the correct syntax, generates the associated processing logic for, let's say, hash and change keys accelerates the development, eliminating the opportunities for human error along the way, right? That way you ensure that you're making much more progress in a less period of time. Another area of focus or caution that I would particularly highlight or underscore here is the development inconsistency. So, as we all know, the data wall model methodology require adherence to an extensive set of rules and recommendations, right? For example, the objects from Hubs links and satellites to the lesser known ones like pits and bridges must all adhere to very specific standards instead of rules to make the vault actually work the way you want it to work.
What happens is many of the times we see developers trying to reinvent the wheel and they end up reinventing these structures through ignorance or prior experience or best of intentions or whatever could be the reason. Problems do arise because of such approaches of reinventing based on personal preferences. A lot of rework is actually demanded both on the initial side of the build and the ongoing maintenance. It just keeps on becoming super complex.
The last piece that is often more neglected in the data warehousing industry is the operation and ongoing maintenance side of data warehouses. We would say that companies should be cautious about the operational predicaments as well because Data Vault, just like the usual data warehouses with Dimension Modeling, need ongoing maintenance which is kind of an overhead to schedule the jobs and execute it and monitor the data feeds. You have to always tackle the failure jobs, restart them while ensuring that everything is processed in the right sequence order.
Yeah. Data Vault is the added complexity that you can imagine on the scheduling and management that you have already got and you need automation to handle all this kind of operational activities and enhancements that could significantly ease up a lot of time. So yeah, at the end of it. In summary, I would say that embracing an automation tool is much more prudent at the beginning stages than figuring it out all by yourself.
I couldn't agree more. I had a chat with some of our friends at Gartner a while ago and they said to me, they're recommending to their customers that if people are wanting to go down a Data Vault journey that they do not do it without an automation tool. In fact, when they're talking to C-level execs they say if you don't use automation with Data Vault then you're on a path about a 21 month path to moving to something else because you will suffer from just the inconsistencies of hand coding and the volume of tables that Data Vault creates. It creates them for a very good reason, as we all know, decouples all of that stuff but it becomes a volume. You have to manage it.
I really like your point around collaboration. I think that's such an important thing to have teams collaborate and get that stuff in place so that you can really use Agile to deliver consistently with that consistency that automation brings.
So, thinking about someone getting started maybe what recommendations would Wescot make given the experience that you have in getting the most out of an automation tool?
Sure, it's an interesting question and I think a lot of the recommendations would vary as per the use case. There are some things that are absolutely inevitable, right? What we typically recommend to the businesses before they get on to the journey of data warehouse automation is particularly important here. For example, we always doubt that companies should take the DV Two certification course to understand the benefits of Data World and gain practical skills. Because when you're doing that award you're not playing a shorter game, right? You're going to be playing a longer game. You need to have the right practical skill set to understand the technology through and ensure that your team moves along nicely with the right skill sets. I pretty much assume that the people that show interest in the training program that you run would be coming up with this particular notion that let's get the crew trained on DV Two auto and then we will figure out what are the right tools.
Of course, after that we try to recommend companies that they should get their head wrapped around the idea that data warehouse automation is essentially about improving working lives as opposed to eliminating manpower on a project. Because that's the number one challenge in terms of strategy that companies face. As long as they have the culture internally, that fosters the idea of improving the skill set and lives of working individuals and using their skill set to actually tackle reallife problems as opposed to just doing day in. Day out. Mundane processes. As long as they have the understanding of that, they should be able to make good use of data warehouse automation. The third thing that we typically recommend is the point at which you should actually take on data warehouse automation in your DV Two journey, right? We typically say that embrace it at the beginning of DV Two project to save time, money and reduce project risk.
From my personal experience, we have been seeing two type of prospects who come for data vault Express or actually show interest in the product. These are companies who are actually either getting started with their DB two automobile journey or they are so far out like one or two years down the road implementing everything through textbook knowledge or even just hiring some consultation company to do the job for them and then just throwing in the towel. That your business. People are not able to see the result and they now want to see automation. I would say that I would definitely recommend don't be the latter set of audience, be proactive. If you have got database two dot o in the cards in the next six months to be implemented, start with your data warehouse automation journey today. You will save a lot of time and money and you will reduce the risk of the project substantially.
The last thing that I would like to recommend here in the interest of time is why looking out for data warehouse automation tools. Look for an end to end data warehouse automation product. Not like some products that have recently surfaced who claim to be doing datawall. 20 just presents a nicer interface and you start working with the product and then you are unable to connect the dots. Like how do we go from this point to the other point and how do we actually get connected to the source all the time, how do we make changes at table level, column level and things like that. Look for tools that can help you throughout the data warehouse lifecycle as opposed to just tackling one part of it.
Yeah, I agree completely. I think we have certain companies that we've seen that have tried to prove art of vault to their business by doing it manually. They've just proved that it's slow when in fact, if they'd embraced an automation tool, they could have proved that it was really fast. You kind of missed the boat and obviously these guys wanted to actually prove the valuable vault. It's really important I think to, as you said, get it up front. Let's shift, let's jump into talking about when someone's buying a solution, an automation solution, what are the business and the technical guidelines that they need to think of when deciding what automation tool and approach to take?
Yeah, absolutely. That's a great question. From my perspective, business and It teams that are starting their Data Vault 2.0 journey should be extra cautious while considering a data warehouse automation tool. Many of the nascent data warehouse automation tools that are specifically built for Database 2.0 standard offer, I would say an incomplete set of tools that undermine the agility, the flexibility and the time. Time to value right of these data warehouse projects. We all know of a few tools in the marketplace that live behind the facade of a modern UI cloud native offering. They fail to provide the comprehensive sets of features required to not only design, develop, deploy, but also operate a data vault. I would say a combination of both right business and technical requirements must drive the final decision. On the business side, it is important to consider, I would say, speed, volume change and the expansion of your analytics program.
On the technical side, I would say that data volume, data variety, analytic complexity will all impact your decision. When you look at the business guidelines, I mean, I would say the key thing is the time to delivery, right? Do your business stakeholders expect fast and frequent delivery of data access, analysis and business capabilities? Two, what is the project risk? Do your data infrastructure projects experience a high level of risk from either poor data quality, lack of source data knowledge, whether it's lack of budget or understaffing? Also, the scope continues to change as well as there's other factors Shree had mentioned earlier, it also depends on your organization and in company culture. Your data warehousing team oriented to teamwork and collaboration? The It relationship with the business stakeholders also collaborative? And furthermore, the data management feature. I mean, are the big data cloud hosted data cloud analytics, data science AI among the current expectations of your business leaders and data consumers?
Are they on the horizon in the foreseeable future? Just some things that you'd want to consider. Go ahead.
Sorry, I was going to say I think that's a really good point. These platforms, automation platforms, they're not an island, are they? They're connected into a larger architecture. It's got to fit in with that enterprise view. You can't just it's got to support the collaboration and it's got to work with other platforms. So that's really important.
Absolutely. When you take a look at the technical guidelines, the first thing I would consider is the requirements. Right? The volatility around the requirements. Do you experience frequent changes to the requirements, including a regular change throughout the development process? Is there a backlog of projects? Do you have just a long list of projects that are waiting, with new projects continuously being added that they're not being completed? You experience competing and conflicting priorities for project funding and staffing, I would think. Documentation number three. Right? The documentation for your data management process and database sparse, dated, and frequently out of sync with implementation operations? Are the process and procedures for the operation of your data infrastructure complex, detailed, time consuming? Are they labor intensive or fragile when something doesn't work for the first time? Also data infrastructure maintenance. Is your data infrastructure maintenance difficult, challenging? Or is it also dependent upon the knowledge of a few key individuals?
Those are the areas I would definitely take into consideration from the technical guidelines.
Yeah. When you think about it, that's such a huge list of things that people need to consider. It doesn't help me automate a bit of code, but it's actually can I maintain it? It going to be supportable in the long run? It going to work with all my other tools? My team going to be able to keep skilled in it as the documentation, which is obviously a huge win on the WhereScape side. Your documentation capabilities are very well known. I think that's huge. The next bit, I guess people are probably listening to this, thinking it was, well, how do I actually put together a business case to actually get my management to approve an investment in this thing? What would be the key elements you'd recommend they focus on when they're building that business case? Yeah.
So, I mean, when you're building out any business case where you need to understand several items, whether the project definition, the business requirements, strategic alignment, the benefits, the ROI with specific where escape. You have the data Vault express that automates the design, creation, and operation of the Enterprise data vaults, which enable our customers to deliver analytics solutions to the business far more quickly, at a much lower cost and with better success than the do it yourself approach. Also, for any business case, the database 2.0 patterns and best practices are built right into the WhereScape's solution with wizards and templates for ease of use, consistency and speed. Last but not least, we reduced a significant time, effort, and cost to learn how to create new data vaults.
Yeah, it's very important. The audience love hearing stories about where customers have got this right and it's worked well. Does any customer come to mind that you could kind of share of a snapshot of what they went through.
Yeah, we have several customers that have gone on this journey with Wearscape. One that comes to mind is a very large health care organization, Aptis Health. They were facing a challenge where they had just over a decade of dramatic growth. It was both organically and inorganically. Aptis help found itself just saddled with data silos throughout the organization, just due to acquired data warehouses and just various reporting systems. Their data integration into a very old legacy data warehouse was quite challenging and as a result, the organization did not have a centralized data availability or any like, master data management system. Additionally, very little documentation was available, and the organization struggled to answer any business questions regarding to historical data or trying to run any analysis on historical trends, rather than.
Apple's Health decided to do is they came in and said, what? We're just going to retire our legacy on premise data infrastructure and we're going to move the data warehouse to the cloud in the cloud. They selected Snowflake at that time and obviously that's their data warehousing solution they built for the cloud due to their, I would say, optimization, due to the innovative platform and the use of the AWS infrastructure. The company additionally adopted Database 2.0 as their data modeling methodology. They decided to go in that direction to be very responsive to future businesses and technological change. That's where Wearscape came into play just for the automation for Snowflake, and It's helped their It team fast track several data infrastructure projects using the product. They've now leveraged Data Vault Express in Snowflake and leveraging the Data Vault 2.0 methodology.
Yeah, wow. I mean, it's great to hear people having success with Data Vault and with automation tools and really getting the benefit. Obviously what Snowflake is doing is really impressive as well. When it all comes together and works, it just solves business problems. So thanks guys, for really sharing. I guess your perspective, which is not the traditional kind of let's dive into the deep tech around how to build Data Vault, but more, how do we get one started? I think that's a really important thing for this audience to hear. So I really appreciate your insight today.
Obviously guys, if you've been watching this and you want more information about WhereScape, you can go to the WhereScape website. I know that WhereScape puts out a lot of really good quality content, so definitely do that. You can get in touch through the service team as well if you'd like.
There's obviously more information on the database innovators community side as well. Please head there and make sure you opt in for the notifications so that we can keep communicating with you. If you're watching this on YouTube, then do the like and subscribe thing and also click the notification so that when the next episode is coming. Guys, thank you so much for your time today. I really appreciate it. We'll definitely be catching up again soon.