Data Warehouse Automation

Well, thank you for joining us today. My name is Simon Mitchell. I've worked in the financial industry for the past 18 years for several different companies ranging from insurance brokers, investment companies, and banks. During this time I've learned a lot of different programming languages and there are a lot of good exciting technologies out there. We live in a world where Excel remains constant. I've developed an indepth knowledge how data warehouse, data integration, reporting, visuals, and analytics. Most recently, I've worked on data strategies and data governance programs. I've been all lone wolf developer, but I've also worked in leading large teams across the globe and now engaging data. I lead a team of highly skilled data consultants where we help clients with a whole range of technical and strategic challenges around data. 

Firstly, I'm currently working with an investment company who are developing their data strategy and a new data warehouse in tandem. Finally, you may have guessed by the picture, I'm also a part time mustache model. Here are a few things that engage in data have as their core principles. Firstly, we approach most of our clients with these principles and it helps us with our thinking and helps with solving problems for them. The first one right at the very top is the people. Having the right knowledge, capability and clear communication with your team is critical. There's a lot of work we do with clients to help that processes is really around how we create standards and governance and controls to make those people and those tools produce value as quickly as possible in the world of data development, which is a rapidly changing one, and now even more so with a COVID pandemic. 


Automated Data Warehousing

We need to have processes that enable teams to adapt and still produce high quality output that is critical to building success. And the final principle is around technology. Using the right technology, using it in the right way to do the right job, and not using the old cliches, picking a sledgehammer to crack a walnut. How does this work inside of Automating data warehouse or data driven reports? My first example is around insurance broker. We were approached to help the trade credit team who had a problem where they had a huge number of people consuming data from the underwriters websites. The UK CEO saw there was issues with this. There's lots of people spending lots of time processing data, all of which was outside of it. He was interested, the CEO was interested in the benefits that aggregating the data could give him and the company, with a combined UK view, being able to speak to underwriters and the market about the trends going on in the market based on all the client data would benefit the existing clients and help to attract new ones. 

Traditionally, sorry, underwriters can only analyze the performance of their own book of business, which may also differ from the way that the market is reacting. Insurance, as an insurance broker who can describe what's going on in the market or provide a better view of the marketplace, would have the potential, the opportunity to create new insurance products, influence insurance rates or expedite claims, which would provide additional benefit to their clients. For those of you who aren't aware of what fake credit is, it's one of the newest types of insurance. When I say newest, it's still quite 30, 40 years old. Being one of the youngest insurance markets in the insurances in the market, you find a lot of companies, underwriters have already developed technology to help them share their own data. There'll be APIs and there'll be plenty of ways of sharing data between brokers and underwriters. 

Also there'd be good web portals that enable data to be shared. That was the start of the problem. That was how all the insurance brokers were collecting the data from the web portal. The end goal was the data warehouse had to have the opportunity to create a highly scalable and reusable data collection process within the data warehouse and help take away some of the strain. This was the reality of what was going on in the office, across all of the offices. This is a picture of a Lego convention in the US where people come together to build wherever they want, like from millions of different bricks or different shapes and sizes. There's no patterns, there's no instructions. It's a free for all. If you imagine it, you can build it. This is the data chaos that we had. The situation with the trade credit insurance broker. 

It turned out that all the brokers were doing very similar things to the children in this picture. They were taking the data, they were crunching it, they were building their own reports using from multiple different files and each client. It took at least a day for each client's report to be created. Plus there were always additional reports on top to add to the time spent crunching data. We started to look at the problem, it actually turned out that in our case, there was at least 90% of the reports that were being built by all the different insurance brokers were the same, regardless of the client, regardless of the office. 90% of the questions were answering was the same questions. There were a few fringe things around the outside, but accidents always happen. The problem being there's always at least one child on that map that wind up Lego up their nose and it's the same thing that was happening in the offices. 

There were some reports that were being built and created at the beginning of the month and then by the end of the month, that same report was being used to answer questions. But the market had changed. There was no health warnings on the data, there was no controls over the quality and the age of the data. Often were using the data in the wrong situation. It was like reading last week's weather report to see what the weather was going to be like this week. However, there are still a few bright sparks on that map in the offices that would be able to create fantastic 80s classic Lego models like the Lego spaceship. We utilize their great ideas along with the expertise of the team to start building some really cool dashboards. So at first we failed. We tried to build using our initial approach was to build using QlikView. 



QlikView offered us a quick, straightforward way of being able to plug into the data and pull the data in from their APIs. However, there were still some gaps there's limitations at the time that prevented us from documenting our ETL, prevented us from documenting our success and failure within the ETL process. Were told by our compliance team at the time this is not suitable, this doesn't create the right rigors around the data, so we want you to try and do something else. Also, the software Clip View didn't allow us to do any kind of data lineage or any other end user documentation. For those of you that have worked with this kind of tool, you'll appreciate how often, how at times consuming all of this and can be very expensive to sit down and build all your data models in one store and then have to do your documentation at the end. 

Everything gets disconnected, everything. You can concentrate on the code, forget to do the documentation, you can do the documentation and it doesn't reflect the code. There's all sorts of different scenarios when things aren't linked together. We ended up breaking down the problem into two. We used Visual Studio to do all the connectivity to the APIs, and that was purely because there was an external company working for the insurance broker who did all the app development and the app development, and their lifecycle was all set up. It was very old school, it was very slow, and documentation was a luxury, but were able to get some of the bits done. Then it all started to fall apart. There were still inconsistencies between data and documentation and weren't getting the right support around the solution. And it was called a failure. That was it. It was done. 



We looked for a different way of doing it. That's where we converted to use WhereScape, because it offered the chance to use all of our existing expertise. Our team of developers could continue to do the stuff they were good at and use a tool that helped to automate some of the mundane things like documentation and allowed them to focus on the skills and things they were good at. Not only did we build the base warehouse with all the documentation, but we managed to get all the error handling, which kept the audit and compliance people extremely happy. The impact of doing this, of changing over and creating or using a tool to automate some of the processes and some of the requirements we had, was quite fantastic, actually. The UK trade credit arm increased its gross written premium in the first year by a million pounds, and that over five years, looked around 20% annual growth. 

They were extremely happy with the way were able to use the data to help grow their business. The UK Arm then started to advertise that they had this lovely analytics. Twins and other countries across Amir also wanted to use the same thing. Because the technology that were using from the underwriters was something that was available in other countries across Amir, we created a conveyor belt where we could automatically turn up to the next country plug into the APIs. Before it, we had an Amir view, not just a UKwide view of what's going on in the insurance market. Each of those countries experienced each of the countries that adopted the analysis saw a similar result, similar return on their investment. The bottom line of the infrastructure, there was an opex improvement. Improving the way and streamlining the insurance brokers, where the insurance brokers created and stored the data, reduced ongoing maintenance by 33% annually. 

Data Warehouse Automation Cost-Savings

The cost savings was just how we stopped saving the Excel files all across the network. That 33% saving of not having Excel files everywhere then contributed to the maintenance cost of the data warehouse. The teams who adopted this approach helped to drive significant growth and enhance customer satisfaction, which led to increased market share across UK and EMEA while promoting the leadership who sponsored this project. An example of how this worked, you may remember back when phones for you went into administration and that impacted affected this insurance broker. What they were able to do with the analytics suite was very quickly look for phones for you within the dashboards, understand all the connected teams and all of the connected clients to that one risk. By 09:00 this, all the teams were coming together, organizing an emergency call and created a plan of action. By 930, their plan of action was how do we get out to our clients, how can we help start processing claims and ensure that our clients are well looked after. 

By 12:00 that day, all of the clients have been contacted and each of the teams were assisting each of the clients. The dashboard acted to join all the brokers together and proactively help clients improving the overall customer service each of the clients received. That's when we got nominated for award for best use of technology. It was fantastic accolade for the team to receive that kind of recognition. Unfortunately, we didn't win it, but you can't win them all. What it did do is proved its worth over the five years. So moving on to automation is king. 


Data Warehouse Developers

It'S not always about having a tool that can automate. Sometimes we need to look at other things that are happening before we can really use automation to its fullest extent. I've recently started looking at a project where I've landed and there's two data warehouses. There's been 3 Data warehouse developers, three project data projects that have helped build these siloed solutions, and an inconsistent development practice. There was no coordination between the developers and everyone was taking their own approach to build a data solution. There was very little documentation because everyone was focused on building the code and documentation can happen afterwards. Also each developer took their own name and conventions and did what they were comfortable with doing. There was also an inconsistent methodology in the way that these solutions were going to be built. It was kind of odd. Some of them were Kimball, some of them were Kimblesk, all of them weren't very good, and that's fine. 

I mean, they were starting to do things, but they just weren't stuck together. And it was quite an expensive setup. However, the data inside of all these solutions were really simple. There was an old application using a DB two database platform which was simple enough for all of the developers to connect into. The way in which all of these solutions had done that was very inconsistent. One day solution was processing data out of it could take a huge amount of strain. The way in which all of the solutions were connected to it was impossible for the support team to understand who was making the biggest impact and slowing down the front end for the business users. The most important thing was that the board had identified the shift in the marketplace and how competitors were using data for things like AI and client portal applications and market analysis that assisted their clients. 

The board backed the overall data strategy and was keen to bring themselves up to par with their competitors. The challenges around this we kind of worked in reverse. We started with how we're going to get the data out to people, how are we going to take these three solutions and actually conform them into one thing? The decision was purely cost based and it looked like a power Bi and Excel world where that would be the digitalization tool and data discovery tools. We wanted to build one single source of truth. Consolidate everything together, bring all these solutions into one thing and the way to do that is to get all the developers to do the same thing. This was the biggest part of the issue. Like Bill Gates highlighted, if you don't get consistency, then all you're doing automation will just highlight how inconsistent you are and make the problem bigger. 

Standardize Data Warehouse Processes

We designed and developed an approach, a development approach for the developers to use that just blended everyone's approach together and met with the culture of the company. We created methodologies so that we could create a consistent output regardless of which developer was building what solution. We also created standards and processes and governance around what we're doing. That's kind of things like naming conventions, let's all use the same naming conventions, let's make sure we use the same release process and we do the same quality control. All of this though, had to fit around the culture of the company, which was quite openminded and quite progressive. Being a small team, they didn't have the luxury of having armies of people to do testing and peer reviews. He just had to do just enough to get it to work. The standards, the code that were trying to develop meant that everyone was doing everything in the same way so that we had less time. 

We could share the work around each of the developers and we could share the knowledge around. Each of the developers didn't have to spend the time learning how the previous developer had built the solution because they already walked in talking the same technical language and they could quickly identify problems and move on. They could still use their expertise in data development. They just spent less time trying to figure out what the last person. We then used WhereScape products to automate all of those things together. There was a lightweight process around the tool that helped us automate all the standards and the processes and the documentation. 


WhereScape 3D

We used WhereScape 3D to discover all of the source systems and help automate the system itself, the relationships of the data, the cardinality of the data, as well as bringing in business terms, so that we ended up creating a rich document of the source system and where the data lived. 

We then continue to use WhereScape 3D, which helps to automate the build of the data warehouse to the end product with full documentation that we could import and we could use bits and pieces from 3D and then publish that to the business so they could understand what the solution did. It took us a bit of time to rebuild small three man squad, two man squad that was originally working on a contractor basis. We were able to change that into a permanent two man development squad that now run the entire organization's platform. We've gone from a contract to a perm model and we've made at least a 70% year on year cost reduction in doing that. These two man squad can cope with everything that's thrown at it. It's now shifted, the company itself is now shifted into where they can do more with fewer people. 

There's lots more data requests being processed by these two developers. While supporting these developers, also supporting the warehouse data is now driving all the decisions for customer services and creating a strategy for fact driven, not gut feel. That's purely because people now trust the data. Because the twoman development guys, they can go in, they can show real value, they can work in an agile way. We're building trust, we're increasing that feel of we can get it right first time. Now business have access to their own data of our Excel. So they've got nice shiny new tools. We've increased adoption of those tools by providing training the relation ships between developers and business stronger than ever. It's not an entirely perfect world, but it's far stronger than it was twelve months ago. There's a whole host of clear and well understood documentation. There might be the occasional spelling mistake in there, but the fact that they have documentation that's readable by the business users because it has all got business terms in it's fantastic. 


Increase Developer Productivity

Like I mentioned, there's a quick turnaround pace now with these developers. Now, more importantly, the company has got a scalable way of doing things with all the standards, with the automation. They can now increase their team, they can decrease their team, they can pass work between developers because there's this clear understanding of what's going to happen, sorry, what the last person has done. Everyone understands the same language. For me, if we all want to build Lego spaceships like this cool one, and I certainly eight year old Simon would, I love to have built that. I think there's some fundamentals that I'd like to offer to you guys. The first one is absolutely understand the value of what you're going to do, what you're going to create. And how will it make it? How will it make a difference? What value are we going to get? 

The three principles are always the way that we approach things. The first one being in people engage with the people, whether it's the business, whether it's your developers, you can have the right tool and the right process in place. If you haven't got people bought in, then the whole thing is not going to work. The process, keep things as simple as possible and focused on the value. Keep it simple. Simon the kiss approach. The moment things get complicated, you increase the risk of the project failing full stop. That more you can do to simplify and take things forward and focus on the end goal. The more success you're likely to have. Technology pick the right tool for the job. Understand the fundamental issue that you're looking to address. Automation has helped me in my career focus on addressing and simplifying the inherent complexity of a difficult task. 


Building a Data Warehouse

Now what's good about it, I want to show you how to actually do it in a tool called WhereScape. Okay, so I've just switched onto the WhereScape 3D screen. I'm going to take the model driven approach to data warehouse design. Today I'm going to model a operational data store type data warehouse from a source system. I'm going to model that, profile it, publish it out into a physical model, and then I'm going to commit that physical model to a snowflake platform and I'm going to build the snowflake tables and all of the necessary code to manage the data in those tables in the 15 minutes. I hope to achieve that probably under that time. I'll take a few questions at the end, but then we have general questions, as Andy has just said at the end of the session. 


Data Model Design

WhereScape 3D pretty much stands for data driven design. In fact, it's actually more model driven design. We're going to work with these connections here. These are source connections, the data we're interested in understanding. Literally this could be the very first time we ever connect this data. All we need is a JDBC or an ODBC connection with read Access to do what I'm about to show you. I'm going to be using a SQL Server source system on premise. I'm going to take that data and I'm going to do the ELT, the extract, the load and the transform to move that data onto the snowflake cloud and have an operational data store in the cloud that we can then use for our end users or for our star schemas or for any other purpose. This trade source system is where I'm going to start. With this layout, you find this workflow here is reactive to whatever you're connected to. 

The workflow is the suggestion of what could happen next, and it tends to be in the right order as well. These buttons will fill up and do this, do this. It tends to suggest the kinds of workflows that we consider to best practice. You don't have to follow those, you don't have to build Odsss. We also build Dead Evolves, we also build Star Schemas, we also build paranormal form and any hybrid approach on any of those. I'm starting here with Traders Source system on discover the connection. I'm going to ask you to document this into my Traders project here. Today I'm going to use the date as my versioning number, sorry. One, two, three. I'm versioning this on the 23rd or the 7th, 2020, and I am going to ask you to look for the DBO schema on that source. Remember, this could be any ODBC or JDB source. 

It could also be flat files, like delimited files, fixed width files, XML files, and to an extent some of the JSON files. We will be covering Avro and parquet in the near future. My filter options here are to say I want to use all the tables, no views at this point, what to include percent being everything and what to exclude from my search. In this particular case, I want to eliminate any of the Sys tables from my view and then that will give me the full discovery that I want. I then say okay to this, and I can now set the discovery running. I'm going to do a full discovery, which is everything, indexes constraints, the lot. Quick is a conceptual model and custom gives you a whole dialogue of checkboxes that you tick and untick what you want to discover and what you don't want to discover. 


Data Modeling

Very quickly there you see a model has appeared in the background and in my overview window here. The second phase of discovery, which you can skip, is to profile the data. I'm going to profile that data now quickly, and this runs a bunch of SQL queries against the actual source data to provide some data quality metrics about that data. I'm going to hit the profile and you'll see here, these are the things that's running against each of the columns as it goes through. Depending on the data type of the column, some metrics will apply and some metrics won't. It won't apply all the metrics, all of the columns, right? So we now have a model. I can use this overview area to see my whole model. We can break the model up into groups if you wanted to, so you can have it in smaller parts because very big models obviously will just fill the screen and it will look something like a blacked off like that in the corner of your screen. 

This particular model, the source model, provides me with SQL Server data types because the SQL Server source and also picked up the names of the attributes and the entities as per the source system. We don't suggest you correct the namings at this stage, although you can. It's more something you do in the transformation layer rather than the load layer, but you can transform a load if you wanted to. Now this is a very good source system because it's provided me with relationships and if I don't like the relationship, for example, this one here, I could flip it around. That doesn't change the source at all, it just changed my documentation to represent how it looks. Also, I can see here that an order ID for order details, an order ID for the order header. It didn't find a relationship here, so the foreign key constraint was not declared. 


Logical Data Model

Now, if there's many situations like this, I could use my foreign key declaration wizard and declare the foreign keys, but I'm not going to, I'm just going to draw in the relationship where it matters. Now, the reason why this source model is important, because this is the blueprint for sorry, let me just tell again, order ID onto order ID. This is the blueprint for the automation. If it knows what the relationships in the source system are, it can work out the joins when we load this data into the data warehouse later on. Okay, so that's my model. I'm going to go with that. If there's any tweaks, we suggest you copy this model into a logical area and then manage the logical model. I'm not going to do that today. In the short time I'm going. To build in 15 minutes. I'm going to go straight to my ODS. 


Data Modeling

Now, at this point, we're going to use a feature of WhereScape 3D called Model conversion rules. These model conversion rules allow me to convert this model to a model of my choosing. It looks for entities, it looks for data types, it looks for attributes. Based on the entities, attributes and data types and other things it finds, it will transform the model to. In this case, I'm going to do a type two ODS model. Down here, you'll see I've got this ODS Traders model. At the moment, I've just got the example model in there. Now I'm going to take this one and using some transformation rules, I'm going to advance copy this to my ODS Traders model. Again, I'm going to use the same naming standard for my version and I'm going to apply some model conversions. Now, these are transformation rules. They transform the model from one type to another case, in this case, I'm going to do a source to red data store and I'm also going to make some commitment towards snowflake. 



At this point, I translate the data types and any column names are non compliant to be snowflake compliant. It's now transforming that source model into an ODS history model, but mapped onto this source. The ELT mapping is maintained at this point between the new model and the source model. Okay, so now we see a different model appearing, the ODS Traders model, 227 23. We can see here the naming standards are applied in the rules is DSS as the prefix. We can also see that the start date, end date, current flag version and update time have been added as the control columns on a type two table. We can see that these DS hiss tables, which are the persistent tables, are getting the data from the stage tables. Down the bottom here in the green and purple area, we'll see the purple stage tables which are getting the data from the load tables. 

We have the load tables which are getting the data from the source, which I just discovered five minutes ago. To see that more graphically, you can use our table trace table source and you can see the lineage of that entity. In this case, I'm sourcing from one source. If there are multiple sources, you'll see them all branching into this flow here. There's the stage table, there's the load table, and there's the original source table on the database that we got it from five minutes ago. This model is now physicalized to an extent towards the snowflake platform. I can still change my mind at this point if I wanted to, and committed to Azure or Teradata or any other platform I choose, I want to continue with the process towards snowflake. This Wescape 3D, it manages us better data on a postgrad SQL database. I'm now going to export this into a Red export model. 


WhereScape RED

This is pretty much like the architect preparing the plans to deliver to the builder. It's like putting the plans into their final representation. At this point, I'm going to be committing them to a snowflake. I take these, this plan and I'm going to prepare for WhereScape Red. WhereScape Red is the builder tool. It's in charge of all the generation, of all the DDL code, the DML code and the data transformation code, and the data workflows. Red is the builder and 3D is the architect. I'm pretty much preparing the final plans for submission to the builder. So prepare for red. I'm going to put it into the Red export category here and I'm going to stick with my naming standard for the version. At this point, I'm going to use the target I've chosen, which is snowflake. Even now I could decide to U turn on that and commit to a different platform. 



I want to commit to the Snowflake platform as per the rule specification. A few things you can tweak on the deployment, on the preparation, and I want to just prepare that. Now, at this point, it's asking me which objects want to go on to which logical schemas. I've got these three logical schemas here, load, Stage and EDW. Load is taking all the Load tables, stage is taking all the Stage tables and EDW is taking the DSS tables. These are set in those rules I just ran, and these are logical representations. When I deploy this to the physical environment, I will need to map these logical onto a physical. If it finds a map by name, it won't ask me. If it doesn't find a match by name, it will ask me to resolve it. Or if it's an ambiguous match, it will ask me to resolve it when it deploys. 


SQL Server to Snowflake

I want to create a new group here today. I'm just going to call it Demo Release. Again, this is a release numbering that you can name according to your naming standards, release naming standards, and I want to include everything I want to deploy in that release. Now, the final thing here is just to check this, I'm going to go from SQL Server to Snowflake. There's the final check on the data types being done and it's now ready to be exported. Now, appearance wise, this model won't have changed much. There'll be a few minor tweaks behind the scenes in terms of getting it ready for snowflake, but it looks pretty much the same as the model I just showed you before. This is now prepared, ready for a snowflake deployment. Now, today I want to deploy into my snowflake environment. Here's my snowflake dashboard. This is the demo Terry Moon I'm using today as the database. 


Database Schemas

And these are the schemas. These first three schemas you see here, which are sorted in alphabetical order. Debbie W dev load and dev stage. That load logical schema is matching onto dev load. The stage logical schema is magic onto dev stage and the EDW logical schema is magic onto DVW. In here I've only got a few things. We provided date dimension at the box that's sitting there. This is just a little sample I loaded earlier to test the load was working. We're going to populate some more stuff into the schema now from WhereScape Red's point of view. So this is the other tool. Now I'm firing up and I want to log on to the Snowflake demo dev. Now this is the metadata repository for my development on Snowflake. I could have a dev proceed, test, proceed, preprod, repositories, prod repositories, any number of different dev repositories. 

I'm going to use this Snowflake demo dev today. This is housed or hosted on a local SQL express, but it also can be hosted on a zoo platform or remote Data Services platform. We can now see here that at the moment I've got just a couple of stage tables. Let's go with the demo one here. I've got just the stage table and the dimension in here. There's no load tables and there's no data store tables in here yet. Okay? Let's just minimize that for a moment and go back to my 3D. Now I'm ready to commit this model across and I'm going to use this one here, which is the same one. Now I'll take this one and map it across. Let's do that. We'll export this one across to where escape Red. This is pretty much putting the plans in the post. In our case, the post is an XML file that represents all the metadata of this model and that's now been deployed across into the XML file. 


Database Deployments

Once it's been put in the post, which has been now I can now use our setup administrator tool to pick that post up and deploy it into my target. I'm going to my applicational folder here. I'm changing to my temp folder where I've just deployed that package to and there's my package sitting there, the traders export with today's date 23 7220 on it. Okay. I'm going to commit that now to my physical Snowflake Red build. I'm going to install it into that same repository I was just showing you. Snowflake demo dev. Okay, that and this is the first deployment of this package. So everything is new. If it wasn't the first deployment, if I was putting a patch release out or change release, then there'd be different, a different summary here. There'll be changes or updates or drop and recreate. We tend to drop and recreate load and state tables, but we just tend to alter persistent tables so we don't destroy any data. 

Metadata Repository

So I'm happy with that term output. I'm going to go here and I want to do a load with the row counts. Here's the situation where one of those logical schemas load is ambiguous. I've got two loads in my target, so it's asking me which of these loads do I want to put it on? It's that load is the physical one I want to commit it to and then off it goes. Now it's deploying the metadata from that XML package into the WhereScape Red metadata repository. 



Once it's deployed the metadata, it starts acting upon that metadata to commit the DDL. It's now creating the objects on my Snowflake. If I was to come over to Snowflake here now and refresh here, we start to see the load tables appear on dev load here. Shortly the stage tables will start to appear on dev stage. If I keep refreshing the screen, we can see the progress here as now putting the stage tables out there. 



If I go back here, we can see it's now finished the stage tables, it's now committing the DS tables to the Snowflake targets. Once it's done the actual DDL and executed that DDL, it will then start to write the DML. It's starting to write the DML scripts after this. Those are the scripts in this case. Because my target is PowerShell, I'm generating, sorry, my target is Snowflake. I'm generating PowerShell scripts. Those PowerShell scripts execute snow SQL or snow pipe commands. Obviously, if I was going to a different platform such as Azure, then we'd be doing PowerShell for Azure and using Transat SQL. If I was going to something like Amazon Redshift, I'll be using Redshift code in either dot bat or PowerShell scripts as well. It's now scripting the load scripts and now scripting the stage scripts and finally it will script the DSS history. 

Here now, if I refresh, I pretty much going to see everything is now populated into these snowflake. Debbie W, dev load and dev stage schemas. All the stuff is pretty much new, just committed now. Okay, so back here in WhereScape Blue, just have a look and see if it's finished. Just doing the DS code now. Here in WhereScape Red, if I refresh, we see that demo release appearing now, that group that I actually created when I exported from 3D, and in here we're seeing the load tables, the stage tables and the data store tables. They have different colors because they're on different schemers and we've set those as different targets. They don't have to be different schemes in the same database, there could be different databases if necessary. Let's just check the progress. There we are, application import complete. That's now a clean bill of health in terms of that. 

This can be kept as part of your release documentation. Send it to a file and you can keep it with your release notes. So that's all good. So let's go back here. We now have a load to extract the data from the source, the stage to convert the data, ready to deployment into my history tables. And my history tables. At the moment, these are just empty tables because nothing's actually executed in terms of code. We've just generated the code. If I was going to load customers here, for example, and look at the properties we'll see here, this is the PowerShell script that you and me would have to have written in order to do this. Now it's been automated. Okay, so very quickly, I'm going to put all of this into a scheduler job. I'm going to put this as a refresh traders ODS. I'm going to give it up to six levels of this could go up to hundreds if you wanted to, depending on how powerful or parallel your server configuration is. 

I want to stick with six on our small build and I'm going to just go with all the stuff in there as we've got it. And okay, that right. We now see a new job here, which is now on our scheduler. If I right click on this job and view the actual dependency diagram, we can see there's the lovely flow that's going to occur when I run this job. All the loads done in parallel, followed by the stages when the dependent load is finished. As it starts to run, we'll see it waits for a scheduler to pick it up. When the scheduler on its polling action picks it up, it will switch to a running state. 


Standardize Data Warehouse Documentation

There we go. And that's running. While it's doing that, I'm going to just go back to the builder here and generate some documentation for this whole thing. I'm going to document it into a self contained HTML set. This can be viewed by anybody with a browser in the world. You don't have to have any landscape software to view it and you don't have to have any license to view it. It's just an extension to the documentation. Try traders. ODS I'm doing today? This here we are in a matter of seconds, the full documentation. And as Simon said, it's all standardized. It's all in sync with the actual metadata, it's in sync with the actual code. You can generate this documentation as frequently, as often as you like. It will always be up to date with the actual metadata. Okay, so once it's generated, you can then view it with any browser. 

All of this can be managed with a style sheet. You can brand this your own way and you can also use your own fonts and colors for headers and text. The simple one is the user documentation. This is the non techy stuff. The thing we have in here is a glossary or data dictionary explaining the fields where we've actually put the commentary in. And that's all good. On the actual demo release here we don't show much in the user documentation, but if I go to technical documentation here and show you the demo release, we can now see the load table. And the load table how? It's actually sourced, where it's sourced from and what extended properties we're using to drive it on snowflake and also which tables it feeds to so which it impacts downstream. If I go to the next table in line here, stage customers, I can see the same thing happening here. 

Stage customers comes from load customers, which came from sequel traders, which was the source we discovered in 3D twenty minutes ago, and then we finished 1520 minutes ago. In data store here, the customer, we can see this is the actual script is currently running. That's the current version of it. All of this code is version control and audit trail. We have here the columns and where it came from and the transformations it went through, the data lineage it was applied to get it to that stage and anything that feeds off it downstream. At the moment, I've not built any downstream, but the next logical thing would be to put a start schema to the right hand side of this. 

We now have a complete set of snowflake terms. We use the put copy to move that data from the onpremise SQL Server to the snowflake cloud. You can see these put copy commands being repeated for each of the tables, the load tables that have been moved across all the way down to the very end here, where you'll start to see when we do the DS tables, eight rows there, 3241 rows there, eight rows there, 8271 rows there. So back to the building here. If I was to go to my DS customers table here and display the data there's, the data now fully into our data warehouse on the ODS, and that was popular in the stage tables. That stays table in the load table. That load table is populated from the source system that I discovered back here in 3D. This source system here that I mapped it through, there's my 20 minutes data warehouse in 20 minutes from a source system built from a SQL Server source system onto a Snowflake cloud. 


ETL Tools

We add more than just the data movements. I know today's demonstration we're showing data movements, but at every step of that stage, you can add your own interpretations around the data. We're more intelligent in things like adding the start, end date, current flag and all those extra columns that are required by the model. If you're building data vault, you need things like record sources and load dates. Rather Control Column, what we call the donkey work that will be required is actually automated into the product as part of the tool that's over and above ETL. Also it provides much more complex, in some ways, transformations, because you're building your transformations on the target platform. We're not taking the data away from the platform, manipulating it, putting it back on the platform. We're keeping the data on the platform transformed in situ, and therefore the operational speed and efficiency is that much better. 

I guess the most complicated bit was speaking to the people, providing the data into us to help us, work with us, for us to be able to connect up and get the data in. There's a lot of contract negotiations that go on. Sometimes the data wasn't free, so that kind of held up a few of the pipelines and things to be plugged in. But generally no. Because once people understood that were going to make life easier, certainly from all the brokers, they're like, yeah, I don't want to be doing this. I want to be talking about insurance, I don't want to be saying a crunching numbers or using Excel. Once we automate all that and put it in place, they save time. They can spend their time doing the things that they are good at. 

Automation Tools

I think also what you find with automation tools is we're not out to replace people, we're just there to make people more productive, what they do. Existing SQL skill sets, existing data warehousing skill sets, existing understanding of Kimball, Dan Linsted and those processes, we're not replacing that. We're just making people those skills already far more productive with those skills. Rather than spend weeks building things, they can build things in days and therefore you get far more success quicker with the same people. 


Becoming a Data Driven Business

Yeah, and we had a question in just a moment ago, which is interesting actually. You seem to allude to the fact that people were quite keen to move to a new way of working once they should see it was easier. But was that initial consultation with them? People are quite resistant to change, after all, especially in traditional industries, I imagine, like insurance. Was that initial consultation with them difficult at all or did they want to listen to people technology kind of thing? 

Your approach, how you do that? Traditionally people turn up with large documents, trying to fill every minutiae detail out to put it into like a waterfall style project. And that doesn't always work. At the time that was an It approach. They were always very waterfall and very slow to deliver. Whereas went in and were more open arms, trying to help and trying to be very honest and transparent about what were there to do. And that helped. That certainly helped, but there was always that resistance. I think it ebbed away over time once the value and when you deliver to one person and they've got a good news story, they tell quite a few people. It doesn't travel as fast as bad news, but it takes a lot longer. That eventually spread its way out. When everyone adopts based on their peers, so if someone else is adopted to a new way of working, then they kind of more open to it.