Data Warehouse Lifecycles

Automate Data Warehouse Lifecycle

Hello, and welcome today's webinar. We'll get started in just a few moments. Hello. Welcome today's webinar, hosted by OneBridge and WhereScape titled automating and accelerating your data warehouse life cycle. Why OneBridge chose squarescape. Before we begin, I will let you know how you can participate in today's event. All attendees have joined in listen only mode. If you have questions, please enter them into the chat window. This is a 60 minutes Webinar, which includes a presentation on how One bridges evaluation of WhereScape resulted in a new partnership, followed by a product demonstration of WhereScape red. Our speakers for today are steven Estep, senior data architect for OneBridge. Mike Schmidt, account executive for Midmarket and Enterprise at WhereScape, and Brian Thomas, solutions architect for WhereScape. We will have a Q and A, which will follow the product demonstration. We encourage you to make use of the chat window to post your questions, and we'll address as many questions as possible. 

This webinar is being recorded and you'll receive an email copy. Thank you again for attending today's Webinar. I'm now delighted to introduce Steven Estepp from OneBridge. 

WhereScape 

Welcome, everyone. My name is Stephen Estepp, and I'm a senior data architect for OneBridge. I built my first database in 1983 and my first data warehouse in 2006. My entire career has been focused on building databases, transferring data, developing complex transformations, and building data warehouses. I recently joined One Bridge, and I'm working on a project to build a client brand new data warehouse. Like every other data warehouse, I'm challenged with ingesting multiple data sources across a large number of legal entities. As part of my effort, I'm evaluating different tools to help accelerate development and design. In an effort to sort through the noise, one Bridge has a history of evaluating many tools to find which best fits a client needs. One Bridge has recently partnered with We're Escape, and I jumped at the opportunity to evaluate WhereScape for my project. Historically, my typical data warehouse involves writing database code to extract, load and transform data. 

 

Rarely do I have the opportunity to use ETL tools. The database code I write has the same structure for every table I recreate and load. The primary differences are the names of the tables and columns and in the data type. I usually create a template and then update the code for the current table and column names. I make mistakes. Most of my errors occur because of typos of the names and data types. Also typical when I deliver the solution, the user has last minuteman changes. This involves finding all the affected code segments and making each change from the source down to the target warehouse. As a developer, documentation is usually an afterthought. To be honest, I hate documentation. In my evaluation aware state, they presented the following benefits I can use drag and drop to select source and target tables to work with WhereScape reads the source metadata to build ETL code. 

 

ETL 

I don't need to use the default data type size as recommended for an ETL tool and then update on my sources data type. WhereScape can build my ETL code from default templates. These templates are used to create target tables and to load these target tablet. Wheresapp offers the opportunity to create a custom template which will blend with the source metadata to create my own custom code workscape can accelerate development through agile processes. We can add a source column and easily recreate the code to rebuild and reload target tablet. Wordscape says we can build the target tables in query eight times faster if our client changes their target database platform. Where state can change the code templates to leverage the database language of the new database platform and then WhereScape builds the documentation listing the source tables and columns, the target tables and columns and the path of data tape. 

 

WhereScape RED

Let's discuss what I found when one bridge evaluated WhereScape red as you can see from the screen wheres gate offers the ability to select the source and target tables and drag and drop them between frame sections of the user interface. You don't connect objects to build the flow, rather the flow is understood to be from load to stage to dimensions and fast. With each table and column you can open configuration screens to fine tune your design. WhereScape is great at reading the source objects and data types and ensuring the target tables use the same metadata. We found the development environment highly configurable. You can define rules for setting target table names like a pending the name of the source defaulting the schema. You can also use similar rules for setting the names of the stored procedures used to insert and update targeting. Some of the development frames can have the color of the objects defaulted. 

 

There appear to be more settings we have yet to discover. Adding a source column really is as simple as selecting the column and clicking menu options. These rebuild code to change your target table definitions and your code to load data. Adding a new source doesn't require much more work than used in the original design. In fact, waresafe, updates, scheduling seamlessly you don't have to study dependencies to add the new source into the right spots of the schedule. Typical data warehouse design is to use kill and fill ETL processes until you determine performance enhancements are needed to load your data. Wheresape offered a modification to the code template to change the data load easily from kill and fill to increment. You can use where state documentation to graphically see and track back the source or sources of a data warehouse dimension table. Note how the target table on the far right shows two source tables on the far left and the steps needed to get the data to the target. 

Tracking Source Tables

You can also track a source table. Ford can you back up there? Once you can also track a source tablet Ford to find all the target tables which use the source. Note the single source on the far left which is used to set the values of the fact and dimensions on the far right. These graphics are automatically built and are invaluable and studying changes to your source for finding the sources of your day warehouse. Okay. As we evaluated warscape, we received excellent support to our analysis. Mike routinely set up meetings to review our technical questions and to be sure all our questions were answered when we had additional questions around practice, best practices and complex data warehouse solutions. Mike scheduled additional meetings with technical experts. I found whereascape to not be just an ETL tool, I found it to be a tool to help me accelerate the building of my ETL code. 

Data Warehouse Tables

This is done by combining templates default or my own with the source metadata to build code for meeting and loading data warehouse tables. This accelerates completing a data warehouse and can also allow me to add a source column to the client's target data warehouse right in front of the event. Now I'll hand over the presentation to Mike. 

Data Warehouse Automation

Thank you Steven. My name is Michael Schmidts. My background is in technology software, more specifically database DevOps and now data. Data warehouse automation. Have worked with and I'm working with some of the largest organizations in the US and Canada to help accelerate their development processes and ultimately make their teams more efficient through automation and culture. 

WhereScape Data Automation Tools

A bit about WhereScape started as a small consulting shop when data warehousing really started exploding about 20 years ago. At the time they would build out data warehouses for customers. At the time it was very tedious and it still is today without automation. So they started developing tools to automate. They were building a data warehouse for a large multinational bank. The project was done very quickly. The bank asked us how were able to complete the project so quickly. They shouldn't have been able to very quick the size of the team, how big the project was. 

It just didn't seem doable. They had mentioned the code looked very standardized and everything looked really good. What's your secret sauce? They asked. WhereScape told them at the time that they had built out some software to help automate the repetitive tasks they were doing for every customer. The bank said fantastic, how much to buy that software? That got us thinking maybe we should be in the software business. The rest really is history. We have more than 1000 customers around the globe that represent companies of all sizes, small, mid market and enterprise organizations. We have more than 30 customer stories told on wherescape.com that detail project successes including time, cost savings and process improvement. Warescape is 20 years old as a company. We've been evangelizing about the benefits of data automation for quite some time. Now it's crossing the chasm and becoming mainstream as companies see the incredible results that digital native companies achieve with automation and realizing that there is a better way to work than hand coding green screens and working weekends. 

Data Warehouse Lifecycle

I'll give you a second to take a look at this slide. This should hopefully be a bit familiar. This is what we found to be the conventional way of building out a data warehouse, or the data warehouse lifecycle, if you will. Typically, you'll gather requirements from the business, build out a framework, and then maybe do some type of profiling and modeling. Some companies are doing the model first approach. Some are just jumping straight into the architecture. And, you have your ETL mapping and developers using tools like SSIs for example, and pushing things into deployment. Hopefully there's some time to do some documentation and data lineage in the end. You work your way all the way up through this staircase, if you will, and then you present back to the business. Most of the time the business says, fantastic, now I see what you've built, I have a better idea of what I actually need. 

Could you go back and make these changes? Most of the time they do expect those changes can be done in a matter of minutes or hours. Typically those projects they're asking for are a lot longer than that. You start all the way at the bottom, working your way back up again. Some of the issues that we saw with this conventional way are just the slow speed of delivery. The businesses and executives want to be data driven and make decisions quickly. They need the data like yesterday. The It team says, we're strapped for resources. We don't have enough time and people to get this done at the speed you'd like. We want to make sure we do this right. The second issue that we see is just messy change management, right as you're working your way through the staircase, there's a lot of opportunity when you're changing through people or tools, just some things to fall through the cracks or maybe some custom scripting needed to be done. 

That it's a lot of additional effort, especially when there's not metadata communication between tools or strong documentation as well between people. Lastly, when we talk to analysts that are following this traditional methodology, the analysts or end users are saying, hey, we spend upwards of 30 or 40% of our day just trying to map out where did this data come from that I'm looking at, what's happened to it along the way since it came from the source, what kind of transformations were applied, things like that. So this was a staircase. We do like to refer to Wescape as an escalator, so you're still taking the same number of steps that you need to, but now you're able to do it a whole lot faster. Our secret sauce is that were built on a robust metadata repository, and then we're using Best Practice templates to write target native SQL. 

SQL 

If you're on SQL Server, we're writing T SQL. If you're on Snowflake, we're writing Snow sequel. Our tool translates that syntax for you as well. If you're on SQL Server today and you decide Snowflake is nice, we'd like to move in that direction. Obviously there's differences in the way sequel is written in Snowflake. Our tool will help translate that for you and does a lot of that work for you? What we see is developers are telling us that they're about five to ten times more efficient using Red. We find that because we are metadata driven, it's really easy to iterate and you can stay agile because now it's very quick. All of these changes are captured in our metadata rollback versions. As Stephen said, in most cases you can make these changes live in front of your end users instead of starting an entirely new project. 

Automated Documentation

It's going to take weeks or months. Lastly, because our analysts report that the documentation is automatically generated through the metadata, they're able to save a lot of time because it's all presented to them as to where the data came from, what kind of transformations have been applied. No longer is your team having to worry about going back and inputting additional documentation at the end. Brian will show you how you do that as you build during the demo. Last slide before I hand it over to Brian Thomas today, things that people like about our tool, things that pop up are we're not black boxed. Everything within our tool is very transparent. You can edit the code, you can edit all of the templates, you can see exactly what's happening under the hood. 

Data Security and Compliance

The second thing is security. We don't do ETL, we do ELT. We push everything up to your target database to do those transformations. 

What that means on our end is we're not using any proprietary engine or ETL code or anything like that. Everything is being captured in our metadata and being pushed to SQL Server snowflake, for example, for that execution. All the data lives within your tablet database WhereScape does not have any of the data itself. All of our security has also passed through from your target database. Whatever kind of authentication you have set up in SQL Server, Snowflake or whatever database you're using passes in through WhereScape. From a security standpoint, there's no fear around. What does War Escape do with my data? Are you storing it? Et Peter, that sums up my portion of today's demo. 

I'm going to pass it over to Brian Thomas, solution architect for Wearscape, for a demo of our flagship automation tool, WhereScape Red. 

WhereScape RED

Well, thank you Mike. As he said, my name is Brian Thomas, I'm a solutions architect for Where Escape, but I've been an Oracle DBA for over 20 years and I built data warehouses as far back as the 90s. So I've done it before. I understand what Stephen is saying about documentation and typos and all that. I've done all that. Why don't I just get you started with WhereScape Red? I'm going to be showing you our flagship product, WhereScape Red? This is a document is it just a diagram that shows how the data would flow in your environment. Shows some of the options that you might have, what your architecture could look like. On the left we have our data sources right here. 

 

Data Sources

Now we can extract data from pretty much any kind of source. Traditional DBMS platform XML JSON file. Text file. Really? Anything? If you can get an ODBC connection to it, we can use that as a source. 

BI Tools

On the far right you have your bi tools, your data visualization stuff, your tableau click, Power, bi, that stuff. We don't do anything in that area. What we do is we do the heavy lifting between those two. We do from the red to the red, we automate that process. We start with extraction of your source. We land the data immediately in your target platform. If your tablet is Oracle or SQL Server or Snowflake, we land that data there first into your platform. We make a one to one copy of your table and we call of your tables. And we call these the load tables. This is your load layer. The next thing we do is we take that data that is landed in your target platform and we start applying some rules to it. The rules could be as simple as just joining two tables or maybe very complex, like building a budgeting algorithm that has to go through multiple serial calculations. 

The idea is to prepare the data using what we call our stage tablet in this AI transcription 

Data Lake, Data Vault, Data Store

layer. Once we prepare the data, then we load it into the foundation layer. Now the foundation layer is showing the possible options you have. It could be a data store, a data lake, third normal form, or even a data vault. What we're showing here are just examples of what you could do. You don't have to build all of these objects. 

Star Schema

Once you get your data in your foundation layer, then you can start building your star schema on top of that. Here you can also apply additional business transformation rules and you can even build a semantic layer with views and aggregate tables or exports, whatever you want. And that's where we stop. The business users will use their dashboards to connect to the data warehouse and pull their data from that. 

Metadata

Now, while the developers are building all of these pieces of the data warehouse process, we're capturing all of this in our metadata. As Michael said, this is our secret sauce. This is where a lot of the magic happens in the metadata. This is how we're able to generate the code, generate dependency diagrams, do impact analysis, and we also have a self documenting feature. 

Documentation

Like Stephen said, documentation is the worst. We have the ability that at the end of each development cycle, developer can push a button and generate that documentation, technical documentation that shows pretty much how the data warehouse is built. So we also provide a scheduler. It's just a basic orchestration service that manages the flows, jobs, dependencies, email alerts, things like that. From a high level perspective, we have four components. We have the developer interface gets installed on a developer machine, which is what I'm going to be showing you for the rest of this. 

The second part is the metadata. It resides in a database, a centralized database. The third piece is the scheduler. Now this is a service that sits in a centralized location, a BM of some sort. You can have the metadata and the scheduler on the same box. So it's very lightweight, though. It's not actually doing the transformation, it just orchestrates the flow. The fourth piece, of course, is your target platform. Whatever it is, we don't care, as long as that target can handle the load that you put on it. This is the developer interface. On the left hand side is the object browser. This shows what is in your data warehouse. It's really a browser for the metadata. On the far right is what we call our physical browser. We have both a data warehouse that you can look at the tables that are in your data warehouse, or we have the browser for your source system. 

Data tables

You can drill down and look at columns, you can look at row counts, even look at the data if you need to. In the middle is our working pane. So this is where we'll build stuff. Right here, drag and drop, as Stephen said. At the bottom here is our results pane. We are very transparent and everything, all the sequel statements will show up down there. Indexes created, everything will show up there in the results pane of what you're doing. Why don't I get started and build something? Let's move the data over to your data warehouse. We're going to take this first step to go from your source to your target platform by creating the load tables. You can easily create a load table. I'll just take this customer, since it's open, let me drag it across. Like Stephen said, I want to type as little as possible because I'm the guy that I spend more time debugging my typos than writing code, I think, most of the time. 

Naming Standards

So we have wizards for everything. I wish this tool was around in the 90s because I would have definitely used it. But we have naming standards. This is load underscore is our default. This is completely customizable to fit your corporate standards. The second thing you'll notice is the target. So we handle what targets go to. In this case, I have things going to a stage schema. You can change this. It's completely configurable by you. You can even override when you're creating something. If I decided, oh, I want to put this in the cloud, I can move this to my stage cloud in my database. 

Data Type Mapping

The last thing is the data type mappings. I believe this was hit on as well as one of our strengths. In this example today, I'm just doing SQL Server to SQL Server on my data warehouse, I mean, on my laptop. 

 

Both the data warehouse and the source are there. If it's not, it will do the data type mappings for you automatically. So right here we have different options. You have you can do SQL is your data warehouse from DB two. We will do the mapping for you. If you're doing redshift from SQL Server, it'll do the data type mappings and snowflake from Teradata. All of these are available. This is very easy to create a load table with just a few clicks. Now again, this has actually what we did down here. It shows meeting the table, executing it, bringing any comments along from the source system, any extended properties, truncating the table that kill and refresh this that he was talking about. Now this could change based on what you want to do. It will change if you're going to something like snowflake. From a development standpoint, it's all the same drag, click a few buttons and add that so that's the load customers has now been created. 

Dimensions

Why don't I take that and use that as a basis to create a dimension. I'm just going to skip from the load over to the star schema for this point, just to keep you on track of what we're doing. Now I'm going to take a dimension. Now I'm going to browse my data warehouse. So it's easy enough to browse. I have a filter because I know tables add up in a data warehouse quickly. So having a filter really helps. I can take this load customers, create a dimension from it. Again, you see the naming standards dim underscore completely customizable from you. Those dimensions belong in the data warehouse. So there's that. We can create a type one, type two, type three, type four. I'm going to go ahead and create a slowly changing dimension because, well, truthfully, we struggled with this when I was doing my first data warehouse. 

Data Warehouse Change Tracking

How do we track changes in a data warehouse? It's nice to know today, not only do we have standards on how to track changes, but we can automate the location of these dimensions. The modifications later are so simple as I'll show you. This is where you add to the documentation, you could take a few moments just to put a sentence or two as opposed to at the end of that staircase, spending weeks creating documentation. As you're creating each object, you just put a sentence or two in. You can put a description, you can put a grain, you can put the purpose. This is my favorite right here. Examples. Put some examples in and this will cut down on your phone calls from those business users later when they're like now, how do I query this dimension? You put examples in, cut your phone calls in half. 

Now I have to build my update procedure. It's going to ask for of information like a good developer. I've already talked to the business users and the business key column they told me was Customer ID. We look at the change detection. Again, I've talked to my business users, they say they care about the address fields, company name, contact name. That's what they really care about. Now we have created this type two dimension in just a few clicks. If we display the data, you can see we've added an unknown record for joins and we've added the type two fields, the start date, end date, current date and version. All were added automatically for me, as well as indexes, unique constraints and again, bringing along those comments. Now, as Mike said, we have templates. As Mike and Steven said, we have templates. Let's look at what was created via the templates. 

Templates

So this is the code. We created 369 lines of code with just a couple of clicks. Truthfully, I never created code that looked this nice. When I was building a data warehouse. I was always running around with my hair on fire, trying to meet deadlines, trying to get it going. Like I said, I typos trying to make sure it worked right. This has documentation at the top, inline comments, indented, color coordinated. Everything works right the first time. Down here is the where clause that talks about our type two changes including company name through country. All this was created in just a few clicks. Now what happens in any data warehouse, whether it's an hour later, a month later, or six years later, the business users want to come back and want to make a change. That's the whole reason for data warehouses is it grows along with your business and keeps relevant. 

Now they've come back and said oh, we forgot, we really want to track phone number as well. If you were the lucky one and you actually built this dimension, you might know where in the code you have to go to. Are you really going to remember which indexes and constraints and stuff? All of those things you have to go look up, determine everything. This could take hours for just a simple change like adding phone number. What I'm going to do is I'm just going to rebuild this. We're going to rebuild the update procedure. We're going to come to change detection and I'm going to add phone number. They'll probably ask for fax number next week. So I think I'll add that, too. Now we have made the change and WhereScape was intelligent enough to know, let me drop constraints, let me drop indexes, recreate them. 

After the procedure was done and everything is done for me, in fact, let's go look at that code again so we can view the update procedure. Now it's in here and maybe you go, oh, what changed? The nice thing is we actually have the ability because we are versioning all of your code and everything. You have the ability to grab the original version of the code. I can look at this and I want to compare it to what's in this code viewer. Now I can see that there were some lines added and deleted. I can scroll down and see here's phone number and fax number were added. That's the easiest thing. I can actually scroll down and see if there's anything else. Oh yeah, here's some more changes that were made. That's the nice thing. Not only is the code there, but it makes modifications and you quickly find out what it did, what it changed. 

These are all done with templates that you can edit and do yourself. Make modifications if it doesn't exactly match what you want. It's that easy as opposed to spending hours and hours just figuring out where the code is. 

Merging Data Tables

Now let me show you a simple transformation just to show you some additional capabilities of WhereScape. We're going to create a stage table. In this, we skip this step with stage tables. Let me go ahead and create a stage table and do a simple transformation for you. We'll go ahead and join two tables. I'm going to take orders and orders detail and I'm going to join those two tables. I guess I'll add a calculated column. That's how I'll show the transformation. What we're going to do is we're going to start with load orders and I'm going to use that again, naming standards, location, we're all done for me, I don't have to remember that. 

 

I'm just going to click okay. Because I'm going to add some more fields. I'm going to add product ID and unit price and quantity. Now I've added a few additional fields. At this point WhereScape knows that we're joining two tables, all we've done is update the metadata. But it does know that. We can go in and do an impact analysis and look at the trackback diagram and see that we're actually joining from the order source to the orders load and orders detail into the stage orders so the metadata knows what's going on. So. Now what we need to do is create this. We're going to create this table and then now I've got to give it to join. I've got to build that update procedure. I'm going to make this set based. You can use parameters and this is where maybe you need assistance or something like that going, oh, do we want to do an incremental load or a full load? 

All those things are all configurable in here. If you are not sure, that's where one bridge can come in and assist you on how to set things up. Now we're going to quickly join two tables. Here's more intelligence that is built into WhereScape. There's three little dots there. I don't even have to think about it. It says, oh, I think we can join on this particular field order ID. We can look at the code. If you need to make modifications, put a where clause into the join or something like that. We can easily do that. Now we've built this, I can execute it and display the data. Now we see that we have our fields, our join of those two tables. I could go in and show you the code that was built behind it, just like we did on the customer dimension. What I'm going to do is instead I'm going to go ahead and add a column because this sometimes is difficult in some tools. 

How do you add a column? But it's so easy in this. What I'm going to do is I'm going to add order value and the description. This is going to be price times quantity. We're going to make this money type numeric and additive. And then I'm going to add AI. AI transcription. You can go in and paste in whatever you want in here. Complex transformations. I'm the kind of guy, like I said, I'd like to type as little as possible. I'm going to go in and I'm going to look at the functions I have available. I can do things like uppercase things, trim, whatever functions. These are SQL Server functions. If you're using the teaser oracle or Teradata, you can easily use those built in default functions for this. Let me show you the exciting thing. So I'm going to say price. Here's my typing space star space, price times quantity. 

That's it. Now I know I haven't made a mistake in three characters. Now we've updated the procedure and now I just need to create recreate the table. What I'm going to do is I'm going to regenerate the update procedure and then I'm going to execute it. That's what we're going to do. We're going to display the data. At this point, now we have our order value here. Order value and all the fields that we need are all done. That was a very simple change to add another column. You can see that WhereScape is dealing with all those little mundane tasks associated with this change. You as a developer would have to do these otherwise. If you aggregate all of these little productivity boosts across an entire project, you can see where we get that eight times gain that Steven was talking about. Somewhere between five and ten times is typical of what you see the leverage from using our tool. 

So now I have the data brian. 

 

Surrogate Keys

Okay, well, that's a great question. I'm going to show with surrogate keys in just a minute that hopefully will answer your questions. We do have the ability to do merge and all those things that you wanted, it's all built into the tool. And the nice thing is we're extensible. If there's something that it doesn't do out of the box, we can help you with building the templates to do that. Or if not us, one bridge, they can assist, especially for something like that, they can come in and assist because I'll get to the end where we do scheduling where you can add in additional scripts if they don't fit everything, but still get automated across the entire process. The next step building a Star schema is to load the fact table. What we have to do is there are some other dimensions that are here. 

We've got product ID, so we probably want to build a product dimension. There's order date, required date, ship date. So we probably need some date dimensions. Now we do come with a date dimension and I have already pre built some dimensional views on top of that date dimension for order date, require date, ship date, and I built the products dimension as well so that we can get through this quicker. The key to all of this is to having a design of some kind and that's where one bridge can come in as well and help you design out your star schema so that you really are meeting the needs of the users. Why don't we go ahead and do the lookups for those surrogate keys? Right, so we need to do that. We recommend doing that in the stage table as well. I'm going to show you how easy that is. 

Display the columns. What I'm going to do is grab my dimensions again, filters are your friend so I can come over and instead of grabbing the entire dimension, it knows I only want the surrogate key to this stage tablet. So it added the customer. I'm going to add the product surrogate key. I'll add order date as well as ship date. Now I've got the four surrogate keys added. We'll need to recreate this table one more time. This time what I'm going to do is do nothing because I've got to give it that lookup logic. So I'm going to do that. Now what we need to do is I need to build the update code. I'm going to rebuild this set based we haven't changed parameters. We haven't changed the join, really. Now it comes up and says, all right, how do I do these lookups? 

So it attempts to match up. This is where some of the intelligence comes in and attempts to match up between the stage and the dimension. It says customer ID with customer ID. So it's matching up with that. Oh, I forgot that was type two dimension. I probably would have messed up writing that code the first time, but WhereScape knows, hey, that's a type two dimension, you need to give me a date. I go and business had told me order date, so I can do that. Product ID looks fine, but if you want to use a different field, I can easily swap it out. Order date, ship date, all of that. Now I just need to execute this. Now we have our data into our stage table, including our surrogate keys. Now I'm going to use this stage table as a basis to build my fact table. 

Now we're going to get into the nitty gritty here of the fact table. Now I can switch the filter back to stage tablet. What I'm going to do is I'm going to take customers and I'm going to use that. Now it says, oh, the naming standards, fact orders, and it's in a data warehouse because it's a fact table. I like that. I'm not going to add any additional things right now because I need to build out my fact table. I'm going to grab those lookups, those surrogate keys. Grab order date, ship date. And now let's add some measures. I'm going to add unit price as a measure and quantity as a measure, and order value. That calculated field, I'll add that as a measure as well. Right now the fact table is in red only we can go back to that track back diagram. I can see now that we go from source to load tablet to stage to my fact tablet. 

Now we need to just execute this, recreate it one more time. I'm going to create recreate my table and I'm going to build this. Now, when it comes to building the fact table, there are some additional things that you need to know ahead of time. And again, this is what helps. If you understand how this works with a fact table, great. If you don't, you can bring in a company like one bridge that can assist with designing your data warehouse. Told customer and order date is what makes that unique. Now I can execute it and display the data. Now I have a very simple fact table. Let's look at that trackback diagram one more time. Impact trackback diagram. In fact, I'm going to go ahead and add the joins in. So this is what we did. We created one dimension, type two dimension. We went through a couple of iterations of that, dim customers, changing that type two logic. 

We created the indexes, we version the code. We joined two tables and created a calculated column. We did all the lookups and we loaded the fact table with all the surrogate keys. Everything is documented and versioned and the indexes were created. I mean, you didn't see me specify any indexes. We have indexes that were all created behind the scenes. Basically this data warehouse I built in 20 minutes was pretty close to what I spent six months using micro strategies, even an ETL tool to build. This is where of course, WhereScape only handles the automation and building. You have to have your design, you have to have your process in place. How often do you want to update it, all those kind of questions and stuff. Installing, setting this up, wondering where do we put our metadata, all those kind of questions. That's where one bridge can come in and really help you in that process. 

Let me show you because we built it. Now let's automate it. What we're going to do is I'm going to create a job for this. You have the ability again, a bunch of fields that you have to fill out. If you don't understand, get some help. But it's fairly simple. Do you want to run it daily, weekly, monthly? Do you want to run it at a certain time? Do you want it multi threaded? Do you want to have a dependent job? That has to happen first. You can set up all of these things. How long do you want to retain the logs? Do you want to send out success emails, failure emails? All of this stuff is available on the screen. We show the actual jobs and I look at this and go, well, didn't quite get it right. The load tables, I'd like to load the load tables all at the same time. 

I'm going to join those together into a group. Now you can see the ten s and then the 1020s, and then the 1030 and 1040 will run. Now if I come over here to our built in scheduler, we can look at the job and I can easily start the job. Now it's running, we can look at it. While it is loading the database, I'm going to go ahead and show you how we can create that documentation. Because that's really to me, that was one of the wow factors, one of the reasons why I got involved with WhereScape to begin with. I'm going to go in here, and if you'll notice, there was different projects listed over here in our builder tab, the sales subject area. You can take that and build your documentation just on your Sprint. If this was named Sprint one, sprint two. 

 

Sprint Three. You can do that. You can actually deploy just the Sprint. But let me show you the documentation. So we just come in here. Like I said, at the end of each Sprint, you can just push a button and create that documentation. It creates HTML files. You can replace the Cascading style sheets with your logos, that thing. So that's really it. You just push a button and it creates this documentation. Now we can look at the documentation in a browser, or I can look at it right here in the application. What we've done is create two different types of documentation. It's taken all those little steps along the way that we took where were clicking and building things and update procedures and taking all that storing into metadata. At the end, it takes all that metadata and creates this wonderful documentation for us. I didn't type very much at all in this whole process, and yet we have all of these descriptions of what was happening. 

We can look at the fact tablet, I can look at this one. This fact table has description, purpose, grain concept. This table right here, we didn't enter anything into it. It's got whether it's a measure, an attribute, the comments that were pulled from the source system. It's got a very nice little star schema here. This is my favorite, the examples. These are the examples on how to run. This will save you phone calls from users that interrupt your process when you're onto the next project, onto the next Sprint. And then we also build a glossary. We take all of the columns and all the tables and turn this into a glossary for your data governance. Maybe one of the business users goes, hey, birthday, I saw that somewhere, let me go ahead. Maybe we can use that to send out a coupon for our customers birthday to be able to buy our product at a discount. 

They go into the glossary and they go, well, birthday, that's for employees, not customers. That would either trigger a modification or maybe they'll go about it some other way. So this is what I have. Now, do we have any other questions? 

Integrating with Data Catalogs

Yes, I do have a few. Okay, the first one is any integration with data catalogs such as Calibra. 

Now we don't have direct integration with this, but we have metadata interface you can take and like export from Calibra and be able to import it into the metadata that turns into this documentation. There might be ways to do it other than that, but that's what I've seen so far. Again, that's a great question for one bridge to be able to help you with that integration, I'm sure. 

And then one more. We have a situation where our system has a start date and a discharge date. It is slowly changing dimension table. 

That's easy enough to do. We can handle that. We're getting into more specific questions than I probably have time to answer right now, but it's definitely something in the slowly changing dimension that you can do. I just did a very simplistic slowly changing dimension. Why don't I go ahead and show you how to deploy this to test and production. So that's one of the next things. Now that you've built it in your development platform, you need to deploy it to production. All these changes I've been making on the fly are happening in development in my development desktop database. If we're going to deploy this to production, we can easily either pick all of the objects or just this. Sprint and I can build an application for this. This application is going to be a list of files, the metadata, it's going to take it. You can add different things you need to in there, tell it which objects you want. 

Maybe one of the objects needed to be changed from all objects. We can go ahead and add that as well over there and that gets added to the list. If you need to delete any objects, you can do that. What it  does is it packages this up in some files in this location. This is configurable. This is just the default that I have set for right now. You can go into our admin tool, execute that, it takes this package and we'll take all of this metadata that you see over here and we'll import it into your test or your development set of metadata. You can execute the job to go ahead and populate that at your time. Okay, so I don't have anything more.