Data Infrastructure in the Cloud

Moving to the Cloud

All right, let's go ahead and get started for today's webinar. My name is Armand Petrosian and I'll be today's moderator and speaker.

First, I'd love to welcome Claudia and laugh. Claudia is a thought leader, visionary, and practitioner. She's an internationally recognized expert on analytics, business intelligence and architectures. To support these initiatives, Dr. Emoff has co authored five books on these subjects and writes articles totaling more than 150 for technical and business magazines. Claudia is the president of Intelligence Solutions, Incorporated and the founder of the Boulder Bi Brain Trust, a consortium of internationally recognized independent analysts and experts. You can follow them on Twitter at hashtag bbbt or become a subscriber at www. Dot bbbt us. We certainly want to thank you, Claudia, for being here and I will now pass it off to you so we can begin the presentation.

All right, thank you much. I appreciate it. Let me go ahead and get rid of that because you don't need that there. All right, thanks everybody. I really appreciate, first of all, many things to wear escape for giving me the ability to speak to all of you today. I'm really quite excited about this topic. It's going to be a combination you can see from my agenda. I'm going to first talk about the challenges and the benefits from cloud computing. I'm going to get into some of the myths. There are many myths that are out there about cloud computer and I really want to cover some of those so that we understand exactly what's going on there. Then, of course, because it is weirdscape, we have to talk about automation. Certainly automation is something that is desperately needed as we go to the cloud. Finally, just a summary slide, a couple of slides from me.

Cloud Computing

Alright, so let's get started with the there we go. Okay, let me get started with my slide. The year was 2010. I love this quote. It just shows, things are not exactly what they appear to be. There's no way that company exists in a year. That was Tom SEBAL, CRM Systems. And he was speaking about salesforcecom. Okay, that was seven years ago. Salesforce today is the world's leading enterprise CRM cloud ecosystem. It is a $10 billion plus company. Siebel, well, I'm sorry, but Siebel was bought by Oracle in 2010 and that pretty much did it. Certainly still a player in the field, but by all means, Salesforce has taken over, hasn't it? So, so much for that. Let's talk about the cloud. The cloud computing. It's certainly here to stay, but it didn't have a terrific start. In fact, it was pretty rocky. People were very nervous about letting go of their data.

Moving to the Cloud

They were worried about its security in this far flung space, wherever it was. Back in those days, the costs were unpredictable. People were trying to figure out how to price the cloud and so forth. It became kind of a problem today, though, boy oh boy, the cloud is no longer a novelty, is it? It is viable, to say the least. It is a massive business and it is a very trusted solution. The good news is that cloud computing has become a critical part of how companies deal with their data. All of their data analytics, their solutions and so forth. They are rapidly moving to the cloud. The other thing that's kind of interesting is that, boy, everybody jumped on the bandwagon, didn't they? There are many cloud solutions today. Google, Amazon, IBM, you name it. Even Oracle. Yes indeed, the best ones were developed for cloud deployment.

Specifically. They weren't an afterthought, they were made for this environment. The solutions though, and this is something I'm going to focus on throughout my presentation, they were not just simply a forklift, or if they were not very effective. Not simply a forklift of the data and the technology to a new platform. There certainly are still some worries, no doubt about it. Worries about what data to deploy to the cloud, what types of data should go into the cloud, and what must I keep on premises. Certainly GDPR and other security concerns play a big role in that area. Each organization has to weigh its specific needs. What are the strengths and what are the challenges of each cloud vendor and what kind of data is appropriate for the cloud deployment. However, and again something I'm going to stress throughout, the basic tenants for analytics must also be understood and supported.

They didn't go away just because the cloud deployment became a possibility. So more on that. This was a survey from Forbes magazine. The link is down at the bottom. Analytic architecture. They're in the midst of dramatic changes. That was a quote from this article. 78% wow. Four out of five survey respondents are planning to increase the use of the cloud for Bi and data management in the next twelve months. This was a 2017 article. This is very recent. Almost half prefer public cloud platforms, which I found kind of interesting for Bi, analytics and data management deployments. Power users are the ones screaming for it. They moving to the cloud. Analytics solutions usage itself, they drive more complex use cases and so on. Advanced reporting, advanced dashboard creation, data scientists, for example, really heavily using the cloud. The main reasons for adopting cloud bi and analytics differ by size of the company.

Cost seems to be the most important, especially to the mid sized companies. Over half of the respondents said cost or the reduction in cost is why we're moving to the cloud. It's an interesting article. Take a look at it.

Cloud Computing Challenges

Let's talk about some of the challenges.

Data Governance

Certainly it governance and control is still something that is quite worrisome. Without a doubt to it is ultimately responsible still for the security and the privacy policies and making sure that they are upheld. Therefore It departments are still kind of leery of meeting ID of their data. It's not just about the loss of job, but also the control over the data. These rank pretty high in It areas.

Service Level Agreements

Some of the things to look for make sure that these assets are implemented and used according to agreed upon policies and procedures. Service level agreements become very important now between the customer and the cloud provider.

They're critical to make sure that the acceptable standards, policies and procedures are being used. You also have to make sure that these assets are properly controlled and maintained within the cloud infrastructure itself. Finally determine whether the assets are supporting the company's strategies and business goals. The customer's, It personnel themselves may want some kind of insight into how the data is secured, stored, used and so forth. Make sure that you understand what your roles are and responsibilities when moving to the cloud.

Data Workflows

The next challenge are the changes to the It workflows. Certainly this is a much more complex environment. If half of your data is on premises and some of it is in the cloud, then obviously compliance and other things, security as well become much more complicated. Making sure you understand where your data is and so forth. How does the company determine whether It's compliance?

Regulatory Compliance

It is in compliance with the regulatory requirements when some of the data is in the cloud and some is on premises. To add to the complexity, a lot of companies are using more than one cloud provider. It could be a mix of private cloud and public cloud. That just adds a little more complexity to this whole environment here and the coordination with on premises data assets. Let's face it, there are a lot of analysts, especially those data scientists, who want to combine data. Some of the data may be in a data warehouse on premises and some of it may be in a data warehouse in the cloud. How do they get access to these two very different deployments and bring It together to do their analysis? So, always a problem for the data scientists managing the costs. As I mentioned at first, costs were kind of all over the place.

Cloud Computing Costs

Cloud providers may have different cost plans for their offerings. For example, some may charge by data volume, others by the number of users using the data and others by the cluster size itself. We need to make sure that we understand the costing mechanisms there and make sure that if we have a low number of users but a lot of data, maybe we want to go with the charge me by the number of users. If we have a small amount of users or small amount of data and massive numbers of users, well, maybe we're going to go the other way around. So on. Demand and scalable nature of cloud computing can be difficult. When you scale up, what's it going to cost? When you scale down, do you get the benefits of reducing the size and so forth? Final warning here, watch out for some hidden costs.

Cloud Performance

Things like customizations or database changes and so forth. The next challenge of course is performance. There are two parts to the performance. There's your internet connectivity up to the cloud and oh dear God, what happens if the provider goes down? When the provider goes down, you go down and all you can do is wait. Of course your own internet bandwidth has a significant impact on performance as well. So think through those things too.

Benefits of Cloud Computing

Lower Operating Costs

The first one of course comes to mind, it comes to everyone's mind and that's the lowered operating costs. It is a difference between capital expenses and operating expenses. Basically what you're doing is renting the infrastructure rather than building and maintaining your own version. The system and equipment costs belong to the cloud provider, not to you. The cost of upgrades, new hardware, software, all that kind of stuff.

Your contract needs to cover all the common additional charges like the upgrades, expanded storage, customization, new users, the personnel and the energy costs also belong to the provider, not to you. You're meeting the environment at a much cheaper price per month than you would if you had to build it upfront yourselves.


Another benefit, the construction and maintenance. Oh boy, it's off your backs. Isn't that nice? That's a great benefit. You don't have to worry about the versions, the upgrades and so forth. Bringing in new storage, that's somebody else's problem. You and your company can focus on the usage of the data, not on managing the infrastructure to store it. Very different environment, tremendous benefit.

Ease of use

The next one ease and agility. People can get a cloud implementation with something as simple as the swipe of their credit card. It's easy to get started. It can take minutes, not months because the infrastructure is already set up.

All you do is add your data in a sane and rational fashion. All the complexity of configuring and tuning the software, doing all the providing of the assets, all of that is taken care of by someone else. It's been configured and tuned for specific purposes. It's not your problem to ensure that.


The next one is the innovation that's going on in the cloud. Boy oh boy. Innovation and performance with new cloud technologies, it's just exploding. It really is. Cloud computing provides businesses with the latest innovations, the latest technologies. A lot of very innovative companies like Snowflake, one of the companies that werescape has partnered with Snowflake is a great example of this. It was born and bred for the cloud and it is a very innovative new technology specifically for analytics.


The next one is that scalability, the ability to scale up and down. Very important to remember the down part based on current needs.

To me, it's one of the most appealing attributes of cloud computing. I don't have to pay for something that I'm not using. I don't have to buy extra computer capacity just in case I may need it later. I don't have to plan well in advance for scaling up. Of course, if I have to scale up for a particularly complex set of analytics, when those are finished, or analysis, when those are finished, let me scale back down again. Again, scale up, scale down on demand, on the fly and hopefully without any disruption.

Data Analytics

Finally, the right cloud provider gives you the ability to handle the diversity of data for all analytics. The cloud data warehouse should let us bring in, yes, the traditional structured data from our operational systems and so forth, but also that wonderful source of data, that variably structured stuff that's usually external to our organizations, the Internet of Things, sensor data, social media, all of that weird data that's out there.

I want to be able to deal with that. I want to be able to analyze it. Again, my data warehouse in the cloud, I hope it supports both fixed schemas for the better operational data, the more structured data, and dynamic schemes so that I can do data science, I can do productive analytics. I can do whatever I want. Okay, so that's the first part. And do submit your questions. By the way, I do hope you have questions that shows that you're paying attention. Please do put in any question that you have. If I say something you didn't understand or even if you didn't agree with it, by all means, go ahead and put your question in.

Cloud Computing Myths

Let's talk about dispelling the myths of moving to the cloud, though this is an important part. So pay attention to this, please. First myth, I have heard this.

Believe me, I've heard it. Just throw the data into the cloud and start analyzing it. Oh my God. Really? You think it's magically going to be understood and usable? That is just totally bonkers. No, of course not. You don't make a data dump or a data swap of your cloud implementation. That's simply a waste of money and time and effort. An analytics environment is a planned environment. It's not a chaotic set of data. To be usable by everybody, we have to have the architecture and the data schemes. They have to be understood. Not just some chaotic dump of data, the manipulation of the data, what we did to it to get it into the cloud, what were the integration processes, what data quality processes, renaming and so forth. These have to be documented and understood. The very definitions of the data have to be understood.

Of course, the lineage has to be made available so that I, as analyst, trust the data and realize that I'm using the right set of data for this purpose. It is made for my purposes. Exactly.

Number two, just forklift your data warehouse and pop it into the cloud. You don't need to redesign anything. Oh, please. Again, really? This is your chance to blow the dust off. Many data warehouses are years old and they've accumulated barnacles over the years. They have things that we thought we needed. But you know what? We really don't. Let's make sure we understand what's going to go on here. For example, inefficient processes, wasted space from unused assets, like reports that are no longer being used, visualizations that are no longer necessary. These take up space, they take time to create. That means a lot of the workspace is being used for non useful things, even excess workspace for users who aren't actually using the environment.

They cost you money. Automating a broken process, please. It does no one any good. Take the time, reassess what the real requirements are. Streamline this environment. Make it efficient and effective. Make sure you get rid of the things that aren't being used anymore. Make sure that the workflows still work or change them to be more efficient and effective. And then there's the third myth. Just by changing to the cloud, we will be more productive. Well, ultimately the answer is yes, you probably will be. You have to realize that things change when you go to the cloud. These platform gains don't equate to productivity gains. If you continue to develop and operate using the same outdated types of means and methods. Processing deploying to the cloud will most likely change your processes and your methodology for your analytics environment. Learning these new processes and methodologies can be frustrating and they will take some time.

Take that time to retool learn the new methodologies and so forth and move on. And then you become much more productive. Okay, let me start winding down so that we can get to the Q and A part. And again, I expect questions. Please do submit your questions.

Benefits of Automation

Let me talk about why I think automation is needed. Absolutely critical in this environment. First off, hybrid environments are complex. They are by nature complex. They have to have consistency, let me put it that way. Consistency of design, development and control. In other words, if I have to combine data from my on premises data assets like my data warehouse, or my data lake or whatever you have on premises data march with data that's in the cloud, I need consistency between those two environments. If I don't have that, then I have a chaotic environment. That means that all of the projects, and let's face it, the projects for analytic environment turn into a program.


The analytics environments consist of multiple, hopefully connected and coordinated projects that build upon the previous projects, designs and knowledge. Well, if we don't have those designs and knowledge available. We got a problem right off the bat. So, automation supports the program mentality best. It documents the standards, the configurations, the ETL practices, the nomenclature, everything that you're using so that you can reuse it over and over again as you go from project to project. The second one is the support for the migration to the cloud itself. That may involve a massive amount of data. If it does, automation is your friend here because it will ensure that your migration mechanisms preserve all of those structures and all of the integrity of the data so that you don't lose any of it as you move to the cloud. Again, automation, tremendous support mechanism for that requirement. Support for ever changing requirements.

The one thing we all know, if we've ever done anything with analytics environment, we know that the analytics environment is going to change. One thing that is guaranteed for implementers is that the environment will change. And that's a good thing. That is a terrific thing. It means that people are using it. It's a sign of a healthy analytics environment when change comes in. Change requires Agility and highly productive developers. You have to be ready for that. That's where automation makes these implementers so much more effective. So much more efficient. It ensures that Agility for rapid prototyping and rapid deployment.


And then the next one documentation. Oh boy. Oh boy. I know. Documentation. Nobody likes it. It's always the last thing you think about and it's probably the most critical thing you can be thinking about. But nobody wants to do it. You have to have the documentation because this is a program environment.

Documentation becomes critical to the overall environment's, maintenance and sustainability. Developers, let's face it, developers come and go in projects, data modelers come and go, data architects. Why they did what they did has to remain. And that's the documentation. That's also the beauty of automation.

Something like WhereScape the documentation is automatically created and maintained. It is not something that you even have to think about. What you put into the technology stays in the technology painlessly. It certainly leads to a much better environment, one that is far more sustainable and maintainable over the long haul, which is what our program is all about.


All right, let's talk about two more reasons why I think automation is a good thing. Typically, organizations that adopt a cloud first strategy are also business cultures that are at the front edge, right? They demand much faster ROI return on investment. They want Agility.

They want it up and running quickly. They swiped their credit card. They expect it to begin working as quickly as that swipe of the credit card. That cultural shift demands retooling and accelerated development. It means agility, doesn't it? Agility, high demand for high quality, of course consistency, reliability. Also the Agility that comes from using something like an automation tool. That means that these demands get met when they should be. It translates into agility and also innovation.

Reduced costs

Finally there's that cost consideration as well. Again, one more time to be more effective and efficient development. That means reduced costs, increased ROI, everybody's happy. Automation obviously makes everyone more effective and more efficient and that reduces the overall costs. It means faster deployments, faster prototypes, get something up and running quickly, make sure it's what you want and be able to of course, change it as needed. Hopefully that's given you some idea of why I think automation is probably a mandatory part of your toolkit.


All right, that's it for me. I'm going to wrap it up with a summary now. Please again submit your questions because I think we've got a couple of really good people from where escape on the line and they're going to be here to answer questions about what they do. If you have questions about automation, by all means go ahead and submit them. If you have questions about what I've been talking about, problems, benefits, moving to the cloud, anything like that, please do submit your questions. This is pretty much your last chance here. All right, so let's summarize what I talked about. Analytic environments. They are much more complicated than they were in the past. We have multiple deployment methods. We can deploy on premises. Of course we still can. But we have multiple deployments in clouds. We can go private cloud, public cloud, multiple public or private clouds.

That causes problems. It causes a hybrid environment that is not going to be going away. We will always have something on premises and some things in the cloud. We also have multiple analytic requirements. We now have to support traditional production analytics, things like key performance indicators, traditional analytics that run out of our data warehouses. We also have these new and different analytic workflows for data science, experimental things, exploratory, types of queries. Of course we have the multiple, and I say unusual. Some people say it's just weird data sources coming in. You'll notice I haven't used the term big data yet. I think big data is just data. But we do have weird data sources. We have social media, we have internet of things, sensor data and so forth. Do take that into account when you talk about your analytic environment. Unfortunately though, the demand for cost reductions continues.

Do more with less, build more with less, and that's where the cloud infrastructure comes into being. We can get something up and running very quickly for much less of a cost. It is, after all, a rental fee, not the full build out fee. Of course, data warehouse automation with its ability to make us more efficient, more effective, faster, more agile, that certainly does help things as well. Deployment to the cloud though, does not negate core data warehouse principles and design concepts. You don't just throw stuff in the cloud and stand back. The workflows have to be effective and efficient. They have to be things that make sense. We are still doing extract, transform, and load or data prep, and we have to have that documented. We have to understand what that means. The schemes have to be understood not just by the implementers, but also the people using the data for their day to day analytics, sustainability and maintainability.

This is a long haul. It's a program. It's a series of coordinated projects over a long time period. That means that the environment has to be sustainable. We have to be able to maintain it. As we change the environment, it will change. I guarantee it. And that's where documentation becomes king. Documentation about the data lineage, documentation about where it exists, about who's using it and so forth. The documentation has to be easy or people won't do it. I know it. I've been implementing data warehouses for years and the last thing I wanted to do was document it. Give me a tool that does the documentation for me and I'm a happy girl. Finally, my advice to you there is no silver bullet replacing sound design and deployment practices. The old garbage in, garbage out analogy still holds. If you put garbage into your cloud implementation, you're going to get garbage out.

With that, I'm going to turn it back over to Armand and let's see if we got some questions from folks.

Data Warehouse Automation

Yeah, I think to some of Claudia's earlier points around documentation for data warehousing and the importance of metadata and documentation when you use data. Data warehouse automation build a data warehouse. That approach lends itself fully to almost dividing. If you need to divide your data warehouse across different cloud computing implementations, then you can use a metadata centric approach with the help of data warehouse automation to divide up the data warehouse in different places, but to see a single version of the truth in terms of analyzing where your data is coming from and how everything fits together in terms of data linear in the cloud.

Yeah, but. I don't know that I have much to add to that other than hopefully the data warehouse on premises was also created with whereascapes automation technology, because then it comes so much simpler. You've got the documentation of both environments, the cloud and the onpremises and oh boy, does that make it easier, right, Jason?

Yeah, that is right. We have a number of customers now who in fact have a single data warehouse spread across multiple on premise and cloud implementations, including using two different cloud databases and having data on premise and other databases as well. Everything actually operates out of a single landscape environment with a single set of metadata and a single scheduler. So, yeah, that's a good point. Having data on premises as well is really no different. If you've used a data warehouse automation approach like We Escape with strong metadata and documentation.

Azure Cloud and WhereScape

Phenomenal. Another question coming in. You mentioned the right cloud technology is important to the success of moving data to the cloud. This one's for you, Jason. Based on your experience, can you comment on Azure and how well where Escape plays with it?

Yes, I can. Again, we have customers using Azure for their data warehouse in the cloud, both the SQL product, which is SQL Server in the cloud, and Azure DW, which is a larger upscaled version of SQL Server specifically designed for building data warehouses. We have customers using one, using the other, and actually using both of those in the same data warehouse implementation. There are some technology differences between the two versions of Azure that we can utilize, and sometimes customers find they actually need to use both cloud platforms to deliver this data warehouse. Again, it comes back to strong metadata and strong documentation and all of the data lineage stuff that you get. It allows customers to deploy pieces of the data warehouse into different parts of Azure or just into one Azure platform if that's what they want to do, and operate it as a single data warehouse.

Data Security

Great. Another one coming in for you, Jason. Ask him about how doescape enhance data security and how does it handle changes in the data source.

It's a good one. I'm going to answer the second part of it first. When I think about the first one, how do we handle changes in data sources? Well, that's actually really easy. Again, it actually comes back to automated generation of documentation and data lineage capabilities. We are able to take a look at any data source you've got now and work out exactly in the data warehouse where you'll need to make changes to the data warehouse based on a data source being changed out. We can go and discover a new source and at the same time and automate bringing in that new source and replacing the existing source you've got built into your data warehouse design from that data. It is actually a much simpler thing to do. If you've already built the data warehouse using Westgate bread, and you need to change the data source than almost any other approach, you can come up with data security.

We Escape has a lot of really good built in, best practice approaches to managing data. We tend to automate database functionality around security, so if the database has strong encryption, then we provide the functionality and surface the database's functionality around data encryption. We don't actually have an engine, so we're not adding to the problem by actually providing a different thing you have to worry about in terms of the data being stored somewhere separate during processing. Because we're doing everything actually in the data warehouse database in terms of data processing, we're able to fully leverage and use the database of security strength, if that makes sense.

Yeah, that definitely does. Let's see another question coming in here. I believe this one is for you, Claudia. Redesign is not seen as moving forward in our organization. How do I prove its value?

What a great question. And yes, I've heard that before, too. All I'm doing is redesigning. I'm not adding new functionality. First of all, you are adding new functionality because as you redesign, you're going to pick up new needs, new requirements, new things. I think in terms of real hard dollars, by doing a redesign, I mentioned it earlier, you get rid of a lot of unused, ineffective stuff that just grew over the years with your data warehouse. That's probably your best way of justifying a redesign. If we do this redesign, or if you do the redesign, the thing that you must do is make sure you document everything that you got rid of or made more efficient, because somebody's going to ask you for that. And that translates into real dollars. If you don't need the extra space, then you don't need as much storage capacity in the cloud that reduces the monthly or yearly payments that you have to make and so forth.

I would say that redesign is not a bad thing. It is something that makes you more efficient, more effective. Keep the dollars on that, it gets rid of unused stuff, keep the dollars on that, and so forth. You also pick up new or different requirements that you can easily do with what you've already got implemented. So, again, that's my comments on that. Jason, as always, I invite you to comment if you have something to say about this as well.

I do think I got a slight extension to it, and that is an old belief of mine that a data warehouse that's static as a data warehouse that's dead and not being used. You can't expect to just keep extending an existing data warehouse forever without occasionally going back and doing a redesign of modules or the whole data warehouse. Certainly moving into the cloud is a great opportunity to actually do that when you replatforming data, actually have a good look at the design that you've got and make some decisions about whether it's right or not. There are bound to be subject areas or parts of the data warehouse that are no longer being used. The business has changed. The requirements for the business have changed. There are going to be pieces that you can turn off if you do a redesign and understanding what you have now more than you probably did when you originally built the warehouse, you're probably going to do some things differently as well.

It just completely makes sense to me to take the opportunity of moving to the cloud to do some redesign work if it makes sense to do it.

Data Vault 2.0

Right on. Yeah. We have one time for one last question it's asking about. Does warscape have built in support for data of all 2.0 methodology and is Worstcase a certified vendor now?

Jason yeah, we are. We're working closely with Dan Linsted and we have really good support for Data Vault two point over. We released a new set of wizards for enhancing that support earlier this year and we've had some fantastic new customer uptake on that particular functionality.