Data Science

Just before we start, though, one of the things that I'm particularly interested in whenever you and I speak is your take on the data industry. In the last ten or so years, you've seen a lot of changes, especially in the positions that you've had, and I was just wondering if you give an overview from your perspective of some of the changes you've seen in the wonderful world of data.

Yeah, I think. Thank you, Donna. The top three or four things are the fact that most of the data activities and things were very much hidden in it before. That's the starting point, I think, for most people. You think about data roles, they were very much buried and now coming out into the business. I think for me that's the most interesting bit, because the real value is going to be about how you apply data science to the business problems, and that really has to reside closer to the business and less of an engineering technology project. That's kind of where it has to start, and I think that's good news. I think there's far more diversity of roles. Data science roles, data engineering roles, describing roles, artificial intelligence roles, so many different roles, and that diversity has made things much more complicated. Actually, alongside that, there are some basic skills that people have which have been around for a long time, and we shouldn't throw those out.

I'll talk about some of those later, but I think when we talk about some of the new skills, we shouldn't also forget that technology has evolved a lot in the past 1520 years, and there's a lot of things that we need to use has been around for a long time. Things like data governance, data quality, we should not neglect those. It's about bringing in the new and keeping the old and keeping it relevant.

Not too much of a challenge there, then. You mentioned just before we get into the crux of the presentation, I think what would be really good data science. You mentioned data RoPS data science, lots of terms right now. Data science, elevator pitch. How would you describe it?

Well, I think data science. Yeah, thank you for that one, Donut. Data science. It really should be science, and I think we're kind of missing some of this . Science by definition is experimental, and I'll pick up on this later. It really should be something where we think we've got some tools, we've got some skills, we've got some capabilities, and we should go away and figure out where the value can be. When you start comparing that with the traditional It project where people are effectively saying, give me the requirements, go a build, it is very different. That's one of the things I'll pick up in more detail. We have to treat a science. It is experimental. You get more value if you treat it as being experimental, as long as you focus on the business problem. If you try turning a bunch of data scientists into I now need you to go away and just be pure developers and develop to the spec, then you're missing the real value in there.

Data science, artificial intelligence, machine learning, developing models attribution custom segmentation. There's a lot of stuff in there and it is diverse and it's a confederation of activities. There's no real one thing of data science anymore. Excellent.

I've just had a glass of water bought to me. Amazing service at this establishment. Brilliant. Thank you very much. Simona. That actually leads us really nicely into the first part of your presentation. I do just want to remind you I have already started having some questions coming through. Rest assured, everyone, I'm seeing those as they come through. We are going to stop in about 15 minutes or so after this first section and answer some of those questions. Simon, let's talk about maximizing the value of data science.

Maximizing the Value of Data Science

Yeah, so I've maximized the value of data science and probably minimized the value of the slides. I think the first thing I would say is that when you start out, just to give it some context, you really need some data strategy. Because without that and if I go through, I think there are kind of a whole load of activities that you have to do as part of data science.

Data Strategy

If you don't start off with some strategy that says, how does this align to our business goals, then I think you're going to struggle from the outset. I think many firms now have one. I'm not sure many of them are good enough, but that really is the starting point. Your data strategy should be focusing on specific areas. Right now, Passcode, I think we're thinking cash management is kind of everyone's top priority. If you're not thinking about cash flow, you're not thinking about getting people off furlough and everything else.

If those things are in your data strategy, knowing how to address them and then figuring out making sure you've got the tools and everything aligned to that. Data strategy is the key one and that in itself is a whole presentation. Really it just has to say how all the data capabilities align. I'll go through some of those things now. I think the first thing is making sure all the data resources are aligned to business goals. I think we're in most organizations missing a trick, and especially those organizations that bring in a few consultants here, different departments doing different things. The way if you've got a data strategy and then anyone. I mean anyone who touches data in the organization is aligned to that. That will be your data science team. Your data engineers who will be building out some of those data sets. The data warehouses.

Data Vaults

Data vaults and everything else you're marketing people. Typically they will be doing an awful lot of data if they're not rounded up when they're building out single custom view or just using that data to drive marketing campaigns. If they're creating their own data sets and it's not aligned, you've got real problems. I think then those who are generating reports and dashboards, there could be separate bi teams across the business, in finance, in operations teams and whatever. Typically a finance team will get their ERP and they'll start building reports off it. All of these things are fighting against each other and it just drives me absolutely insane because you get so many reports and they don't align to each other. What it means is that there's competition to do the same thing. Really what I'm saying is all those things create your data community and every area, anyone who collects data.

There are plenty of people in sales or customer service centers when they're collecting their data, they need to get that right so we don't have to fix them downstream so that's your data community, all their activities need to be prioritized. It needs a chief executive to do that and he can only do that if you've got a data strategy and everything else. If data science is trying to do something on its own without all those other activities lined up, then I promise you it will not be productive. So that's the challenge. I guess the next thing is we need to so I'm sorry, behind on the slides, but ensuring the data is resource aligned, then evangelizing the value of data science. That needs someone to go out and explain what it can do and how it can do it. It really does need someone to kind of go out to the board to sometimes major shareholders so they understand because ultimately they may be investing in the business and its approach and to make sure that people understand the value.

Typically if that doesn't happen, then people don't know what data scientists do. We'll talk about that in a second. I think we shouldn't forget that it's not just communicating, it is making sure people get it and understand it. Otherwise you're sort of struggling.

Data Science Roadmap

Setting the roadmap for data science, I think is the bit behind the evangelization. That really by that I mean talking about so going back to the data strategy, how you're going to use predictive modeling, what problems have you got? It could be forecasting your sales, forecasting your cash flow, what can you use to do some predictive modeling and aligning those activities up to the business problems. As I say, different skill sets for each one of those. Customer segmentation, again, a different skill set requires understanding, meeting some of that data, understanding which types of segments, customer personas you want to look at, which again, if you're doing anything around the customer, you have to have those in place.

Doing the modeling for that and then you have to know that's going to then align to your marketing, which then means your marketing folks need to use that to do the marketing and drive activity, whether that be reducing churn or increasing revenue or driving customers to new products and services. Those things need to be done right. Forecasting.

AI and Machine Learning

I mentioned AI. There is a route for AI. There are specific things where AI can help and it could be improving your data, or it could be a variety of other things that you could be looking at depending on the nature of your organization. And the same with machine learning. I think people will hear about these things, they'll read about them in the newspaper, they'll know that other companies, especially the likes of Google, Alphabet, Apple, they're all using these things, everyone's talking about it. If you don't have a roadmap for how you can use AI, machine learning, predictive modeling in your organization, then you can do the evangelization.

No one really knows what problems you're going to be solving. If you dive straight into the problem solving without explaining how you can fix things, then that won't align to your data strategy. The next thing really is that, and I mentioned this earlier on when Donna asked the question is the work of data scientists is by definition experimental. You start off with a hypothesis, you test it. By the way, it may not be perfect first time around and you need to explain that to people because if you don't get that, then the business go, well, it's failed, let's not do it again. Actually, no, it's just the starting point. If you're building customer percentage, if you're doing forecasting, all of those things are fairly subjective. You can build those things, but then it has to be a process of continuous improvement. Just getting across to people, it's one of the things that I'd always try and do is explain.

It is different from engineering. It is experimental. After those you need to make sure that you're doing things in the right area and validate them. As you're doing that, you then need to work with other people in your data communities. One of the failings of data science is that data scientists get brought in to solve a problem and then they do all the engineering, the data quality, the data governance, everything to go with it. And therefore they get buried. Hence only a small percentage of what they do is about data science. You're not really getting value from them. As part of the community, when they've done their experimentation, you need a whole support environment for them which would involve people to test their work. Someone has developed something, you need someone to test it. You should never mark your own homework. I think we all kind of know that, although what's going on at universities at the moment, I think that's exactly what they are doing anyway.

But that's a different thing. I think we're in an interesting environment. You need it. People who can take those models and put them in a production environment. I would doubt that any data scientist wants to spend their time testing, retesting building a model straight through to a production system based upon the rigor. There it is. It's not a great thing to do, but those people who do it well within it needs to be given the opportunity to do it, lead it, build it, and make sure it happens. Because if you build great models, if they're not productionized, if they're not out there every day building you new leads for customers and whatever, you're missing a trick. So that productionization. I guess the documentation is something that from my experience of Wescape, is done quite well, because when you're building those data feeds, you get it fully documented.


So that's a big alleluia for me. Most data scientists, and I may be discriminating here, are just not good at documenting and they shouldn't be. We need people who will do the documentation, the explainers, as I would call them, those who will explain the work they've done. That's key from a data privacy GDPR perspective. If you doubt that, then please do go and read GDPR and understand that if you've got a model that involves decision making around customers in any short, in any form, you have to be able to understand those models, how they work and everything else. I would advocate that people afterwards would look at the models, explain and understand and work with data science to do it. There's a whole environment that will wrap around data science to enable data scientists to carry on doing what they do to generate the value, but they get properly supported by people who do things.

Data Governance

I would include in that, as I mentioned earlier, data governance and data quality. There are experts in the firm who know how to do that. If you don't have that, you need them because clearly that's part of that community. If a data scientist picks up issues with the data quality, which you'd expect them to do, they should be able to delegate that to the data governance team. Those teams should be supporting data science as well. I think that comes back to as a whole data community, you have a variety of people with different roles that bring that capability together to allow the data scientists to do what they're good at, to enable engineers to do what they're good at, and other people in their roles. I think if you do all of those things that I've mentioned, then you have a pretty good chance that your data scientists will be successful.

If you don't, then I think you go back to those battle days where data scientists are brought in to do some really clever things, which they're capable of doing, should be doing. They end up building bi reports because that's easy, and that's easy to ask them to do. People specify those requirements and therefore they don't stay around long, they're not excited, and they'll move from jobs, which is generally what happens, and they build their own reports and dashboards and do very little data science. From a data scientist point of view, that's not what they're there for. Really it is about bringing that whole organization together to help. So that's kind of my little rant. Donut so all done with that?

Eloquently done as always. So just reminding people. We're going to take a brief pause here and answer some questions. I was just going to summarize slightly in terms of since you've covered, from my perspective, it sounds strategy is very important, communication of that strategy, ensuring that everybody in the business understands the strategy and understands their swim lane almost. People kind of know the part that they play in delivering. That my question before I get onto the other ones, some of those teams, the amount of people that can be involved in that ecosystem, that environment, that data environment, communicating with those guys, how would you advise kind of communicating regularly, how would you manage that level of team and the communication between those different people?

Yeah, well, I think it's a good question. As a chief data officer, you are responsible for the value that data brings. You're not responsible for managing every single person. I think that means that you're managing the capability and it's your responsibility to make sure that those things are done right across the organization. It doesn't mean every single person has got data in their title reports into you because in some firms that would be almost everyone. Data driven firms, it would be different. That means you need to set up that data community. That means that you need to get people who have done a great job to present what they've done to that community and say, look, I built this great model and people can come and see it. You need to organize events to bring the business in and show them those capabilities. You may have done something for meeting ID, bringing operations and finance along to those and start running those internal workshops and sessions.

I think the likes of nationwide, I've seen they do an awful lot of those things. The folks there organize loads of events and it's a great example of how to do a data community, bringing people in, presenting, showing the tools, their capabilities and yeah, I think that's really the bottom line. Donna.

Coming through to some of the questions and I was leading the witness slightly someone has just posed a question. I'm going to do this one first, which is, what is the best route to becoming a CDO? Is it technology, data governance? I mean, it might be worthwhile. I know you have plenty of peers as well, and I have a meeting. Everybody's journey is rather different, but I'd love to get your take on it.

Well, I think you just given the answer. I think they are, and actually but there's a common thread which is to be successful as a CDO, you have to start off with the business problems. If you start off by taking a technical location to the business, that's not going to go well because a bit hit miss, say, what should we do some AI? They go, well, how will you apply it? You go, Well, I don't know, we can do this. It's a bit random. You have to be business savvy, and that means that you start off with the business case. For me, I would always start by saying, if you want to do data properly, it's only about four things. It's about your revenues, your costs, reduction of risk, and so customer satisfaction, and you can start in risk. You would talk about compliance and all kinds of things.

That's it. So that's your starting point conversation. You start saying, well, okay, in order to solve that, if you want to generate revenue. Well, what we need to do is either we could reduce our churn or we could do whatever, and you start drilling down into the specific things. You've got to be tablet to talk that language and then understand how data would solve that. If you want to be a CDO, to be able to talk that language is great. It doesn't matter whether your background is data governance, it's relevant, but you can't just talk data governance. If your background is technology and you've been building things, you'll understand some of the challenges you'll understand. I guess that's a big part of where I came from, building big data sets and getting a good kicking when those data sets were not of a good enough quality.

You learn all about data quality pretty quickly or you fail. When you then start delivering that to customers, then you understand the value where sales people are saying, we sold this product, we expect our customers to see our quality data and it's not great, then you have to take responsibility and that forces you to learn. So I think business focus isn't savvy. That's really it. It doesn't matter where you've come from.

Donna yeah, communicator, mediator, you name it, you cover them all. I've had a question from someone. We're talking about data science and enabling, allowing data scientists the space that they need in order to do their job, and you gave us some terrific examples of the types of people within the organization. I've had one person who said, what about in smaller organizations? Sorry, it's quite a long question. How should smaller organizations look at approaching this? They know that they're going to need to start addressing data science, but don't really initially know how to get on that journey, and they don't have limitless resources. I think I've paraphrased that question. I hope that the asker doesn't mind you doing that.

Yeah, I think there's a big difference between large commercial organizations, small commercial organizations, not for profit organizations, and I accept they're all slightly different. I think you can still have a data strategy even if it's two pages long, and even if you write it and end up modifying it, I would say do it. Typically in a larger organization, your data strategy might end up becoming 40, 50 pages to get into the right detail, but even two pages to say, this is what we're going to do and how we're going to do it. Understanding what the business problem is. I think even if you're a small organization, you will understand what some of those challenges are. Those should come from the CEO, the CFO and CEO, whoever it is, or the founders, whatever. It really doesn't matter. Once what their problem is, then the next bit is to say, okay, how can I use data?

Well, there are only so many tools and things you can do. I think the starting point is then to say, really, how can you deliver value quickest? People talk about quick wins, but the win is the easy bit, because you have to say, well, deliver value. Actually doing it quickly is absolutely key. If you start with the most difficult problem and it takes a long time, people will lose faith. Find something that's really quite simple, and it could be as straightforward as, and I would say doing something like forecasting for people. In the current environment, if you want to do something really clever, you can just look at cash flow forecasting to show, based upon people being furloughed or not people, the lack of business, what it would have been normally, what do we need to do? How long can we survive with our cash for the next three months, six months, nine months, twelve months is very relevant.

You can create the models to do that, and that stuff will be incredibly valuable. In the current environments, I don't think there are many firms that are not struggling with an absolutely enormous with a cash flow problem, and the same with forecasting sales and things like that. I would start with that, start building the models, generate some value pretty quickly, explain how all those things fit together. Explain that the forecast and models may not be perfect to start with, but you want them to validate, get some successes, and then you can build and gradually more capabilities. You'll get some quick wins quickly by getting into forecasting.

Yeah. I guess by kind of almost starting Gmail, building the confidence, as you said, that the role almost evolves based on the requirements. That's brilliant. Lovely. Thank you, sir. Questions come.

Well, the role evolving, I think is so it's easy to think that what big organizations do is what you should be doing, but I think it's right that's what you should aspire to. But you start off small. I would start off with a very simple strategy. They all know what you're trying to achieve on day one.

Yeah, get them in the family early and fast, just conscious about keeping us relatively to time. I just wanted to make you aware. A couple of people have asked for any use cases around, specific verticals insurance being one, but I think you probably will touch . Maybe we'll just wait until we get to the end. I just want to make sure there's nothing super urgent. Right. Everyone else, I'll come to get on with the second half of your session, Simon, and then every single minute that we have left at the end, I think will be used answering questions.

Okay. As we go through this, I think the starting point in terms of how do we maximize the time to availability data. I think this is the point because whether you're a small organization or a large organization, your challenge is going to be, okay, so what we've got a strategy, we know what we want to do. If you can't get your hands on your data quickly, you are going to be challenged. That's why this is relevant for a second part. I think there are many things that you can think about but access the data. Otherwise it's all taught, no action. You want to get the data, how do you do it? I think for me, there are a couple of things. I think the starting point is not about the access. It's about having a clear process. There are models that have been around for how you manage data science.

Managing Data Science

I mentioned earlier on, it is an experimental thing, and there is something called the Crispm lifecycle. I guess, Donna, we can probably share some of these things later, but it's been around since 1999. Anyone who thinks data science is new isn't, by the way, because some of us were playing with artificial intelligence in 1986. It's been around for a long time. The DM in the Crisp DM is all about data mining. It has a very simple model which takes you through understanding the business problem, then going through and testing hypothesis all the way through to validating the models. So have a look at that. The value of that to me is then everyone knows where you are in the data science process. If you're saying you're doing a data science project and you can't say, well, we're at the beginning or the end. You may not know like a linear project, you're 10% through, 20% through, but at least you can say, well, we're still on that first stage of we're defining the hypothesis.

Data Science Processes

Everyone goes, okay, tell me what that means and then you can talk about it. If you don't have a process, it's all data science in one big bucket and people get lost. Having a clear process really helps you getting to value quicker. The next thing is about architecture, really. I think you need a company wide architecture. I know there are different environments for data architects that they will look out for data science and they be real time processes, streaming of data, pretty much anything you can think of. If you don't have an architecture where everyone knows where they're getting their data into, what they are going to do with it, and how they're going to do it, and what tools, then you're going to struggle because then it's still vague. If it's part of your business problem and you could be talking about forecasting, if you don't know where you want to get your data from, so maybe it comes from your finance system and you want to bring it into a tool.

The Role of Data Engineers

If you don't know what that tool is, what the environment is, then and if everyone's using a different architecture across the business and they're not joined up, then everyone creates their own data sets. It is bedlam. So my plea is don't go there. Actually one of the things you want to be able to do, I know lots of data scientists like doing their own extraction or data wrangling, but really the biggest part of that data should be built out by data engineers and they will do it faster and reliable and it's productionized. There's no reason why from an experimental point of view, data science can't turn around and say, we've got our data set, we've got this quick data set and we've done something with it. The challenge is a quick value driven project is very hard to productionize. Really at that point you need to hand it over to engineers.

If you haven't involved your data engineers at the beginning, it will all go wrong. Really that's why I'm saying that architecture needs to be linked back to how are your data engineers getting that data in the first place?

Data Warehouse

It could be you're building out a warehouse. Why are you a data scientist? One of the questions I have, why are data scientists not taking data from the warehouse? Well, they should be. They should be no different to any other group where the fundamental data they're using does come from the warehouse. Because the majority of any data that is valuable to a business comes from structured data. An awful lot of. Unstructured data is valuable to look at and to add value to structured data. You need to start with the structured data, and especially if you're a small organization as that question picked out. I think that architecture needs to represent the whole company.

Customer Views

I think, and this may sound a bit obvious, most of the value comes through customers, any revenue will come through your customers. I think if you don't have a single customer view in place, everyone builds lots of workarounds. Your marketing team, your finance team wants to get a view across customers, so they manage the risk. Putting one of those in place earlier on is really important and that involves disparate data sources, bringing those together, matching and merging that data to form a single record. That is your single source of the truth. That's where you get the benefits from increased revenue, I guess, cross selling, understanding the profitability of customers, providing rewards and reducing risk. For me, single customer view and if you've got a small business, limited products, that's a great starting point in building out a data set from which you can do an awful lot very quickly.

It is no more than understanding customers across all your products, across all the channels and brands that you have. Once you've done that, you can generate an awful lot of value very quickly. You start building deeper data on customers. Once you've got a single customer view, you can go off and us all the unstructured data you like, but build it back to an individual customer ID and you can start looking at conversations, social media, everything, but linking it back to that. So that is a key starting point. If you start data science without having a single customer view, it becomes a lot more challenging later on and then you end up with a single source of everything for your customers. What I would say is once you've done that and a single customer view has been talked about for a long time, that's great, but then you need to go and do it for your suppliers, your employees and your products.

If you don't do it for your products, you don't really understand what happens to them and it could be a bunch of products in a similar category and understanding how those products are consumed, who consumes them. It's a view around that, your employees understanding whether you're getting value, whether you've got too many employees or too few employees in a certain area where they've got the right skills and how much they're paid relative to how much profit they're generating from customers, all those things. And same with suppliers. I think suppliers are a bit more mature. People clearly have a good view that they want to get more value when they're negotiating suppliers. A single customer view is only start, it has to be suppliers, employees and products afterwards. The next thing and Donna, you're going to like this one. Charlie is it needs to be built and automated.

Data Automation

Often we see these things that you need to do something, you need to do it well and then move on. I mentioned how we need to iterate. Once something's working up and running, then you need to make sure it's fully automated. As you follow behind the value of data science, getting those data sets that they work every day. It's fully automated, end to end, through to production systems, and the models have been updated overnight or in real time or whatever it is. If that automation isn't done and it requires hand cranking from data scientists who have to bring their own data sets in and then export that data out to a system afterwards, you'll do two, three, four models and then suddenly you're in maintenance mode. Anyone who's got a technology background with no, hang on a minute, we've all been there with that. That's why we end up having a proper methodology for building stuff.

We develop, we test, we hand it over, put it in production, and then it's in a support mode. The same has to happen with data science generally not done very well at the moment. Automating data science is a big plus. Most data scientists, once they get that, would absolutely love it. I think, and there are many ways of doing that, whether it be streaming that data or building a warehouse or your data mark, it does not matter how you do it and what you need. That is part of the value of, I guess of WhereScape is that? Whether you're doing that in a snowflake environment, I think was the one Charlie mentioned earlier, or you decide to change your environment, it doesn't matter because you've got a tool that automates it and puts that data wherever you need it and you can change over time.

Data Science Automation

For me, automation is absolutely critical for that. And in conclusion. I guess of all those things. I know Donny mentioned some of those things. I was kind of doing my conclusion right at the end rather than at the end of the first half. But kind of data strategy. Leveraging resources across the business because otherwise you're fighting against each other and hey. I think passcode. We will need to be a lot friendlier. Maybe we can get that one out of the way quite quickly. You just need to make sure that you get that development. I have no idea what Crown development is. That's a complete mistake. So forgive me on that. Focusing your data scientists on data science, it sounds obvious, but I think the more they're focusing on data science, the better. Okay, actually, that's crowd development, that's my typo. That was part of the title, I think it is about getting everyone to help.

If you've got people in the business who are capturing that data up front, rather than you having to fix it downstream, get them to work on it up front and understand the problem, take ownership for it. The business needs to take ownership for the data. It's not a data team, data science team, data bi team problem. And that's what I mean. It is fully the whole organization to get everyone involved, and that is tricky. Creating a process for data science so that people understand how it's happening, so it's not just some magic and it's kind of brought back into line with other things and then automate what you can automate and yeah, apologize for that type of that was interesting. Crown developing was something completely different, I'm sure, but anyway, it definitely got my brain wearing. I wasn't sure where you were going to go with that, but after this DM, well, we'll make sure we'll get the Chris stuff out as well. I'm just going to go through a couple of these questions with you, Simon. I just need to flick around various windows here. Sorry, I cannot talk and read. I am one of this people. I need to read carefully, so I make sure I get these properly. I was supposed to be reading while were going through. I was too busy listening to you. So we've got a question. How to increase business appetite for data science as it's very volatile. How would you increase business appetite for data science as it's very volatile? I'm assuming possibly from a corporate political standpoint, it's possibly quite volatile. That something you can resonate with you, Simon?

Yeah, I think people blow hot and cold on data science. They start off by saying everyone should do it. It's all in the press, very exciting. The reality seems to be quite difficult and different. There are a few things that are done that are quite cool. I think doing something cool actually kind of matters because people are expecting something other than you just building a simple system and saying somewhere behind the scenes that you've got a single customer view and it's generating more customers. You've got to do something that's quite smart. I think that's the Evangelization bit for me. I think the way of doing that is to make sure that you take one problem. And I mentioned this before. I think this is the right approach. One problem. Make it work. Go round and show everyone and say. Look. We've done it. Go away and do something a bit more clever once you've done it.