How to define business requirements for a successful cloud data & analytics project
Data Platform Virtual Summit 2021
Many data projects fail to deliver the impact they should for a simple reason – they focus on the data.
In this session, James explains a different way of thinking that will set up your data & analytics projects for success.
Using an iterative, action-oriented, insight discovery process, he will demonstrate tools and techniques that will help you to identify, define and prioritize requirements in your own projects so that they deliver maximum value.
He'll also explore the synergy with modern cloud analytics platforms like Azure Synapse, explaining how the process and the architecture actively support each other for fast, impactful delivery.
Transcript
Hi, my name is James Broome and I'm the Director of Engineering at a UK based technology consultancy called endjin. I spent the last 20 years delivering software solutions in data insights and providing technology strategy and engineering support to teams and customers in the UK across the Middle East and North America.
Today, I am going to help you ensure that your data projects are successful at the time, money and energy. That you or your organization's of investing in strategic data initiatives are well-spent and that deliver real business value. I'm going to describe a different way of thinking a shift in mindset and how to approach data projects that puts the consumer and the outputs front and center of the process. None of what I'm going to describe as complicated, but putting it into practice can be hard and it might go against a lot of natural inclinations as to how you typically do. I'm going to explain why some of those common and long established methodologies actually setting them up for failure, but how this simple change in approach, can we reverse that and ensure that whatever you deliver will be valuable. I hope that you find it useful. And I look forward to any questions I'll a session has been presented. I'll be available to chat live in the Q and A. I'm going to start with the story one, the time guessing is familiar to any of you that have worked on data projects, but any length of time, Your business has identified.
They could and should be using data to make informed decisions about internal processes, about your customers, about business strategy. They want to become more data-driven and sat about embarking on creating an enterprise data warehouse or a centralized data platform that will act as a single source of truth for all your business intelligence needs. And as part of the BI or data team you've been tasked with designing and implementing this new database. And you kick off the project by trying to find out what kinds of data are you going to need to get hold of and how it's going to be used. But of course, the business users of the system can't tell you exactly what they want. So they talk about self-service BI and being able to slice and dice the metrics by and various different dimensions. So you start designing a data model, a centralized view of the organization, containing all the facts and dimensions of the business might care about. It takes a long time. There's just lots of time. Lots of questions, lots of unknowns, and lots of compromises needed to when consolidating data from different sources into one logical model. And of course, different business users from different teams have different views on how things should be modelled and what things should be called. So development of the model is slow. The enterprise data warehouse technology that you're using that naturally lends itself to facts and dimensions tables is expensive to run. So you have to make some tough decisions and what granularity of data you can store and how long it can hold on to it. And it's hard to optimize a model for self service because you still don't know what the exact types of queries are that the users are going to want to perform.
But after some months, maybe a year, you get the first version of the data warehouse out into the business, but guess what? They don't want to use it because it doesn't give them the information they need and the way they need it. But now they've got something to try. They're able to tell you why it's wrong and how it should be changed. But now it's also really slow to make changes as any updates to the model needs to be considered in the light of the whole thing, wanting to requests, they're going to impact another and vice versa. So the model becomes more and more of a series of compromises, a long way from the perfect view of the organization, the limitations and trade offs that have been baked in as you go.
I mean, it's hard to explain the rationale and thinking behind some parts of the model, meaning you need to write and maintain extensive documentation and become a gatekeeper for future changes, which slows down innovation with bureaus. But eventually the model, the tools are used to query it. And the reports that the business has to be generated to start to gain adoption across the organization. Now this should be considered success, but in fact, by the time you get to this point, you have a data warehouse that is slow and costly to develop expensive, to run and maintain brutal and hard to evolve, and a compromised view of the organization sound familiar. So what you. The business users of the system, can't tell you exactly what they want. This is the core thing. This is the starting point that we accept all too often, the least the compromised output that we described and the tools and techniques around data modelling that you're all familiar with in some ways, or by it indirectly hide this problem from view because by not addressing that, we're effectively saying that's okay. Because we can design the perfect model that will allow you to query anything you want in any way that you want. Except of course, that's not the case. You can't do that. And no amount of modelling iterate you towards a perfect result by definition, it will always be a compromise. So we need to think differently about how and help the business users describe what they need so that we don't have to build a compromise.
And we need to frame that need in a way that relates back to the objectives of the budget. So then when we help them, we're adding real value in the rest of this talk. I'll help you with that, uh, help you to ask the right questions of the business. I'll help you to understand why this approach works. And I'll explain how underlying architectural and design choices can fully support this process so that your projects are set up for success. And the story I just told the business identify do, they could ensure to be using data, to make informed decisions. They wanted to become more data-driven, but driven towards them. Decisions need to be made in context, is the business aspiring to grow or introduce efficiency and cut costs. Maybe it's about growth.
There's this through organic growth or acquisition. If it's about organic growth, is this through new customers or improving customer satisfaction. So you can help sell new products and services to existing customers. Data. How data can help you with all of this, it can help you make the right decisions, but you need the context in order to make them. So the first mindset shift is this, make sure you understand the objective of the business before you start. Why are they investing in this project platform or initiative? What are they hoping to achieve and what is the plan strategy to achieve it? Of course, the objectives may change at different organizational levels, departments or businesses. They might have their own specific objectives either in line with, or in addition to overarching strategies and goals. And that's absolutely fine, but make sure that whatever level you're focusing on, everyone understands these goals.
Hey, the thing people don't care about the data they care about what the data is telling them, the outputs on the data, they care about the insights or knowledge or. And they care about those things because they help them in making decisions or taking actions, especially when those decisions, actions are tied back to a bigger business goal. So the second mindset shift is this, instead of asking people what data they need, ask them what they are trying to achieve specifically, what can they do within their role in order to meet the business school? Because fundamentally anyone's job at any level can be boiled down to a service of decisions to be made or actions to be taken. So we can find out what decisions and actions are, and we can work backwards on that. Understand the insights needed from the data. Let's see that with an example, let's look a telecommunications provider that has a problem with customer satisfaction. They might've identified at the customer service call center is a key business unit that can influence customer success.
So the call center managers are a key stakeholder in your new data warehouse. How can you ensure that your data platform delivers value to them? That it provides the right insights that will actually help them ask them what they can do that will contribute to that goal? How can a call center manager improve customer satisfaction? They can keep the call queue length as short as possible. So the customers don't get fed up. How can they keep the core QX as short as possible? They can increase the team size for shift. If the phone queues are going to be busier than previously expected, this is a specific action that can be taken that will have a positive impact on a business call. If you cast your mind back to the original story, we started by looking at the data, the data we had or the data we thought we need and built a model around that would hopefully provide the lighting. It was a data-centric bottom up approach to building our data platform and that caused problems. We've now flipped everything around. We haven't even talked about data yet. We focused on actions and decisions that the business can take to achieve the goals.
So now that we have an action, we need to define the insights that will help people know whether to, or when to take them. But good way to think about this is to think about the questions that you might ask in the case of the call center manager. This might be how busy are we now? When are we likely to be busy? Or how busy are you likely to be tomorrow? What would the optimal team size be? The questions, help us understand what evidence we need in order to provide the insight that could help the call center manager with decisions around staffing. The next one. What evidence can be used to answer these questions. That's an obvious ones. Like the length of the call queue, the number of issues where it resolving per hour. We could also look at things like seasonality patterns. Are there any external factors at play or any known or plan the maintenance events that might impact the service causing people to call up? And what about more qualitative things? Customer sentiment. Are they angry or are they happy when they finally get to, to speak through, to speak to them on the phones? This is all evidence that we could use to help answer the questions, which in turn helps us to decide whether or when to take the action.
We've moved away from the data and started talking about actionable insights. If our data platform can deliver this, we already know it's going to be valuable. You find out. We've identified the audience we've captured the specific actions and we've captured the evidence that could be used to support the decision-making, but an actionable insight doesn't stop there. There's a couple more things that we need to know. What kind of notification do we need does this seems like need to be delivered in real time, daily, monthly. And if they watch channel people always default to some kind of dashboard or report, but it's not the only option. What about an SMS or email notification?
Or an integration into another application. What is the most effective notification mechanism for the audience? And finally, what feedback loop do we need to make sure that the action, how do they affect? We wanted, how can we tell when we're moving into my direction towards our goals? And if we repeat this process for the different types of audience, for the different actions and the different types of evidence, we've started to build out a backlog of actionable insight. And that's the next mindset shift you can ensure to have a delivery backlog for your data platform. That's broken down into a series of discreet, actionable insights. You can now report progress and prioritize accordingly. And that's important because not all insights equal. It might be that we don't have the data to provide evidence.
It might be that we have the data, but it's not the right granularity level, or we can't get hold of it. Fast enough to support the notification schedule that we. But go back to our call center managers questions. How busy are we now? How busy are we likely to be tomorrow? What would the optimal team size be? What's interesting here is that each of these questions, but different it, all of them would be useful. And if we take a little deeper, we can see that each question builds on the next to add a higher level insight. The first question is asking us about current state it's different. What's already happened.
The second question, is that something about the future state it's predictive, what do we think is going to happen? And the third question is asking us for an answer it's prescriptive. What should we do about it? So you can also get the same insights in different ways and in different levels of sophistication. So you need to prioritize. And there's two axes that are useful here, the business impact and the complexity today. Being able to answer the question, what would the optimal team size be? What clearly have a huge business impact. But in order to do that, you may need to create insights that don't currently exist. We may use capture a lot more data. There may be sophisticated modelling needed. There may be an element of experimentation to see if this is even feasible, that we can predict that with good enough confidence it to be useful. So it's probably going to be very complicated and you might want to throw all your resources at this. It may be significant enough to justify the time and money to figure it out. But you need to make that prioritization decision in context with everything. And that's the next mindset shift by us within the business, what they need, you haven't constrained yourself by what's currently possible because what's currently possible.
Might not be what they need at all, but now you can prioritize accordingly. And obviously the best starting point is here in the top, right? High-impact low effort insights. That's where you want to study delivery. As you're now guaranteeing that what you do first will add value straight away. Which leads to adoption and further investment. And you just keep going through this process, discover the insights needed, prioritize, deliver. So how do you deliver it? Where do you start when you have actionable insight number one, and you've identified the evidence to answer the first question through the right channel with the right feedback loops.
In order to know if we should increase the number of staff on the shift, we said, we needed to know how busy we were. And one of the pieces of evidence we identified earlier that would tell us how busy we were was the number of issues that we were resolving. So number of issues resolved in the last hour is the output that we're aiming for. In order to answer this specific question, the most optimal structure of data to serve output would be a count of issues by time. This allows us to easily and efficiently query the number of issues resolved in the last hour. This means we need to get hold of the issue logs and potentially pivot aggregate, or transform them to be in a time series.
And we know that the issue logs can be extracted from our tickets. And what we've defined is a pipeline. The only difference is we've worked backwards from the output to get to the data, but that's an important difference. As we know that by focusing on delivering this pipeline, we have something valuable. So that's what we do. We don't try and solve all the problems at once. We tackle them one at a time, always delivering incremental pieces of value to the base. And I'm guessing that you've come across pipelines before they're central to a lot of cloud analytics platforms. For example, as your data factory or synapse pipelines pipelines enable agile iterative delivery, thin optimized, horizontal slices through your data platform, not providing specific pieces of value pipelines also allow for polyglot processing.
So you can pick the best choice in the ingestion or transformation and modelling for the specifics of the insight that the pipeline is yet. You're not trying to design for everything upfront, so you don't need to bake in compromise. And because they're aligned to specific pieces of business value, you can also align cost controls and security boundaries to the business units that care about them and the pipeline parcels, scale pipeline, pattern scales really well. So you talk a more and more insights and more, more pipelines. You'll find repeatability and reusability along the way. So another mindset shift, the technology can and should support the process. If you adopt an actual insights based approach to data projects, a pipeline based architecture is a really effective way of delivering them.
For example, the output of one pipeline might be the starting point of it. So rather than a single centralized model, the pipeline based approach can easily lead to different data representations from the same source to serve different purposes. And we call these data projections or data. API is reusable data structures that can lay on top of each other to provide targeted use cases with traceability and lineage all the way from the source, our issues by time data, projection that we use to calculate the number of issues resolved. And last hour could also be used for other purposes. We could use it to forecast to what this might look like for the next hour day. And we could combine that forecast projection with other data sources, like the plan dimension and schedule to what if, what if scenarios I've got various stuffing patterns, which might result in another data protection.
It's normally around this point that someone asks, if I'm telling you to forget about star schemers and facts and dimension tables altogether, and I'll be clear that I'm definitely not saying that what I am saying is a single centralized, upfront star schema design is in our experience going to cause you all the problems that I outlined it. But building a star schema as an output, that's entirely tied to one or more of the actionable insights is an absolutely valid approach. One of the best examples of this is within a power BI tabular model. If a power BI report is the correct output, deliver the actionable insight in the right way then.
Great. But remember the model in the power BI report is there to serve a purpose, to deliver a specific set of well-defined insights to a specific audience. You could also think of this as a semantic model rather than a centralized economy. So the last mindset shift is this, there are many options when it comes to cloud storage and you don't always need a star schema in a data warehouse to efficiently store your data insights because you work backwards on the desired outputs. You know exactly what the optimal data structures should be to support them. So you can pick the optimal storage choice for each stage of the pipeline, according to how it's going to be used. I want to show you now what all of this looks like in practice. And to be honest, there's no complex process or technique to follow.
We're just talking about asking questions, but asking different questions as to what you might normally be doing. We typically use structured workshops to capture the actual insights, and there's two ways you could do this. The first way is to cast your net wide and include representatives from different parts of the business in the same workshop. The advantage of this is you'll get contrasting sometimes conflicting points of view about what's needed. And that's a good thing as it promotes discussion and prioritize. However, the downside is it's harder to narrow in and go deeper to a specific area if it's not relevant to the majority of the people in the workshop. So the alternative is to have workshops targeted at specific business units, roles, or whatever makes sense for your organization. You're going to narrow and possibly buy a set of opinions, but can dig deeper into the specifics of the insights needed. Neither is right or wrong, and you should find out what works best. I've outlined a fairly simple process so far to discovering the insights we need to deliver. And we take that into our workshops, just structure the conversation. We typically start by setting the business goals. These might be predefined if we're bringing in overarching or strategic objectives, or we might need to understand what they are for that specific group of people, where they want to capture all the people who can help influence these goals.
This might be based around personas or roles within the team, like the call center manager. Well, there might be specific individuals. There might also be people external to your organization, external stakeholders, customers, partners, et cetera, for each person or group of people. We then want to capture all the actions they can perform. What can I actually do that will help contribute towards the goal? In the case of the call center manager, this was managing the number of staff. This can sometimes be hard as you're trying to be very specific about what people can actually do in their jobs and distilling a job role down into a series of actions and decisions can feel a bit impersonal, but remember, the purpose of this is to help people to be more effective. At this point, you should have a list of people and actions and it might be quite large. So this is a good time to consolidate and prioritize. We tend to focus in on what we collectively think are the most influential actions that we've captured to take into the next step. You want to dig a bit deeper and want to ensure that we're spending our time on the most valuable things. First, each of those actions, we now want to capture the questions that we might want to ask. Like how busy are we now? And for each of those questions, what evidence could we use to answer them? For example, the number of issues resolved in a pass out. These are insights that are actionable and they're tied back to a business.
We probably ended up with another long list. So it's time to consolidate and prioritize again, this time we can use the, actually, as I mentioned previously, the impact of the business and the complexity deliver starting with the highest impact, lowest effort insights. We can form a backlog for our data platform more than likely you'll need to repeat this process again and again, this is not a once and done activity. You may need to circle back to deeper into specific areas. You may need to combine outputs and multiple workshops and consolidate and prioritize at a high level. There's no right or wrong. You can start delivery as soon as you get your first actual insight, or you can wait until you've got a backlog. That's going to keep you busy for the next 12 months, either way you should regularly review your backlog to ensure that the prioritization is still correct. And the business need. Is it all sounds easy, right? Well, yes. So I said, in my introduction, the concept is simple. We're just flipping everything on its head. I'm looking top top-down instead of bottom-up, but putting it into practice can be hard for a number of reasons. One, the business objectives that you need might not be clear, or it might not be communicated transparently across the organization to you're asking people what they actually do in their jobs. They might not be able to, or want to tell. And three, your natural inclination is to already start with the data. And that's where you're more so after you consider one and two, but remember if you follow this process, you're guaranteed to start delivering something of value, right? From the start. And most of the technology absolutely supports the process.
It makes it easier to deliver insights, as you've probably realized by now, none of this is really about the technology. It's all about people. And as you can probably guess that's the last mindset. So that brings us to the end of this session. I have you found it useful and you've got some key takeaways for how to approach your own data projects and engine. We help businesses go through this process a lot and whilst it might be hard at first, we've seen how effective it can be. Time and time again, using the approach that I described. We've got modern data platforms from the ground up delivering productionized insights that have been rolled out across an organization in just a few weeks.
So we know it works. If you want to hear more about how we could help you then feel free to Drop us a Line. Thank you very much for listening. If you still have questions or want to connect, I'm still available in the live chat Q and a, or you can find me afterwards in the virtual networking lounge.
Thank you.