Data-driven approaches: definition and communication

This article proposed a high-level framework to define a data-driven approach.

Solving a problem requiring data, what approach would you take? Do you really need Artificial Intelligence (AI)?

You think that your problem requires data-driven decision making. Does that mean that you need to use any AI? What are the other options? Is an AI approach even feasible?

Your choice will depend on the problem you are trying to solve and on the people you work with.

There are different actors around you, including stakeholders, sponsoring the initiative and expecting returns on their investment. They might ask you to articulate the pros and cons of using AI, or more specifically machine learning techniques. How do you explain the advantage? Is an AI solution the best fit?

A starting point would be a problem-focused mind set.

The best solution depends on the objectives. A problem-focused mind set starts with the problem to solve in mind. The solution choices are just a consequence. Having the problem in mind, you can really empathize with the audience. While the concept is simple, its application is not direct. What if your stakeholders think that AI is the solution? Perhaps they prefer to be consistent with some market trends, seeing AI applications in the same industry. Rather than following their guidance, solving their real problem is more effective, especially if you aim to establish a long trustworthy relationship.

With that in mind, what situations require machine learning?

How can you explain it in simple terms? Probably, it is not as simple as it sounds. No matter how well you know your stakeholders, there will always be a gap in how different people perceive the same subject. To mitigate this risk, you could start asking their view of AI. Then, you can walk backwards to your view and try to be as close as you can.

I tried to prepare a re-usable framework to simplify the choices.

To choose and explain the approach, I defined a framework and adopted it on different projects at Microsoft. After having acquired some degree of empathy with my audience, I leveraged it to shape the messages into simple explanations.

The following chart summarizes the framework and this article shows each part in more detail.

alt text

Key questions to understand the problem

To illustrate this example, we can use a sales forecasting problem.

The objective would be to forecast the daily sales volume based on information available, relevant and relatable to the sales.

The problem is analytical in the sense that the data plays a major role. To solve this problem and similar ones, there are three inputs:

Out of this input, to define the approach, I defined two key concepts

Type of solutions

What are the options to solve an analytical problem? There are several options and, to create clarity, I summarized them into three approaches:

Please note that each methodology utilizes both data and domain expertise. The difference is just in the way of utilizing the historical data.

Identify the most suited approach

To identify the approach, the first question is about whether or not to use historical data. If there is little to no historical data, the only feasible approaches are rule-based. Also, if the historical data is very simple to understand, it is possible to look into it and define rules to apply to the live data. In this case, the approach would be rule-based.

The two other approaches are statistical and machine learning. The question is whether the domain knowledge is sufficient to extract the relevant information from the data. From example, let’s assume that the only relevant factor affecting the sales is the day of the week. To forecast the sales for next Tuesday, a statistical approach is to compute the average sales volume on Tuesdays.

What happens if it is not clear what affects the sales volume? Let’s suppose that there are 3 possible root causes: weekday, time of the year, promotions. Each root cause if captured by historical data. Having 10 years’ worth of historical data, it would be hard to understand how to use the information just calculating base statistics. The machine learning algorithms identify patterns from the historical data in a way that is defined by the data scientists.

Another perspective is conceptual. How are we learning from the history to predict the future?

The following chart summarizes the recommended technique for each situation.

alt text

Please be aware that the chart is over-simplified and is meant to be used with non-technical audiences.

When to use this framework

The framework described in this article is high-level and oversimplified. Therefore, it is useful to explain the concepts in a simple and direct way to the project stakeholders. Specifically, I utilized this approach in three areas.

The first area is to design a new project. Specifically, it is about designing any data-related component of the project. If the approach is defined during the design phase, the component objectives and expectations can be set accordingly. Still, after knowing more, the methodology could change, so it would help have an agile delivery methodology.

After having designed a project, to foster delivery quality, this framework can still help. Specifically, the framework can help identify risks, dependencies and assumptions. Some example are:

The last area is about the estimates of resources. The category of techniques can help estimate what is needed to deliver the project in terms of resources, timelines, required skills.

Conclusions

The framework described in this article is a good fit for high-level conversations. However, being overly simplistic, it should be utilized carefully while diving into the technicalities. In reality, the approach is usually hybrid. For example, a rule-based approach applied to the output of a machine learning technique, or a statistical technique to define the features of a machine learning model.

Another risk is to forget about key aspects other than data and subject-matter expertise. Examples are

In summary, this explanatory framework is good for as long as the topics of conversation are high-level. For the project delivery, the framework should be adapted to include other key aspects.