What is the role of a data scientist?

From What is the role of a data scientist

Data Science has been around for decades, but it recently increased in popularity among companies. Although the tools and techniques existed already, there are some changes. Digital technologies generate more data that can drive new advanced analytics use-cases. Also, there are more success stories show-casing the value in data, making companies more keen to invest resources into new solutions. Because of the hype, phrases like “Data Science” and “Big Data” became buzzwords. However, their meaning is loosely defined and it’s not entirely clear to many businesses. From a practical perspective, what is data science? Is the role of the data scientist the same as the statistician, or are there new challenges?

The scope of this article is to understand what the role of a data scientist consists in. Depending on the company and on the specific role, there are lots of differences, so it’s challenging to come out with a universal profile. However, it’s still possible to understand the role from a high-level perspective. Breaking down the core responsibilities of a data scientist, we can have a better understanding of the related skillset.

The starting point of a data science project is “why”. Why does the company need an advanced analytics solution? Why is it willing to allocate a budget? Why will the project have an impact? To address these queries, the data scientist is capable of understanding the business context, brainstorm solutions, and identify what’s valuable. The related skills are business acumen, experience in applying a solution in a specific industry, and soft skills like stakeholder management and listening. Being knowledgeable about the field is definitely useful although not mandatory, since data scientists can interact with subject matter experts to get the information they need. Therefore, the core skill is being capable of collecting business information and defining advanced analytics use-cases accordingly.Also, the data scientist should be able to present the solution and its value, and good presentation skills help in that.

After having defined the target, the next question is “what”. What do we need to do to solve the challenge? What techniques can we use? What are the main steps from the current situation to the final solution? To address these queries, the data scientist should be able to define the logical steps to build an end-to-end solution. The required knowledge is about statistics, data processing, machine learning techniques, model validation. However, being knowledgeable about the separate steps is not enough as the data scientist needs to be capable of designing an end-to-end solution that every time is different depending on the data and on the target. The main challenges are to

Each step requires thinking outside the box and using common sense, in addition to some knowledge about the techniques.Knowing what are the logical step of an advanced analytics solution doesn’t imply being able to build it. The final question is “how”. How can we implement the solution? How can we put the data together? How can we prototype and deploy the solution?

This part is more technical and the skillset is diverse. The main areas are

The technical skills depend a lot on the context, so there is more diversity in the “how” area.

The data science process requires a broad expertise and the data scientist can’t go very deeply into each component of the solution. That’s especially true for data scientist consultants, given that they join new projects where the customer has already a deep knowledge about the context and the tools. To design and build the solution, the data scientist needs to interact with

This article shows what’s common across most of the data scientists and aims to provide more clarity about the role. Depending on the specific case, the skillset can be more detailed and it varies a lot depending on the industry, seniority, level, team. Also, in larger teams there will be people specialised in different aspects of the solution, so it’s less important to have a person having the full skillset.