Heuristics for a Data Scientist: A common sense approach.
BY Silvia Dassiè, Data Scientist at Ryanair.
The first time I encountered the word Data Science and what it promised to be, I was completely fascinated. From my previous experience in the IT, I have learned that data is the most valuable piece of an enterprise.
The idea to have found an algorithmic approach to extract more insights from the huge amount of available data seems incredibly powerful. Moreover, data is pervasive, so the horizon of possible applications has become potentially infinite.
A lot of articles have already been written to explain who is a Data Scientist, what are the skills required, the differences between other similar roles like Data Analyst or Business Analyst. What I would rather point out is the innovative approach to data that the word science implies: application of clever algorithms and techniques that can face what human beings do not, in order to transform messy data into augmented knowledge.
However, how to proceed is quite a different matter. From my professional experience, as a Data Scientist and previously in the IT as a System Analyst and Team Lead, I have learnt that no matter what your role, skills and duties are, you should never forget a daily good dose of common sense. In particular, during the projects I have been involved, I have found the following heuristics very useful and illuminating:
- Data is your best friend. Understand data is a key skill of any successful Data Scientist.
Like in a friendship, the initial exploratory analysis phase is tremendously important. It has the fundamental potentiality to address you on the right path from the beginning and avoid biggest mistakes. Do not rush immediately in a machine learning algorithm but enjoy the discovery phase to become familiar with your data and deeply understand its connections and potentiality. People that master the knowledge of the domain are the most powerful people inside an organization.They can make a difference and turn the tide of a company.
- Keep it simple. It is a well-known principle become very popular in Computer Science and in particular in Web Usability thanks to Steve Krug  but it can be applied to a wide variety of problems and systems. “Simplicity is about subtracting the obvious and adding the meaningful” . A simple model is easier to understand by everybody and easier to communicate and share with marketing and business. It is perceived as more trustful than a complex one because people can fully understand it and keep it in mind. They can completely control the entire process and eventually measure the result more comfortably. As a first model, if possible, prefer a structure that can be interpreted without any difficulty even by non-technical staff like rule systems, trees or regression coefficients. Moreover, “It turns out that simple rules frequently achieve high accuracy. […] and just one attribute is sufficient to determine the class of an instance quite accurately.” . It is always possible to enrich the model and improve the accuracy in a second time.
- No free lunch. Following the famous “No Free Lunch Theorems for Optimization” , generally there is no always-best strategy, no model that works for every data set. Like in the IT, also in the Data Science world, you can meet people that are very fond of a particular technology. However, focusing only on a class of algorithms or a particular programming language, they lose sight of the initial problem. They are confusing the big picture with the details, the problem with the tools to solve it. The solution has instead to adapt to the kind of data you have and the problem itself.
Moreover, data and problems can change over time so it is necessary to be prepared to be more flexible and adaptive than one could think initially.
- Curiosity first. “I have no special talents. I am only passionate curious” is a quote by Albert Einstein I have always found particularly appropriated to describe a Data Scientist. A genuine curiosity should drive people in Data Science, a desire to dig dip into data and investigate a problem until it is solved. In general, a scientific approach has to be followed but at the same time, you should also take advantage of other discovery techniques: serendipity, finding something unsought and unexpected when looking for another thing, or the idea of “the adjacent possible” , in most cases you just need to use info already available and recombine them in a different and creative way to solve problems.
- “Errare humanum est”. Errors and failures are words that might evoke negative scenarios, something one should worry about and avoid at any cost. Nevertheless, as it is true that is impossible to avoid all errors, they are also a great source of innovative ideas . They force yourself to evaluate other hypothesis, see the problem from different angles and discover insights that are more appropriate.
- Emotional data. One of the most important skills a Data Scientist should master is the attitude to add emotions to data when creating a story or a product. Analyzing data and discovering insights is not enough. The way they are presented or used to build a product is equally or even more important in order to create value for the company.
If you are not able to reach your audience or users with the right message or product, the discoveries you have made, become useless and the entire work loses credibility. Great stories and products are usually strongly correlated with the ability of applying visualization principles and techniques ( and ) to involve stakeholders and reveal insights when text and statistics are not enough.
- Be enthusiastic and skeptical at the same time. It may seem a contradiction in terms but we need both the bright and the dark side to face the challenges we find along our daily work. Sometimes we tend to forget one side in favor of the other and unfortunately, we lose the right balance. Look at the horizon and what could be done in the long term, but at the same time critically analyze and question assumptions at every step of your path.
Someone has defined the Data Scientist as the sexiest job of 21th century. However, I prefer to define it as a data craftsman or artist. It is a hard work that involves a continuous improvement and implies different hand-on activities like cleaning data, intensive analysis and implementation of algorithms in order to extract insights and communicate them in the better way to solve company problems and gain business value.
 Krug, Steve. Don’t make me think. A Common Sense Approach to Web Usability. 2nd ed. Berkeley: New Riders, 2006.
 Maeda, John. Le leggi della semplicità. Bruno Mondadori: Milan, 2006.
 Witten, Ian H. , Frank, Eibe and Hall, Mark A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. Morgan Kaufmann Series: Burlington, 2011.
 Wolpert, David H. and Macready, William G. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 1, no. 1 (1997): 67-82.
 Johnson, Steven. Where good ideas come from. The Seven Patterns of Innovation. Penguin Group: London, 2011.
 Tufte, Edward Rolf. Envisioning Information. Cheshire, Connecticut: Graphics Press, 1990.
 Norman, Donald Arthur. Emotional Design. Why We Love (or Hate) Everyday Things. New York: Basic Books, 2004.