Why AI/Data Science Projects Fail. Interview with Joyce Weiner
“The most dangerous pitfall is when you solve the wrong problem.” –Joyce Weiner
I have interviewed Joyce Weiner, Principal AI Engineer at Intel Corporation. She recently wrote a book on Why AI/Data Science Projects Fail.
Q1. In your book you start by saying that 87% of Artificial Intelligence/Big Data projects don’t make it into production, meaning that most projects are never deployed. Is this still actual?
Joyce Weiner: I can only provide the anecdotal evidence that it is still a topic of conversation at conferences and an area of concern. A quick search doesn’t provide me with any updated statistics. The most recent data point appears to be the Venture Beat reference (VB Staff, 2019). Back in 2019, Gartner predicted that “Through 2022, only 20% of analytic insights will deliver business outcomes.” (White, 2019)
Q2. What are the common pitfalls?
Joyce Weiner: I specifically address the common pitfalls that are in the control of the people working on the project. Of course, there can be other external factors that will impact a project’s success. But just focusing on what you can control and change:
- The scope of the project is too big
- The project scope increased in size as the project progressed (scope creep)
- The model couldn’t be explained
- The model was too complex
- The project solved the wrong problem
Q3. You mention five pitfalls, which of the five are most frequent?, and which one are the most dangerous for a project?
Joyce Weiner: Of the five pitfalls, scope creep has been the one I have seen the most in my experience. It’s an easy trap to fall into, you want to build the best solution and there is a tendency to add features when they come to mind without assessing the amount of value they add, or if it makes sense to add them right now. The most dangerous pitfall is when you solve the wrong problem. In that case, not only have you spent time and effort on a solution, once you have realized that you solved the wrong problem, you now need to go and redo the project to target the correct problem. Clearly, that can be demoralizing for the team working on the project, not to mention the potential business impact from the delay in delivering a solution.
Q4. You suggest five methods to avoid such pitfalls. What are they?
Joyce Weiner: The five methods I discuss in the book to avoid the pitfalls mentioned previously are:
- Ask questions – this addresses the project scope as well as providing information to decide on the amount of explainability required, and most importantly, ensures you are solving the correct problem.
- Get alignment – working with the project stakeholders and end users, starting as early as the project definition and continuing throughout the project, addresses problems with project scope and makes sure you are on track to solve the correct problem
- Keep it simple – this addresses model explainability and model complexity
- Leverage explainability – obviously directly related to model explainability, and addresses the pitfall of solving the wrong problem
- Have the conversation – continually discussing the project, expected deliverables, and sharing mock-ups and prototypes with your end users as you build the project addresses all 5 of the project pitfalls.
Q5. How do you apply and measure effectivnesss of these methods in practice?
Joyce Weiner: Well, the most immediate measurement is if you were able to deploy a solution into production. As a project progresses, you can measure things that will help you stay on track. For example, having a project charter to document and communicate your plans becomes a reference point as you build a project so that you recognize scope creep. A project charter is also useful when having conversations with project stakeholders to document alignment on deliverables.
Q6. Throughout your book you use the term “data science projects” as an all-encompassing term that includes Artificial Intelligence (AI) and Big Data projects. Don’t you think that this is a limitation to your approach? Big Data projects might have different requirements and challenges than AI projects?
Joyce Weiner: Well, that is true Big Data projects do have additional challenges, especially around the data pipeline. The five pitfalls still apply, and those are the biggest challenges to getting a project into deployment based on my experience.
Q7. In your book you recommend as part of the project charter to document the expected return on investment for the project. You write that assessing the business value for your project will help get resources and funding. What metrics do you suggest for this?
Joyce Weiner: I propose several metrics in my book, which depend on the type of project you are delivering. For example, a common data science project is performing data analysis. Deliverables for this type of project are root cause determination, problem solving support, and problem identification. Metrics are productivity, which can be measured as time saved, time to decision which is how long it takes to gather the information needed to make a decision, decision quality, and risk reduction due to improved information or consistency in the information used to make decisions.
Q8. You also write that in acquiring data, there are two cases. One, when the data are available already either in internal systems or from external sources, and two, when you don’t have the data. How do you ensure the quality (and for example the absence of Bias) of the existing data?
Joyce Weiner: The easiest way to ensure you have high quality data is to automate data collection as much as possible. If you rely on people to provide information, make it easy for them to enter the data. I have found that if you require a lot of fields for data entry, people tend to not fill things in, or they don’t fill things in completely. If you can collect the data from a source other than a human, say ingesting a log file from a program, your data quality is much higher. Checking for data quality by examining the data set before beginning on any model building is an important step. You can see if there are a lot of empty fields or gaps, or one-word responses in free text fields – things that call the quality of the data into question. You also get a sense of how much data cleaning you’ll need to do.
Bias is something that you need to be aware of, for example, if your data set is made solely of failing samples, you have no information on what makes something good or bad. You can only examine the bad. Building a model from those data that “predicts” good samples would be wrong. I’ve found that thinking through the purpose of the data and doing it as early as possible in the process is key. Although it’s tempting to say, “given these data, what can I do?” it’s better to start from a problem statement and then ensure you are collecting the proper data related to the problem to avoid having a biased data set.
Q9. What do you do if you do not have any data?
Joyce Weiner: Well, it makes it very difficult to do a data science project without any data. The first thing to do is to identify what data you would want if you could have them. Then, develop a plan for collecting those data. That might be building a survey or that might mean adding sensors or other instruments to collect data.
Q10. How do you know when an AI/Big Data Project is ready for deployment?
Joyce Weiner: In my experience a project is ready for deployment when you have aligned with the end user and have completed all the items needed to deliver the solution they want. This includes things like a maintenance plan, metrics to monitor the solution, and documentation of the solution.
Q11. Can you predict if a project will fail after deployment?
Joyce Weiner: If a project doesn’t start well, meaning if you aren’t thinking about deployment as you build the solution, it doesn’t bode well for the project overall. Without a deployment plan, and without planning for things like maintainability as you build the project, then it is likely the project will fail after deployment. And by this I include a dashboard which doesn’t get used, or a model that stops working and can’t be fixed by the current team.
Q12. What measures do you suggest to monitor a BigData/AI project after it is deployed?
Joyce Weiner: The simplest measure is usage. If the solution is a report, are users accessing it? If it’s a model, then also adding predicted values versus actual measurements. In the book, I share a tool called a SIPOC or supplier-input-process-output-customer which helps identify the metrics the customer cares about for a project. Some examples are timeliness, quality, and support level agreements.
Q13. In your book you did not address the societal and ethical implications of using AI. Why?
Joyce Weiner: I didn’t address the societal and ethical implications of AI for two reasons. One, it isn’t my area of expertise. Second, it is such a big topic that it warrants its own book.
Joyce Weiner is a Principal AI Engineer at Intel Corporation. Her area of technical expertise is data science and using data to drive efficiency. Joyce is a black belt in Lean Six Sigma. She has a BS in Physics from Rensselaer Polytechnic Institute, and an MS in Optical Sciences from the University of Arizona. She lives with her husband outside Phoenix, Arizona.
VB Staff. (2019, July 19). Why do 87% of data science projects never make it into production? Retrieved from VentureBeat: https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
White, A. (2019, Jan 3). Our Top Data and Analytics Predicts for 2019. Retrieved from Gartner: https://blogs.gartner.com/andrew_white/2019/01/03/our-top-data-and-analytics-predicts-for-2019/