{"id":5395,"date":"2021-05-27T09:18:35","date_gmt":"2021-05-27T09:18:35","guid":{"rendered":"http:\/\/www.odbms.org\/blog\/?p=5395"},"modified":"2021-05-27T09:18:35","modified_gmt":"2021-05-27T09:18:35","slug":"why-aidata-science-projects-fail-interview-with-joyce-weiner","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2021\/05\/why-aidata-science-projects-fail-interview-with-joyce-weiner\/","title":{"rendered":"Why AI\/Data Science Projects Fail. Interview with Joyce Weiner"},"content":{"rendered":"<blockquote><p><strong>&#8220;The most dangerous pitfall is when you solve the wrong problem.&#8221; &#8211;Joyce Weiner<\/strong><\/p><\/blockquote>\n<p>I have interviewed <strong>Joyce Weiner<\/strong>, Principal AI Engineer at Intel Corporation. \u00a0She recently wrote a book on \u00a0<span id=\"productTitle\" class=\"a-size-extra-large\">Why AI\/Data Science Projects Fail.<\/span><\/p>\n<p>RVZ<\/p>\n<p><strong>Q1. In your book you start by saying that 87% of Artificial Intelligence\/Big Data projects don\u2019t make it into production, meaning that most projects are never deployed. Is this still actual? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>I can only provide the anecdotal evidence that it is still a topic of conversation at conferences and an area of concern. A quick search doesn\u2019t provide me with any updated statistics. The most recent data point appears to be the Venture Beat reference\u00a0(VB Staff, 2019). Back in 2019, Gartner predicted that \u201cThrough 2022, only 20% of analytic insights will deliver business outcomes.\u201d\u00a0(White, 2019)<\/p>\n<p><strong>Q2. What are the common pitfalls?<\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>I specifically address the common pitfalls that are in the control of the people working on the project. Of course, there can be other external factors that will impact a project\u2019s success. But just focusing on what you can control and change:<\/p>\n<ol>\n<li>The scope of the project is too big<\/li>\n<li>The project scope increased in size as the project progressed (scope creep)<\/li>\n<li>The model couldn\u2019t be explained<\/li>\n<li>The model was too complex<\/li>\n<li>The project solved the wrong problem<\/li>\n<\/ol>\n<p><strong>Q3. You mention five pitfalls, which of the five are most frequent?, and which one are the most dangerous for a project? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>Of the five pitfalls, scope creep has been the one I have seen the most in my experience. It\u2019s an easy trap to fall into, you want to build the best solution and there is a tendency to add features when they come to mind without assessing the amount of value they add, or if it makes sense to add them right now. The most dangerous pitfall is when you solve the wrong problem. In that case, not only have you spent time and effort on a solution, once you have realized that you solved the wrong problem, you now need to go and redo the project to target the correct problem. Clearly, that can be demoralizing for the team working on the project, not to mention the potential business impact from the delay in delivering a solution.<\/p>\n<p><strong>Q4. You suggest five methods to avoid such pitfalls. What are they? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>The five methods I discuss in the book to avoid the pitfalls mentioned previously are:<\/p>\n<ol>\n<li>Ask questions \u2013 this addresses the project scope as well as providing information to decide on the amount of explainability required, and most importantly, ensures you are solving the correct problem.<\/li>\n<li>Get alignment \u2013 working with the project stakeholders and end users, starting as early as the project definition and continuing throughout the project, addresses problems with project scope and makes sure you are on track to solve the correct problem<\/li>\n<li>Keep it simple \u2013 this addresses model explainability and model complexity<\/li>\n<li>Leverage explainability \u2013 obviously directly related to model explainability, and addresses the pitfall of solving the wrong problem<\/li>\n<li>Have the conversation \u2013 continually discussing the project, expected deliverables, and sharing mock-ups and prototypes with your end users as you build the project addresses all 5 of the project pitfalls.<\/li>\n<\/ol>\n<p><strong>Q5. How do you apply and measure <em>effectivnesss<\/em> of these methods in practice? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>Well, the most immediate measurement is if you were able to deploy a solution into production. As a project progresses, you can measure things that will help you stay on track. For example, having a project charter to document and communicate your plans becomes a reference point as you build a project so that you recognize scope creep. A project charter is also useful when having conversations with project stakeholders to document alignment on deliverables.<\/p>\n<p><strong>Q6. Throughout your book you use the term \u201cdata science projects\u201d as an all-encompassing term that includes Artificial Intelligence (AI) and Big Data projects. Don&#8217;t you think that this is a limitation to your approach? \u00a0 Big Data projects might have different requirements and challenges than AI projects?<\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>Well, that is true Big Data projects do have additional challenges, especially around the data pipeline. The five pitfalls still apply, and those are the biggest challenges to getting a project into deployment based on my experience.<\/p>\n<p><strong>Q7. In your book you recommend as part of the project charter to document the expected return on investment for the project. You write that assessing the business value for your project will help get resources and funding. What metrics do you suggest for this? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>I propose several metrics in my book, which depend on the type of project you are delivering. For example, a common data science project is performing data analysis. Deliverables for this type of project are root cause determination, problem solving support, and problem identification. Metrics are productivity, which can be measured as time saved, time to decision which is how long it takes to gather the information needed to make a decision, decision quality, and risk reduction due to improved information or consistency in the information used to make decisions.<\/p>\n<p><strong>Q8. You also write that in acquiring data, there are two cases. One, when the data are available already either in internal systems or from external sources, and two, when you don\u2019t have the data.\u00a0How do you ensure the quality (and for example the absence of Bias) of the existing data? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>The easiest way to ensure you have high quality data is to automate data collection as much as possible. If you rely on people to provide information, make it easy for them to enter the data. I have found that if you require a lot of fields for data entry, people tend to not fill things in, or they don\u2019t fill things in completely. If you can collect the data from a source other than a human, say ingesting a log file from a program, your data quality is much higher. Checking for data quality by examining the data set before beginning on any model building is an important step. You can see if there are a lot of empty fields or gaps, or one-word responses in free text fields \u2013 things that call the quality of the data into question. You also get a sense of how much data cleaning you\u2019ll need to do.<\/p>\n<p>Bias is something that you need to be aware of, for example, if your data set is made solely of failing samples, you have no information on what makes something good or bad. You can only examine the bad. Building a model from those data that \u201cpredicts\u201d good samples would be wrong. I\u2019ve found that thinking through the purpose of the data and doing it as early as possible in the process is key. Although it\u2019s tempting to say, \u201cgiven these data, what can I do?\u201d it\u2019s better to start from a problem statement and then ensure you are collecting the proper data related to the problem to avoid having a biased data set.<\/p>\n<p><strong>Q9. What do you do if you do not have any data? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>Well, it makes it very difficult to do a data science project without any data. The first thing to do is to identify what data you would want if you could have them. Then, develop a plan for collecting those data. That might be building a survey or that might mean adding sensors or other instruments to collect data.<\/p>\n<p><strong>Q10. How do you know when an AI\/Big Data Project is <em>ready<\/em> for deployment? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>In my experience a project is ready for deployment when you have aligned with the end user and have completed all the items needed to deliver the solution they want. This includes things like a maintenance plan, metrics to monitor the solution, and documentation of the solution.<\/p>\n<p><strong>Q11. Can you predict if a project will fail after deployment? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>If a project doesn\u2019t start well, meaning if you aren\u2019t thinking about deployment as you build the solution, it doesn\u2019t bode well for the project overall. Without a deployment plan, and without planning for things like maintainability as you build the project, then it is likely the project will fail after deployment. And by this I include a dashboard which doesn\u2019t get used, or a model that stops working and can\u2019t be fixed by the current team.<\/p>\n<p><strong>Q12. What measures do you suggest to monitor a BigData\/AI project after it is deployed? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>The simplest measure is usage. If the solution is a report, are users accessing it? If it\u2019s a model, then also adding predicted values versus actual measurements. In the book, I share a tool called a <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/SIPOC');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/SIPOC\" target=\"_blank\">SIPOC or supplier-input-process-output-customer <\/a>which helps identify the metrics the customer cares about for a project. Some examples are timeliness, quality, and support level agreements.<\/p>\n<p><strong>Q13. In your book you did not address the societal and ethical implications of using AI. Why? <\/strong><\/p>\n<p><strong>Joyce Weiner:\u00a0<\/strong>I didn\u2019t address the societal and ethical implications of AI for two reasons. One, it isn\u2019t my area of expertise. Second, it is such a big topic that it warrants its own book.<\/p>\n<p>&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;<\/p>\n<p><strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner.jpg');\"  href=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-5396\" src=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner-300x300.jpg\" alt=\"JoyceWeiner\" width=\"162\" height=\"162\" srcset=\"https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner-300x300.jpg 300w, https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner-150x150.jpg 150w, https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner-1024x1024.jpg 1024w, https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/JoyceWeiner.jpg 1605w\" sizes=\"(max-width: 162px) 100vw, 162px\" \/><\/a><\/strong><\/p>\n<p><strong>Joyce Weiner<\/strong> <em>is a Principal AI Engineer at Intel Corporation. Her area of technical expertise is data science and using data to drive efficiency. Joyce is a black belt in Lean Six Sigma. She has a BS in Physics from Rensselaer Polytechnic Institute, and an MS in Optical Sciences from the University of Arizona. She lives with her husband outside Phoenix, Arizona.<\/em><\/p>\n<p><strong>References<\/strong><\/p>\n<p>VB Staff. (2019, July 19). <em>Why do 87% of data science projects never make it into production?<\/em> Retrieved from VentureBeat: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/');\"  href=\"https:\/\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/\" target=\"_blank\">https:\/\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/<\/a><\/p>\n<p>White, A. (2019, Jan 3). <em>Our Top Data and Analytics Predicts for 2019.<\/em> Retrieved from Gartner: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/blogs.gartner.com\/andrew_white\/2019\/01\/03\/our-top-data-and-analytics-predicts-for-2019\/');\"  href=\"https:\/\/blogs.gartner.com\/andrew_white\/2019\/01\/03\/our-top-data-and-analytics-predicts-for-2019\/\" target=\"_blank\">https:\/\/blogs.gartner.com\/andrew_white\/2019\/01\/03\/our-top-data-and-analytics-predicts-for-2019\/<\/a><\/p>\n<p><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/51-5Qe-eEYL._SX404_BO1204203200_.jpg');\"  href=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/51-5Qe-eEYL._SX404_BO1204203200_.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-5399\" src=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/51-5Qe-eEYL._SX404_BO1204203200_-244x300.jpg\" alt=\"51-5Qe+eEYL._SX404_BO1,204,203,200_\" width=\"203\" height=\"250\" srcset=\"https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/51-5Qe-eEYL._SX404_BO1204203200_-244x300.jpg 244w, https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/05\/51-5Qe-eEYL._SX404_BO1204203200_.jpg 406w\" sizes=\"(max-width: 203px) 100vw, 203px\" \/><\/a><\/p>\n<div class=\"a-row\"><span class=\"a-size-base a-color-base a-text-bold\">ISBN-13:<\/span> <span class=\"a-size-base a-color-base\">978-1636390383<\/span><\/div>\n<div class=\"a-row\"><span class=\"a-size-base a-color-base a-text-bold\">ISBN-10:<\/span> <span class=\"a-size-base a-color-base\">1636390382<\/span><\/div>\n<div class=\"a-row\"><span class=\"a-text-bold\">Publisher : <\/span>Morgan &amp; Claypool (December 18, 2020)<\/div>\n<div class=\"a-row\"><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/ieeexplore.ieee.org\/document\/9316369');\"  href=\"https:\/\/ieeexplore.ieee.org\/document\/9316369\" target=\"_blank\">https:\/\/ieeexplore.ieee.org\/document\/9316369<\/a><\/div>\n<p>&nbsp;<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>&#8220;The most dangerous pitfall is when you solve the wrong problem.&#8221; &#8211;Joyce Weiner I have interviewed Joyce Weiner, Principal AI Engineer at Intel Corporation. \u00a0She recently wrote a book on \u00a0Why AI\/Data Science Projects Fail. RVZ Q1. In your book you start by saying that 87% of Artificial Intelligence\/Big Data projects don\u2019t make it into [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[881,66,793,1647,1646],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5395"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=5395"}],"version-history":[{"count":7,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5395\/revisions"}],"predecessor-version":[{"id":5404,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5395\/revisions\/5404"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=5395"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=5395"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=5395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}