{"id":5357,"date":"2021-04-21T11:50:37","date_gmt":"2021-04-21T11:50:37","guid":{"rendered":"http:\/\/www.odbms.org\/blog\/?p=5357"},"modified":"2021-04-21T11:53:36","modified_gmt":"2021-04-21T11:53:36","slug":"on-c-debugging-interview-with-greg-law","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2021\/04\/on-c-debugging-interview-with-greg-law\/","title":{"rendered":"On C++ Debugging. Interview with Greg Law"},"content":{"rendered":"<blockquote>\n<h4>&#8220;Like it or not, debugging is part of programming. There is a lot of research and cool technology about preventing bugs (programming language features or design decisions that make certain bugs impossible) or catching bugs very early (through static or dynamic analysis or better testing), and all this is of course laudable and good stuff. But I\u2019ve often been struck by how little attention is placed on making it easier to fix those bugs when they inevitably do happen.&#8221; &#8212;\u00a0<strong>Greg Law<\/strong><\/h4>\n<\/blockquote>\n<p><strong><strong>Q1: You are a prolific speaker at C++ conferences and podcasts. In your experience, who is still using C++?<\/strong><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> C++ is used widely and its use is growing. I see a lot of C++ usage in Data Management, Networking, Electronic Design Automation (EDA), Aerospace, Games, Finance, etc.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">It\u2019s probably true that use of some other languages &#8211; particularly JavaScript and Python &#8211; is growing even faster, but those languages are weak where C++ is strong and vice versa. Go is growing a lot and Rust is getting a lot of attention right now and has some very attractive properties. 10-15 years ago, it felt almost like programming languages were \u201cdone\u201d but these days, we\u2019re seeing a lot of innovation both in terms of new or newish languages, and development of older languages. Even plain old C is seeing a bit of a resurgence. We are going to continue living in a multi-language world; I expect C++ to remain an important language for a long while yet.<\/span><\/p>\n<p class=\"normal\"><strong><a name=\"_ifv1ca3jlid\"><\/a><\/strong><span lang=\"UZ-CYR\"><strong>Q2: In my interview with Bjarne Stroustrup last year, he spoke about the challenge of designing C++ in the face of contradictory demands of making the language simpler, whilst adding new functionality and without breaking people&#8217;s code. What are your thoughts on this?<\/strong><b><\/b><\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> I totally agree. I think all engineering is about two things &#8211; minimising mistakes and making tradeoffs (i.e. judgements). Mistakes might be a miscalculation when designing a bridge so that it won\u2019t stand up or an off-by-one error in your program &#8211; those are clearly undesirable, we don\u2019t want those. A tradeoff might be between how expensive the bridge is to build and how long it will last, or how long the code takes to write and how fast it runs. <\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">But tradeoffs are relevant when it comes to reducing errors too &#8211; what price should we pay to avoid errors in our programs? How much extra time are we prepared to spend writing or testing it to get the bugs out? How far do we go tracking down those flaky 1-in-a-thousand failures in the test-suite? Are we going to sacrifice runtime performance by writing it in a higher-level and less error-prone language? Alternatively, we could choose to make that super-clever optimisation about which it\u2019s hard to be confident it is correct today and even harder to be sure it will remain correct as the code around it changes; but is the runtime performance gain worth it, given the uncertainty that has been introduced? It\u2019s counterintuitive, but actually there is an optimal bugginess for any program &#8211; we inevitably trade off cost of implementation and performance against potential bugs. <\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">It\u2019s probably fair to say however that most programs have more bugs than is optimal! I think it\u2019s also true that human nature means we tend to under-invest in dealing with the bugs early, particularly flaky tests. We always feel \u201cthis week is particularly busy, I\u2019ll part that and take a look next week when I\u2019ll have a bit more time\u201d; and of course next week turns out to be just as bad as this week.<\/span><\/p>\n<p class=\"normal\"><strong><span lang=\"UZ-CYR\">Q3: I understand Undo helps software engineering teams with debugging complex C\/C++ code bases. What is the situation with debugging C\/C++? What are you seeing on the ground?<\/span><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> Like it or not, debugging is part of programming. There is a lot of research and cool technology about preventing bugs (programming language features or design decisions that make certain bugs impossible) or catching bugs very early (through static or dynamic analysis or better testing), and all this is of course laudable and good stuff. But I\u2019ve often been struck by how little attention is placed on making it easier to fix those bugs when they inevitably do happen. The situation is not unlike medicine in that prevention is better than cure, and the earlier the diagnosis the better; but no matter what we do, we will always need cure (unlike medicine we have the balance wrong the other way round &#8211; in medicine we spend way too much on cure vs prevention!).<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">It\u2019s all about tradeoffs again. All else being equal, we\u2019d ensure there are no bugs in the first place; but all else never is equal, and how high a price can we afford on prevention? And actually if you make diagnosis and fixing cheaper, that further reduces how much you need to spend on prevention.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">The harsh reality is that close to none of the software out there today is truly understood by anyone. Humans just aren\u2019t very good at writing code, and economic pressure and other factors mean we add and fix tests until our fear of delivering late outweighs our fear of bugs. This is compounded as code ages; people move on from the project, bugs get fixed by adding a quick hack, further increasing the spaghettification. Like frogs in boiling water, we\u2019ve kind of become so used to it that we don\u2019t notice how awful it is any more!<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">People routinely just disable flaky failing tests because they can\u2019t root-cause them. Over a third of production failures can be traced back directly or indirectly to a test that was failing and was ignored.<\/span><\/p>\n<p class=\"normal\"><strong><span lang=\"UZ-CYR\">Q4: You have designed a time travel debugger for C\/C++. What is it for? <\/span><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> Debugging is really answering one question: \u201cwhat happened?\u201d. I had certain expectations for what my code was going to do and all I know is that reality diverged from those expectations. Traditional debuggers are of limited help here &#8211; they don\u2019t tell you what happened, they just tell you what is happening right now. You hit a breakpoint, you can look around and see what state everything is in, and either it looks all good or you can see something wrong. If it\u2019s good, set another breakpoint and continue. If it\u2019s bad\u2026 well, now you want to know what happened, how it became bad. The odds of breaking just at the right point and stepping your code through the badness are pretty long. So you run again, and again, if you\u2019re lucky vaguely the same thing happens each time so you can home in on it; if not, well\u2026 you\u2019re in trouble. <\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">With a time travel debugger like <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/solutions\/products\/udb\/?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb');\"  href=\"https:\/\/undo.io\/solutions\/products\/udb\/?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb\">UDB<\/a>, it\u2019s totally different &#8211; you see some piece of state is bad, you can just go backwards to find out why. Watchpoints (aka data breakpoints) are super powerful here &#8211; you can watch the bad piece of data and run backwards and have the debugger take you straight to the line of code that last modified it. We have customers who have been trying to fix something for literally years who with a couple of watch + reverse-continue operations had it nailed in an hour.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">Time travel debuggers are really powerful for any bug where a decent amount of time passes between the bug itself and the symptoms (assertion failure, segmentation fault, bad results produced). They are particularly useful when there is any kind of non-determinism in the program &#8211; when the bug only occurs one time in a thousand and\/or every time you run the program it fails at a different point in or a different way. Most race conditions are examples of this; so are many memory or state corruption bugs. It can also help to diagnose complex memory leaks. Most leak detectors or static analysis help with the trivial issues( say you returned an error and forgot to add a free) but not the hard ones (for example when you have a reference counting bug and so the reference never hits zero and the resources don\u2019t get cleaned up).<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">This <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/info.undo.io\/time-travel-debugging-whitepaper?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb');\"  href=\"https:\/\/info.undo.io\/time-travel-debugging-whitepaper?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb\">new white paper<\/a> provides more insight into what kind of bugs time travel debugging helps with *. It\u2019s not uncommon for software engineers to spend half their time debugging, so it\u2019s a must-read for anyone who wants to increase development team productivity.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">By the way, Time Travel Debugging is also sometimes known as Replay Debugging or Reverse Debugging.<\/span><\/p>\n<p class=\"normal\"><strong><span lang=\"UZ-CYR\">Q5: Since you say it lets you see what happened, could it help with code exploration too?<\/span><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> Funny you say that. This is a use case it wasn&#8217;t initially designed for, but many engineers are using it to explore unfamiliar codebases they didn&#8217;t write. They use it to observe program behaviour by navigating forwards and backwards in the program&#8217;s execution history, examine registers to find the address of an object etc. They say there&#8217;s a huge productivity benefit in being able to go backwards and forwards over the same section of code until you fully understand what it does. Especially as you\u2019re trying to understand a certain piece of code, and there are often millions of lines you don\u2019t care about right now, it\u2019s easy to get lost. When that happens you can go straight back to where you were and continue exploring.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">Debugging is about answering \u201cwhat did the code do\u201d (ref. <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/resources\/set-breakpoint-past\/');\"  href=\"https:\/\/undo.io\/resources\/set-breakpoint-past\/\">cpp.chat podcast<\/a> on setting a breakpoint in the past **); but there are other activities that involve asking that same question. As I say, most code out there is not really understood by anyone. <\/span><span lang=\"UZ-CYR\">\u00a0<\/span><\/p>\n<p class=\"normal\"><strong><span lang=\"UZ-CYR\">Q6: What are your tips on how to diagnose and debug complex C++ programs?<\/span><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Greg Law:<\/strong> The hard part about debugging is figuring out the root cause. Usually, once you&#8217;ve identified what&#8217;s wrong, the fix is quite simple. We once had a bug that sunk literally months of engineering time to root cause, and the fix was a single character &#8211; that&#8217;s extreme but the effect it&#8217;s illustrating is very common.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">Identifying the problem is an exercise in figuring out what the code really did as opposed to what you expected. Somewhere reality has diverged from your expectations &#8211; and that point of divergence is your bug. If you&#8217;re lucky, the effects manifest soon after the bug &#8211; maybe a NULL pointer is dereferenced and you needed a check for NULL right before it. But more often that pointer should never be NULL, the problem is earlier.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">The answer to this is multi-pronged:<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><em>1.<\/em> Liberal use of assertions to find problems as close to their root cause as possible. I reckon that 50% of assert fails are just bogus assertions, which is annoying but cheap to fix because the problem is at the very line of code that you notice. The other 50% will save you a lot of time.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><em>2.<\/em> If you see something not right, do not sweep it under the carpet. This is sometimes referred to as &#8216;smelling smoke&#8217;. Maybe it&#8217;s nothing, but you better go and look and see if there&#8217;s a fire. When you&#8217;re smelling smoke, you&#8217;re getting close to the root cause. If you ignore it, chances are that whatever the underlying cause of the weirdness is, it will come back and bite you in a way that gives you much less of a clue as to what&#8217;s wrong, and it&#8217;ll take you a lot longer to fix it. Likewise don&#8217;t paper over the cracks &#8211; if you don&#8217;t understand how that pointer can be NULL, don&#8217;t just put a check for NULL at the point the segv happened. <\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">This most often manifests itself in people ignoring flaky test failures. 82% of software companies report having failing tests that were not investigated that went on to cause production failures *** (the other 18% are probably lying!). Working in this way requires discipline &#8211; following that smell of smoke or fixing that flaky test that you know isn&#8217;t your fault will be a distraction from your proximate goal. But when something is not right, or not understood, ignoring it now is going to cost you a lot of time in the long run.<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><em>3.<\/em> Provide a way to know what your code is really doing. The trendy term is observability. This can be good old printf or some more fancy logging. An emerging technique is <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/solutions\/products\/software-failure-replay\/');\"  href=\"https:\/\/undo.io\/solutions\/products\/software-failure-replay\/\">Software Failure Replay<\/a>, which is related to Time-Travel Debugging. Here you record the program execution (a failed process), such that a debugger can be pointed at the execution history and you can go back to any line of code that executed and see full program state. This is like the ultimate observability. Discovering where reality diverged from your expectations becomes trivial.<\/span><\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/04\/Greg-Law-Headshot-2018.jpg');\"  href=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/04\/Greg-Law-Headshot-2018.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-5358\" src=\"http:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/04\/Greg-Law-Headshot-2018-300x200.jpg\" alt=\"Greg Law Headshot 2018\" width=\"220\" height=\"146\" srcset=\"https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/04\/Greg-Law-Headshot-2018-300x200.jpg 300w, https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2021\/04\/Greg-Law-Headshot-2018-1024x681.jpg 1024w\" sizes=\"(max-width: 220px) 100vw, 220px\" \/><\/a><\/strong><\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\"><strong>Dr Greg Law<\/strong> <em>is the founder of <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb');\"  href=\"https:\/\/undo.io\/?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb\">Undo<\/a>, the leading Software Failure Replay platform provider. Greg has 20 years\u2019 experience in the software industry prior to founding Undo and has held development and management roles at companies, including Solarflare and the pioneering British computer firm Acorn. Greg holds a PhD from City University, London, and is a regular speaker at CppCon, ACCU, QCon, and DBTest. <\/em><\/span><\/p>\n<p class=\"normal\"><b><span lang=\"UZ-CYR\">Resources<\/span><\/b><\/p>\n<p class=\"normal\"><b><span lang=\"UZ-CYR\">* <\/span><\/b><span lang=\"UZ-CYR\"><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/info.undo.io\/time-travel-debugging-whitepaper?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb');\"  href=\"https:\/\/info.undo.io\/time-travel-debugging-whitepaper?utm_source=odbms&amp;utm_medium=industrywatchinterview&amp;utm_campaign=udb\">White Paper: Increase Development Productivity with Time Travel Debugging<\/a><\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">** <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/resources\/set-breakpoint-past\/');\"  href=\"https:\/\/undo.io\/resources\/set-breakpoint-past\/\">cpp.chat podcast<\/a> &#8211; Setting a Breakpoint in the Past<\/span><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">*** <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/info.undo.io\/software-reliability-report-optimizing-supplier-and-customer-relationship');\"  href=\"https:\/\/info.undo.io\/software-reliability-report-optimizing-supplier-and-customer-relationship\">Freeform Dynamics Analyst Report<\/a> &#8211; Optimizing the software supplier and customer relationship<\/span><\/p>\n<p class=\"normal\"><strong>Related Posts<\/strong><\/p>\n<p class=\"normal\">&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2020\/07\/thirty-years-c-interview-with-bjarne-stroustrup\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2020\/07\/thirty-years-c-interview-with-bjarne-stroustrup\/\" target=\"_blank\"><strong>Thirty Years C++. Interview with Bjarne Stroustrup<\/strong>.\u00a0by Roberto V. Zicari.ODBMS Industry Watch. July 23, 2020<\/a><\/p>\n<p class=\"normal\"><strong>Follow us on Twitter: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/odbmsorg');\"  href=\"https:\/\/twitter.com\/odbmsorg\" target=\"_blank\">@odbmsorg<\/a><\/strong><\/p>\n<p class=\"normal\"><span lang=\"UZ-CYR\">\u00a0<\/span><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>&#8220;Like it or not, debugging is part of programming. There is a lot of research and cool technology about preventing bugs (programming language features or design decisions that make certain bugs impossible) or catching bugs very early (through static or dynamic analysis or better testing), and all this is of course laudable and good stuff. [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[74,85,1640,1641,914],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5357"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=5357"}],"version-history":[{"count":9,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5357\/revisions"}],"predecessor-version":[{"id":5367,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5357\/revisions\/5367"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=5357"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=5357"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=5357"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}