{"id":5796,"date":"2025-09-11T04:48:21","date_gmt":"2025-09-11T04:48:21","guid":{"rendered":"https:\/\/www.odbms.org\/blog\/?p=5796"},"modified":"2025-09-11T04:48:22","modified_gmt":"2025-09-11T04:48:22","slug":"on-debugging-with-ai-interview-with-mark-williamson","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2025\/09\/on-debugging-with-ai-interview-with-mark-williamson\/","title":{"rendered":"<strong>On Debugging with AI. Interview with Mark Williamson<\/strong>"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote\">\n<p>&#8220;Quality of code (and everything that goes along with it) isn\u2019t talked about enough in AI conversations!&nbsp; There are some obvious facets to this &#8211; does the code do what you intended?&nbsp; Is it fast?&nbsp; Does it crash in the common cases?&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p><strong>Q1. Can AI write <\/strong><strong><em>better <\/em><\/strong><strong>code than humans?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson:<\/strong><strong> <\/strong>I don\u2019t think so, at least not today.&nbsp; For one thing, LLM-based AIs are trained on pre-existing code, which was written by fallible humans.&nbsp; So they at least have the potential to make all the mistakes we do.<\/p>\n\n\n\n<p>Despite that, any coding AI you pick will write better frontend Javascript than me &#8211; that\u2019s not my area of expertise.&nbsp; But I would back an experienced human (with or without AI <em>assistance<\/em>) to beat an unsupervised AI coder.<\/p>\n\n\n\n<p>Can they beat humans some day?&nbsp; I assume so &#8211; but they\u2019re not doing it today.&nbsp; And when you factor in other aspects of the Software Engineer\u2019s job (such as building the <em>right<\/em> thing) it\u2019s even more challenging.<\/p>\n\n\n\n<p><strong>Q2. How do you define what is a &#8220;better&#8221; code?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Quality of code (and everything that goes along with it) isn\u2019t talked about enough in AI conversations!&nbsp; There are some obvious facets to this &#8211; does the code do what you intended?&nbsp; Is it fast?&nbsp; Does it crash in the common cases?<\/p>\n\n\n\n<p>A lot of the work a human developer does to achieve this is actually achieved <em>after<\/em> the initial code is typed in.&nbsp; There\u2019s an iterative process of learning about and refining the solution &#8211; understanding what you\u2019ve made and improving on it.&nbsp; A lot of this is really debugging, in the broadest sense of the term: the code doesn\u2019t do what you expected and you need to understand and fix it.<\/p>\n\n\n\n<p>There\u2019s another step beyond that, though &#8211; whether the code <em>fits its intended purpose<\/em>.&nbsp; Getting that fit requires understanding the end user, thinking through the implementation tradeoffs and anticipating future developments.&nbsp; For now, I see AI as freeing up some time so we can create space for those human insights.<\/p>\n\n\n\n<p>Just focusing on how many lines of code we create is a pattern in the industry &#8211; we overvalue simply <em>generating code<\/em> versus all the other things that software engineers actually do.<\/p>\n\n\n\n<p><strong>Q3. Can AI write some types of code faster and with fewer simple errors?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Yes!<\/p>\n\n\n\n<p>In my experience, I\u2019ve found AI to be extremely useful in three scenarios:<\/p>\n\n\n\n<ul>\n<li>Writing code that is <em>almost<\/em> boilerplate &#8211; where it\u2019s not a copy-paste problem but requires quite routine changes.<\/li>\n\n\n\n<li>Writing code that <em>would be<\/em> boilerplate for a different engineer &#8211; e.g. if I want to write JSON serialisation \/ deserialisation code in Python it\u2019s easier for me to get an AI assistant to show me the shape of a good solution.<\/li>\n\n\n\n<li>Doing refactors that involve restructuring or applying a small fix in a lot of places &#8211; a coding agent can handle the detail while I concentrate on the overall shape.<\/li>\n<\/ul>\n\n\n\n<p>In all these cases, the benefit is in reducing the amount of thinking required to figure out my design approach.&nbsp; In Daniel Kahneman\u2019s book <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Thinking,_Fast_and_Slow');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Thinking,_Fast_and_Slow\"><em>Thinking Fast and Slow<\/em><\/a>, he describes two modes of thought: System 1 and System 2.&nbsp; System 1 is the stuff you can just answer automatically, whereas System 2 thought requires <em>effort<\/em>.<\/p>\n\n\n\n<p>System 2 is tiring &#8211; you probably can\u2019t manage more than a couple of hours of <em>really hard thinking<\/em> about code in a day.&nbsp; So it\u2019s precious.&nbsp; An agent lets me offload some work so I can focus that effort on exploring solutions to the <em>real<\/em> problem I\u2019m trying to solve.<\/p>\n\n\n\n<p><strong>Q4. Large Language Model (LLM)-based AI code assistants are powerful tools, but they have significant limitations that developers must understand. What are such limitations?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>The most obvious limitation is that they don\u2019t know everything.&nbsp; They often act as though they do, which is a trap.&nbsp; \u201c<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Hallucination_(artificial_intelligence)');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Hallucination_(artificial_intelligence)\">Hallucinations<\/a>\u201d are the most well-known consequence of this &#8211; in which the LLM gives an answer that is confident but ultimately not based in fact.<\/p>\n\n\n\n<p>I like to say that modern AI\u2019s training teaches it <em>what a good answer looks like<\/em> &#8211; they\u2019ve seen lots of examples of them, after all.&nbsp; So, from an AI\u2019s point of view, a good answer includes attributes like:<\/p>\n\n\n\n<ul>\n<li>Projecting confidence.<\/li>\n\n\n\n<li>Using the right terminology.<\/li>\n\n\n\n<li>Relating suggestions specifically to your question and context.<\/li>\n\n\n\n<li>Being right!<\/li>\n<\/ul>\n\n\n\n<p>If it can satisfy most of those, then it\u2019ll think it\u2019s done a good job.&nbsp; So when they\u2019re asked a question and they lack facts, an AI will figure \u201c3 out of 4 isn\u2019t bad\u201d and give a dangerously convincing answer that\u2019s not based in reality.<\/p>\n\n\n\n<p>There are two important things we can do to reduce this risk:<\/p>\n\n\n\n<ul>\n<li>Supply high-quality context to the underlying model &#8211; the more <em>relevant<\/em> information available the better.&nbsp; Supplying insufficient information invites the model to guess and supplying irrelevant information encourages it to head off on the wrong track.<\/li>\n\n\n\n<li>Verify the model\u2019s answers against a ground truth &#8211; run your tests, have experts review your code, verify the dynamic behaviour of the application matches what you expected.<\/li>\n<\/ul>\n\n\n\n<p>You want to focus the model\u2019s intelligence on solving the real problem (not on guessing), then know when it has actually solved it.<\/p>\n\n\n\n<p><strong>Q5. While LLM-based code assistants are incredibly powerful, there is critical information they lack that limits their effectiveness and makes human oversight essential. Why this?<\/strong><\/p>\n\n\n\n<p><strong>What does it mean in practice?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>As a CTO, I\u2019ll divide my answer into two parts:<\/p>\n\n\n\n<ul>\n<li><em>As an engineer<\/em>, LLMs don\u2019t know enough about your code to solve all the problems you wish they could solve.&nbsp; They typically don\u2019t have good knowledge of the runtime behaviour of the system, which makes incorrect answers more likely.&nbsp; And they\u2019re not good at inferring design intent, making it harder to fix subtle bugs correctly.<\/li>\n\n\n\n<li><em>As a product manager<\/em>, LLMs lack the insight into the true purpose of the software to be built.&nbsp; You cannot rely on them to design the code to the needs of the end users, long term evolution \/ maintenance and business tradeoffs required.<\/li>\n<\/ul>\n\n\n\n<p><strong>Q6. LLMs are brilliant at static analysis\u2014interpreting the text of a codebase, logs, and other documents. But they are blind to dynamic behavior. This is the critical information they lack and cannot get. Why? Do you have a solution for this problem?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Coding agents have a similar weakness to humans: they can\u2019t see what the program <em>really did<\/em> at runtime and it\u2019s hard to reason about <em>why<\/em> things happened.&nbsp; They can get some of this from logs (and LLMs are really good at reading logs!) but logging can only capture so much.<\/p>\n\n\n\n<p>There\u2019s a catch 22 here for the developer: <em>if you\u2019d been able to predict precisely what logging you\u2019d need to fix the bug you\u2019re investigating, then you\u2019d have known enough to avoid the bug in the first place.<\/em>&nbsp; There\u2019s no reason to think that\u2019s different for LLMs.<\/p>\n\n\n\n<p>Coding agents can follow the same tedious loop that humans do: adding more logging to a codebase and running stuff again (or perhaps asking a human to obtain more logs some other way).<\/p>\n\n\n\n<p>They can even do this toil more enthusiastically than any human! But the speed you gained from the agent may just disappear into a swamp of rebuilding, attempting to reproduce, finding what logging statements are still missing and then repeating the process.&nbsp; This kind of inefficiency will be bad news for any Engineering department hoping to improve productivity in return for their AI spend.<\/p>\n\n\n\n<p><strong>Q7. It seems that time travel debugging (TTD) directly addresses this limitation. Please tell us more.<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Time travel debugging captures a trace of <em>everything<\/em> a program does during execution.&nbsp; The resulting recordings effectively represent the whole state of memory at every machine instruction the program executed.<\/p>\n\n\n\n<p><em>Anything<\/em> you want to know about the program\u2019s runtime behaviour can then be queried from the recording, without needing to re-run or change the code.&nbsp; Rare bugs become fully reproducible and any state can be explored in detail.&nbsp; Moreover, the ability to rewind time makes it easy to explore <em>why<\/em> a bad state arose, not just <em>what<\/em> the state was.<\/p>\n\n\n\n<p>Of course, storing all of memory at every point in execution time would be extremely inefficient!&nbsp; A modern, scalable time travel debugger stores only information that flows into the program (initial memory state, IO from disk and network, system calls results, non-deterministic CPU instructions, etc).&nbsp; This makes it possible to efficiently recompute all other state on demand.&nbsp; Watch the talk \u201c<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.youtube.com\/watch?v=NiGzdv84iDE');\"  href=\"https:\/\/www.youtube.com\/watch?v=NiGzdv84iDE\">How do Time Travel Debuggers Work?<\/a>\u201d for the full details on how a modern time travel debugger is built.&nbsp;&nbsp;<\/p>\n\n\n\n<p>For an AI, this capability is ideal.&nbsp; Remember that we need <em>high-quality<\/em> context to feed the model and a <em>ground truth<\/em> to make sure its answers are based in reality.&nbsp; With time travel debugging, a coding agent has access to a recording of the program\u2019s dynamic state and can drill down in detail on any suspicious behaviours &#8211; that gives us high-quality context.&nbsp; The ground truth comes from the deterministic nature of the recording and also makes it possible to verify the AI\u2019s findings.<\/p>\n\n\n\n<p>These properties mean that AI coding agents get smarter when given access to a time travel debugging system.<\/p>\n\n\n\n<p><strong>Q8. You have released an add-on extension called <\/strong><strong><em>explain<\/em><\/strong><strong>, which integrates with your UDB debugger (part of the Undo Suite). What is it and what is it useful for?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Good question. Let me explain first what Undo is to set the context. It\u2019s our time travel debugging technology (which runs on Linux x86 and ARM64) and is mostly used to debug complex enterprise software that makes use of advanced multithreading techniques, shared memory, direct device accesses, etc.<\/p>\n\n\n\n<p>The Undo Suite captures precise recordings of unmodified programs using just-in-time binary instrumentation.The two main components of the Undo Suite are:<\/p>\n\n\n\n<ul>\n<li>LiveRecorder &#8211; which captures program executions into portable recording files.<\/li>\n\n\n\n<li>UDB &#8211; which provides a GDB-compatible interface to debug both live processes and recordings (but also integrates into IDEs such as VS Code).<\/li>\n<\/ul>\n\n\n\n<p>The explain extension is our first step in integrating AI with a time travel debugging system.&nbsp; It provides two pieces of functionality:<\/p>\n\n\n\n<ul>\n<li>An MCP (Model Context Protocol) server &#8211; this exports the functionality of our UDB debugger for use by an AI agent, allowing it to integrate into existing AI workflows including agentic IDEs (such as <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/code.visualstudio.com\/docs\/copilot\/overview');\"  href=\"https:\/\/code.visualstudio.com\/docs\/copilot\/overview\">VS Code with Copilot<\/a>, <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/cursor.com\/');\"  href=\"https:\/\/cursor.com\/\">Cursor<\/a> or <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/windsurf.com\/');\"  href=\"https:\/\/windsurf.com\/\">Windsurf<\/a>).<\/li>\n\n\n\n<li>The explain command itself, which provides additional tight integration with terminal-based coding agents (such as <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.anthropic.com\/claude-code');\"  href=\"https:\/\/www.anthropic.com\/claude-code\">Claude Code<\/a>, <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/ampcode.com\/');\"  href=\"https:\/\/ampcode.com\/\">Amp<\/a> and <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/github.com\/openai\/codex');\"  href=\"https:\/\/github.com\/openai\/codex\">Codex CLI<\/a>) where available.<\/li>\n<\/ul>\n\n\n\n<p>In either case, we\u2019re providing the power of time travel debugging to an AI, so that it can reason about the dynamic behaviour of a program.&nbsp; As the name suggests, this extension has a particular focus on <em>explaining<\/em> program behaviour &#8211; how a given state arose, why the program crashed, etc.<\/p>\n\n\n\n<p>We provide a carefully-designed set of tools to the agent so that it can answer these questions effectively. It\u2019s important that the design of the MCP tools guides the actions to be taken by the LLM, otherwise it can easily get overwhelmed by the complexity.<\/p>\n\n\n\n<p>In an agentic IDE you can connect to the MCP server in a running UDB session &#8211; then ask the agent questions (use the \/explain prompt exported by the server for best results).&nbsp; In UDB itself, you can just type the explain command and we\u2019ll automatically invoke your preferred terminal coding agent and put it to work on your problem.<\/p>\n\n\n\n<p><strong>Q9.&nbsp; Can you show us an example of how time traveling with an AI code assistant works in practice?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>Sure! I\u2019d recommend watching these two demo videos:<\/p>\n\n\n\n<ol>\n<li>The cache_calculate demo video on the <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/products\/undo-ai\/');\"  href=\"https:\/\/undo.io\/products\/undo-ai\/\">Undo website<\/a> which showcases how to use explain to get AI to tell you what has gone wrong in the program.<\/li>\n\n\n\n<li>This <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/youtu.be\/dmH7owoctC4');\"  href=\"https:\/\/youtu.be\/dmH7owoctC4\">YouTube video<\/a> where I use AI + time travel debugging to explore the codebase of the legendary Doom game and understand exactly what the program did when I played it.<\/li>\n<\/ol>\n\n\n\n<p>We have additional demos, showcasing more advanced functionality, which aren\u2019t yet public &#8211; you can book a personalised demo from <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/products\/undo-ai\/');\"  href=\"https:\/\/undo.io\/products\/undo-ai\/\">https:\/\/undo.io\/products\/undo-ai\/<\/a> to see the more advanced AI debugging functionality we\u2019re currently building.<\/p>\n\n\n\n<p><strong>Qx. Anything else you wish to add?<\/strong><\/p>\n\n\n\n<p><strong>Mark Williamson: <\/strong>The core message here is that AI-Augmented Software Engineers still need the right tools to do their jobs well.&nbsp; Our goal is to make AI coding agents more effective at understanding and fixing complex code, improving the return on investment Engineering departments get on their AI stack.<\/p>\n\n\n\n<p>The next big step for us will be designing a UX to be used <em>by AIs<\/em> instead of by humans.&nbsp; Providing time travel debugging to a coding agent is already useful, but to get the best performance we need to work with what LLMs are good at.&nbsp; In other words:<\/p>\n\n\n\n<ul>\n<li>A query-like interface: rather than the statefulness of a debugger, LLMs are happiest when they can ask Big Questions and get a report in answer.&nbsp; Our engine lets us extract detailed information very quickly from a recording so that an AI can start with an overview, then drill down.<\/li>\n\n\n\n<li>Specialised, composable tools: a debugger provides quite general tools (stepping, breakpoints, etc) for a human developer to apply to any problem.&nbsp; Coding agents can use these but we believe LLM intelligence is best spent on solving the <em>core problem<\/em> well, rather than diluting it on planning complex tool use.&nbsp; A specialised set of analyses will allow the LLM to focus on what it\u2019s good at &#8211; finding patterns and proposing fixes.<\/li>\n<\/ul>\n\n\n\n<p>On top of these tools and the data contained within our recordings, we are building <strong>Undo AI<\/strong> &#8211; a product to enable agentic debugging at enterprise scale.&nbsp; We\u2019re currently taking applications for our <strong>pilot program<\/strong>, please get in touch to find out more at <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/undo.io\/');\"  href=\"http:\/\/undo.io\/\">undo.io<\/a> .<\/p>\n\n\n\n<p>\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/wp-content\/uploads\/2025\/09\/Mark-Williamson.png');\"  href=\"https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2025\/09\/Mark-Williamson.png\"><img decoding=\"async\" loading=\"lazy\" width=\"235\" height=\"230\" src=\"https:\/\/www.odbms.org\/blog\/wp-content\/uploads\/2025\/09\/Mark-Williamson.png\" alt=\"\" class=\"wp-image-5800\"\/><\/a><\/figure>\n\n\n\n<p><strong>Mark Williamson, <\/strong>Chief Technical Officer, Undo<\/p>\n\n\n\n<p><em>After a few years as our Chief Software Architect, Mark is now acting as Undo\u2019s CTO. Mark loves developing new technology and getting it to people who can benefit. He is a specialist in kernel-level, low-level Linux, embedded development with a wide experience in cross-disciplinary engineering.<\/em><\/p>\n\n\n\n<p><em>In his previous role, his remit was to align the product\u2019s architecture with the company\u2019s needs, provide technical and design leadership, and lead internal quality work. One of his proudest achievements is his quest towards an all-green test suite!<\/em><\/p>\n\n\n\n<p><em>As Undo\u2019s CTO, Mark\u2019s primary responsibility is to scale product-market fit and ensure we take our products in the right direction to meet the needs of a broader spectrum of customers.<\/em><\/p>\n\n\n\n<p><em>Mark is also author on <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/medium.com\/@mark_undoio');\"  href=\"https:\/\/medium.com\/@mark_undoio\">Medium<\/a>, a <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.youtube.com\/watch?v=to8KkFQn7jE&amp;t=2722s');\"  href=\"https:\/\/www.youtube.com\/watch?v=to8KkFQn7jE&amp;t=2722s\">conference speaker<\/a>, and a new home owner enjoying the delights of emergency home repairs!<\/em><\/p>\n\n\n\n<p>\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026..<\/p>\n\n\n\n<p><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/x.com\/odbmsorg');\"  href=\"https:\/\/x.com\/odbmsorg\"><strong>Follow us on X<\/strong><\/a><\/p>\n\n\n\n<p><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.linkedin.com\/in\/roberto-v-zicari-087863\/');\"  href=\"https:\/\/www.linkedin.com\/in\/roberto-v-zicari-087863\/\"><strong>Follow us on LinkedIn<\/strong><\/a><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>&#8220;Quality of code (and everything that goes along with it) isn\u2019t talked about enough in AI conversations!&nbsp; There are some obvious facets to this &#8211; does the code do what you intended?&nbsp; Is it fast?&nbsp; Does it crash in the common cases?&#8221; Q1. Can AI write better code than humans? Mark Williamson: I don\u2019t think [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[990,1525,1640,327,1756,1813,1812,1432],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5796"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=5796"}],"version-history":[{"count":7,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5796\/revisions"}],"predecessor-version":[{"id":5804,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/5796\/revisions\/5804"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=5796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=5796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=5796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}