On the Evolution of Python. Q&A with Diego Russo 

Q1. How is Python evolving its infrastructure and tooling to support the explosive growth of AI and machine learning workloads, particularly around vector operations and large-scale data processing?

For AI and ML workloads I see Python’s role in two main areas: the interpreter itself and the basic ecosystem infrastructure around it.

The goal is not to turn CPython into a tensor engine, but to make the runtime a fast, stable and predictable orchestrator for libraries like NumPy, PyTorch and JAX. The work done in recent releases has already shifted the baseline here: Python 3.11 delivered roughly a 25% average speed-up over 3.10 on the standard benchmarks, with many workloads doing better, and 3.12, 3.13 and now 3.14 have continued to reduce overhead. For many AI codebases, simply moving from a 3.9 or 3.10 runtime to 3.14 gives a very noticeable performance improvement without changing the application code, as long as deprecations and removed modules are handled during the upgrade.

On the concurrency side, PEP 703 introduced a free-threaded build of CPython that removes the Global Interpreter Lock (GIL), and this first shipped as an experimental option in Python 3.13. With PEP 779 in Python 3.14, free-threaded mode is now an officially supported configuration. This change is aimed exactly at the class of AI workloads that are CPU bound in Python itself rather than in the underlying native kernels. For people who want to track ecosystem support around this, there is ongoing community work documented at py-free-threading.github.io, focused on scientific computing and ML libraries.

In terms of infrastructure, the PSF maintains the essential services that make large AI stacks usable at all (e.g. PyPI and its hosting). Platform standards for binary wheels, packaging improvements such as treating system Pythons as “externally managed” and moving towards virtual environments by default in installers are other activities in this area. Those pieces matter a lot once you start deploying multi-gigabyte AI and data toolchains across developer laptops, clusters and cloud services. Alongside that, newer tools like uv offer an extremely fast, Rust-based package and project manager that combines dependency resolution, virtual environment management and multi-Python support in one place, which is very helpful when you are iterating quickly on complex AI environments.

Q2. What are the most significant performance improvements or optimizations in recent Python releases that benefit data-intensive applications, and what’s on the roadmap for future versions?

For intensive workloads, the biggest win is simply to move off 3.9/3.10 and onto the 3.11–3.14 line. Python 3.11’s specialising adaptive interpreter (PEP 659) gives roughly a 1.25× speedup over 3.10 on the standard benchmark suite, with individual benchmarks seeing 10–60% gains, and 3.12 and 3.13 continue to remove overhead in the interpreter and runtime.

By 3.13, the runtime adds two important new execution modes: an experimental JIT compiler (PEP 744) and an experimental free-threaded build, which removes the GIL so CPU bound Python can actually scale across cores if your extensions are thread safe. In 3.14, free-threaded Python becomes an officially supported configuration, with overhead for single thread application kept in the low single digits on standard benchmarks.

Python 3.14 also introduces a new “tail calling” interpreter variant that relies on compiler support for efficient tail calls. With current LLVM/Clang, that gives a further single digit to low double digit percentage speedup on pyperformance; GCC and MSVC already have the needed support implemented and it will roll out in upcoming releases, so this path will open up more widely as toolchains catch up.

On the roadmap, there are a few clear priorities that matter for data workloads:
    •    Ensure that free-threading has stable performance, predictable behaviour, and broad extension support; so that it can become the standard
    •    Make the JIT faster and compatible with free-threading, so users can have both multi-core scaling and better single core throughput
    •    Introduce Rust in selected parts of CPython to improve memory and thread safety in this more concurrent world (https://github.com/Rust-for-CPython)
    •    Add explicit lazy imports (PEP 810) so large and dependency heavy applications can cut start up time and initial memory by deferring big imports until first use

Q3. With NumPy, Pandas, and PyTorch dominating the data science landscape, how does the Python core team coordinate with these major library maintainers to ensure compatibility and performance as the language evolves?

There is no single central authority for NumPy, Pandas, PyTorch and similar projects, but there is a lot of deliberate coordination where it actually matters.

The first layer is people. Many maintainers of the big data libraries are deeply involved in Python’s design process: they participate in PEP discussions, attend Python conferences, and are active on the Python forums. When changes touch the type system, the C API, or concurrency, you can usually assume someone from these projects has been in the room arguing for or against the proposal. If a change would break a major library or makes their life significantly harder, that feedback tends to show up very early and can block or reshape the design before it lands.

The second layer is the public interfaces that libraries depend on. The C API Working Group created by PEP 731 exists specifically to oversee and modernise the CPython C API with extension authors in mind, rather than changing it ad hoc. Projects like HPy are exploring a cleaner, more portable extension API that works well with multiple interpreters and future free‑threaded builds, which is very relevant to large numeric libraries implemented in C or C++. On the typing side, the separate typing specification and Python Typing Council (PEP 729) mean that new type features are designed with heavy library users involved and in mind, not just core developers.

On top of that, as defined by PEP 602, every CPython release goes through a period where release candidates are widely tested against upstream packages; issues that show up there are treated as release‑blocking bugs on the Python side or prompt quick fixes on the library side.

Q4. What are the current best practices and common pitfalls you see when enterprises deploy Python for production database systems and real-time data processing at scale?

For production databases and real time data, Python works well when you use it as the coordinator, not the worker. Let the database or engine do the heavy lifting: push joins, filters and aggregations into SQL or into a columnar engine such as DuckDB or Arrow backed DataFrames, rather than looping over rows in Python. Use async I/O for high concurrency services, with proper async database drivers and frameworks, instead of a thread per request model that just burns context switches without improving throughput.

For CPU bound work, assume that naïve multi threaded Python will not scale under the classic GIL. Either move hot paths into native code that releases the GIL, use processes for parallelism, or, if you are on a modern release and can audit your stack, evaluate the free-threaded build as a separate engineering project. Around that, treat environments as disposable and reproducible: virtual environments or containers everywhere, pinned dependencies, and no manual pip install into system Pythons (in modern OSes/pip this is highly discouraged).

Q5. Looking ahead, what are Python’s key priorities for supporting the next generation of data applications, particularly around concurrency, type safety, and integration with modern data platforms?

Looking ahead, I see a few clear priorities. On concurrency, the focus is to make free-threaded Python the stable, reliable and default choice for workloads that need it. The community has now a supported no-GIL build, and the next steps are to keep overhead low, harden the implementation and bring more C extensions into a thread-safe world. On top of that sits the JIT: the goal is a genuinely performant JIT that is a net win on real code and works cleanly with free-threading, so you get better single core and multi core performance in the same runtime.

Independent of the JIT, there is ongoing work to improve the interpreter and standard library themselves. Despite the improved performance of the latest releases, there is still room to optimise bytecode execution, memory management and hot modules. Rust is also being considered for some standard library modules, primarily to gain memory and thread safety as the community pushes harder on parallelism. If you talk about performance you also need observability, so a better stack of debugging and profiling tools for live, long-running Python processes is very much on the roadmap.

Beyond CPython itself, packaging is a big part of the future for data applications. In particular, the wheelnext.dev initiative is looking at how to make the experience of installing and updating scientific and ML stacks much less painful, for example with better handling of hardware specific wheels and complex native dependencies. That kind of work at the ecosystem level matters just as much as another few percent in the interpreter if you want Python to stay a practical choice for serious data systems.

……………………………………………

Diego Russo is a CPython core developer and Principal Software Engineer in Arm’s Runtimes team, based in Cambridge, UK. Python user since 2006, contributor since 2023, his work focuses on making CPython and its ecosystem work reliably on Arm platforms, improving performance and CI, and collaborating with open source projects that target Arm. He is also a EuroPython organiser and leads the Arm Python Guild, an internal community of more than 1,400 Python developers.

Disclaimer: The views and opinions expressed in these answers are my own and do not represent the official position of my employer or of the Python core development team.

You may also like...