Inside MySQL: 20 Years of Source Code, Open Source Contributions, and What Comes Next. A Conversation with Marcelo Altmann
Q1. Marcelo, congratulations on being named MySQL Rockstar 2025! You’ve spent nearly 20 years working with MySQL — from the LAMP stack era when InnoDB was “the new kid” to today’s cloud-native, multi-terabyte production environments. How has the practice of MySQL database administration fundamentally changed over this period, and what core principles have remained surprisingly constant? Given MySQL’s 30th anniversary, where do you see the biggest opportunities for MySQL in the next decade?
Thanks. This title is the recognition of work that started back in 2006 and keeps going. It means a lot to me and I’m really proud of it.
Back in the day, things we now take for granted were still maturing in the MySQL ecosystem. InnoDB, for example, was a plugin you had to install separately to get the latest features. We were in the transition from 5.0 to 5.1 — row-based replication had just been introduced, but most deployments wouldn’t adopt it for years. As a result, replication could sometimes drift, requiring manual intervention and careful reconciliation. Tools like pt-table-checksum and pt-table-sync were essential in every DBA’s toolkit precisely because of that.
What hasn’t changed is the core focus: making deployments secure, stable, and predictable. That principle holds on both sides — the people building the product and the people running it in production.
We’re generating and consuming more data than ever — data is the gold of this century. And MySQL sits right where that data is born. What excites me most is the binlog. It was designed for replication, but it’s become so much more than that. At Readyset, we consume it to keep a caching layer in sync with the source database, no manual invalidation needed. People are doing the same to feed search indexes, analytics, ML pipelines. Now with the AI wave, every application needs somewhere to store embeddings, manage retrieval, keep state — and MySQL is already deployed everywhere. Nobody wants to adopt a whole new database just to experiment with AI. So to me, the biggest opportunity for the next decade is simple: let people do more with the data where it already lives.
Q2. As the lead developer of Percona XtraBackup, you achieved some remarkable performance improvements — 5x faster streaming with FIFO threads, 50% smaller backups with ZSTD compression, and 530% faster multi-threaded streaming via Named Pipes.
Can you walk us through one of these optimizations from a technical perspective? What were the architectural limitations you had to overcome, and what lessons did you learn about performance optimization in MySQL tooling that would apply to other database systems or backup solutions?
Let’s talk about the FIFO feature. When streaming was introduced in XtraBackup around 2019, the use case was straightforward — push backups to the cloud over a WAN link. But then object stores like S3 evolved, and organizations started running their own object storage locally over the LAN. That shifted the equation. Even though XtraBackup already supported parallel backup threads, all that data still funneled through a single STDOUT channel to xbstream/xbcloud. If I remember correctly, serialization over STDOUT via a mutex capped throughput at around 1.8 Gbps — perfectly fine for a WAN link to your cloud provider, but it became a limiting factor when operating on a 10 or 40 Gbps local network. With FIFO, we replaced that single pipe with multiple named pipes, allowing XtraBackup to fully saturate the local bandwidth. The result was about a 5x reduction in backup time for the same 1TB dataset.
Q3.You’re one of the community contributors whose code has been accepted into MySQL itself. What does it actually take to contribute successfully to MySQL’s codebase — not just technically, but in terms of understanding the project’s culture, standards, and review process? For developers who want to contribute to MySQL or other major open source database projects, what advice would you give about navigating the path from “I found a bug” to “my patch is upstream”?
Contributing to MySQL is often misunderstood — people think you need to land some massive feature to make a difference. My first accepted patch was just fixing a URL in a CMake dependency. That’s it. But that tiny patch taught me how the process actually works — after that I got more confident and did more patches. This taught me the coding style, how to run tests, how to submit, what reviewers care about. You build trust with maintainers one patch at a time.
That said, the bar is high, and it should be. Code that “works” isn’t enough — you have to think about edge cases, performance, backward compatibility. Especially in areas like InnoDB, where a subtle mistake can corrupt data. Most of my contributions came from real problems I hit myself.
Now, one thing worth mentioning is that the review process is still evolving — you submit a patch, Oracle integrates it, and if anything needs fixing they handle it internally. The process is currently more streamlined, with less back-and-forth, which also creates an opportunity to further enhance the contributor learning experience. There are ongoing efforts in collaboration with Oracle to make the review process more transparent, which should improve the feedback loop and further encourage more people to get involved.
My advice for anyone looking to contribute: just start. A good bug report is already a contribution. Keep your patches small and focused, write tests, and be patient. The MySQL codebase can feel intimidating, but remember — every contributor you admire started with something small. The community needs fresh eyes and new perspectives, so don’t wait until you feel “ready enough.” Jump in.
Q4. At Readyset, you’re working on a fundamentally different approach to database caching — tailing the MySQL binary log to keep cached query results consistent in real-time, without TTLs or application changes. This requires deep understanding of MySQL’s replication internals. Can you explain the core technical challenge of building cache invalidation based on binlog events? What MySQL replication behaviors or edge cases have surprised you most, and how does this approach compare to traditional query caching or application-level caching strategies?
This is a complex problem on both ends — parsing the binlog and determining what to invalidate.
On the binlog side, the wire protocol is hyper-optimized for performance and size, so you really have to understand every bit — and I mean that literally. There’s a lot of low-level stuff you need to get right: bitmaps for column signedness, bitmaps for which fields are NULL, column presence masks.
Collations are another layer — CHAR padding math depends on whether you’re dealing with utf8mb4 or latin1, binary collations pad with zeros instead of spaces, and the length encoding in metadata bytes gets interesting with multibyte charsets. Integer types have their own surprises too — a MEDIUMINT is a 24-bit integer, which isn’t a native type in most languages.
Rust, for instance, has i8, i16, i32, i64 — no i24. So you read 3 bytes off the wire and need to sign-extend it correctly into a larger type, otherwise signed values come out wrong.
On the invalidation side, it’s one of those classic, well-known challenges in computer science. A primary key lookup is the easy case — row changes, you know which cached result is affected. But a single row change in one table might invalidate a multi-table JOIN where the connection is indirect. Window functions are worse — an insert can shift partition boundaries and reorder rankings across rows that weren’t even touched. Every layer of complexity (aggregates, subqueries, correlated conditions) makes the tracing harder.
The comparison to traditional approaches is what makes this interesting, though. Application-level caching with TTLs is simpler but you’re always trading consistency — either you accept stale reads or set TTLs so aggressive they negate the benefit. MySQL’s built-in query cache invalidated at the table level — any write blew away every cached query touching that table. We invalidate at the row level through a dataflow graph, which is much more precise but means you need to carefully handle all these edge cases correctly.
Q5. Your “Replication Internals: Decoding the MySQL Binary Log” series goes byte-by-byte through event structures — the kind of deep technical content that only comes from years inside MySQL source code. What motivated you to write this level of technical documentation, and what have you discovered about MySQL’s replication format that is often overlooked or not widely explained in one place? For database professionals who want to truly understand the systems they operate rather than just use them, what’s your recommended path to developing that depth of knowledge?
The motivation came from a practical problem — much of this information isn’t consolidated in a single place You end up reading bits of the MySQL source code to understand what a specific field means or how a particular encoding works. I originally put this together as a presentation for MySQL Online Summit, walking through how each binlog event can be manually decoded byte by byte. Once I had it all written down, I realized the length would exceed my time slot by a good margin, so I turned that research into the blog post series instead.
And honestly, I’m glad I did — I often find myself going back to my own notes to check the meaning of a specific field. It became my own reference material as much as anyone else’s.
As for things that are often overlooked — Minimal Row-Based Replication is a good example. Most people know it sends only changed columns in updates, but what’s less widely understood is that you can receive an INSERT with an entirely empty row — no columns at all — and the consumer is expected to populate every column with its default value. These are the kinds of details you typically discover by reading the source code or through hands-on experience in production environments.
This knowledge also had a very practical application at Readyset. We had to implement a lot of this decoding in the Rust MySQL driver we use, which is an open-source project. So a lot of what I learned reading the MySQL source code turned into actual contributions upstream (not only mine but from other engineers in the Readyset team) — things like correctly handling specific event types, metadata parsing, and edge cases that the driver didn’t cover yet.
For database professionals who want to develop that depth of understanding, my advice would be: don’t be afraid to read the source code. Documentation gives you the “what,” but the source code gives you the “why” and the edge cases. Start with a concrete problem — something you actually need to understand for your work — and trace it through the code. That’s how I learned most of what I know: not by setting out to understand all of replication, but by needing to decode a specific event correctly and following the thread from there.
Q6. You’ve worked across the entire MySQL ecosystem — from DBA work managing production systems to developing open source tooling at Percona to building infrastructure at Readyset. Looking at MySQL’s community and ecosystem as it celebrates 30 years, what do you see as the most significant contributions the community has made to MySQL’s success, what do you think will matter most to keep the ecosystem healthy and sustainable over the next decade—whether that’s collaboration between vendors and users, contributor experience, documentation, tooling, or governance processes?
The MySQL community’s contributions over the past 30 years have been enormous — from tools like Percona Server and XtraBackup, to the countless DBAs and developers who’ve shared knowledge, filed bugs, and pushed the ecosystem forward. MySQL became the most popular open-source database in the world not just because of the technology, but because of the community around it.
I think we’re at a unique moment right now. There’s a real conversation happening between Oracle and the community about how MySQL development should look going forward — public roadmap discussions, community summits, more openness about what’s coming. That’s encouraging, and I think what matters most is that we seize this moment. Both sides are at the table, and this is a great opportunity to strengthen collaboration and make it truly effective. MySQL runs at a scale no other open-source database matches, and that means the stakes are high for everyone.
For me, it comes down to a few things: genuine transparency in development — public worklogs, open bug tracking, visible roadmap discussions. A contributor experience that makes it realistic for people outside Oracle to participate meaningfully. And continued investment in the ecosystem — tooling, connectors, documentation.
MySQL has thrived for 30 years because people cared enough to invest in it. The next decade depends on whether we can channel that energy into real collaboration — vendors, users, and Oracle working together in a more aligned and collaborative way.
……………………………………………..

Marcelo Altmann
Marcelo has been working with MySQL since 2006 — back when LAMP stacks were everywhere and InnoDB was still the new kid on the block. Since then he’s gone from DBA work in Brazil to managing Ireland’s country-code top-level domain (.ie) as a MySQL DBA, to spending years deep inside MySQL’s source code at Percona, to his current role at Readyset building query caching infrastructure in Rust.
Along the way, some of his code made it into MySQL itself. That’s not something that happens often for people outside Oracle — it requires understanding the codebase well enough to fix or improve things that the core team accepts upstream. He’s an Oracle ACE Pro and a MySQL Rockstar, which is Oracle’s formal way of acknowledging someone who’s been genuinely contributing to the ecosystem for years — not just using it.
His longest chapter at Percona was as the lead developer of Percona XtraBackup, the go-to open-source backup tool for MySQL. He rebuilt how streaming worked using FIFO threads — cutting backup times by 5x — added ZSTD compression that cut backup sizes in half, and implemented multi-threaded streaming via Named Pipes that pushed performance up by 530%. He also added InnoDB Buffer Pool Dump support, which can speed up post-restore warm-up time dramatically. These weren’t minor patches — they changed how the tool worked at a fundamental level, and they’re in production at thousands of companies right now.
He speaks at conferences regularly. He’s been at Percona Live, the MySQL Global Forum, the MySQL & HeatWave Summit, preFOSDEM MySQL Belgian Days, and others — covering everything from XtraBackup internals to replication deep dives to how Readyset uses the binary log to keep caches fresh. He’s also been on the Percona HOSS Talks FOSS podcast, where he got into debugging techniques like GDB and Record & Replay.
He writes too — on his personal Medium blog and on the Readyset engineering blog, where he’s currently in the middle of a series called “Replication Internals: Decoding the MySQL Binary Log.” It’s the kind of series that goes byte-by-byte through event structures, the sort of thing that only someone who’s spent years reading MySQL source code could write with any confidence.
At Readyset, all of that comes together. He works on a caching layer that tails the MySQL binary log to keep cached query results consistent with the underlying database — no TTLs, no application changes, just real-time invalidation driven by replication events.
—
github: https://github.com/altmannmarcelo/altmannmarcelo
linkedin: https://www.linkedin.com/in/marcelo-altmann/
Sponsored by MySQL/Oracle