On Using in Memory Database. Interview with Jonah H. Harris
” Whether it’s adding features, fixing bugs, or improving performance, it all comes down to the quality of the code.”–Jonah H. Harris.
Q1. You are the director of Artificial Intelligence & Machine Learning at The Meet Group. What are your current responsibilities?
Jonah H. Harris: AI and ML research is rapidly growing. Staying on top of those advancements to identify key strategic opportunities and improvements that deliver novel and strategic solutions, which solidify our position as leaders in personal connection, is paramount. While setting direction is important, my primary goal is to shape, grow, and lead an exceptional team of Machine Learning Engineers to research, design, develop, and implement innovative solutions and advance our company’s capabilities across multiple business units. Our focus areas primarily include deep learning, natural language processing, computer vision, recommendation, ranking, and anomaly detection. It’s quite a bit to remain current on these days.
Q2. What do you use Artificial Intelligence & Machine Learning for?
Jonah H. Harris: At The Meet Group, we provide multiple brands and platforms which enable members to identify potential partners for romantic, platonic, and entertainment purposes. While traditional recommendation systems match items (e.g., books, videos, etc.) with a user’s interests, we aim to match people who are mutually interested in and likely to communicate with each other. While recommendation is a critical component of our business, additional work is required to perform abuse prevention and improve monetization – all of which are enhanced using a combination of data science, machine learning, and artificial intelligence. Our team employs many different techniques and technologies to accomplish each area mentioned above as quickly and efficiently as possible.
Q3. You have been working previously as the VP of Architecture and Lead DBA, overseeing high performance data access. What were your most important projects?
Jonah H. Harris: Now paired with Parship, The Meet Group is a worldwide leader in personal connection with a globally distributed workforce. When I joined as the Lead DBA in 2008, however, it was a small social network named myYearbook based in New Hope, Pennsylvania. Through multiple acquisitions and stages of the company, from private to NASDAQ-listed and private once again, I’ve been fortunate enough to grow with the organization and hold various positions from individual technologist to Chief Technology Officer. I’ve always enjoyed challenging work and my current position, overseeing AI/ML, is no different.
When I think of all the projects I’ve architected or developed over the years, one of the most fun and architecturally challenging was the reciprocal matchmaking system designed for a game called BlindDate.
BlindDate was a questionnaire-based matchmaking system that allowed members to select questions about themselves, supply their own answers, and identify their desired partner’s answers. To be “matched,” other members would need to answer the same questions along with the desired answers bi-directionally. One important implementation caveat was that we did not want to precompute these matches – they had to be done in (soft) real-time. We found many members would submit hundreds or even thousands of questions. While we did our best to partition this problem into an optimal search space, performing this reciprocal match was a performance challenge.
For our MVP, we initially designed this to use a relational database. Early on, however, we found this began to take around eight hundred milliseconds per request. As the game scaled, this would never work as initially designed. This led us to look at eXtremeDB.
Coupled with its new (at the time) multi-version concurrency control (MVCC) transaction manager and ability to control the low-level data structure format, we were able to design a bitwise-optimized matching algorithm. As a result, the eXtremeDB-based implementation dropped the response time of a single request down to seventy-six microseconds on the exact same hardware; it also reduced memory usage by two-thirds.
Q4. What are the main challenges you have encountered to achieve high performance data access?
Jonah H. Harris: Largely, a primary challenge is defining the appropriate structure to store and query data. Relational databases are great for general-purpose data management. On the other hand, NoSQL-oriented systems are great for flexibility. Similarly, systems such as Redis provide a unique ability to perform tasks that can’t easily be done with great performance in a traditional database management system. When designing an application, you have to choose the best tool for the job and make trade-offs where necessary. In some cases, this requires utilizing multiple data management technologies or sacrificing performance on one task in favor of another. It’s hard to find a system that’s both as flexible as it is fast: eXtremeDB is really the only contender in that category I’ve found.
Q5. Can you tell us about some of the work you have done with eXtremeDB?
Jonah H. Harris: In addition to the BlindDate case mentioned above, we experimented with storing a graph database structure in eXtremeDB – it was highly performant and gave us the ability to store the graph in an optimal form while also making it queryable via SQL.
eXtremeDB is so good that I have personally licensed it to develop and test out my own ideas and implementations of various systems. I’ve built everything from a Redis-compatible service to real-time recommender systems based on eXtremeDB.
I’m actually in the process of writing a book for Apress, Realtime Recommendation Systems: Building Responsive Recommenders from the Ground Up, and testing out several of those algorithms with eXtremeDB as well. Compared to several well-known open-source recommenders, my eXtremeDB-based versions consistently demonstrate several hundred percent improvements in performance. This is due to eXtremeDB’s highly-optimized in-memory implementation, which doesn’t force me to sacrifice on-disk capabilities as other systems do. Additionally, I’ve always licensed the eXtremeDB source code, which is rare for a company to offer. With that, I’ve been able to gain a solid understanding of internals and compile-time optimizations, enabling me to make even better performance gains. The code is immaculate, and McObject is equally great about accepting patches for additional functionality.
Q6. Why choosing eXtremeDB?
Jonah H. Harris: If my earlier answers haven’t already praised its modularity flexibility enough, I’ll state it more clearly: with over twenty years of professional experience not only administering and developing against databases but also working on their internals, eXtremeDB is the only system I’ve found that gives developers the ability to build almost anything with very few constraints.
Likewise, McObject’s support is exceptional. You can ask as detailed of a question as you can imagine and get a solid answer, in many cases from the engineers themselves.
Q7. You have implemented a number of features for commercial and open-source databases. What are the main lessons you have learned?
Jonah H. Harris: Whether it’s adding features, fixing bugs, or improving performance, it all comes down to the quality of the code. Unfortunately, most open-source database code is abysmal. Postgres, InnoDB (proper), and Redis are exceptions. That said, you’d expect commercial implementations to be so much better – but they’re usually not. It’s sad, really.
While I didn’t know it initially, part of the team behind eXtremeDB was also behind the old Raima Database Manager (RDM). In the late nineties, I used RDM quite a bit and had a source code license for its code as well. Aside from the MASM-based NetBIOS lock manager implementation, which I believe they acquired from a third-party developer, it was an extremely well-written system with great documentation. So, when I found out eXtremeDB was a brand new, from the ground-up, in-memory-optimized system with very similar developer-friendly embedded database design goals, I was sold!
Sure, I’ve worked on the internals of many different database systems. But, I have no problem understanding the code to eXtremeDB at all. It’s all well-organized and straightforward, which is hard to do for a system that supports multiple transaction managers and is optimized for both in-memory and on-disk operations.
Q8. You are an active open-source contributor. What are your current open source projects you contribute to?
Jonah H. Harris: As of late, I haven’t had a great deal of time to do much open-source work. Database-wise, my latest contributions are to Redis, adding a few useful commands and performance optimizations. The rest are generally bug fixes or feature additions in libraries I frequently use.
Q9. What is your experience of using open source software for mission critical applications?
Jonah H. Harris: I’ve always been a big advocate of open-source. I remember first using FreeBSD and Linux in the mid-90s when I was in middle school. That said, I’m huge on choosing the best tool for the job at hand. Sometimes that’s open-source, and sometimes it’s not.
In the early 2000s, I was hired to lead the development of a Johnson & Johnson brand’s rewrite of their CFR Part 11 quality system ERP module from PowerBuilder to Apache+PHP. We used a good amount of open-source, but it still ran on top of HP-UX and Oracle. Did it need to? No. But that’s what they were comfortable with and, to be honest, those were a better choice stability-wise at the time.
These days, when I’m building a general back-end web-based API, I default to Node.js+NGINX, Postgres, and Redis. As most things are containerized on top of a Linux distribution these days, it’s hard to beat that stack. Language-wise, I like TypeScript, though I do see cases for Rust and Go in the future.
That said, when I’m building a performance-optimized system, I still prefer C with libuv for networking. For data management, I’ll use eXtremeDB when I need MVCC or dual in-memory/on-disk functionality. There’s no need to reinvent that, and nothing is nearly as fast. Otherwise, I’ll use klib data structures for simple single-threaded apps.
Open source is great, and it’s come a long way. But, there are still valid cases for using commercial systems.
Qx Anything else you wish to add?
Jonah H. Harris: For the most part, IMDB systems have always been considered a niche: you either know about them or you don’t. eXtremeDB is an IMDB-optimized system, but its functionality far surpasses its competitors in every aspect. It can be used locally or distributed, with and without SQL, in-memory only or as an on-disk hybrid, in-process and as a server, with high availability, vector-optimized operations, real-time embeddability, source code, and many compile-time optimizations. More people really should know about it; it’s a genuinely fantastic system.
Jonah H. Harris Director of Artificial Intelligence & Machine Learning, The Meet Group.
Leader. Entrepreneur. Technologist. NEXTGRES Founder. Former CTO at The Meet Group. OakTable Member. Open Source Contributor. Founding Member of the Forbes Technology Council.
Follow us on Twitter: @odbmsorg