Do we still have an impedance mismatch problem? – Interview with José A. Blakeley and Rowan Miller.
“The impedance mismatch problem has been significantly reduced, but not entirely eliminated”– José A. Blakeley.
” Performance and overhead of ORMs has been and will continue to be a concern. However, in the last few years there have been significant performance improvements” –José A. Blakeley, Rowan Miller.
Do we still have an impedance mismatch problem in 2012?
Not an easy question to answer. To get a sense of where are we now, I have interviewed José A. Blakeley and Rowan Miller. José is a Partner Architect in the SQL Server Division at Microsoft, and Rowan is the Program Manager for the ADO.NET Entity Framework team at Microsoft.
The focus of the interview is on ORM (object-relational mapping) technology) and the new release of Entity Framework (EF 5.0). Entity Framework is an object-relational mapper developed by Microsoft that enables .NET developers to work with relational data using domain-specific objects.
Q1. Do we still have an impedance mismatch problem in 2012?
Blakeley: The impedance mismatch problem has been significantly reduced, but not entirely eliminated.
Q2. In the past there have been many attempts to remove the impedance mismatch. Is ORM (object-relational mapping) technology really the right solution for that in your opinion? Why? What other alternative solutions are feasible?
Blakeley: There have been several attempts to remove the impedance mismatch. In the late ’80s, early ’90s, object databases and persistent programming languages made significant progress in persisting data structures built in languages like C++, Smalltalk, and Lisp almost seamlessly. For instance, Persistent C++ languages could persist structures containing untyped pointers (e.g., void*). However, to succeed over relational databases, persistent languages needed to also support declarative, set-oriented queries and transactions.
Object database systems failed because they didn’t have strong support for queries, query optimization, and execution, and they didn’t have strong, well-engineered support for transactions. At the same time relational databases grew their capabilities by building extended relational capabilities which reduced the need for persistent languages and so the world continued to gravitate around relational database systems. Object relational mappings (ORM) systems, introduced in the last decade together with programming languages like C# which added built-in query capabilities (i.e., Language Integrated Query – LINQ)to the language, are the latest attempt to eliminate the impedance mismatch.
Object-relational mapping technology, like the Entity Framework, aims at providing a general solution to the problem of mapping database tables to programming language data structures. ORM technology is a right layering in bridging the complex mapping problem between tables and programming constructs. For instance the queries needed to map a set of tables to a class inheritance hierarchy can be quite complex. Similarly, propagating updates from the programming language structures to the tables in the database is a complex problem. Applications can build these mappings by hand, but the process is time-consuming and error prone. Automated ORMs can do this job correctly and faster.
Q3. What are the current main issues with O/R mappings?
Blakeley, Miller: In the area of functionality, enabling a complete ORM covering all programming constructs is a challenge. For example, up until its latest release EF lacked support for enum types. It’s also hard for an ORM to support the full range of concepts supported by the database. For example, support for spatial data types has been available in SQL Server since 2008 but native support has only just been added to EF. This challenge only gets harder when you consider most ORMs, including EF, support multiple database engines, each with different capabilities.
Another challenge is performance. Anytime you add a layer of abstraction there is a performance overhead, this is certainly true for ORMs. One critical area of performance is the time taken to translate a query into SQL that can be run against the database. In EF this involves taking the LINQ query that a user has written and translating it to SQL. In EF5 we made some significant improvements in this area by automatically caching and re-using these translations. The quality of the SQL that is generated is also key to performance, there are many different ways to write the same query and the performance difference can be huge. Things like unnecessary casts can cause the database not to use an index. With every release of EF we improve the SQL that is generated.
Adding a layer of abstraction also introduces another challenge; ORMs make it easy to map a relational database schema and have queries constructed for you, because this translation is handled internally by the ORM it can be difficult to debug when things don’t behave or perform as expected. There are a number of great tools, such as LINQPad and Entity Framework Profiler, which can help debug such scenarios.
Q4. What is special about Microsoft`s ORM (object-relational mapping) technology?
Miller: Arguably the biggest differentiator of EF isn’t a single technical feature but how deeply it integrates with the other tools and technologies that developers use, such as Visual Studio, LINQ, MVC and many others. EF also provides powerful mapping capabilities that allow you to solve some big impedance differences between your database schema and the shape of the objects you want to write code against. EF also gives you the flexibility of working in a designer (Model & Database First) or purely in code (Code First). There is also the benefit of Microsoft’s agreement to support and service the software that it ships.
Q5. Back in 2008 LINQ was a brand-new development in programming languages. What is the current status of LINQ now? For what is LINQ be used in practice?
Miller: LINQ is a really solid feature and while there probably won’t be a lot of new advancements in LINQ itself we should see new products continuing to take advantage of it. I think that is one of the great things about LINQ, it lends itself to so many different scenarios. For example there are LINQ providers today that allow you to query in-memory objects, relational databases and xml files, just to name a few.
Q6. The original design of the the Entity Framework dated back in 2006. Now, EF version 5.0 is currently available in Beta. What’s in EF 5.0?
Miller: Before we answer that question let’s take a minute to talk about EF versioning. The first two releases of EF were included as part of Visual Studio and the .NET Framework and were referred to using the version of the .NET Framework that they were included in. The first version (EF or EF3.5) was included in .NET 3.5 SP1 and the second version (EF4) was included in .NET 4. At that point we really wanted to release more often than Visual Studio and the .NET Framework released so we started to ship ‘out-of-band’ using NuGet. Once we started shipping out-of-band we adopted semantic versioning (as defined at http://semver.org ). Since then we’ve released EF 4.1, 4.2, 4.3 and EF 5.0 is currently available in Beta.
EF has come a long way since it was first released in Visual Studio 2008 and .NET 3.5. As with most v1 products there were a number of important scenarios that weren’t supported in the first release of EF.
EF4 was all about filling in these gaps and included features such as Model First development, support for POCO classes, customizable code generation, the ability to expose foreign key properties in your objects, improved support for unit testing applications built with EF and many other features.
In EF 4.1 we added the DbContext API and Code First development. The DbContext API was introduced as a cleaner and simpler API surface over EF that simplifies the code you write and allows you to be more productive. Code First gives you an alternative to the designer and allows you to define your model using just code. Code First can be used to map to an existing database or to generate a new database based on your code. EF 4.2 was mainly about bug fixes and adding some components to make it easier for tooling to interact with EF. The EF4.3 release introduced the new Code First Migrations feature that allows you to incrementally change your database schema as your Code First model evolves over time.
EF 5.0 is currently available in Beta and introduces some long awaited features including enum support, spatial data types, table valued function support and some significant performance improvements. In Visual Studio 11 we’ve also updated the EF designer to support these new features as well as multiple diagrams within a model and allowing you to apply coloring to your model.
Q7. What are the features that did not make it into EF 5.0., that you consider are important to be added in a next release?
Miller: There are a number of things that our customers are asking for that are on the top of our list for the upcoming versions of EF. These include asynchronous query support, improved support for SQL Azure (automatic connection retries and built in federation support), the ability to use Code First to map to stored procedure and functions, pluggable conventions for Code First and better performance for the designer and at runtime. If we get a significant number of those done in EF6 I think it will be a really great release. Keep in mind that because we now also ship in between Visual Studio releases you’re not looking at years between EF releases any more.
Q8. If your data is made of Java objects, would Entity Framework be useful? And if yes, how?
Blakeley: Unfortunately not. The EF ORM is written in C# and runs on the .NET Common Language Runtime (CLR). To support Java objects, we would need to have a .NET implementation of Java like Sun’s Java.Net.
Q9. EF offers different Entity Data Model design approaches: Database First, Model First, Code First. Why do you need three different design approaches? When would you recommend using each of these approaches?
Miller: This is a great question and something that confuses a lot of people. Whichever approach you choose the decision only impacts the way in which you design and maintain the model, once you start coding against the model there is no difference. Which one to use boils down to two fundamental questions. Firstly, do you want to model using boxes and lines in a designer or would you rather just write code? Secondly, are you working with an existing database or are you creating a new database?
If you want to work with boxes and lines in a designer then you will be using the EF Designer that is included in Visual Studio. If you’re targeting an existing database then the Database First workflow allows you to reverse engineer a model from the database, you can then tweak that model using the designer. If you’re going to be creating a new database then the Model First workflow allows you to start with an empty model and build it up using the designer. You can then generate a database based on the model you have created. Whether you choose Model First or Database First the classes that you will code against are generated for you. This generation is customizable though so if the generated code doesn’t suit your needs there is plenty of opportunity to customize it.
If you would rather forgo the designer and do all your modeling in code then Code First is the approach you want. If you are targeting an existing database you can either hand code the classes and mapping or use the EF Power Tools (available on Visual Studio Gallery) to reverse engineer some starting point code for you. If you are creating a new database then Code First can also generate the database for you and Code First Migrations allows you to control how that database is modified as your model changes over time. The idea of generating a database often scares people but Code First gives you a lot of control over the shape of your schema. Ultimately if there are things that you can’t control in the database using the Code First API then you have the opportunity to apply them using raw SQL in Code First Migrations.
Q10. There are concerns about the performance and the overhead generated by ORM technology. What is your opinion on that?
Blakeley: Performance and overhead of ORMs has been and will continue to be a concern. However, in the last few years there have been significant performance improvements in reducing the code path for the mapping implementations, relational query optimizers continue to get better at handling extremely complex queries, finally, processor technology continues to improve and there is abundant RAM allowing for larger object caches that speed up the mapping.
José Blakeley is a Partner Architect in the SQL Server Division at Microsoft where he works on server programmability, database engine extensibility, query processing, object-relational functionality, scale-out database management, and scientific database applications. José joined Microsoft in 1994. Some of his contributions include the design of the OLE DB data access interfaces, the integration of the .NET runtime inside the SQL Server 2005 products, the development of many extensibility features in SQL Server, and the development of the ADO.NET Entity Framework in Visual Studio 2008. Since 2009 José has been building the SQL Server Parallel Data Warehouse, a scale-out MPP SQL Server appliance. José has authored many conference papers, book chapters and journal articles on design aspects of relational and object database management systems, and data access. Before joining Microsoft, José was a member of the technical staff with Texas Instruments where he was a principal investigator in the development of the DARPA funded Open-OODB object database management system. José became an ACM Fellow in 2009. He received a Ph.D. in computer science from University of Waterloo, Canada on materialized views, a feature implemented in all main commercial relational database products.
Rowan Miller works as a Program Manager for the ADO.NET Entity Framework team at Microsoft. He speaks at technical conferences and blogs. Rowan lives in Seattle, Washington with his wife Athalie. Prior to moving to the US he resided in the small state of Tasmania in Australia.
Outside of technology Rowan’s passions include snowboarding, mountain biking, horse riding, rock climbing and pretty much anything else that involves being active. The primary focus of his life, however, is to follow Jesus.
For further readings