Orleans, the technology behind Xbox Halo4 and Halo5. Interview with Phil Bernstein
“Orleans is an open-source programming framework for .NET that simplifies the development of distributed applications, that is, ones that run on many servers in a datacenter.”– Phil Bernstein.
I have interviewed, Phil Bernstein,a well known data base researcher and Distinguished Scientist at Microsoft Research, where he has worked for over 20 years. We discussed his latest project “Orleans”.
Q1. With the project “Orleans” you and your team invented the “Virtual Actor abstraction”. What is it?
Phil Bernstein: Orleans is an open-source programming framework for .NET that simplifies the development of distributed applications, that is, ones that run on many servers in a datacenter. In Orleans, objects are actors, by which we mean that they don’t share memory.
In Orleans, actors are virtual in the same sense as virtual memory: an object is activated on demand, i.e. when one of its methods is invoked. If an object is already active when it’s invoked, the Orleans runtime will use its object directory to find the object and invoke it. If the runtime determines that the object isn’t active, the runtime will choose a server on which to activate the object, invoke the object’s constructor on that server to load its state, invoke the method, and update the object directory so it can direct future calls to the object.
Conversely, an object is deactivated when it hasn’t been invoked for some time. In that case, the runtime calls the object’s deactivate method, which does whatever cleanup is needed before freeing up the object’s runtime resources.
Q2. How is it possible to build distributed interactive applications, without the need to learn complex programming patterns?
Phil Bernstein: The virtual actor model hides distribution from the developer. You write code as if your program runs on one machine. The Orleans runtime is responsible for distributing objects across servers, which is something that doesn’t affect the program logic. Of course, there are performance and fault tolerance implications of distribution.
But Orleans is able to hide them too.
Q3. Building interactive services that are scalable and reliable is hard. How do you ensure that Orleans applications scale-up and are reliable?
Phil Bernstein: The biggest impediment to scaling out an app across servers is to ensure no server is a bottleneck. Orleans does this by evenly distributing the objects across servers. This automatically balances the load.
As for reliability, the virtual actor model makes this automatic. If a server fails, then of course all of the objects that were active on that server are gone. No problem. The Orleans runtime detects the server failure and knows which objects were active on the failed server. So the next time any of those objects is invoked, it takes its usual course of action, that is, it chooses a server on which to activate the object, loads the object, and invokes it.
Q4. What about the object’s state? Doesn’t that disappear when its server fails?
Phil Bernstein: Yes, of course all of the object’s main memory state is lost. It’s up to the object’s methods to save object state persistently, typically just before returning from a method that modifies the object’s state.
Q5. Is this transactional?
Phil Bernstein: No, not yet. We’re working on adding a transaction mechanism. Coming soon.
Q6. Can you give us an example of an Orleans application?
Phil Bernstein: Orleans is used for developing large-scale on-line games. For example, all of the cloud services for Halo 4 and Halo 5, the popular Xbox games, run on Orleans. Example object types are players, game consoles, game instances, weapons caches, and leaderboards. Orleans is also used for Internet of Things, communications, and telemetry applications. All of these applications are naturally actor-oriented, so they fit well with the Orleans programming model.
Q7. Why does the traditional three-tier architecture with stateless front-ends, stateless middle tier and a storage layer have limited scalability?
Phil Bernstein: The usual bottleneck is the storage layer. To solve this, developers add a middle tier to cache some state and thereby reduce the storage load. However, this middle tier loses the concurrency control semantics of storage, and now you have the hard problem of distributed cache invalidation. To enforce storage semantics, Orleans makes it trivial to express cached items as objects. And to avoid concurrency control problems, it routes requests to a single instance of each object, which is ordinarily single-threaded.
Also, a middle-tier cache does data shipping to the storage servers, which can be inefficient. With Orleans, you have an object-oriented cache and do function shipping instead.
Q8. How does Orleans differ from other Actor platforms such as Erlang and Akka?
Phil Bernstein: In Erlang and Akka, the developer controls actor lifecycle. You explicitly create an actor and choose the server on which it’s activated. Fixing the actor’s location at creation time prevents automating load balancing, actor migration, and server failure handling. For example, if an actor fails, you need code to catch the exception and resurrect the actor on another server. In Orleans, this is all automatic.
Another difference is the communications model. Orleans uses asynchronous RPC’s. Erlang and Akka use one-way messages.
Q9. Database people sometimes focus exclusively on the data model and query language, and don’t consider the problem of writing a scalable application on top of the database. How is Orleans addressing this issue?
Phil Bernstein: In a database-centric view, an app is a set of stored procedures with a stateless front-end and possibly a middle-tier cache. To scale out the app with this design, you need to partition the database into finer slices every time you want to add servers. By contrast, if your app runs on servers that are separate from the database, as it does with Orleans, you can add servers to scale out the app without scaling out the storage. This is easier, more flexible, and less expensive. For example, you can run with more app servers during the day when there’s heavier usage and fewer servers at night when the workload dies down. This is usually infeasible at the database server layer, since it would require migrating parts of the database twice a day.
Q10. Why did you transfer the core Orleans technology to 343 Industries ?
Phil Bernstein: Orleans was developed in Microsoft Research starting in 2009. Like any research project, after several years of use in production, it was time to move it into a product group, which can better afford the resources to support it. Initially, that was 343 Industries, the biggest Orleans user, which ships the Halo game. After Halo 5 shipped, the Orleans group moved to the parent organization, Microsoft Game Studios, which provides technology to Halo and many other Xbox games.
In Microsoft Research, we are still working on Orleans technology and collaborate closely with the product group. For example, we recently published code to support geo-distributed applications on Orleans, and we’re currently working on adding a transaction mechanism.
Q11. The core Orleans technology was also made available as open source in January 2015. Are developers actively contributing to this?
Phil Bernstein: Yes, there is a lot of activity, with contributions from developers both inside and outside Microsoft. You can see the numbers on GitHub – roughly 25 active contributors and over 25 more occasional contributors – with fully-tested releases published every couple of months. After the core .NET runtime and Roslyn compiler projects, Orleans is the next most popular .NET Foundation project on GitHub.
Phil Bernstein is a Distinguished Scientist at Microsoft Research, where he has worked for over 20 years. Before Microsoft, he was a product architect and researcher at Digital Equipment Corp. and a professor at Harvard University. He has published over 150 papers and two books on the theory and implementation of database systems, especially on transaction processing and data integration, which are still the major areas of his work. He is an ACM Fellow, a winner of the ACM SIGMOD Innovations Award, a member of the Washington State Academy of Sciences and a member of the U.S. National Academy of Engineering. He received a B.S. degree from Cornell and M.Sc. and Ph.D. from University of Toronto.
On the Industrial Internet of Things. Interview with Leon Guzenda ODBMS Industry Watch, Published on 2016-01-28
Challenges and Opportunities of The Internet of Things. Interview with Steve Cellini ODBMS Industry Watch, Published on 2015-10-07
Follow ODBMS.org on Twitter: @odbmsorg