Comments on: Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.

By: Brady Wetherington

Brady Wetherington — Mon, 25 Feb 2013 22:25:55 +0000

Mongo is not without its flaws – they are numerous and substantial, and I have written about them extensively – but this is not an apples-to-apples comparison of architectures.

If you want to scale writes, you use sharding. If you want to survive failures, you use replica sets. You could argue it’s nicer to have nodes that run partitions of data which are replicated around on nodes (I certainly would) but the mongo architecture is very simple and very easy to administer.

By: Christos Kalantzis

Christos Kalantzis — Sun, 24 Feb 2013 21:12:14 +0000

Hi Dwight,
Let me preface this comment with a clarification, that the intention of the article was never to do a hatchet job on MongoDB. I am sharing our experiences with the technology and how it didn’t fit into our use case of multi-datacenter, mutil-directional heavy writes.

That being said, I’m glad you clarified that in today’s MongoDB a master can be chosen automatically to fill that role and that a new master will then be chosen as needed, in case of failure.

However, that is still a Master-Slave scenario, where ALL rights only go to the single Master (of that shard/replica-set) and eventually will get replicated to the Slaves. That architecture limits your write throughput to a single server. This subsequently limits the efficiency of your hardware usage, since adding more servers, doesn’t increase both write and read throughput, unlike in C* where adding a new node to the cluster linearly increases both write and read throughput (Reference)

“A MongoDB replica set is a cluster of mongod instances that replicate amongst one another and ensure automated failover. Most replica sets consist of two or more mongod instances with at most one of these designated as the primary and the rest as secondary members. Clients direct all writes to the primary, while the secondary members replicate from the primary asynchronously. . . .If you’re familiar with other database systems, you may think about replica sets as a more sophisticated form of traditional master-slave replication. In master-slave replication, a master node accepts writes while one or more slave nodes replicate those write operations and thus maintain data sets identical to the master. For MongoDB deployments, the member that accepts write operations is the primary, and the replicating members are secondaries.”
http://docs.mongodb.org/manual/core/replication/

Furthermore, setting up multi-directional mutil-datacenter replication, with MongoDB, is not supported under it’s single master per shard/replica-set architecture. That was a very important use case for us and was a major reason for considering C*, which handles that scenario with minimal operational complexity.

Cheers,
Christos

By: Dwight Merriman

Dwight Merriman — Tue, 19 Feb 2013 16:20:31 +0000

Hi, one point of clarification, in MongoDB one doesn’t declare a master. The replica set members are peers and perform consensus elections to elect a primary at any given point-in-time.

So for example to set up a three node replica set one would run:*

$ mongo –host host0
> cfg =
{
_id : “mysetname”,
members : [
{_id : 0, host : “host0”},
{_id : 1, host : “host1”},
{_id : 2, host : “host2”},
]
}
> rs.initiate(cfg)

One might declare certain nodes as being secondary-only in some multi data center topologies, or in cases where one wants an analytics workload to only hit a certain slice of the cluster to assure a high QOS for the operational side of the application.

We do get requests from MongoDB users with very large clusters for new cluster management/operational features, that is certainly a high priority on the project roadmap.