Deploying Virtualized Hadoop® Systems with VMware vSphere® Big Data ExtensionsTM
About the Authors
Xinhui Li is a senior member of the development staff at VMware. Her work focuses on the design and optimization of distributed systems, most recently in the HPC and big data fields. She works on the adoption of new technologies on vSphere and provides technical contributions for the vSphere Big Data Extensions product. She writes technical documents, blog articles, and academic papers regularly.
Justin Murray is a senior technical marketing manager at VMware. He has worked at the company since 2007 in various roles, with a main focus on helping customers and partners use VMware products for deploying their applications on Hadoop and other platforms. To this end, he creates technical material for consumption by architects in the virtualization and application spaces and gives talks regularly on these subjects.
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www .vmware .com
Copyright © 2014 VMware, Inc . All rights reserved .
This document provides an introduction to the concepts contained in the VMware vSphere® Big Data ExtensionsTM technology. Big Data Extensions is part of the vSphere product family as of vSphere 5.5 and can also be used with installations of vSphere 5.0. This document also provides a set of different deployment patterns and reference architectures that adopters of Big Data Extensions can implement to use the technology in their own Hadoop® work.
Big Data Extensions provides an integrated set of management tools to help enterprises deploy, run, and manage Hadoop workloads executing in virtual machines on a common infrastructure. Through the VMware vCenterTM user interface, users can manage and scale Hadoop clusters more easily. Big Data Extensions complements and integrates with the vSphere platform through use of a vCenter plug-in and a separate VMware SerengetiTM Management Server running within a virtual appliance.
This document can be used as a starting point for a new installation or for rearchitecting an existing environment. The examples given here can be adapted to suit the environment, based on requirements and the available resources.
In this document, we confine the scope of discussion to versions of Hadoop preceding the 2.0 version, so Hadoop 2.0 YARN technology is not covered here.
This document addresses the following topics:
• An overview of the Hadoop and Big Data Extensions technologies
• Considerations for deploying Hadoop on vSphere with Big Data Extensions
• Architecture and configuration of Hadoop systems on vSphere
Sponsored by VMware