Getting Up to Speed on Hadoop and Big Data
By Cynthia M. Saracco, Senior Solution Architect, IBM Silicon Valley Lab
It’s a question I’ve often heard during my 4 years of working on Big Data projects: How can someone get up to speed quickly on Big Data technologies, particularly those involving Hadoop?
While there’s no way to go from novice to expert in a few hours or few days, there are a number of good options to help you build skills quickly. I’ve outlined my top 3 recommendations below.
1. Understand the concepts.
Spend a bit of time learning the basic concepts — what the technology is designed to do, how it’s being used, how experts expect it to evolve, and so on. You can go about this in several ways: the Apache site has ample technical information on Hadoop and its related projects; various online courses are available from Big Data University, Udacity, and vendor sites; several publishers offer digital and hard copy books on the Hadoop ecosystem, including O’Reilly, Manning, McGraw Hill, and others; and numerous Web sites have technical articles.
If you prefer presentations and videos, vendor and conference Web sites offer a good selection. For example, you’ll find a range of videos on Hadoop and Big Data topics available through IBM site.
It doesn’t hurt to explore market dynamics and trends impacting Big Data technologies. Some industry analysts (such as Gartner Group, IDC, and Forrester) publish free summaries of their market analysis research or offer free attendance at certain Web seminars.
2. Get hands-on experience.
Reading about a technology, watching videos, and playing recorded demos will only get you so far. If you really want to learn about Big Data technologies, plan to spend some time using them. In the Hadoop arena, just about every vendor that offers a packaged distribution provides some sort of free sandbox environment.
IBM, for example, offers a free VMware image, a free software image for installing on your private cluster, and a free cloud service via Bluemix (look for the Analytics for Hadoop service).
Find a sandbox that you like, and get to work. Start simple, but focus on those technologies you think are most relevant to your organization. The open source community — particularly Apache sites — often contain samples and introductory materials to help you get started. And many vendor sites provide sample tutorials or application scenarios, too.
In fact, I’ve had so many requests for such materials, that I’ve published some in the tutorials section of Hadoop Dev, IBM’s web site for Hadoop developers.
3. Network with your peers.
Finally, look for ways to learn from your peers. Many regions have active MeetUp communities focused on Big Data where you attend a brief lecture on a technical topic, see a live demo, or simply network with others who are working on Big Data projects. In addition, conferences and workshops — both academic and commercial — offer opportunities for candid discussions with more advanced users.
If you don’t live an area where periodic face-to-face networking opportunities are readily available, visit online forums dedicated to the technologies that interest you. And, once you have some practical experience, don’t forget to share what you’ve learned wit