The Definitive Guide to Do Data Science for Good
by Tobias Pfaff, founder of DataLook.
This article has benefited from comments by Katharine Bierce, Christian Bracher, Quentin Dumont, Daniel Kirsch, Eric Liu and Miriam Young.
You are a fully-equipped (or aspiring) data scientist and want to use your precious skills for solving problems that really itch the world? Welcome to the club. The good news is that there are many ways for data scientists to do good. However, the path is not always beaten and you might need to show some initiative. This article will give you some insight on how you can get involved, either through group meetings and events, as a volunteer or in paid positions.
Getting started — online data science competitions
A good place to start (without even having to leave your couch!) are online data science competitions. These competitions allow you to sharpen your skills and to get familiar with different problem types before you get actually involved.
The home of data science competitions certainly is Kaggle. Watch out for competitions that tackle social problems. Examples are the diabetic retinopathy detection competition or the Africa soil property predictionchallenge.
DrivenData is a rather new competition platform that focuses solely on social challenges. This makes it a perfect place to test your skills while doing good.
Occasionally, you will find other data science for good competitions. The IBM Big Data for Social Good Challenge was one of them (but beware, you are not free in the choice of tools here).
Another great way to get started is to replicate one of the projects in our #openimpact shortlist (magic ball icon = predictive analytics inside!)
Group meetings and events
A good opportunity to mingle with like-minded folks in person is attending (or starting) a meetup. The following table lists data science meetups around the world with a focus on social good:
|Name||Creation year||Members||Past events||Location|
|Data for Good – Data Scientists & Devs doing GOOD||2012||661||13||Toronto|
|DataKind NYC||2012||2041||22||New York|
|Data for Good – Calgary||2013||358||14||Calgary|
|Data for Good Montreal – Data Scientists & Devs doing GOOD||2013||140||1||Montréal|
|Brussels Data Science Meetup||2014||1279||35||Brussels|
|Data for Good||2014||581||2||Paris|
Source: Own compilation. Numbers are retrieved dynamically from meetup.com.
You should also keep your eyes and ears open for dedicated hackathons. An example from the past is theThorn hackathon in San Francisco. Or the Bayes Impact hackathon which happens annually (also in San Francisco).
DataKind is a true pioneer in the field and does a phenomenal job of getting volunteers excited about harnessing the power of data science in the service of humanity. If you live close to one of the DataKindChapters, you can attend their meetups and further engage in the following ways:
- Attend a DataDive:
DataDives are weekend-long, marathon-style events where dozens of volunteers rally together to help 3-4 social change organizations do initial data analysis, exploration, and prototyping. These events are free for organizations, open to volunteers of all skill levels and take place around the world.
- Be among the ones selected into a DataCorps:
DataCorps is DataKind’s signature program that brings together teams of pro bono data scientists with social change organizations on long-term projects that use data science to transform their work and their sector. These projects last between one to six months and are structured so that volunteers can work in their spare time.
DataKind also hosts a neat “Data Do-Gooding Calendar”.
Do you live in Brazil? Then you might want to check out Data4Good. This initiative works on creating a network of volunteers, produces content to educate around the usage of data for social good (mostly infographics) and provides consulting services for social organizations (more about Data4Good in this blog post).
What if you are not so much into meetups, or if you are living on a remote farm and all you have is a cat, an internet connection and “The Elements of Statistical Learning”?
Well, one thing you can do is look for job descriptions for skilled data volunteers on LinkedIn. However, at the time of writing I got 0 results for “volunteer data scientist” and 1 result for “volunteer data analyst”. However, if “volunteer data entry” is what you are looking for, then there is plenty to do.
If LinkedIn doesn’t get you hooked up with an exciting problem, you should check out the Digital Humanitarian Network. They leverage digital networks for humanitarian response to crises or disasters. It took me a bit to understand their “activation facilitation process”, but it’s a great idea (this diagram helps). You can volunteer through their member organizations who provide data science and coding tasks of different complexity (check out this diagram to see the members’ services).
Some people are even thinking about virtual marketplaces that match up non-profits or local governmentswith volunteer data scientists. In the same vein, we are currently thinking how we can match up parties ondatalook.io. On the one hand non-profit organizations or government agencies who see a project on DataLook and think that it can be replicated to solve their own problem, but don’t have the necessary skills in-house. And on the other hand local or remote data scientists who would be interested in helping to realize the project. If you think this is a great idea or want to discuss this with us, please get in touch.
You see that there are quite a few opportunities for volunteering in the field. But what if you need some dough to pay the bills?
Paid jobs (temporary / part-time)
Such positions are usually organized as fellowships. The most prolific fellowship in the field is probably theData Science for Social Good Fellowship at the University of Chicago. It was started in 2013 and is run as a 3-month summer program where fellows working in small teams partner with non-profits and local government authorities to tackle socially relevant problems using data science. The fellowship is sponsored by the Eric and Wendy Schmidt Foundation. [$11-16k, 12 weeks]
The program has a smaller sibling in Atlanta: Data Science for Social Good Atlanta. The summer internship program was launched in 2014. Students in the program work as paid interns on projects coming from the City of Atlanta and local non-profits. [$8k, 10 weeks]
If you are a college student in the New York City area, you are then eligible for the Microsoft Research Data Science Summer School. In the past, students taking part in the summer school have worked on NYC related challenges. [$5k, 8 weeks]
Code for America fellows are usually web/app developers, but a few of the fellows are data scientists working on problems in different U.S. cities. [$50k, 11 months]
All these fellowships are run by organizations that partner with non-profits and the government. There are also non-profits that offer their own fellowship. An example is the Thorn Innovation Lab where data scientists help fight child sexual exploitation. [$100k, 1 year]
Apart from fellowships you might become what I call a “data angel”, a full-time data scientist working at a company that partners with a non-profit. You help the non-profit for a limited time while receiving your salary from your company. Some companies that offer such Corporate Social Responsibility programs are Pivotal,Teradata, Cloudera, Palantir, and Informatica.
If your company wants to establish such a CSR program in Germany, get in touch with us.
Paid jobs (permanent / full-time)
DataKind announced in 2014 it would create a full-time, in-house Data Science Team for Good in New York City. Their first data scientist was hired in early 2015 (see here) and you should check out DataKind’scareers page for upcoming positions. Sometimes, “Data for Good” job openings in general are also tweeted via @DataKind.
Bayes Impact is a Y Combinator backed non-profit in San Francisco. They launched in 2014 and their approach is to take on a few large projects at a time rather than spreading their resources across many smaller projects. Their vision is to build operational data science solutions for large-scale problems that affect millions of people. Project partners are large NGOs and the federal government. Bayes Impact is always looking for big-hearted data scientists, data engineers and software engineers. You can apply here.
As non-profits are beginning to understand that data science can help them achieve their goals, a few of them have already created full-time positions for data scientists. Examples are Change.org, Big Mountain Data, and Crisis Text Line.
The government sector too begins to slowly open positions for data scientists. On the local level, the team of the Mayor’s Office of Data Analytics in New York City has achieved some impressive impact with their projects. On the federal level, the White House recently appointed Data Science veteran DJ Patil as U.S. Chief Data Scientist.
You might also want to look for jobs in for-profit companies whose mission is to use cutting-edge data science to solve pressing societal problems. An example is Enlitic in San Francisco who want to revolutionize diagnostic healthcare with deep learning. Or Edgeflip in Chicago who want to enable non-profits and issue-based groups to better reach their online communities using data science. You should also have a look at consultancies like Real Impact Analytics (Brussels), SocialCops (New Delhi) or Civis Analytics (Chicago) that have interesting social good projects in their portfolios. But these are just examples and there are many more out there.
And then there is of course the vast field of academia and science with opportunities to apply your data science skills to the greater good. Fields that produce enormous amounts of data like astronomy have a huge demand for data scientists. Check out an article by Jake Vanderplas for elaborate thoughts about data science in academia.
Off the beaten path
From my German perspective it seems like the vast majority of occasions to apply your skills for social good are in the U.S. I became interested in the field in 2013 and I didn’t find an organization in my city that allowed me to use my skills for social good. Instead of giving up I tried to convince a federal authority to use predictive analytics for prioritizing food security inspections. That didn’t work and then I founded DataLook. Through DataLook, I’m now in touch with a lot of people in Germany and abroad who share my interests. It’s a long way and we are still looking for non-profits and government agencies as project partners to realize projects (get in touch!). However, I hope that this article helps some of you get connected with existing initiatives – or to start your own and leave the beaten path in order to do what you want to do: use data science to tackle real problems.
Originally posted at DataLook blog.