Predicting Mood Disorder Risk — A Data Blending and Machine Learning Approach
Predicting Mood Disorder Risk
A Data Blending and Machine Learning Approach
MASTER’S THESIS
by Alex Klein and Florian Reifschneider
A thesis submitted in fulfillment of the requirements for the degree of Master of Science
Frankfurt Big Data Lab, Department of Computer Science Goethe-University
Abstract
The ability to predict the risk of developing a mood disorder for a given individ- ual with sufficient accuracy could be of substantial help in diagnosing and treat- ing mood disorders before they become a major health issue.
This thesis is based on work that was conducted within the framework of the Geisinger Health Collider Project that was jointly held in cooperation by Geisinger Health Systems and UC Berkeley from fall 2015 to spring 2016. In this context, access was given to anonymized real-world clinical data on mood disorder patients, including their detailed patient history.
Using a data blending and machine learning approach, this thesis seeks to develop and evaluate a prediction model centered around the patient history that is able to predict the individualized risk for mood disorder development in order to facilitate early diagnosis in undiagnosed individuals. As part of data blending, the clinical data provided by Geisinger Health Systems was combined with multi-disciplinary data acquired from various sources, such as the U.S. Census Bureau.
The focus of the data blending strategy was to examine the relationship between an individual’s personal environment and their mood disorder risk. The actual prediction model was trained on the blended data using several machine learning algorithms and evaluated thoroughly in order to validate the hypothesis that the personal environment influences the mood disorder risk, as well as to proof that the patient history of an individual can be used to accurately predict their mood disorder risk.