On happiness and AI-powered journaling. Interview with Alon Halevy
” An AI powered assistant can give you much better advice the more it knows about you and if it can collect data without burdening you. While this challenge creates the obvious but surmountable privacy issues, there is an interesting data integration challenge here to collect data from the digital breadcrumbs we leave all over, such as posts on social media, photos, data from wearables. Reconciling all these data sets into a meaningful and useful signal is a fascinating research problem!”–Alon Halevy
I have interviewed Alon Halevy, CEO of Megagon Labs. We talked about happiness, AI-powered journaling and the HappyDB database.
Q1. What is HappyDB?
Alon Halevy: HappyDB is a crowd-sourced text database of 100,000 answers to the following question: what made you happy in the last 24 hours (or 3 months)? Half of the respondents were asked about the last 24 hours and the other half about the last 3 months.
We collected HappyDB as part of our research agenda on technology for wellbeing. At a basic level, we’re asking whether it is possible to develop technology to make people happier. As part of that line of work, we are developing an AI-powered journaling application in which the user writes down the important experiences in their day. The goal is that the smart journal will understand over time what makes you happy and give you advice on what to do. However, to that end, we need to develop Natural Language Processing technology that can understand better the descriptions of these moments (e.g., what activity did the person do, with whom, and in what context). HappyDB was collected in order to create a corpus of text that will fuel such NLP research by our lab and by others.
Q2. The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. How is it possible to advance the state of the art of understanding the causes of happiness by simply looking at text messages?
Alon Halevy: One of the main observations of the science of happiness is that a significant part of people’s wellbeing is determined by the actions they choose to do on a daily basis (e.g., encourage social interactions, volunteer, meditate, etc). However, we are often not very good at making choices that maximize our sustained happiness because we’re focused on other activities that we think will make us happier (e.g., make more money, write another paper).
Because of that, we believe that a journaling application can give advice based on personal experiences that the user has had. The user of our application should be able to use text, voice or even photos to express their experiences.
The text in HappyDB is meant to facilitate the research required to understand texts given by users.
Q3. What are the main findings you have found so far?
Alon Halevy: The happy moments we see in HappyDB are not surprising in nature — they describe experiences that are known to make people happy, such as social events with family and friends, achievements at work and enjoying nature and mindfulness. However, given that these experiences are expressed in so many different ways in text, the NLP challenge of understanding the important aspects of these moments are quite significant.
Q4. The happy moments are crowd-sourced via Amazon’s Mechanical Turk. Why?
Alon Halevy: That was the only way we could think of getting such a large corpus. I should note that we only get 2-3 replies from each worker, so this is not a longitudinal study about how people’s happiness changes over time.
The goal is just to collect text describing happy moments.
Q5. You mentioned that HappyDB is a collection of happy moments described by individuals experiencing those moments. How do you verify if these statements reflect the true state of mind of people?
Alon Halevy: You can’t verify such a corpus in any formal sense, but when you read the moments you see they are completely natural. We even have a moment from one person who was happy for getting tenure!
Q6. What is a reflection period?
Alon Halevy: A reflection period is how far back you look for the happy moment. For example, moments that cover a reflection period of 24 hours tend to mention a social event or meal, while moments based on a reflection of 3 months tend to mention a bigger event in life such as the birth of a child, promotion, or graduation.
Q7. The HappyDB corpus, like any other human-generated data, has errors and requires cleaning. How do you handle this?
Alon Halevy: We did a little bit of spell correcting and removed some moments that were obviously bogus (too long, too short). But the hope is that the sheer size of the database is its main virtue and the errors will be minor in the aggregate.
Q8. What are the main NLP problems that can be studied with the help of this corpus?
Alon Halevy: There are quite a few NLP problems. The most basic is to figure out what is the activity that made the person happy (and distinguish the words describing the activity from all the extraneous text). Who are the people that were involved in the experience? Was there anything in the context that was critical (e.g, a sunset). We can ask more reflective questions, such as was the person happy from the experience because of a mismatch between their expectations and reality? Do men and women express happy experiences in different ways? Finally, can we create an ontology of activities that would cover the vast majority of happy moments and reliably map text to one or more of these categories.
Q9. What analysis techniques did you use to analyse HappyDB? Were you happy with the existing NLP techniques? or is there a need for deeper NLP techniques?
Alon Halevy: We clearly need new NLP techniques to analyze this corpus and ones like it. In addition to standard somewhat shallow NLP techniques, we are focusing on trying to define frame structures that capture the essence of happy moments and to develop semantic role labeling techniques that map from text to these frame structures and their slots.
Q10. Is HappyDB open to the public?
Qx Anything else you wish to add?
Alon Halevy: Yes, I think developing technology for wellbeing raises some interesting challenges for data management in general. An AI powered assistant can give you much better advice the more it knows about you and if it can collect data without burdening you. While this challenge creates the obvious but surmountable privacy issues, there is an interesting data integration challenge here to collect data from the digital breadcrumbs we leave all over, such as posts on social media, photos, data from wearables. Reconciling all these data sets into a meaningful and useful signal is a fascinating research problem!
Dr. Alon Halevy is a computer scientist, entrepreneur and educator. He received his Ph.D. in Computer Science at Stanford University in 1993. He became a professor of Computer Science at the University of Washington and founded the Database Research Group at the university.
He founded Nimble Technology Inc., a company providing an Enterprise Information Integration Platform, and TransformicInc., a company providing access to deep web content. Upon the acquisition of Transformicby Google Inc., he became responsible for research on structured data as a senior staff research scientist at Google’s head office and was engaged in research and development, such as developing Google Fusion Tables. He has served as CEO of Megagon Labs since 2016.
Dr. Halevy is a Fellow of the Association of Computing Machinery (ACM Fellow) and received the VLDB 10-year best paper award in 2006.
–Paper: HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments , Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu
–Software: BigGorilla is an open-source data integration and data preparation ecosystem (powered by Python) to enable data scientists to perform integration and analysis of data. BigGorilla consolidates and documents the different steps that are typically taken by data scientists to bring data from different sources into a single database to perform data analysis. For each of these steps, we document existing technologies and also point to desired technologies that could be developed.
The different components of BigGorilla are freely available for download and use. Data scientists are encouraged to contribute code, datasets, or examples to BigGorilla. We hope to promote education and training for aspiring data scientists with the development, documentation, and tools provided through BigGorilla.
–Software: Jo Our work is inspired by psychology research, especially a field known as Positive Psychology. We are developing “Jo” – an agent that helps you record your daily activities, generalizes from them, and helps you create plans that increase your happiness. Naturally, this is no easy feat. Jo raises many exciting technical challenges for NLP, chatbot construction, and interface design: how can we build an interface that’s useful but not intrusive. Read more about Jo!
– Data Integration: From Enterprise Into Your Kitchen, Alon Halevy – SIGMOD/PODS Conference 2017
Follows us on Twitter: @odbmsorg