Quantifying Self-Reported Adverse Drug Events on Twitter: Signal and Topic Analysis
By
Vassilis Plachouras , Thomson Reuters Research & Development 1 Mark Square London, EC2A 4EG, UK
Jochen L. Leidner , Thomson Reuters Research & Development 1 Mark Square London, EC2A 4EG, UK
Andrew G. Garrow, Thomson Reuters IP & Science
77 Hatton Garden London, EC1N 8JS, UK
andrew.garrow@tr.com vassilis.plachouras@tr.com jochen.leidner@tr.com
ABSTRACT
When a drug that is sold exhibits side effects, a well functioning ecosystem of pharmaceutical drug suppliers includes responsive regulators and pharmaceutical companies.
Existing systems for monitoring adverse drug events, such as the Federal Adverse Events Reporting System (FAERS) in the US, have shown limited effectiveness due to the lack of incentives for healthcare professionals and patients. While social media present opportunities to mine information about adverse events in near real-time, there are still important questions to be answered in order to understand their im- pact on pharmacovigilance. First, it is not known how many relevant social media posts occur per day on platforms like Twitter, i.e., whether there is “enough signal” for a post- market pharmacovigilance program based on Twitter mining. Second, it is not known what other topics are discussed by users in posts mentioning pharmaceutical drugs.
In this paper, we outline how social media can be used as a human sensor for drug use monitoring. We introduce a large-scale, near real-time system for computational phar- macovigilance, and use our system to estimate the order of magnitude of the volume of daily self-reported pharmaceutical drug side effect tweets. The processing pipeline comprises a set of cascaded filters, followed by a supervised machine learning classifier. The cascaded filters quickly reduce the volume to a manageable sub-stream, from which a Support Vector Machine (SVM) based classifier identifies adverse events based on a rich set of features taking into account surface-textual properties, as well as domain knowledge about drugs, side effects and the Twitter medium. Using a dataset of 10,000 manually annotated tweets, a SVM classifier achieves F1=60.4% and AUC=0.894. The yield of the classifier for a drug universe comprising 2,600 keywords is 721 tweets per day. We also investigate what other topics are discussed in the posts mentioning pharmaceutical drugs. We conclude by suggesting an ecosystem where regulators and pharmaceutical companies utilize social media to obtain feedback about consequences of pharmaceutical drug use.
LINK to FULL PAPER .PDF: Plachouras-Leidner-Garrow-2016-SMSoc