Fighting Covid-19 with Graphs. Interview with Alexander Jarasch
“There are an enormous amount of applications that we can provide. Just to mention a view of them: Scanning literature and patents for genes, proteins, targets and drugs. Finding information in clinical trials, which drugs are used and what inclusion/exclusion criteria exist. Automatically finding and querying genes with their synonyms and connected gene function and in which tissues they are expressed.” –Alexander Jarasch
I have interview Dr. Alexander Jarasch, head of Data and Knowledge management department at the German Center for Diabetes Research (DZD). Alexander is a team member of The CovidGraph Project.
CovidGraph is a non-profit collaboration of researchers, software developers, data scientists and medical professionals.
The aim of the project is to help researchers quickly and efficiently find their way through COVID-19 datasets and to provide tools that use artificial intelligence, advanced visualization techniques, and intuitive user interfaces.
Q1. What is CovidGraph?
Alexander Jarasch: CovidGraph is a knowledge graph that connects text data from scientific literature and intellectual property with clinical trials, drugs and entities from biomedical research such as genes, proteins, their function and regulation.
Q2. What is the aim of this project?
Alexander Jarasch: Provide researchers with Covid-19 relevant information that is connected and easy to query. Usually, this work requires several manual, tedious and error prone work that can be speed up by using CovidGraph.
Q3. Who Is this project aimed at?
Alexander Jarasch: Researches, medical doctors regardless of their field of research. Since Covid-19 has many complications it is a valuable resource for getting connected data.
Q4. What data sets do you use?
Alexander Jarasch: Public datasets from literature, patents, case numbers, clinical trials, therapeutic targets, genes, transcripts, proteins, gene ontology, gene expression data, pathways. There are more to come. (See list at the end of the interview)
Q5. How do you check the quality and reliability of the COVID-19 datasets you use in your project?
Alexander Jarasch: The datasources are well established databases that are used and cited for years.
Q6. Why using Knowledge graphs?
Alexander Jarasch: Research data, especially in healthcare is highly connected, very heterogenous and often unstructured. Today, these datasources are siloed and connections between them isn’t available. Connecting the datasources enables a more comprehensive view on it. By the fact of connecting data in insights occur, that have been hidden before.
Q7. Which tools are you developing to explore papers, patents, existing treatments and medications around the family of the corona viruses?
Alexander Jarasch: One the one hand we provide user interfaces for interactive data browsing and querying. For example, users can use Linkurious , Graphiken, derive GmbH and Neo4j Bloom. On the other hand we develop a more specific UI for users from biomedical research together with yWorks.
Q8. What are the applications that The CovidGraph project provides?
Alexander Jarasch: There are an enormous amount of applications that we can provide. Just to mention a view of them: Scanning literature and patents for genes, proteins, targets and drugs. Finding information in clinical trials, which drugs are used and what inclusion/exclusion criteria exist. Automatically finding and querying genes with their synonyms and connected gene function and in which tissues they are expressed.
Q9. Who is maintaining the data stored in the Knowledge Graph? Is it centralized or distributed?
Alexander Jarasch: It’s maintained by a community of volunteers and data is stored on a publicly accessible server.
Q10. The COVID*Graph should provide the data basis for understanding the processes involved in a coronavirus infection. What did you learn so far?
Alexander Jarasch: In parallel to data integration and preliminary data analysis we found that Covid-19 is supposed to affect more than just lung cells. Researchers also support this finding and can be found in several articles. We found out that ACE2 (Angiotensin-converting enzyme 2) is the gene that is mentioned most in scientific articles. This seems obvious since this is the receptor the corona virus uses to access the cells.
Qx Anything else you wish to add?
Alexander Jarasch: We are a private-public partnership and volunteers from several companies working with graph technology. We are non-profit community and hope to support researchers and doctors to find a cure for Covid-19 / Sars-Cov-2 and related diseases.
Dr. Alexander Jarasch is the head of Data and Knowledge management department at the German Center for Diabetes Research (DZD). His team supports scientists from basic research and and clinical research with IT solutions from data management to data analysis. New insights from diabetes research and its complications are stored in a knowledge graph connecting data from basic research, animal models and clinical trials.
Dr. Jarasch received his PhD in structural bioinformatics and biochemistry from Ludwig-Maximilians University (LMU) in Munich and has a master’s degree in bioinformatics from the LMU and the Technical University of Munich.
He completed his postdoctoral trainings on behalf of Evonik Industries AG and Roche Diagnostics GmbH.
bioRxiv (pronounced “bio-archive”) is a free online archive and distribution service for unpublished preprints in the life sciences. It is operated by Cold Spring Harbor Laboratory, a not-for-profit research and educational institution. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals.
medRxiv (pronounced “med-archive”) is a free online archive and distribution server for complete but unpublished manuscripts (preprints) in the medical, clinical, and related health sciences.
The Lens is building an open platform for Innovation Cartography. Specifically, the Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow document collections, aggregations, and analyses to be shared, annotated, and embedded to forge open mapping of the world of knowledge-directed innovation.
Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
New UniProt portal for the latest SARS-CoV-2 coronavirus protein entries and receptors, updated independent of the general UniProt release cycle.
RefSeq: NCBI Reference Sequence Database. A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.
The Gene Ontology resource. The mission of the GO Consortium is to develop a comprehensive, computational model of biological systems, ranging from the molecular to the organism level, across the multiplicity of species in the tree of life.
The GTEx Portal. The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation.
REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.
ClinicalTrials.gov is a resource provided by the U.S. National Library of Medicine.
COVID-19 Response United Nations
COVID-19 Resources Johns Hopkins University.
COVID-19 Open Research Dataset (CORD-19)
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
The Lens COVID-19 Datasets
The Lens has assembled free and open datasets of patent documents, scholarly research works metadata and biological sequences from patents, and deposited them in a machine-readable and explorable form.
Ensembl Genome Browser
Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. http://www.ensembl.org
NCBI Gene Database
Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide. https://www.ncbi.nlm.nih.gov/gene
The Gene Ontology Resource
The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research. http://geneontology.org
2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). https://github.com/CSSEGISandData/COVID-19
United Nations World Population Prospects 2019
The 2019 Revision of World Population Prospects is the twenty-sixth round of official United Nations population estimates and projections that have been prepared by the Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat. https://population.un.org/wpp/
Follow ODBMS.org on Twitter: @odbmsorg