Alexander Jarasch, Analytics, Big Data, Cypher, Diabetes, epigenetics, genetics, German Center for Diabetes Research, Graph Databases, Graphs, healthcare, Knowledge, knowledge graphs, lipidomics, machine learning, Metabolism, metabolomics, Neo4j
On gaining Knowledge of Diabetes using Graphs. Interview with Alexander Jarasch
“The challenge is that we have to combine lots of different types of data, simultaneously, depending on genetics, epigenetics, different subject matter areas such as lipidomics, metabolomics, the lifestyle and behaviour of the patient and looking at people in different cultures and environments. The variety of data we need to analyse is a major challenge, which is why from a data perspective we use graph. It is here we can make the links to answer biomedical queries.” –Alexander Jarasch.
I have interviewed Alexander Jarasch, head of data and knowledge management at the German Center for Diabetes Research (DZD). We discussed what are the main challenges in trying to understand more about diabetes, and how diabetes researchers are using graph database technology in order to create knowledge graphs and find hidden connections in medical data.
Q1. You are the head of data and knowledge management at the German Center for Diabetes Research (DZD). What are your main tasks?
Alexander Jarasch: There are several responsibilities that my team fulfils within DZD, – these include IT infrastructure which can encompass databases, data transfer services, data management and knowledge management as a second part of our remit.
Q2. Diabetes is one of the most widespread diseases worldwide. What are the main challenges in trying to understand more about diabetes?
Alexander Jarasch: Diabetes is a metabolic disease, and a complex area to understand. It is not yet obvious what causes type 2 diabetes, but it is clearly linked to obesity. Here, we try to understand the molecular mechanisms, where diabetes starts and how we can try and prevent it. The challenge is that we have to combine lots of different types of data, simultaneously, depending on genetics, epigenetics, different subject matter areas such as lipidomics, metabolomics, the lifestyle and behaviour of the patient and looking at people in different cultures and environments.
All these dependencies are connected to each other. Metabolism is connected to the environment, genetics, epigenetics and so forth. The big challenge is to see this not just from one perspective, but from as many perspectives at the same time as we can get.
From a data management point of view it is not easy bringing all this patient-related data together with basic research data, and then to combine it with publicly available data, all held in disparate data stores and souces. We need to bring this heterogenous data together and connect it in a very clear way.
Q3. How is the status of research in treating and preventing the disease?
Alexander Jarasch: Diabetes is not currently curable. We have to distinguish between type 1 and type 2 diabetes. Preventing type 1 is not relevant, as it is genetic and one inherits it. Preventing type 2 is very complicated. Obviously it is suggested that patients lead a healthier life, play more sport and drink less alcohol. But some patients don’t respond to lifestyle interventions.
The research itself is very complex and diverse. You can look at it from the patient side, the basic research side or the animal model side. Preventing diabetes is a complicated field and the research is ongoing. There is no clear outcome for the patient at present.
Q4. How do you gain knowledge of diabetes from the datasets and the databases you already have?
Alexander Jarasch: We have different types of data – patient data from clinical trials, animal models, basic research – basically all the data from the various omics. We analyse this to gain more knowledge by connecting this data and viewing it all simultaneously. We also look to gain knowledge from the large data sets by applying machine learning. On the database side we have introduce graph databases in the form of Neo4j in order to create knowledge graphs.
Q5. If you look at the characteristics of Big Data: Volume, Variety, Velocity, Veracity; which ones are relevant for you?
Alexander Jarasch: I would not highlight anyone of these as they all have the same level of importance. If we don’t have enough data we don’t have the statistical significance , if we don’t have a variety of data we can’t distinguish between its different states. If we don’t have high quality data we can’t keep up with the velocity necessary to answer the questions. The variety of data we need to analyse is a major challenge, which is why from a data perspective we use graph. It is here we can make the links to answer biomedical queries.
Q6. What are the main benefits when you start connecting patients’ data?
Alexander Jarasch: The main benefit of connecting a patient’s data, which could also incidentally be an animal model, is that you can see the data from a number of perspectives. The more parameters you have the more complete the puzzle can be. The benefit here is being able to see the patient from many different sides. One discipline is not sufficient to answer the biomedical questions or help in the prevention of diabetes.
We can also connect between different centers. Diabetes, for example, has co-complications with other diseases. These include cancer, cardiovascular disease and Alzheimer’s. We can now connect and look at these different types of data and better understand how symptoms and causes interconnect.
Q7. In one of your use case you have studied pre diabetes, using graphs to connect data from animal models, genetics, metabolomics and literature to deduce causes of prediabetes in human. What results did you obtain so far?
Alexander Jarasch: We have connected different types of public data and our own data. One result is the hypothesis of seven metabolites that overlap between human genomic data and that seen in a prediabetes pig model. This is now under further investigation and we will dig deeper. The question is which pathways do these metabolites follow and how are they regulated in the body? It is in itself a very complex question.
Q8. What is your experience so far in using graph technology and specifically Neo4j?
Alexander Jarasch: We are now at a point with graph databases where we can easily connect different types of data – where the drawings and brainstorming sessions with researchers come very close to the data model. This makes it much easier to query data, even for non-computer scientists to answer questions. When it comes to Neo4j it is easy to install and implement. The query language, Cypher, is easy to understand and the visualisation software is again very promising for non-computer scientists. Essentially, it makes it far easier for us to combine different types of data.
Q9. What are the main benefits in using graph technology in your area of work?
Alexander Jarasch: The main benefit of graph technology is its ability to connect heterogenous data across different locations and species. This is possible with relational databases, but it is very complicated. We do still use relational databases as they are connected to different devices and recurring processes and are fit for purpose in these roles. It is in combining and connecting heterogenous data, where graph technology has the greatest impact. This is a situation where relational databases are rather limited.
Q10. Do you think that connecting data and applying modern machine learning techniques will help scientists getting closer to understand this complex disease and hopefully help to care for patients in the future?
Alexander Jarasch: Yes, I would definitely agree with this. Connecting different types of data is key to modern data analysis and especially in life science / health care industry. Of course this makes the process much more complex and far bigger. Applying machine learning techniques can help to cope this and to gain the knowledge from many data sources. This provides us with a better understanding of diseases in general I would say. We are applying ML techniques on our big data sets. One example would be to cluster patient groups in order to identify different subtypes of diabetes.
The question is how can we distinguish between patient (groups) and treat people individually when they get diabetes. Some people, for example, don’t react to lifestyle intervention when it comes to diabetes. We have tall, lean people who have diabetes, obese people with diabetes but also obese people who don’t have diabetes. Obviously, the mechanisms behind that must be quite different from each other, and thus a single therapy or prevention for all people is most likely not working. That’s why we connecting data sources and try to cluster our patients into subgroups to come up with individual treatments or suggested interventions. Graph technology provides us with a way of connecting relevant data sources.
Alexander Jarasch currently works at the German Center for diabetes research and is responsible for data and knowledge management. Before he worked at the “Pharma Research and Early Development (pRED)” at Roche. Alexander does research in Computing in Mathematics, Natural Science, Engineering and Medicine, Databases and Data Mining.
– Graphs to Fight Diabetes – Dr. Alexander Jarasch, DZD (link to YouTube Video), GraphConnect-2018, New York.
– Artificial Intelligence Methodologies and Their Application to Diabetes, J Diabetes Sci Technol. 2018 Mar; 12(2): 303–310, Published online 2017 May 25. doi: 10.1177/1932296817710475
– Beyond data integration, Drug Discovery Today, February 2008 R
– On using AI and Data Analytics in Pharmaceutical Research. Interview with Bryn Roberts , ODBMS Industry Watch, 2018-09-10
– Beyond the Molecule and Beyond the Device: Machine Learning and the Future of Healthcare, ODBMS.org, EXPERT ARTICLES, 24 AUG, 2017
Follow us on Twitter: @odbmsorg
From → Uncategorized