GeCo (Data-Driven Genomic Computing) is focused on tertiary analysis for genomic data integration, as a new data-driven basic science based on a simple driving principle: data should express high-level properties of DNA regions and samples, high-level data management languages should express biological questions with simple, powerful, orthogonal abstractions.
Although this idea is very simple, putting it in action is far from trivial, as it requires a radical change of the dominant approach. Along these principles, it is possible to build a progressive revolution of genomic computing, towards important outcomes, such as the integrated access to large repositories of sequences and the building of an Internet of genomic computing services providing Google-like processing and search.
During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient. Among the objectives of the project, the creation of an “open source” system available to biological and clinical research; while the GeCo project will provide public services which only use public data (anonymized and made available for secondary use, i.e., knowledge discovery), the use of the GeCo system within protected clinical contexts will enable personalized medicine, i.e. the adaptation of therapies to specific genetic features of patients. The most ambitious objective is the development, during the 5-years ERC project, of an “Internet for Genomics”, i.e. a protocol for collecting data from Consortia and individual researchers, and a “Google for Genomics”, supporting indexing and search over huge collections of genomic datasets.
- European GeCo Project Aims to Replace Dominant Genomic Computing Methods, GenomeWeb 29-08-2016 link + pdf
- Kaitoua A, Pinoli P, Bertoni M, Ceri S.Framework for Supporting Genomic OperationsIEEE-TC, 2016 (in press), DOI 10.1109/TC.2016.2603980 pdf
- Masseroli M, Pinoli P, Venco F, Kaitoua A, Jalili V, Palluzzi F, Muller H, Ceri S.GenoMetric Query Language: A novel approach to large-scale genomic data management.Bioinformatics 2015; 31(12): 1881-1888. DOI: 10.1093/bioinformatics/btv048 pdf
- Masseroli M, Kaitoua A, Pinoli P, Ceri S.Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.Methods, 2016 (in press). DOI: 10.1016/j.ymeth.2016.09.002 pdf
- Ceri S, Kaitoua A, Masseroli M, Pinoli P, Venco F.Data management for heterogeneous genomic datasets.IEEE/ACM Transactions on Computational Biology and Bioinformatics 2016 (in press). DOI: 10.1109/TCBB.2016.2576447 pdf
- Fernandez JD, Lenzerini M, Masseroli M, Venco F, Ceri S.Ontology-based search of genomic metadata.IEEE/ACM Transactions on Computational Biology and Bioinformatics 2016; 13(2):233-247. DOI: 10.1109/TCBB.2015.2495179 pdf
- Montanari P, Bartolini I, Ciaccia P, Patella M, Ceri S, Masseroli M.Pattern Similarity Search in Genomic Sequences.IEEE Transactions on Knowledge and Data Engineering 2016; 28(11): 3053-3067. DOI: 10.1109/TKDE.2016.2595582. pdf
- Jalili V, Matteucci M, Masseroli M, Ceri S.Indexing Next-Generation Sequencing dataInformation Sciences Elsevier 2016 (in press). pdf
- Bertoni M, Ceri S, Kaitoua A, Pinoli P.Evaluating Cloud Frameworks on Genomic Applications.IEEE Big Data Conference, Santa Clara, Nov. 2015. pdf
- Cumbo F, Fiscon G, Ceri S, Masseroli M, Weitschek E.TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.BMC Bioinformatics BMC series, 2017, 18:6. DOI: 10.1186/s12859-016-1419-5 pdf