DSpace/Dipòsit Manakin

Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration

Registre simple

dc.contributor Universitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia
dc.contributor Universitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques
dc.contributor.author Pelegrí Sisó, M. Dolors
dc.date.accessioned 2021-01-08T17:48:13Z
dc.date.available 2021-01-08T17:48:13Z
dc.date.created 2020-09
dc.date.issued 2020-09
dc.identifier.uri http://hdl.handle.net/10854/6415
dc.description Curs 2019-2020 es
dc.description.abstract Motivation: The diversity and huge omics data take biology and biomedicine research and application into a big data era. Most of the current statistical analyses required to analyze omic data are not designed to deal with big data. Principal component analyses and multivariate methods to integrate multi-omic data are one of those examples. Therefore, having efficient and scalable functions are required to exploit the large amount of omic data which is currently available. Results: We developed a library called BigDataStatMeth which includes functions to perform basic matrix operations and linear algebra for big matrices using HDF5 and DelayedArray Bioconductor’s infrastructure. We tested its performance by comparing the computational time with the one obtained with R base functions. Our results showed that our implementation outperforms existing functions and that the improvement increases when sample size is also increasing. This package can be the basis for implementing statistical methods required in omic data with large number of samples or features. As a proof-of-concept, we implemented PCA and Lasso regression within the same package and we also created another Bioconductor package, mgcca, which implements Generalized Canonical Correlation Analysis (GCCA) that is used in multi-omic data integration. We implemented an algorithm that allows the possibility of having missing individuals in one or more tables. The implemented methods have been used to analyze real omic data. We first used PCA to call genotype inversions of more than 400K individuals from UKBiobank. Then, data from TCGA was used to integrate multiple omic layers using GCCA. es
dc.format application/pdf es
dc.format.extent 11 p. es
dc.language.iso eng es
dc.rights Tots els drets reservats es
dc.subject.other Bioinformàtica es
dc.subject.other Dades massives es
dc.title Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration es
dc.type info:eu-repo/semantics/masterThesis es
dc.description.version Director/a: Calle Rosingana, M. Luz
dc.rights.accessRights info:eu-repo/semantics/openAccess es

Text complet d'aquest document

Registre simple

Buscar al RIUVic


Llistar per

Estadístiques