DSpace Repository

Development of a machine learning algorithm classification tool to improve strain detection in whole genome metagenomics dataset

Show simple item record

dc.contributor Universitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia
dc.contributor Universitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques
dc.contributor.author Squitieri, Alessia
dc.date.accessioned 2021-01-08T16:01:17Z
dc.date.available 2021-01-08T16:01:17Z
dc.date.created 2020-05-15
dc.date.issued 2020-05-15
dc.identifier.uri http://hdl.handle.net/10854/6410
dc.description Curs 2019-2020 es
dc.description.abstract Metagenomics is a pioneering branch of bioinformatics that utilizes genomics techniques, like the sequencing of the DNA, in order to obtain important information about microorganisms. During the recent years, scientists strongly focused on this innovative field, highlighting its importance in the clinical area, as well as in the environmental one. In this respect, the lack of user – friendly software that allow metagenomes’ analysis has become an important issue. GAIA is a bioinformatics tool, developed by Sequentia Biotech, that is aimed to perform functional and taxonomical analyses of metagenomics data from both amplicon and whole genome sequencing data. As well as other software, GAIA has the ability to analyze data at strain level. However, one limitation of GAIA is the high number of false positives that can arise during this type of analysis. This is due to the high similarity existing between genomes of microorganisms from different strains of the same species. From this perspective, we worked on GAIA’s ability to taxonomically classify bacterial strains from their sequences. We benchmarked different machine learning classification models. Moreover, we had to handle the imbalanced data problem, a common machine learning issue, testing different methods and comparing them to each other. We finally find the best model using hyperparameters tuning technique. The results we obtained show a significant improvement in the accuracy of GAIA’s predictions. es
dc.format application/pdf es
dc.format.extent 32 p. es
dc.language.iso eng es
dc.rights Tots els drets reservats es
dc.subject.other Genòmica es
dc.subject.other Algorismes genètics es
dc.subject.other Bioinformàtica es
dc.title Development of a machine learning algorithm classification tool to improve strain detection in whole genome metagenomics dataset es
dc.type info:eu-repo/semantics/masterThesis es
dc.description.version Director/a: Serrat Jurado, Josep Maria
dc.rights.accesRights info:eu-repo/semantics/closedAccess es

Files in this item

Show simple item record

Search RIUVic


Advanced Search

Browse

Statistics