Please use this identifier to cite or link to this item:
https://dspace.uzhnu.edu.ua/jspui/handle/lib/70221
Title: | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression- Based Disease Diagnosis Systems |
Authors: | BABICHEV, SERGII LIAKH, IGOR ŠKVOR, JIŘí |
Keywords: | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression- Based Disease Diagnosis Systems, Gene expression data, gene ontology analysis, clustering, biclustering, convolutional neural network, Bayes optimization, classification, Alzheimer’s disease, cancer disease. |
Issue Date: | 29-Jan-2025 |
Publisher: | IEEE Access |
Citation: | ABSTRACT The manuscript details the outcomes of a comprehensive study on the application of cluster- bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases. INDEX TERMS Gene expression data, gene ontology analysis, clustering, biclustering, convolutional neural network, Bayes optimization, classification, Alzheimer’s disease, cancer disease. |
Series/Report no.: | technical sciences;VOLUME 13, 2025 |
Abstract: | The manuscript details the outcomes of a comprehensive study on the application of cluster- bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases. |
Description: | The manuscript details the outcomes of a comprehensive study on the application of cluster- bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases. |
Type: | Text |
Publication type: | Стаття |
URI: | https://dspace.uzhnu.edu.ua/jspui/handle/lib/70221 |
Appears in Collections: | Наукові публікації кафедри інформатики та фізико-математичних дисциплін |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Integrating_Data_Mining_Deep_Learning_and_Gene_Ontology_Analysis_for_Gene_Expression-Based_Disease_Diagnosis_Systems.pdf | Stattja | 2.9 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.