Classificação de gravidade e identificação de biomarcadores na Covid-19: análise do exoma de pacientes através de máquinas de vetores de suporte com kernel linear (SVM)

Nenhuma Miniatura disponível
Data
2025-02-24
Autores
Zetum, Aléxia Stefani Siqueira
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
Introduction: SARS-CoV-2 infection presents a wide spectrum of clinical manifestations. Genetic variations may influence the host's response to the virus. The use of Machine Learning (ML) has shown promise in identifying genetic biomarkers and individuals who may develop severe forms of the disease. Objective: To develop an ML model using exome data to predict clinical outcomes in COVID-19 patients and identify genes potentially associated with disease severity. Methodology: The study involved data from 239 COVID-19 patients ("Non-severe" and "Severe"). DNA sequencing was performed, and ancestry analysis was conducted. A Support Vector Machine (SVM) model with a linear kernel was developed to predict COVID-19 severity, utilizing Recursive Feature Elimination (RFE) to select the most influential variants. Metrics such as Area Under the Curve-Receiver Operating Characteristic (AUC-ROC), accuracy, F1 score, sensitivity, and specificity were used. Subsequently, logistic regression (LR) analysis was performed with the variants selected by SVM-RFE and confounding variables. Results and Discussion: The SVM model with a linear kernel achieved an AUC-ROC of 0,81, accuracy of 83%, and an F1 score of 0,78, indicating a good capacity to discriminate between "Severe" and "Non-severe" cases of COVID-19. Fifteen variants were selected by the model, of which seven were significantly associated with disease severity in the LR analysis. Risk variants include WSCD1 (rs2302837 "A/A" or "A/G," 95% CI: 1,32–7,24, OR: 3,09, P < 0,01), PTPRS (rs1143700 "A/A" or "A/G," 95% CI: 1,54–7,07, OR: 3,30, P < 0,01), ARVCF (rs2073744 "A/A" or "A/G," 95% CI: 1,31–6,30, OR: 2,88, P < 0,01), and LVRN (rs10078759 "G/G" or "G/C," 95% CI: 1,07–4,31, OR: 2,08, P = 0,04). Conversely, protective variants include ALDH4A1 (rs6426813 "G/G" or "G/A," 95% CI: 0,23–0,93, OR: 0,48, P = 0,02), ARHGAP22 (rs10776601 "C/C" or "C/T," 95% CI: 0,09–0,56, OR: 0,23, P < 0,01), and C3 (rs423490 "A/A" or "A/G," 95% CI: 0,14–0,70, OR: 0,32, P < 0,01). The results demonstrated that the SVM with a linear kernel is effective in predicting COVID-19 severity using exome data. The protein-protein interaction (PPI) network analysis identified biological pathways associated with the immune system, inflammatory response, and blood coagulation. Genes such as C3, PTPRS, and LVRN stood out in functions related to immune response regulation and inflammation modulation, suggesting these pathways are directly linked to adverse COVID-19 outcomes. The network also revealed the interconnection between cellular signaling processes and stress response mechanisms, which may explain the variability in clinical responses observed among patients. Conclusion: The SVM with a linear kernel using our data proved effective in predicting COVID-19 severity. This study highlights the importance of integrative approaches to better understanding the disease. Identifying genetic biomarkers can aid in treatment and management of future pandemics
Descrição
Palavras-chave
Biomarcadores , Machine learning , Genética
Citação