Português English
Contato

Defesa – Dissertação de Franklin Nunes


Detalhes do Evento


Aluno(a): Franklin Vinny Medina Nunes
Orientador(a): Marcio Dorn

Título: A Framework for Model Selection and Interpretation in Single-Cell Transcriptomic Data
Linha de Pesquisa: Aprendizado de Máquina, Representação de Conhecimento e Raciocínio

Data: 03/06/2026
Hora: 15:30
Local: Esta banca ocorrerá de forma remota. Acesso público disponibilizado pelo link https://meet.google.com/unq-qrdo-ogf.

Banca Examinadora:
-Lucinéia Heloisa Thom (UFRGS)
-Charley Christian Staats (UFRGS)
-Glaucia Maria Bressan (UTFPR)

Presidente da Banca: Marcio Dorn

Resumo: Traditional transcriptomic classification approaches can obscure cell-type-specific disease signals when they treat heterogeneous cellular populations as a single analytical unit. Single-cell RNA sequencing addresses this limitation by providing cellular resolution and allowing researchers to investigate transcriptional programs within specific cell populations. However, single-cell transcriptomic datasets often contain high-dimensional, sparse, and imbalanced measurements. They also create a risk of information leakage when researchers perform feature selection or model optimization outside the training data. This study presents a framework that combines feature selection, supervised learning, multiobjective model selection, and SHAP-based interpretation for single-cell transcriptomic analysis. The study evaluates the framework on a chronic obstructive pulmonary disease single-cell RNA-sequencing dataset and restricts the analysis to male samples to reduce sex-related transcriptional confounding. After global filtering, the dataset retained 69,366 cells and 31,902 genes, and 27 cell populations met the criteria for downstream modeling. For each cell population, the framework performs Wilcoxon-based feature selection within cross-validation folds and evaluates the accuracy, sensitivity, specificity, and F1-score of Logistic Regression, Support Vector Machine, Random Forest, K-nearest neighbors, and Multilayer Perceptron classifiers. The analysis also compares the proposed strategy with alternative feature-selection methods, including ANOVA, Random Forest importance, and recursive f eature elimination. Model performance varied across cell populations, indicating that no classifier was a universal solution across all classification tasks. In the CD4-positive, alpha-beta T-cell case study, the Multilayer Perceptron achieved the most balanced metric profile, formed the only Pareto-optimal solution, and maintained comparable performance in the final hold-out evaluation. SHAP-based interpretation identified influential genes, and enrichment analysis associated the selected features with translational, stress-response, and regulatory processes.

Palavras-Chave: single-cell RNA sequencing; feature selection; interpretable machine learning; multiobjective optimization; chronic obstructive pulmonary disease