Defesa de Dissertação de Mestrado Acadêmico
Aluno(a): Carlos Eduardo Antônio Ferreira
Orientador(a): Joel Luis Carbonera
Título: Investigating an in-context learning approach for SPARQL query generation
Linha de Pesquisa: Aprendizado de Máquina, Representação de Conhecimento e Raciocínio
Data: 10/12/2025
Hora: 13:30
Local: Esta banca ocorrerá de forma remota. Acesso público disponibilizado pelo link https://mconf.ufrgs.br/webconf/00179534.
Banca Examinadora:
-Dennis Giovani Balreira (UFRGS)
-Luan Fonseca Garcia (PUCRS)
-Sandro Rama Fiorini (IBM Research)
Presidente da Banca: Joel Luis Carbonera
Resumo: Large Language Models (LLMs) have demonstrated strong performance across a wide range of natural language processing tasks. Among adaptation strategies, fine-tuning is effective but computationally expensive, while in-context learning (ICL) offers a cheaper and more flexible alternative, particularly appealing for enterprise settings where fine-tuning is often impractical. Knowledge Graph Question Answering (KGQA) aims to generate factual answers to natural language (NL) questions by querying structured data in knowledge graphs (KGs). A central challenge is translating NL questions into accurate SPARQL queries for a given KG, a task referred to as text-to-SPARQL. Despite the growing interest in ICL, its effectiveness for SPARQL query generation remains underexplored. This study investigates the viability of using ICL with an instruction-tuned LLM approach for text-to-SPARQL, focusing on how different prompt example selection strategi es impact performance. We conducted experiments using the LC-QuAD 1.0 benchmark and a 70-billion-parameter LLM. We analyze results by running the generated queries, comparing answers using the Mean F1-score and BLEU metrics, and investigating the causes of errors. Additionally, we introduce RSE (Restricted Structural Equivalence), a metric designed to assess the equivalence of SPARQL queries under a set of structural and semantic criteria. Our results show that even a few prompt examples significantly improve performance, but gains saturate beyond 10 examples. Similarity-based example selection outperformed diversity-based selection, while providing gold-standard URIs yielded the most substantial improvements. These findings highlight entity disambiguation as the main challenge for LLM-based SPARQL generation and reinforce the importance of example relevance over quantity.
Palavras-Chave: Knowledge Graph Question Answering Large Language Models SPARQL Generation In-context Learning.