Current research topics

Current research topics (page not updated since 2008)

My broad research area is databases. Most of my work focuses on the problem of data integration. A good overview of the work that was done at my group in recent years is given by my publications.

At this time (September 2008) we are working at various stages of development on the following subjects:

Schemaless querying of WEB and P2P sources
This work aims at developing a method that allows WEB and P2P sources to be queried. The central point of our approach is that we do not require a mediated or global schema for the sources. This means that our approach is adequate for highly dynamic environments, where the new sources may be added at any time and existing sources may no longer be available. This work is done by Sergio Mergen (PhD student)

Complex data matching
The problem in data matching is how to assess if two different data instances represent the same real world entity. We work specifically with the problem of matching of complex or aggregate data instances like records and XML trees.
In this context, Carina Dorneles (former PhD student) has devised an approach to record matching that combines similarity scores from different similarity functions (see paper here).
This approach is being extended by Marcos Nunes (MSc Student).
Felipe Levin (MSc Student) is working on an approach for record deduplication that takes not only the record attributes in account, but also the attributes of related records.
Euler Taveira (MSc Student) is working on the problem of implementing similarity functions in a relational database.
Adrovane Kade (PhD student) is developing a method for matching XML instances that may not belong to a same DTD (for the first results see this paper).

Evaluation and selection of similarity functions
This work aims at developing methods for the semi-automatic selection of similarity functions that are adequate for a specific dataset. We have already developed a method for the semi-automatic estimation of recall and precision for a given similarity function. This work was part of the PhD Thesis of Raquel Stasiu. Juilana Bonato dos Santos further developed this approach in order to completely eliminate human intervention (see details in this paper).
Raquel Stasiu further devised a quality measure specifically for the evaluation of similarity functions, that is called discernability. Francisco Krieser (MSc student) developed SimEval, a software tool that employs discernability to compare similarity functions and is working on new variations of this quality measure.

Former research topics

In the past we have worked on the following subjects:

Schema matching
Sergio Mergen has developed a method for matching XML and relational schemata and also a method for translating data from one ontology to another.
Importing and exporting XML views from relational databases
This work aims at developing a method that allows XML views to be constructed from a relational database, exported (e.g. to a mobile device), modified off-line and reimported into the relational database. Recent results were published in the following paper. This research is part of a CNPq sponsored project together with Vanessa Braganholo (UFRJ). Vanessa is a former PhD student at our group and has developed a method for updating relational databases through XML views.
Querying XML sources through a mediated schema
In this work we developed a query decomposition algorithm that allows queries against a conceptual schema to be decomposed over several XML schemata. This work was part of a project with Ronaldo Mello (UFSC). Ronaldo is a former PhD student at our group and has developed a method for the bottom-up construction of a conceptual schema from several XML schemata. In this project, Sandro Camillo developed a method for mapping a query on a conceptual database into an XPath query on a single XML source. Felipe Victolla extended Sandro's method allowing the query to be decomposed over several XML sources (see this paper)
For older projects please take a look at this page.