topics (page not updated since 2008)
My broad research area is databases. Most of my
work focuses on the problem of data
integration. A good overview of the work that was done at my group in recent years is given by my publications.
At this time (September 2008) we are
working at various stages of development on the following subjects:
- Schemaless querying
of WEB and P2P sources
This work aims at developing a method that allows WEB and P2P
sources to be queried. The central point of our approach
is that we do not require a mediated or global schema for
the sources. This means that our approach is adequate
for highly dynamic environments, where the
new sources may be added at any time and existing
sources may no longer be available. This work is done
by Sergio Mergen (PhD student)
The problem in data matching is how to assess if two different data instances
represent the same real world entity. We work specifically with
the problem of matching of complex or aggregate data instances like
records and XML trees.
In this context, Carina Dorneles (former PhD student) has devised
an approach to record matching that combines similarity scores from
different similarity functions (see paper here).
This approach is being extended by Marcos Nunes (MSc Student).
Felipe Levin (MSc Student) is working on an approach for record deduplication that takes not only the
record attributes in account, but also the attributes of related records.
Euler Taveira (MSc Student) is working on the problem of implementing similarity functions in a
Adrovane Kade (PhD student) is developing a method for matching XML instances
that may not belong to a same DTD (for the first results see this paper).
and selection of similarity functions
This work aims at developing methods for the semi-automatic selection
of similarity functions that are adequate for a specific dataset. We
have already developed a method for the semi-automatic estimation of recall and precision for a given similarity function. This work was part of the PhD Thesis of Raquel Stasiu.
Juilana Bonato dos Santos further developed this approach in order to completely eliminate human intervention
(see details in this paper).
further devised a quality measure specifically for the evaluation of
similarity functions, that is called discernability.
Francisco Krieser (MSc student) developed SimEval, a software tool that employs
discernability to compare similarity functions and is working on new variations of this
In the past
we have worked on the following subjects: