2012 - present
|
Multi-Match – multilingual
matching
Multilingual information is available in many different sources and
formats . This fact has motivated research aimed at finding mappings between
data represented in different languages in the areas of Information
Retrieval, Natural Language Processing, and more recently, Databases . This
project focuses on researching and proposing new methods for matching of
multilingual data in different scenarios. Our goals are:(i) collecting
parallel corpora on the web , (ii) finding correspondences in multilingual
Wikipedia, and (iii) detecting multilingual plagiarism. The results of this
project will represent contributions to the fields of Information Retrieval
and Natural Language Processing by providing parallel corpora that are very
important resources for the advancement of these areas. The wide availability
of multilingual data also facilitates plagiarism. Multilingual plagiarism
detection will also be targeted in this project. In this topic, our main
contribution will be the inclusion of analysis of citations and references
that is essential for the confirmation of plagiarism.
Funded by CNPq (Edital Universal)
|
2010 - present
|
Cameleon
The goal of this project is to investigate, propose, experiment,
apply and validate automatic and collaborative techniques for the development
of lexical and ontological resources that can be useful in the context of
multilingual applications, particularly for French, Portuguese and English.
Project HomePage: http://cameleon.imag.fr
Funded by CAPES
Coordinated by Aline Villavicencio
|
2010 - 2012
|
DP-ML Cross-Language Plagiarism
Detection
With
the dissemination of the Web, millions of people gain access to information
from several areas of knowledge. The number of digital information available
has grown immensely. However, despite its many benefits, the Web is one of
the easiest means to enable plagiarism. Plagiarism is one of the most serious
forms of academic misconduct. It is defined as “the use of another person's
written work without acknowledging the source”. Recent research has shown
that this type of misconduct is increasingly frequent in the academic world.
This fact has been motivating techniques to automate plagiarism detection.
This project focuses on cross-language plagiarism detection, in which the
contents of a document are translated without making any reference to its
source. Our aim is to develop an efficient method to enable the detection of
cross-language plagiarism.
Funded by CNPq (Edital Universal)
|
2009 -
|
INCT Web
INWeb was
created to study the various phenomena related to the Web. It is an institute
composed by a network of researchers from four Brazilian Universities. The
mission of this institute is to develop models, algorithms and technologies
to contribute to the integration of the Web with the society. As a result, we
expect more effective and secure distribution of information, more efficient
and useful applications, so that the Web can become a vector for social and
economic changes in our country. The institute activities include research,
education of human resources and knowledge transfer to the society and
companies. Our
research proposal plans to improve the state-of-the-art for the three layers
of networks. Specifically, we aim to develop solutions for three great
challenges defined on the unified view of the Web: (i) Identification,
characterization and modeling of interests and patterns of people behavior on
the Web and the established networks among them; (ii) Treatment of
information that circulates through the Web layers, considering the
activities of crawling, extracting and processing information; (iii) Delivery
of information in a satisfying way regardless of time and place.
Coordinated
by Virgílio Almeida (UFMG).
Website: http://www.inweb.org.br/
Funded by: CNPq, MCT, and Fapemig
|
2008 - 2010
|
Cross-Language Information
Retrieval
The
aim of this project is to contribute to the development of
Cross-Language Information Retrieval (CLIR) involving Portuguese. The
motivation is the growing need to explore documents in foreign
languages, experienced by more and more people. CLIR research has
been developing quickly since the late 90's. Despite recent advances
there are still many aspects left unexplored, specially regarding the
use of Portuguese. In this project we hope to develop a system that accepts
queries in Portuguese and searches for documents in English. In addition, the
following aspects will be investigated: (i) development of stemming
algorithms for Portuguese; (ii) proposal of new techniques for mapping
concepts between languages through the analysis of parallel and comparable
corpora; (iii) study of the process of relevance feedback in a CLIR
environment; and (iv) development of techniques for the identification of
multi-work expressions.
Funded by CNPq (Edital Universal)
|
2008 - Present
|
GPU Cluster
The aim of this
project is to build a computer cluster based on Graphics Processing Units
(GPUs) at the Institute of Informatics of UFRGS. The cluster consist in 6
computers with quad-core processors (4 CPUs), each connected via PCI-X to a
external unit containing 4 GPUs. Thus, the cluster will have 24 CPUs and 24
GPUs conected by high speed Infiniband switches. Given that each CPU is
composed internally by 4 processors, there will be effectively 3072 internal
processors with a computational power of approximately 12 TFLOPS. The
computational resource provided by this cluster will allow the processing of
computationally complex tasks and will be vital for the research to be
developed in the uniniversity in the next few years.
Funded by CNPq (Edital Jovens Pesquisadores)
Coordinated by João Luiz Dihl Comba
|
2007 - Present
|
ApproxMatch
Approximate
Data Matching aims at deciding whether two data instances represent the same
real world entity. This technique is employed in many data management
applications, such as record deduplication, similarity querying, similarity
joining and schema integration. This project aims at addressing three open
problems in Approximate Data Matching: (i) defining adequate similarity
functions for complex objects such as XML trees; (ii) developing quantitative
measures to compare the quality of similarity functions; (iii) study how
query decomposition methods should behave in environments where schema
matching happens at query time.
Funded by CNPq (Edital Universal)
Coordinated by Carlos A. Heuser
|
2007 - Present
|
Managing Large Volumes of Textual
Data
This
project is within the scope of the challenges set by the Brazilian Computer
Society, namely the management of large volumes of multimedia distributed
data. In the context of this challenge, this project deals specifically with
the management of textual data, such as web pages or electronic documents,
created by public or private organisations. One of the central problems is to
establish relations and associations between documents. In this project, two
types of relationships are considered (i) versioning of documents, aiming at
determining groups of documents that can be considered as different versions
of the same information; and (ii) content similarity, aiming at clustering
documents that deal with the same subject.
Funded by CNPq (Edital Grandes Desafios)
Coordinated by J. Palazzo M. de
Oliveira
|
2005 - Present
|
Integrating Information Retrieval
Techniques into Database Systems
In
the classical view, the areas of Information Retrieval (IR) and Databases
(DB) have little in common. DB normally deal with structured data while IR
deals with unstructured documents, typically in the form of free text.
Considering that data stored by most organisations is both structured and
unstructured, and that users frequently have to query data in both formats,
there is a growing need to integrate the two areas. The aim of this project
is to apply IR concepts into DB systems to facilitate the process of solving
imprecise queries. The challenge it to modify the processing of queries to
include the notions of similarity and relevance.
Funded by CAPES-PRODOC
|
2003 - 2003
|
Cell Assemblies
Reverberating
circuits of neurons can explain many psychological phenomena; as the neural
representation of concepts, they may be the basis of thought. While evidence
exists for neural Cell Assemblies (CAs), there has been very little work on
the computational modelling of CAs. The goal of this project is to explore
models of CAs, metrics and uses of CAs. My contribution was to adapt the CAs
model to perform Information Retrieval.
Funded by EPSRC -England
Coordinated by Christian
Huyck
|