Português English

Proposta de Tese de Sérgio Montazzolli Silva

Detalhes do Evento

Aluno: Sérgio Montazzolli Silva
Orientador: Prof. Dr. Claudio Rosito Jung

Título: Fast Contextual Text Recogntion With Deep Convolutional Neural Networks
Linha de Pesquisa: Processamento de Imagens e Visão Computacional e Reconhecimento de Padrões

Data: 10/08/2018
Horário: 09h
Local: Prédio 43412 –  Sala 215 (sala de videoconferência) do Instituto de Informática

Banca Examinadora:
– Prof. Dr. David Menotti Gomes (UFPR – por videoconferência)
– Prof. Dr. Jacob Scharcanski (UFRGS)
– Prof. Dr. Eduardo Simões Lopes Gastal (UFRGS)

Presidente da Banca: Prof. Dr. Claudio Rosito Jung

Abstract: In this work we explore Deep Learning techniques to effectively recognize text in images given some context, called Contextualized Text Recognition (CTR). CTR has many applications, such as Automatic License Plate Recognition (ALPR) and Racing Bib Number Recognition. With the rise of Deep Learning, many computer vision results were improved in the past years.Its astonishing recognition capacity allowed the enhancement of existing and emerging of new challenging applications, such as speech recognition, self-driving cars, black and white image colorization, to name a few. However, this analysis power comes with a price: the networks present a large number of parameters, meaning that a considerable amount of data is needed in order to train such models. To overcome these difficulties in tasks where there is not much data available, we proposed in the first part of this work clever uses of data augmentation and small adaptations over the fastest models found in the literature. The results achieved are shown in the context of ALPR, where we demonstrate an approach capable of processing images at around 70 FPS and still achieving state-of-the-art performance. Going further, we noticed that there is a lack of unified datasets in ALPR encompassing license plates from different regions and scenarios. Also, there is no dataset exploring multi-regions and challenging scenarios where the plates are oblique and highly distorted. Thus, in the second part, we propose a dataset containing challenging images, and developed a novel CNN that regresses affine parameters responsible for rectifying license plates, allowing text recognition with high accuracy rates when compared to state-of-the-art methods. Finally, this monography presents the next steps until the completion of the PhD, focusing mostly on the Racing Bib Number Recognition.

Keywords: Deep Learning, Computer Vision, License Plate.