### 5<sup>th</sup> IEEE CASS Rio Grande do Sul Workshop Porto Alegre, Brasil Instituto de Informática, UFRGS

October 22-23, 2015





### proceedings

### www.inf.ufrgs.br/cassw











# **Proceedings**

5<sup>th</sup> IEEE CASS Rio Grande do Sul Workshop

**CASSW 2015** 

**October 22<sup>nd</sup> to 23<sup>rd</sup>, 2015** 

Porto Alegre, Rio Grande do Sul, Brazil

Cover art design and art production by Ricardo Reis.

October 22-23, 2015 Porto Alegre, Brazil

### **Table of Contents**

### **Introductory Section**

Foreword Committees Organization Committee Technical Program Committee

### **Invited Talks**

| Rethinking Memory System Design for Data-Intensive Computing             | 2 |
|--------------------------------------------------------------------------|---|
| ONUR MUTLU, Carnegie Mellon University, USA                              |   |
| 5G transceiver: RFIC Design by Mathematics                               | 3 |
| Francois Rivet, Université de Bordeaux, France                           |   |
| Fast Prototyping: A Must in Current Electronic System Design Methodology | 4 |
| Victor Grimblatt, Synopsys Chile R&D Center, Chile                       |   |
| New Developments in state-of-the-art Video Coding                        | 5 |
| Luis Alberto da Silva Cruz - Universidade de Coimbra, Portugal           |   |
| Accelerating Bioinformatics Algorithms with Reconfigurable Devices       | 6 |
| Ricardo Jacobi - Universidade de Brasília, Brazil                        |   |
| IBM Design Closure Flow for High Performance Microprocessors             | 7 |
| Gi-Joon Nam - IBM Yorktown Heights, USA                                  |   |
| IC Physical Implementation Challenges in sub-20nm CMOS Nodes             | 8 |
| Andrew Kahng - University of California at San Diego, USA                |   |
| Timing-Driven Placement                                                  | 9 |
| Jose Güntzel - Universidade Federal de Santa Catarina, Brazil            |   |
|                                                                          |   |

### October 22-23, 2015 Porto Alegre, Brazil

### **Poster Session 1**

| 1.1.  | Logic Synthesis to Automatic Cell Layout Generation<br>Calebe Conceição and Ricardo Reis, UFRGS                                                                                  | 11 |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2.  | Logic Minimization by Gate Merging<br>Luciana Mendes Da Silva, Calebe Micael de Oliveira Conceição,<br>Guilherme Bontorin and Ricardo Reis, UFRGS                                | 12 |
| 1.3.  | Local search techniques for incremental timing-driven placement<br>Mateus Fogaça, Guilherme Flach, Marcelo Johann, Ricardo Reis and<br>Jucemar Monteiro, UFRGS                   | 13 |
| 1.4.  | <b>Test Solutions for NAND Flash Products - eMMC Test Solution</b><br>Elcio Kondo, Magrit Krug, Marcio Da Silva, Lucio Prade, Celso Peter and<br>Fabiano Colling, Unisinos       | 14 |
| 1.5.  | <b>3D Sound Perception using Stereo Headphones</b><br>Joel A. Luft and Altamiro A. Susin, UFRGS                                                                                  | 15 |
| 1.6.  | Asynchronous VLSI Design: Circuit Templates, Cell Libraries and<br>Synthesis Flows<br>Matheus Moreira and Ney Calazans, PUCRS                                                    | 16 |
| 1.7.  | Automatic Synthesis of Layout with ASTRAN                                                                                                                                        | 17 |
| 4 0   | Gisell Moura, Adriel Ziesemer and Ricardo Reis, UFRGS                                                                                                                            | 40 |
| 1.8.  | A Multi-Standard Interpolation Hardware Solution for H.264 and HEVC<br>Guilherme Paim, Henrique Maich, Vladimir Afonso, Luciano Agostini,<br>Bruno Zatt and Marcelo Porto, UFPel | 18 |
| 1.9.  | Stereo Matching and Sensor Fusion Technique for Image Depth                                                                                                                      | 19 |
|       | Estimation                                                                                                                                                                       |    |
| 4.40  | Fabio Pereira and Altamiro Susin, UFRGS                                                                                                                                          | 00 |
| 1.10. | Jezz: An Efficient Legalization Algorithm<br>Julia Puget, Guilherme Flach, Marcelo Johann and Ricardo Reis, UFRGS                                                                | 20 |
| 1.11. | PHiCIT - Improving Hierarchical Networks-on-chip through 3D Silicon                                                                                                              | 21 |
|       | Photonics Integration<br>Cezar Rodolfo Wedig Reinbrecht, Martha J. Sepúlveda and Altamiro<br>Amadeu Susin, UFRGS                                                                 |    |
| 1.12. | An Evaluation of BTI Degradation of 32nm Standard Cells<br>Rafael Schivittz, Cristina Meinhardt and Paulo F. Butzen, FURG                                                        | 22 |
| 1.13. | Energy-Efficient Architectures for Sum of Squared Differences<br>Calculation<br>Ismael Seidel, Marcio Monteiro and Jose Luis Güntzel, UFSC                                       | 23 |
| 1.14. | SATD Hardware Architecture for HEVC Encoder<br>Bianca Silveira, Claudio Diniz, Eduardo Da Costa and Mateus Fonseca,<br>UCPel                                                     | 24 |
| 1.15. | Design Method for CML Topology-Based Divide-by-2 Circuit with                                                                                                                    | 25 |
|       | Unbalanced Loads<br>Raphael Souza and Agord Matos, Programa CI-Brasil                                                                                                            | 20 |
| 1.16. | Run-time of the Data Dependency Detector for Harvesting Parallelism                                                                                                              | 26 |
|       | for Global Routing<br>Diego Tumelero, Guilherme Bontorin and Ricardo Reis, UFRGS                                                                                                 |    |
|       |                                                                                                                                                                                  |    |

### October 22-23, 2015 Porto Alegre, Brazil

### **Poster Session 2**

| 2.1.  | High Throughput SAD Architecture for Quality HEVC Encoding<br>Brunno Abreu, Mateus Grellert and Sergio Bampi, UFRGS                                                                                          | 28 |
|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.2.  | A tool for Fault Insertion Simulation in CMOS Circuits<br>Ygor Aguiar <sup>1</sup> , Alexandra Lackmann Zimpeck <sup>2</sup> and Cristina Meinhardt <sup>1</sup> ,<br>FURG <sup>1</sup> , UFRGS <sup>2</sup> | 29 |
| 2.3.  | Evaluation of different SRAM cell topologies in 32nm technology<br>Roberto Almeida, Paulo Butzen and Cristina Meinhardt, FURG                                                                                | 30 |
| 2.4.  | Low Latency Izhikevich's Simple Neuron Model on FPGA<br>Vitor Bandeira, Vivianne L. Costa, Guilherme Bontorin and Ricardo Reis,<br>UFRGS                                                                     | 31 |
| 2.5.  | Integration of the uCLinux on the TVD-SoC Architecture for the Brazilian Digital TV                                                                                                                          | 32 |
|       | Ana Luiza Brod, Cezar Rodolfo Wedig Reinbrecht and Altamiro Amadeu<br>Susin, UFRGS                                                                                                                           |    |
| 2.6.  | An Optimization-Based Design Methodology for Fully Differential Amplifiers                                                                                                                                   | 33 |
|       | Arthur Campos de Oliveira <sup>1</sup> , Paulo de Aguirre <sup>2</sup> , Lucas Compassi Severo <sup>2</sup><br>and Alessandro Girardi <sup>2</sup> , UFRGS1, UNIPAMPA <sup>2</sup>                           |    |
| 2.7.  | Development of a DSP module in VHDL with use of SIS/SIL techniques                                                                                                                                           | 34 |
|       | Bruna Fernandes Flesch, Rodrigo Marques Figueiredo, Lucio Rene Prade,                                                                                                                                        |    |
| 2.8.  | Marcio Rosa Da Silva and Bianca Brand, Unisinos<br>Generating a Multiple Program Transport Stream for SBTVD                                                                                                  | 35 |
| 2.0.  | Jefferson Johner, Cezar Rodolfo Wedig Reinbrecht and Altamiro Amadeu<br>Susin, UFRGS                                                                                                                         | 55 |
| 2.9.  | Integration of ISDB-T NIM Tuner on TVD-SoC for Brazilian Digital TV Set-                                                                                                                                     | 36 |
|       | top Boxes                                                                                                                                                                                                    |    |
|       | Paulo Kipper, Cezar Rodolfo Wedig Reinbrecht and Altamiro Amadeu<br>Susin, UFRGS                                                                                                                             |    |
| 2.10. | Adjusting Video Tiling to Available Resources in a Per-frame Basis in                                                                                                                                        | 37 |
|       | HEVC                                                                                                                                                                                                         |    |
|       | Giovani Malossi <sup>1</sup> , Daniel Palomino <sup>2</sup> , Cláudio Diniz <sup>2</sup> , Sergio Bampi <sup>1</sup> and Altamiro Susin <sup>1</sup> , UFRGS1, UFPel <sup>2</sup>                            |    |
| 2.11. | Profile and Analysis of Memory Hierarchies for High Efficiency Video                                                                                                                                         | 38 |
|       | Coding – HEVC                                                                                                                                                                                                |    |
| 2 1 2 | Ana Mativi, Eduarda Monteiro and Sergio Bampi, UFRGS<br>A Reconfigurable Operational Amplifier in 180nm CMOS Technology                                                                                      | 39 |
| 2.12. | Mateus C. S. Oliveira, Paulo César C. de Aguirre, Lucas C. Severo and<br>Alessandro Girardi, UNIPAMPA                                                                                                        | 39 |
| 2.13. | A Educational Tool for VLSI Global Placement                                                                                                                                                                 | 40 |
|       | Gabriel Porto, Cristina Meinhardt and Paulo Francisco Butzen, FURG                                                                                                                                           | -  |
| 2.14. | Set-top Box Interface Software                                                                                                                                                                               | 41 |
| •     | Pedro Portugal and Altamiro Susin, UFRGS                                                                                                                                                                     |    |
| 2.15. | Evaluating Devices Behavior in CMOS and FinFET Technologies<br>Giane Ulloa and Cristina Meinhardt, FURG                                                                                                      | 42 |

October 22-23, 2015 Porto Alegre, Brazil

### Foreword

The IEEE Circuits and Systems Workshop will be held for the fourth time in Porto Alegre, October 22-23, 2015, at the Instituto de Informática of Universidade Federal do Rio Grande do Sul (UFRGS). It is an event intended for academic exchange between national researchers and foreign researchers. The speakers are renowned researchers and from institutions with significant work in the field of Circuit and Systems. The event will last for two days and the program will consist of a series of tutorials and posters sessions. The invited speakers are Andrew Kahng - UCSD (USA), Gi-Joon Nam - IBM Yorktown (USA), François Rivet - IMS (France), Onur Mutlu - CMU (USA), Victor Grimblatt - Synopsys (Chile), Ricardo Jacobi - UNB (Brazil), Luis Alberto da Silva Cruz - Univ. of Coimbra (Portugal) and José Güntzel, UFSC (Brazil). The posters sessions includes the presentation of 31 posters, which were peerreviewed prior publication. For this invaluable contribution, we would like to thank all the reviewers who participated in the process. We would like to extend our gratitude to the members of the organization committee as well. Finally, we also would like to thank IEEE CASS Society for their support in the scope of the Outreach Call, as well as CNPq and CAPES. We wish to all participants an excellent workshop and fruitful exchanges.

Ricardo Reis General Chair

Marcelo Johann Program Chair

Raphael Brum Poster Session Chair

October 22-23, 2015 Porto Alegre, Brazil

### Committees

### **General Chair**

Ricardo Reis, UFRGS, Brazil

### **Program Chair**

Marcelo Johann, UFRGS, Brazil

### **Poster Session Chair**

Raphael Brum, UFRGS, Brazil

### **Finance Chair**

Gracieli Posser, UFRGS, Brazil

### **Publication Chair**

Carolina Metzler, UFRGS, Brazil

### **Web Chairs**

Tania Ferla, UFRGS, Brazil Gabriel Ribeiro, UFRGS, Brazil

### IEEE Circuits and Systems (CAS) Society Liaison:

Ricardo Reis, UFRGS, Brazil

### Student Branch IEEE UFRGS:

Prof. Marcelo Soares Lubaszewski Cezar Rodolfo Wedig Reinbrecht Jefferson Johner Paulo Kipper Magnun Furtado Ana Luiza Brodt

### Sponsors

IEEE Circuits and Systems Society (CASS) Brazilian Computer Society (SBC) Brazilian Microelectronics Society (SBMicro)

### Organization

Universidade Federal do Rio Grande do Sul (UFRGS)

October 22-23, 2015 Porto Alegre, Brazil

### Paper Awards

### CASSW 2015 Best Graduate Student Poster Award

### Local Search Techniques for Incremental Timing-Driven Placement

Mateus Fogaça, Guilherme Flach, Marcelo Johann, Ricardo Reis and Jucemar Monteiro, UFRGS

### CASSW 2015 Best Undergraduate Student Poster Award

### High Throughput SAD Architecture for Quality HEVC Encoding

Brunno Abreu, Mateus Grellert and Sergio Bampi, UFRGS

October 22-23, 2015 Porto Alegre, Brazil

### **Technical Program Committee**

Joao Azevedo, Catena Radio Design Sérgio Bampi, UFRGS Alexsandro Bonatto, IFRS Guilherme Bontorin, UFRGS Thiago Both, UFRGS Raphael Brum, UFRGS Paulo Butzen, FURG Bárbara Canto, UFRGS Paulo Comassetto, Unipampa Anelise Kologeski, UFRGS Jody Matos, UFRGS Cristina Meinhardt, FURG Carolina Metzler, UFRGS Matheus Moreira, PUCRS Cícero Nunes, UFRGS Samuel Pagliarini, CMU Gracieli Posser, UFRGS Ricardo Reis, UFRGS Felipe Sampaio, UFRGS / IFRS Erik Schuler, IFRS Jorge Tonfat, UFRGS Pablo Vaz, UFRGS Alexandra Zimpeck, UFRGS

October 22-23, 2015 Porto Alegre, Brazil

### **Invited Talks**

# Rethinking Memory System Design for Data-Intensive Computing

Onur Mutlu, Carnegie Mellon University, USA

**Abstract:** The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques.

In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we might also briefly touch upon our ongoing related work in combating scaling challenges of NAND flash memory.

An accompanying paper can be found here: http://users.ece.cmu.edu/~omutlu/pub/memory-systems-research\_superfri14.pdf

**Short Bio:** Onur Mutlu is the Strecker Early Career Professor at Carnegie Mellon University. His broader research interests are in computer architecture and systems, especially in the interactions between languages, system software, compilers, and microarchitecture, with a major current focus on memory systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. Prior to Carnegie Mellon, he worked at Microsoft Research, Intel Corporation, and Advanced Micro Devices. He was a recipient of the IEEE Computer Society Young Computer Architect Award, Intel Early Career Faculty Award, faculty partnership awards from various companies, a number of best paper recognitions at various top computer systems venues, and a number of "computer architecture top pick" paper selections by the IEEE Micro magazine.

### **5G transceiver: RFIC Design by Mathematics**

Francois Rivet, Université de Bordeaux, France

**Abstract:** Wireless system designers have been facing the continuously increasing demand for high data rates and mobility required by new wireless applications and therefore have started research on new generation of wireless systems that are expected to be deployed beyond 2020. 5G wireless networks will support 1,000-fold gain in capacity, connections for at least 100 billion devices, and a 10 Gbps individual user experience capable of extremely low latency and response times. Deployment of these networks will emerge between 2020 and 2030. It is clearly observed that new solutions are required. The focus of this presentation will be on the RFIC Design by Mathematics of 5G transceiver, exploring novel approaches along with a thorough discussion of advanced techniques for these receivers and transmitters towards a revolution in RF integrated circuits and systems design.

Design by Mathematics is a disruptive of way of thinking in RFIC design. It uses mathematical properties for signal processing in RF signal conditioning, from baseband to RF Front-End. These mathematical properties are integrated in silicon to display the best trade-off in terms of power consumption, dynamic range, wide bandwidth, frequency agility and modulation schemes. The work is brought at a high system level and enables to relax constraints compared to traditional RF architectures.

**Short Bio:** Dr. Francois Rivet received the Master degree in 2005 from Electrical Engineering Graduate School of BORDEAUX in Southwest of France (ENSEIRB) and the PhD degree in 2009 from the University of BORDEAUX, France. He joined the French Research Agency (CNRS) in 2005 as a PhD student. His PhD activities took place at IMS, the microelectronics laboratory of the University of BORDEAUX. His research is focused on the design of RFICs with a dedicated methodology ("Design by Mathematics"). He is a member of the STMicroelectronics-IMS joint research laboratory. Dr. Rivet has publications in top ranked journals, international conferences, national conferences and holds 9 patents. He received the Best Paper Award at Software Defined Radio Forum in 2008 at Washington DC, USA. He is member of several Technical Program Committees (RFIC, MWSCAS, SBCCI, ...). Since June 2010, he is tenured as Associate Professor at IMS Lab and Bordeaux Institute of Technology. In 2014, he founded the "Circuits and Systems" team at IMS Lab.

# Fast Prototyping: A Must in Current Electronic System Design Methodology

Victor Grimblatt, Synopsys Chile R&D Center, Chile

Abstract: The role of electronics in our life has changed dramatically over the last decade. This change started almost 10 years ago with the smartphone introduction when applications moved from desktop to mobile devices. A similar electronic revolution is happening in the automotive industry. Fuel consumption and emissions are driving new hybrid and electric vehicles; the market is also interested on safety and is pushing for new concepts for automotive drive assistance systems (ADAS). Self-driving cars are becoming a reality. We are seeing comparable boosts of electronics in other markets such as consumer and industrial applications. Finally the emergence of Internet of Things (IoT) will take the involvement of electronics in our lives to a whole new level. All those devices are processing information and communicating with the surrounding environment. The advancement in silicon complexity as well as the software running on those devices make it possible. The increase in the amount and complexity of the software content is putting more pressure on the entire supply chain to meet time to market, differentiation, and quality expectations. Companies have been adapting their processes to provide more functionality through software and improve the impact of software on the performance and power consumption. At the same time they are reducing the software schedule dependency from hardware availability through prototyping. During the presentation will review the prototyping methodologies and how the dependency on hardware is mitigated. We will also review how prototyping helps on early architecture exploration and selection, software development, hardware-software integration, and system validation.

Short Bio: Victor Grimblatt was born in Viña del Mar, Chile. He has an engineering diploma in microelectronics from Grenoble INP (France) and an electronic engineering diploma from Universidad Tecnica Federico Santa Maria (Chile). He is currently R&D Group Director and General Manager of Synopsys Chile, leader in EDA. He opened the Synopsys Chile R&D Center in 2006. He has expertise and knowledge in business and technology and understands very well the trends of the electronic industry; therefore he is often consulted for new technological business development. Before joining Synopsys he worked for different Chilean and multinational companies, such as Motorola Semiconductors, Honeywell Bull, VLSI technology Inc., and Compass Design Automation Inc. He started to work in EDA in 1988 in VLSI Technology Inc. where he developed synthesis tools being one of the pioneers of this new technology. He also worked in embedded systems development in Motorola semiconductors. In 1990 he was invited by professor McCluskey to present his work in Logic Synthesis at the CRC, Stanford University. He has published several papers in EDA and embedded systems development, and since 2007 he has been invited to several Latin American Conferences to talk about Circuit Design, EDA, and Embedded Systems. From 2006 to 2008 he was member of the "Chilean Offshoring Committee" organized by the Minister of Economy of Chile. In 2010 he was awarded as "Innovator of the Year in Services Export". In 2012 he was nominated to best engineer of Chile. He is also member of several Technical Program Committees on Circuit Design and Embedded Systems. Since 2012 he is chair of the IEEE Chilean chapter of the CASS. Victor Grimblatt is from 2002 professor of Electronics and IC Design in Universidad de Chile and Universidad de los Andes.

### New Developments in state-of-the-art Video Coding

Luis Alberto da Silva Cruz - Universidade de Coimbra, Portugal

**Abstract:** In 2013 the current state-of-the-art video coder, H.265/HEVC version 1, reached the final standard status. Although H.265/HEVC provided enormous coding efficiency gains in comparison to its predecessor, H.264/AVC, its development did not stop, and after incorporation of several new tools to handle, for e.g. 3D video a new version 2 is now available. The emergence of new signal formats like HDR video, plenoptic video and point-cloud 3D video poses new challenges to video coding technology. To address these challenges explorations on improvements to HEVC/H.265 have already begun, with provisional impressive results. This talk will cover briefly the history of video coding technology, mostly since MPEG-2, to then describe the latest generation codec (HEVC) in some detail. After that the speaker will introduce new video coding technology. The talk will end with a sum-up of recent research and development results in the field.

**Short Bio:** Luis A. da Silva Cruz (M'11) received the Licenciado and M.Sc. degrees in Electrical Engineering from the University of Coimbra, Portugal, in 1989 and 1993, a M.Sc. degree in Mathematics and a Ph.D. degree in Electrical Computer and Systems Engineering from Rensselaer Polytechnic Institute (RPI), Troy, NY, US in 1997 and 2000 respectively. He has been with the Department of Electrical and Computer Engineering of the University of Coimbra in Portugal since 1990 first as a Teaching Assistant and as an Assistant Professor since 2000. He is a researcher of the Institute for Telecommunications, Portugal, where he he has been working on video processing and coding, mainly video codec technology, wireless communications and medical image and video processing for automatic diagnostic applications.

# Accelerating Bioinformatics Algorithms with Reconfigurable Devices

Ricardo Jacobi - Universidade de Brasília, Brazil

**Abstract:** High Performance Computing (HPC) can be achieved through a variety of technologies. From the classic supercomputers to today's heterogeneous platforms combining high performance processors, GPUs, many core systems and FPGAs, the evolution of HPC is related to the state of art in semiconductor technology and architectures. The use of FPGA to accelerate algorithms is gaining momentum due to the large amount of parallelism it provides and the power reduction obtained by migrating algorithms to dedicated hardware. Bioinformatics is a research field dedicated to the processing of biological data. Sequence comparison and sequence alignment are two basic operations which aims to find the similarity between two genomic sequences and find the alignment that produces the best matching between them. Since sequences can be huge, HPC is needed to speed up the process. Some dedicated architectures to tackle these problems are presented, based on wavefront processing vectors.

**Short Bio:** Ricardo Pezzuol Jacobi received a PhD in Applied Science - Université Catholique de Louvain in 1993. He was professor of UFRGS Informatics Institute from 1989 to 1998, when joined UnB. He is currently associate professor of the Computer Science Department at the University of Brasilia. He was Director of the Institute of Exact Sciences at UNB from 2004 to 2007 and Vice-Director of the Campus UnB Gama from 2008 to 2012. His research areas are reconfigurable architectures and applications, hardware and software co-design and dedicated architectures for high performance computing.

### IBM Design Closure Flow for High Performance Microprocessors

Gi-Joon Nam - IBM Yorktown Heights, USA

**Abstract:** As VLSI technology scales down further to meet the demands of Moore's law, interconnect delays become the dominant factor in timing optimization. Coupled with conflicting optimization objectives such as delay, area, routability and design for manufacturability, the design closure problem of complex VLSI designs becomes almost intractable. This presentation will introduce the IBM design closure methodology and address the engineering concepts that shape a modern layout synthesis flow. IBM design closure methodology has demonstrated significant success for high performance microprocessor designs in IBM flagship products such as P/Z servers. Then, I will conclude the talk that this still is an exciting time to be a computer & electrical engineers with tremendous opportunities laying ahead in VLSI and architecture areas.

**Short Bio:** Gi-Joon Nam is a research staff member and manager at the IBM T. J. Watson Research Center. He currently manages the Physical Design department. His group is conducting research on various design automation techniques for high performance computing IBM products such as IBM's P/Z microprocessors and server chips. Prior to this, he has managed the Optimized Analytics System department at the IBM Austin Research Lab working on the workload optimized systems for big data applications. Gi-Joon has been involved with leading-edge high performance VLSI designs for 15+ years, starting from 130 nm technology nodes to sub-20 nm technologies.

# IC Physical Implementation Challenges in sub-20nm CMOS Nodes

Andrew Kahng - University of California at San Diego, USA

**Abstract:** IC physical implementation is where "rubber meets the road" for power, performance, area and cost in leading-edge CMOS nodes. This talk will highlight new challenges, as well as promising optimization levers, for physical implementation in sub-20nm process technologies. The list of challenges includes (i) BEOL resistivity and variability; (ii) greater discreteness in sizing due to fewer fins and threshold voltages; (iii) a "race to the end of the roadmap" which causes too-hasty design enablement; (iv) a growing loss of model-hardware correlation; and (v) the breakdown of old algorithms and methodologies in the face of today's explosion of signoff modes and corners. Available levers to meet these challenges include (i) on-chip adaptivity; (ii) holistic margin recovery; (iii) improved design signoff criteria; (iv) "closing the loop" in the performance analyses that drive circuit optimizations; and (v) 3-dimensional integration.

**Short Bio:** Andrew B. Kahng is Professor of CSE and ECE at UC San Diego, where he holds the endowed chair in High-Performance Computing. He has served as visiting scientist at Cadence (1995-1997) and as founder, chairman and CTO at Blaze DFM (2004-2006). He is the coauthor of 3 books and over 400 journal and conference papers, holds 30 issued U.S. patents, and is a fellow of ACM and IEEE. He has served as general chair of DAC, ISQED, ISPD and other conferences. He has also been international chair/co-chair of the Design technology working group, and recently of the System Integration focus team, in the ITRS since 2000. His research interests include IC physical design and performance analysis, the IC design-manufacturing interface, combinatorial algorithms and optimization, and the roadmapping of systems and technology.

### **Timing-Driven Placement**

Jose Güntzel - UFSC, Brazil

**Abstract:** Timing closure is currently one of the most challenging tasks in the design of VLSI circuits. Several techniques are iteratively applied along the physical design flow to meet the timing constraints such as gate sizing, buffer insertion, timing-driven routing and timing-driven placement. Among these techniques, timing-driven placement (TDP) is probably the one with highest timing optimization potential since it finds new legal locations for standard cells based on quite accurate circuit delay information which generally results in shorter interconnect delays. This talk reviews some of the most important TDP techniques found in the literature, pointing out their main features. Although the quality of global placement has significantly advanced in the last years, there is still a lack of efficient techniques to address the TDP problem. Therefore, this talk also presents a Lagrangian Relaxation formulation for TDP that compresses both late and early slack histograms while preserving the placement quality.

**Short Bio:** José Luís Güntzel received the Electrical Engineering degree from the Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil, in 1990. He received both the M.Sc. and the Ph.D. degrees in Computer Science also from the Federal University of Rio Grande do Sul (UFRGS) in 1993 and 2000, respectively. Since 2007, Dr. Güntzel is an Associate Professor at the Department of Informatics and Statistics of the Federal University of Santa Catarina (Florianopolis, Brazil). His research interests include physical design automation, timign analysis, memory optimization for low-power embedded computing systems and energy-efficient VLSI architectures for video compression. Dr. Güntzel is a member of the IEEE/IEEE-CAS, the Brazilian Microelectronics Society and the Brazilian Computer Society.

5<sup>th</sup> IEEE CASS Rio Grande do Sul Workshop October 22-23, 2015 Porto Alegre, Brazil

### Poster Session 1: Graduate Track

## Logic Synthesis to Automatic Cell Layout Generation

### Calebe Conceição, Ricardo Reis

### Introduction

UFRGS

In standard cell methodology, the small number of logic functions in most of cell libraries restricts the search for better optimization in number of transistor, since the circuit description must fit into the set of logic functions available [1]. The ASTRAN achieves high quality design of the layout of any transistor network [2][3], and may fill this gap by producing on demand a customized set of cells for each circuit needs.



Av. Bento Gonçalves, 9500 – 91501-970 Porto Alegre/RS, Brazil Contact: {cmoconceicao,reis}@inf.ufrgs.br



Luciana Mendes da Silva, Calebe Micael de Oliveira Conceição, Guilherme Bontorin, Ricardo Reis

### Introduction

The main objective is to reduce the number of transistors and interconnections between cells of a circuit. Our methodology uses a library-free approach with the use of complex gates to reduce the number of transistors. The methodology is evaluated by using the benchmark b02 from ITC 99.



### Logical Optimization Methodology





### **Conclusion and Future Works**

The ITC99 benchmark b02 has 22 cells, it represents 206 transistors; it was selected 11 combinational gates with unitary fan out, using a total of 62 transistors. The same logic can be implemented with 5 complex gates, using 54 transistors. We had these results doing the Gate Merging Index and Iteractions equal 2. It is demonstrated that our method presents a significant potential to reduce the number of transistors and interconnections.

Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Microeletrônica Av. Bento Gonçalves,9500 – 91501970 Porto Alegre/RS, Brazil

PGMICRO Contact: {Imsilva, cmoconceicao, gbontorin, reis}@inf.ufrgs.br



Mateus Fogaça, Jucemar Monteiro, Guilherme Flach, Marcelo Johann and Ricardo Reis

### 1. Introduction

Timing closure becomes more and more challenging as technology scales. In physical design, placement is a key step to achieve routing and timing constraints. This work presents 4 local search techniques to reduce timing violations during placement. An algorithm using the proposed techniques was applied to 5 IBM benchmarks.

### 2. Proposed techniques

#### Early violations removal:

Reduce early violations by spreading critical cells subject to a max displacement.



#### Path straightening:

Compute the **weighted** average position of sinks/driver and place cell in their **bounding boxes**.



#### **Buffer alignment:**

Place **buffers** between their **drivers** and **sinks**.



#### Net load reduction:

Approach non critical cells to reduce net load.



### 3. Placement Algorithm



### 4. Experimental results

|             | Early     |        | La        | ite    |
|-------------|-----------|--------|-----------|--------|
| Circuit     | TNS       | Impr.  | TNS (10⁵) | Impr.  |
| superblue16 | -51,61    | 59.06% | 6,20      | 20.05% |
| superblue18 | -62,14    | 87.05% | -9,18     | 11.33% |
| superblue4  | -99,32    | 84.52% | -34,31    | 1.32%  |
| superblue10 | -32,04    | 94.84% | -331,18   | 0.10%  |
| superblue7  | -1.941,07 | 2.25%  | -17,84    | 3.94%  |



Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Microeletrônica Av. Bento Gonçalves, 9500 – 91501-970 Porto Alegre/RS, Brazil Contact: {mpfogaca, jucemar.monteiro, gaflach, johann, reis}@inf.ufrgs.br





### Test Solutions for NAND Flash Products <u>eMMC Test Solution</u>

UNISINOS itt CHIP test group - Electrical Test Laboratory Kondo, E.; Krug, M.; da Silva, M.; Prade, L.; Colling, F.; Peter, C.

### Introduction

Non Volatile Memories (NVM) are becoming more frequent on our lives and NAND Flash is the most popular kind of NVM. USB drivers (pen drivers), Secure Digital (SD) Cards, Solid State Drives (SSD) and Embedded Multi Media Card (eMMC) are used in our cell phones and tablets. eMMC consists of memory controller and NAND memory in the same package.

### **Objectives**

Study, research, develop and train people on NAND Flash architecture and operation. All this efforts are going to converge in a local test solution for functional testing NAND Flash products using FPGA (Field Programmable Gate Array).

The local test solution for eMMC product consists in adapt existent production DRAM burn in and sorter machines. Burn in board is a high paralleslim board used to test ICs during burn in test.

1. Main Objectives

- Perform all eMMC tests locally.
- Increase test capacity with installed equipment.

2. Specific Objectives:

- Adapt DRAM burn in board (BIB) in order to increase parallelism.
- Adapt sorter machine to automate BIB load and unload.
- Prototype FPGA board to perform all eMMC tests at BIB.



eMMC IC package 153ball FBGA capacity 16GB - up to 400MBps (DDR)



#### Unisinos – Universidade do Vale do Rio dos Sinos Itt CHIP – Instituto Tecnológico em Semicondutores Av. Unisinos, 950 – 93022-000 – São Leopoldo/RS, Brazil Contact: ittchip@unisinos.br | www.unisinos.br

### Methodology

Adaptor board designed at Unisinos Modelab with Altium<sup>™</sup> design software.





Prototype adaptor board for eMMC socket



Prototype adaptor board for eMMC mounted at DRAM BIB



Prototype FPGA board to run eMMC tests on BIB



eMMC controller developed using hardware description language VHDL with FPGA, which will allow to access all eMMC resources for testing.

### **Final Considerations**

This project is still under development, and has as goal to deliver a final product for industry. This project is helping us to better understend the memory test process and even if it's not possible to have a competitive product at the end, the process of making it is increasing the team knowledge on the whole process.

> ©2015 Unisinos www.unisinos.br





### 3-D Sound Perception Using Stereo Headphones

Joel A. Luft and Altamiro A. Susin

#### Introduction

The fundamental objective in 3D-audio is to implement the three-dimensional audio reproduction to create a natural spatial sound perception by the listener. This work attempts to create binaural real life listening experiences using traditional headphones. The main data used for spatial audio reproduction are the Head-Related Impulse Responses (HRIRs) and Binaural Room Impulse Response (BRIRs) [1].

#### Method

The 3-D perception is obtained filtering the sound by HRIR or BRIR and applied to the phone according figure 1.



Figure1 Spatial sound synthesis.

The HRIR used is from CIPIC database[2]. The HRIR have different responses depending the azimuth, elevation and subjects. Figure 2 present some examples of HRIR and HRTF from CIPIC database. HRTF (HEAD RELATED TRANSFER FUNCTION) is the Fourier transform of HRIR.



Figure 2. HRIR and HRTF example for azimuth  $60^\circ$  and  $0^\circ$  de elevaçao.



The BRIR is obtained using the same CIPIC database but simulating room response by MCRoomSim [3] that simulate reflections in the room (reverberation) (Figure 3). Several sources was placed in different positions and response (BRIR) obtained for each situation.



Figure 3 Room and energy reflections representation [1].

#### Results

As expected in preliminary and informal test the perception is different to each subject due the HRIR used was not obtained to the subject because the database HRIR and subject HRIR do not match. Since the anthropometric data was not evaluated until now the head shape of model and subjects may differ. The next step of the work is test the perception using head models specific to the subject.

#### References

[1] M. Vorländer. Auralization Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Springer - Verlag Berlin, 2008.

[2] V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano. The CIPIC HRTF Database. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001. New Paltz, New York.

[3] A. Wabnitz, N. Epain, C. Jin and A. Schaik. Room acoustics simulation for multichannel microphone arrays. Proceedings of the International Symposium on Room Acoustics. ISRA 2010. Melbourne, Australia.

#### Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Engenharia Elétrica Av. Osvaldo Aranha, 103 – 90035-190 Porto Alegre/RS, Brazil Contact: joel.luft@ufrgs.br | http://www.lapsi.eletro.ufrgs.br

Asynchronous VLSI Design: Circuit Templates, Cell Libraries and Synthesis Flows

#### Matheus Trevisan Moreira

### Introduction and Motivation

- Synchronous circuits  $\rightarrow$  Global clock
- Asynchronous circuits → Local handshaking

- Shift to asynchronous / GALS approaches is inevitable  $\rightarrow$  ITRS

- Limited support for asynchronous design

### The ASCEnD-A Flow

- Automatic design of async cell libraries
- Tools from PUCRS, UFRGS and Cadence
- Used in different technologies  $\rightarrow$  180nm, 65nm, 45nm and FD-SOI 28nm



### ASCEnD Libraries

- Large library available in 65nm (921 cells)
  - NCL, NCL+, C-elements and MUTEXes
- Other libraries being designed for FreePDK45 and IBM 130nm
- New cells and optimizations
  - Differential design, DFT and low power



### **Circuit Templates**

- Return-to-One design
- DIMxS, NCL+ and SDDS-NCL
- $\downarrow$ Static power (~2x),  $\downarrow$ Energy (~2x),
- $\uparrow$ Performance (~1.5x),  $\downarrow$ Area (~1.8x)
  - Better design space exploration
- Blade
  - Asynchronous design + resiliency
  - Can reach 1.8x performance improv.
  - Area overhead of ~10%



### **Final Remarks**

- Async. can help solving VLSI problems
  - $\downarrow$ Static power,  $\downarrow$ Energy,  $\uparrow$ Performance
  - Voltage scaling friendly
  - Robustness against PVT variations
- More support for async design required
- Well accepted work
  - 3 journal papers
  - 43 conference papers (35 B1+)

Pontifícia Universidade Católica do Rio Grande do Sul Programa de Pós-Graduação em Ciência da Computação Av. Ipiranga, 6681, B. 32, room 727, Porto Alegre/RS, Brazil Contact: matheus.moreira@acad.pucrs.br Automatic Layout Synthesis using ASTRAN



### Gisell Borges Moura, Adriel Ziesemer Jr., Ricardo Reis

### Introduction



### Methodology

Cells with any sizing and any network of transistors can be used in the layout synthesis. The cell layouts generate by ASTRAN are added to the cell library. The impact of using any transistor network will be evaluated for a set of benchmarks in **power**, **area** and **delay**.



Design of any transistor network using:

#### **Complex Gates**

Contribute to reduce the number of transistors, interconnections and vias.

#### Extra Sizing

The choice of **any size** in addition to those that the library cell offers.

#### Synthesis Flow





Fig.1: Comparison of area for the buffer cell between sizes available by Free Cell Library of FreePDK45 (X1, X2, X4, X8, X16, X32) and extra sizes (X3, X6, X12, X24) generated by ASTRAN.

#### Conclusion

The flexibility of ASTRAN tool makes possible to use any logic and simplification applied in any network of transistors willing reductions in power, area and timing.

Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Microeletrônica Av. Bento Gonçalves, 9500 Porto Alegre, RS - Brazil Contact: {gbmoura, amziesemerj, reis}@inf.ufrgs.br, **A MULTI-STANDARD INTERPOLATION** HARDWARE SOLUTION FOR H.264 AND HEVC

Guilherme Paim, Henrique Maich, Vladimir Afonso, Luciano Agostini, Bruno Zatt, Marcelo Porto

### Introduction

- The previous H.264/AVC standard dominant in remains the current market:
- HEVC provided 39.9% of reduction in the bit rate for the same video quality;
- migration, The however, occurs gradually because H.264/AVC is already present in most devices;
- This work presents a multi-standard fractional interpolator architecture for the H.264/AVC and the HEVC.

### Architecture

- Decomposed input in 4x4 blocks;
- The H.264/AVC 6-Tap and HEVC 8-Tap filter may be factored to share common sub-expressions;
- Architecture (Fig.1 A) is composed by four parallel IP Cores (Fig.1 b);
- The Multi-Standard Filter is adaptable to the desired standard (Fig. 1 c).



### Results

- Described in VHDL:
- Synthesized in the Synopsys DC tool;
- Power analysis with supply at 1V and 50% of switching activity;
- The gate count is calculated based on 2-input NANDS;
- Table I presents the results and the main related works.

| Related             | Liu [1]         | Wang [2]        | Developed       |  |
|---------------------|-----------------|-----------------|-----------------|--|
| Standard            | H.264           | HEVC            | H.264<br>HEVC   |  |
| Technology          | UMC<br>130nm    | TSMC<br>90nm    | TSMC<br>65nm    |  |
| Frequency<br>(MHz)  | 350             | 280             | 482             |  |
| Gates<br>(K)        | 75.74           | 64.7            | 166.8           |  |
| Total Power<br>(mW) | -               | -               | 80.69           |  |
| Max.<br>Throughput  | 2160p<br>@30fps | 4320p<br>@30fps | 4320p<br>@30fps |  |

### Conclusion

- An Multi-Standard filter:
- Unique Multi-Standard interpolator solution for: ✓ MC & FME
  - ✓ HEVC & H.264/AVC
- Optimized critical path;
- High performance: ✓ 4320p@30fps

### References

[1] J. Liu, X. Chen, Y. Fan and X. Zeng, "A full-mode FME VLSI [2] S. Wang, D. Zhou and S. Goto, "Motion compensation QFHD H.264/AVC encoder", 19th VLSI-SoC, 2011.

architecture based on 8x8/4x4 adaptive Hadamard transform for architecture for 8K UHDTV HEVC decoder," IEEE ICME, 2014.





Universidade Federal de Pelotas Programa de Pós-Graduação em Computação Grupo de Arquiteturas e Circuitos Integrados

{gppaim, hdamaich, vafonso, zatt, agostini, porto}@inf.ufpel.edu.br



### Stereo Matching and Sensor Fusion Technique for Image Depth Estimation

Fabio I. Pereira, Altamiro A. Susin



- being tried;
  Kalman filter and particle filter are alternatives for camera position estimation.
- Rough but promising results from predefined image datasets;
- Motion artifacts and camera orientation still challenging in real video.





### Jezz: A Legalization Algorithm Using Linear Cost Function

Julia C. Puget, Guilherme Flach, Marcelo Johann, Ricardo Reis





### PHiCIT – Improving Hierarchical Networks-on-Chip through 3D Silicon Photonics Integration

#### Cezar Reinbrecht, Martha Sepúlveda and Altamiro Susin.

✓ Networks-on-Chips (NoCs) have been proposed as an appropriate solution for supporting the MPSoC communication.
 ✓ PHiCIT follows the concept to arrange different topologies in a hierarchy. The intra-cluster communication, our approach aims to achieve a very high performance through an optical full-connected crossbar. Considering the inter-cluster communication, we aim a low complexity (low area and low power) and flexible architecture. Hence, we use an electrical 2D mesh NoC.

✓ This work also propose a novel implementation strategy for 3D optical NoCs, regarding its floorplanning.

✓ Since our cluster level uses photonic, the IPs can be arranged at any layer from the 3D stack without compromising performance, allowing designer to achieve the best chip area.





# An evaluation of BTI degradation of 32nm standard cells

30

25

20

5

0

(**∧**ш)<sup>|1</sup> ∧⊽ 10 Stress

0.13µm PMOS

T=100°C, V<sub>gs</sub> = -2.5V

Fig. 2. Recovery and stress phases of NBTI [1]

2000

Time (s)

1000

 $T_{ox} = 1.3 nm$ 

Recovery

Stress

Data

3000

Model

4000

#### Rafael B. Schivittz; Cristina Meinhardt; Paulo F. Butzen

**Introduction:** This work presents a tool that estimates the delay degradation due to BTI effect in CMOS logic gates. This information is used to define the more sensible gates under this aging effect.

### BTI (Bias Temperature Instability):

- major aging mechanism in nanometer circuits.
- increase the transistor threshold voltage,
- reduce system operation frequency.
- two phases: recovery and stress phases.

### ADDES – Aging Delay Degradation EStimator



Table I. Input Parameters

| Parameter          | Value     |
|--------------------|-----------|
| Technology         | 32 nm [2] |
| Supply Voltage     | 1 V       |
| Temperature        | 100°      |
| Vth_nominal        | 340 mV    |
| Α                  | 0.002342  |
| п                  | 0.166667  |
| C <sub>NMOS</sub>  | 0.79      |
| C <sub>PMOS</sub>  | 1.08      |
| CR <sub>NMOS</sub> | 0.16      |
| CR <sub>PMOS</sub> | 0.15      |

#### Table II. Gate Delay Degradation output

| Table II. Gale Delay Degradation output |                                     |         |         |         |          |  |  |
|-----------------------------------------|-------------------------------------|---------|---------|---------|----------|--|--|
| Logic                                   | Relative gate delay degradation (%) |         |         |         |          |  |  |
| Gates                                   | 1 year                              | 3 years | 5 years | 7 years | 10 years |  |  |
| AOI21                                   | 11,0                                | 13,2    | 14,4    | 15,2    | 16,2     |  |  |
| AOI22                                   | 11,4                                | 13,7    | 14,9    | 15,8    | 16,7     |  |  |
| AOI211                                  | 11,2                                | 13,5    | 14,7    | 15,5    | 16,5     |  |  |
| AOI221                                  | 11,5                                | 13,8    | 15,1    | 15,9    | 17,0     |  |  |
| INV                                     | 10,2                                | 12,3    | 13,3    | 14,1    | 15,0     |  |  |
| NAND2                                   | 10,8                                | 13,0    | 14,1    | 14,9    | 15,8     |  |  |
| NAND3                                   | 11,3                                | 13,6    | 14,8    | 15,6    | 16,6     |  |  |
| NAND4                                   | 11,8                                | 14,1    | 15,4    | 16,3    | 17,3     |  |  |
| NOR2                                    | 10,7                                | 12,8    | 13,9    | 14,7    | 15,6     |  |  |
| NOR3                                    | 11,0                                | 13,3    | 14,4    | 15,3    | 16,2     |  |  |
| NOR4                                    | 11,4                                | 13,7    | 14,9    | 15,8    | 16,7     |  |  |
| OAI21                                   | 11,1                                | 13,3    | 14,5    | 15,3    | 16,3     |  |  |
| OAI22                                   | 11,5                                | 13,8    | 15,1    | 15,9    | 16,9     |  |  |
| OAI33                                   | 11,9                                | 14,3    | 15,5    | 16,4    | 17,4     |  |  |
| OAI211                                  | 11,5                                | 13,8    | 15,0    | 15,8    | 16,8     |  |  |
| <b>OAI221</b>                           | 11,6                                | 14,0    | 15,2    | 16,1    | 17,1     |  |  |

Fig. 1. User interface

ADDEs tool is implemented in JAVA and to estimate the BTI degradation, it needs:

- circuit description and simulation parameters
- input probability to be 0 or 1



#### The output is shown in Table II.

Fig. 3. Top five most degraded logic gates

### Temporal analysis:

Evaluate the delay degradation progress over the years. The degradation of benchmarks is computed for 1, 3, 5, 7, and 10 years. To calibrate the tool, simulations using *NGSPICE* were adopted.

### **Final Remarks**

The degradation estimative of aging in standard cells makes possible the aging analysis in the early stages of the design flow, which can results in circuits less susceptible to those effects.

[1] Vattikonda, R.; Wang, W.; Cao, Y. "Modeling and Minimization of PMOS NBTI Effect for Robust Nanometer Design". DAC 2006,

[2] W. Zhao, Y. Cao, "New generation of Predictive Technology Model for sub-45nm early design exploration," IEEE Trans. on Electron Devices, 2006.

Grupo de Sistemas Digitais e Embarcados - <u>www.gsde.c3.furg.br</u> Universidade Federal do Rio Grande PPGCOMP / FURG



22

### **Energy-Efficient Architectures for** Sum of Squared Differences Calculation

#### Ismael Seidel, Marcio Monteiro, José Luís Güntzel

 $O_{i,j}$ 

|O| -

8

+16

20

20 SSD

PSSD

Х

8 +

"0000"

20

reset

clock

2. Architectures

Absolute

Standard HDL

multiplication:

20

Logic Conjunction:

loade

LOAL

CAL

enable

Difference:

Selection of pre-

calculated data:

|A - B|

16

DONI

Total: 34 cycles/SSD

 $A^2$ 

Implementation

256:

square operation

 $C_{i,j}$ 

8 +

### 1. Introduction

Motion Estimation is the most complex video coding tool because requires a huge number of similarity calculations [1] during Block Matching Algorithm:



 $\operatorname{SAD}(O, C) = \sum_{i=1}^{M} \sum_{j=1}^{N} |O_{i,j} - C_{i,j}|$ Sum of Squared Differences (SSD):  $SSD(O, C) = \sum_{i=1}^{M} \sum_{i=1}^{N} (O_{i,j} - C_{i,j})^2$ The square operation is the reason why SSD:

-provides **better coding** -is less energy-efficient efficiency than SAD [2] than SAD

Goal: Design energy-efficient SSD architectures to improve coding efficiency;

### 3. Method

- Architecures described in Verilog;
- Synthesized with Synopsys $^{\mathfrak{B}}$ Design Compiler<sup>®</sup> Tool in Topographical Mode;
- Simulated using 10 million blocks from a • 1080p video sample in Synopsys  $^{\textcircled{B}}$  VCS  $^{\textcircled{B}}$

Energy<sub>SSD</sub> = Time<sub>SSD</sub> × Power<sub>SSD</sub> Cycles<sub>SSD</sub> × Period<sub>SSD</sub>

### 5. Conclusions

- Before simulation: Vedic is the best option when using clock gating;
  - Worst: pre-calculated data...
- After simulation: The best option now is the use of precalculated data!
- Compared with SAD (w/o clock gating):
  - 11.58pJ/SSD vs. 6.7pJ/SAD [4];

### 4. Synthesis Results

|                                                                    |               | without clock gating |         |             | with clock gating |         |         |             |       |
|--------------------------------------------------------------------|---------------|----------------------|---------|-------------|-------------------|---------|---------|-------------|-------|
|                                                                    | Architecture* | itecture* Power (mW) |         | Energy (pJ) | Power (mW)        |         |         | Energy (pJ) |       |
|                                                                    |               | Dynamic              | Leakage | Total       |                   | Dynamic | Leakage | Total       |       |
| g                                                                  | standard      | 112.469              | 7.143   | 119.612     | 10.00             | 103.81  | 6.07    | 109.88      | 6.99  |
| ulat                                                               | pre-calc      | 197.094              | 13.247  | 210.341     | 13.37             | 133.94  | 10.93   | 144.87      | 9.21  |
| Ë)                                                                 | conjunction   | 148.344              | 11.148  | 161.492     | 10.27             | 125.21  | 7.79    | 133.00      | 8.46  |
| non-simulated                                                      | vedic [3]     | 142.054              | 8.790   | 150.844     | 12.79             | 47.24   | 3.22    | 50.46       | 3.21  |
| -                                                                  | standard      | 190.667              | 8.467   | 199.134     | 12.66             | 159.362 | 7.252   | 166.614     | 10.59 |
| simulated                                                          | pre-calc      | 168.454              | 13.730  | 182.184     | 11.58             | 142.301 | 9.314   | 151.615     | 9.64  |
| đ                                                                  | conjunction   | 216.568              | 11.785  | 228.353     | 14.52             | 161.540 | 8.763   | 170.303     | 10.83 |
| sir                                                                | vedic [3]     | 219.589              | 11.785  | 231.374     | 14.71             | 172.458 | 8.750   | 181.208     | 11.52 |
| *SSD architectures are named after the used square implementation. |               |                      |         |             |                   |         |         |             |       |

### References

- F. Bossen et al. Hevc complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol., 22(12):1685–1696, 2012.
- [2] G. Sanchez et al. Efficiency evaluation and architecture design of ssd unities for the h.264/avc standard. In Southern Programmable Logic Conference (SPL), pages 171-174, March 2010. [3]
- J.M. Rudagi et al. Design and implementation of efficient multiplier using vedic mathemat-ics. In International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom), pages 162-166, Nov 2011.
- [4] I. Seidel et al. Towards optimal use of pel decimation to trade off quality for energy. Analog Integrated Circuits and Signal Processing, 85(1):107-128, 2015.



Contact: ismaelseidel@inf.ufsc.br; marcio.m@grad.ufsc.br; CNPa j.guntzel@ufsc.br

5<sup>th</sup> IEEE CASS Rio Grande do Sul Workshop – October 22-23 – Porto Alegre, Brazil

UNIVERSIDADE FEDERAL

**DE SANTA CATARINA** 



### SATD Hardware Architecture for HEVC Encoder

Bianca Silveira, Cláudio Diniz, Mateus Fonseca, Eduardo Costa

### Introduction

- The most recent video compression standard is the High Efficient Video Coding (HEVC);

- Sum of Absolute Transformed Differences (SATD) is a metric to estimate the distortion between two video blocks in video encoders;

- This work proposes a hardware architecture for SATD based on 8x8 Hadamard Transform.



The 2-D 8x8 Hadamard Transform is divided into two stages of one dimensional (1-D) Hadamard Transform. The two stages are connected by a set of sequential/parallel registers and multiplexers.



The circuit in Fig.1 represents the operation performed by the horizontal Hadamard algorithm.



**Results** 



### **Conclusions and Future Work**

- This work presented a dedicated architecture for SATD;
- The whole architecture was synthesized to ASIC 45nm by using Cadence environment;



- Results showed that the sequential/parallel registers presented the largest total power consumption;

- As future work we intend to optimize the SATD architecture by exploiting different levels of parallelism in the Hadamard Transform.

Universidade Católica de Pelotas Mestrado em Engenharia Eletrônica e Computação Rua Gonçalves Chaves, 373 – 96015-560 Pelotas/RS, Brazil Contact: bijb77@gmail.com | http://pos.ucpel.tche.br/ppgeec



### Design Method for CML Topology-Based Divideby-2 Circuit with Unbalanced Loads

#### Raphael Ronald Noal Souza, Agord de Matos Pinto Jr.

VCOp

VCOr

LO Ip

LO\_In

ency Div

Transmitter

FD2

#### Introduction

This work describes the design method applied for divideby-2 circuit FD2 design in a frequency synthesizer (Fig. 1) integrated in a RFID protocol-based transceiver. Considering the norm ISO/IEC 18000-4, the system was implemented with CMOS-based XFAB 0.18 µm technology (EDA tool: Cadence Virtuoso Analog Environment) and comprises the indicated technical features: (1) Frequency range: 2.4 GHz to 2.475 GHz; (2) Number of channels: 16; (3) Channel Spacing: 5 MHz; (4) Modulation: OOK.



Transceiver architecture and connection with FD2 (Fig. 2):

Homodyne Receiver RX (Fig. 1 - block diagram top): pair of differential signals for each Mixer. 1

Transmitter TX (Fig. 1 - block diagram bottom): one connection (LO Ip) for driving single-ended Modulator.

Frequency Synthesizer SX (Fig. 1 - block diagram middle): 1 pair of differential signals (LO\_Qp / LO\_Qn) for loop divider on feedback.

#### Design Method

Effects considered in the employed design techniques:

Emulation\_1: input capacitance C<sub>COMP</sub> at LO\_In (diff. signal) to balance LO\_lp (Modulator).

Emulation\_2: capacitive effects from the tracks (block / top level) at FD2 output signals.

Considering design goals (Fig. 4), resulting analog design flow is applied in sequence for each FD2 latch (Fig. 5).









FD2 is composed by 2 latches in master-slave configuration (Fig. 3 (A)). Each latch is designed by applying a Current Mode Logic CML-based topology (Fig. 3 (B)).





Table I: Set of Loads and Impedances Load Capacitances (C = connection FD2 Out MOD (60 fF) LD (110 fF) (10 fF) (10 fF) Receiver LO lp с С LO In С с с LO Qp С LO\_Qn с с Fig 2: General connections diagram from FD2

C<sub>TOTA</sub> (fF)

70

70

120

120

#### Results

Modulate

Fig 6 shows the final FD2 top level layout.

Bf\_I

Bf Qr

 Sub-blocks: Latch 1 (lower left), latch 2 (lower right), C<sub>COMP</sub> (upper left), and external routing lines.

Output waveforms (PLL - channel 16): FD2 Outputs (LO\_lp, LO\_In, LO\_Qp, LO\_Qn).

Table II compares the final area for each sub-block.



Fig 6: FD2 - Final layout representation and output waveforms

#### Table III: Output Amplitude Variation

| Differential Output Voltage Amplitude                                         |        |              |        |     |      |  |  |
|-------------------------------------------------------------------------------|--------|--------------|--------|-----|------|--|--|
| PLL<br>Channel<br>(GHz) Swing<br>LO_lp/In Var 1 Swing<br>LO_lp/In Var 1 Var 2 |        |              |        |     |      |  |  |
| 1<br>(2.4)                                                                    | 319 mV | 4%           | 347 mV | 6%  | 7.8% |  |  |
| 16<br>(2.475)                                                                 | 306 mV | 4%<br>326 mV |        | 0 % | 6%   |  |  |
| <ul> <li>Var 1 (variation %): distinct PLL channels,</li> </ul>               |        |              |        |     |      |  |  |
| same differential pair.                                                       |        |              |        |     |      |  |  |
|                                                                               |        |              |        |     |      |  |  |

Var 2 (variation %): distinct diff. pairs, same PLL channel.



- → Effective solution with:
- Optimized performance for customized solution.
- Residual unbalancing in the • output signals (hard to remove!)
- Unbalanced latches impacting on symmetry of layout structures.

Centro de Tecnologia da Informação Renato Archer CTI

Programa CI-Brasil - Centro de Treinamento 2 (CT2)

Rodovia Dom Pedro I (SP-65), Km 143,6 - Amarais - Campinas, SP Contact: raphael.souza@cti.gov.br

25



### Run-time of the Data Dependency Detector for Harvesting Parallelism for Global Routing

Diego Tumelero, Guilherme Bontorin, Ricardo Reis

### Introduction

Global Router (GR) is a NP-Hard class task, this means that it's an uphill task that tends to get worse. Therefore, it's necessary to perform the Global Routing step more efficiently, using parallel computing techniques.

We verified that 61% of the nets in ISPD 08s benchmarks are shorter than 128 length units. As a result, there are predominantly short size nets and they are sufficiently distant from each other. This demonstrates the potential of net level approach technique to allow the GR processing in massive parallel architectures.

We demonstrate that separating nets in clusters according to the length and processing them in parallel can reduce by 67 times the running time for collision detection, if compared with a sequential non-clustered analysis.

### Net Level Approach for Parallel GR

Each net can be routed probably inside the bounding box region and processed in different threads.



Source: Y. Han et al., "Exploring High-Throughput Computing Paradigm for Global Routing" IEEE T. VLSI Syst., Jan. 2014.



Overhead in absolute values for the three data dependency detector implementations:

No Clustering on a Sequential Collision Detection;

Clustering the Nets by Length and Sequential Collision Detection;

Clustering Nests by Length and Parallel Processing and Collision Detection.

We can see a significant improvement on reducing the search space and using parallelism.

There is a trade-off between the data dependence analysis and the economy of running time in GR.

Sources: G. Nam et al., "The ISPD global routing benchmark suite," in Proceedings of the 2008 International Symposium on Physical Design - ISPD '08, 2008. Tumelero, Diego. Exploração de Paralelismo no Roteamento Global de Circuitos VLSI. Master Thesis - UFRGS 2015.



Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Microeletrônica Instituto de Informática – Campus do Vale Av. Bento Gonçalves, 9500 – 91501-970 Porto Alegre/RS, Brazil Contact: dtumelero@inf.ufrgs.br 5<sup>th</sup> IEEE CASS Rio Grande do Sul Workshop October 22-23, 2015 Porto Alegre, Brazil

### Poster Session 2: Undergraduate Track

# High Throughput SAD Architecture for Quality HEVC Encoding

#### Brunno A. Abreu, Mateus Grellert, Sergio Bampi

### Introduction

- The HEVC standard demands a large computing effort
- Motion Estimation is the most time-consuming step due to extensive computations, like the Sum Of Absolute Differences (SAD)
- Solutions typically require SIMD and dedicated hardware architectures
- SAD Architecture proposed based on trees of adders

#### Method

- Hardware description using VHDL, with ISE Design Suite and ISim
- Python scripts implemented to generate random valid inputs and outputs
- Based on latency results, we decided the best pipeline configuration, achieving 8 stages as the best result



## References

[1] B. Bross, W. J. Han, J. R. Ohm, G. J. Sullivan, T. Wiegand, "High Efficiency Video Coding (HEVC) text specification draft 7", 2012.

[2] B. Abreu, M. Grellert, S. Bampi. "High Throughput SAD Architecture for Quality HEVC Encoding". *30° Simpósio Sul de Microeletrônica*, 2015.

[3] X. Yuan, L. Jinsong, G. Liwei, Z. Zhi and R. Teng, "A high performance VLSI architecture for integer motion estimation in HEVC", *IEEE 10th International Conference on ASIC (ASICON)*, 2013.

[4] P. Nalluri, L. N. Alves, A. Navarro, "A novel SAD architecture for variable block size motion estimation in HEVC video coding", *IEEE International Symposium on System on Chip* (SoC), 2013.



#### Results

 Architecture was designed using mediumsized 4x4 pixels SAD blocks



• Balance between input bandwidth, frequency, hardware area and throughput

|                         | 45nm<br>Virtex-6 | 65nm<br>Virtex-5 | 40nm<br>Virtex-6<br><b>[3]</b> | 65nm<br>Virtex-5<br><b>[4]</b> |
|-------------------------|------------------|------------------|--------------------------------|--------------------------------|
| Max. Freq.<br>(MHz)     | 511.7            | 416.67           | 110                            | 171.9                          |
| #Registers              | 2440             | 2484             | 19744                          | 20736                          |
| #LUTs                   | 2271             | 2215             | 55346                          | 15453                          |
| Throughput<br>(@4K UHD) | 159              | 128              | 109739                         | 5310                           |
| Bandwidth               | 1024b            | 1024b            | 4096b                          | 4096b                          |
| BD-Rate<br>Penalty      | NO               | NO               | YES<br>(7.3%)                  | YES<br>(6.8%)                  |

### Conclusions

- Capability of achieving real-time UHD 4K encoding, even at 120 FPS
- Published in 30° Simpósio Sul de Microeletrônica
- Main future goal is to describe an architecture for TZ Search (part of the Motion Estimation)

Universidade Federal do Rio Grande do Sul Instituto de Informática Caixa Postal 15064 | 91501-970 Porto Alegre - RS - Brasil Contact: baabreu@inf.ufrgs.br

# A tool for Fault Insertion Simulation in CMOS Circuits

#### Ygor Q. de Aguiar<sup>1</sup>, Alexandra L. Zimpeck<sup>2</sup> and Cristina Meinhardt<sup>1</sup>

<sup>1</sup>Universidade Federal do Rio Grande – FURG - C3 <sup>2</sup>Universidade Federal do Rio Grande do Sul – UFRGS - PPGC

In the nanoscale technology context, it is noted a considerable increase in the faults occurrence, such as Stuck-Open, Stuck-On, and Single Event Transient. As a result, tools that support the integrated circuit design and the identification of its robustness for faults are indispensable these days. This work presents a tool that evaluates the behavior of CMOS circuits under the faults aforementioned and calculates the fault coverage for each circuit.

#### **FAULT INJECTION**



**TOOL DEVELOPMENT** 

Independently of the signal applied at the gate terminal.

Pulse can be captured by a memory element



#### CONCLUSIONS

This tool evaluate the circuit behavior under faults and determine the robustness of the evaluated circuits. In this way, this software can be also used to help identifying the most adequate fault tolerance techniques applied to logic gates.





# Evaluation of different SRAM 1 bit cell topologies in 32nm technology

#### ALMEIDA, Roberto; BUTZEN, Paulo F.; MEINHARDT, Cristina

#### Introduction

Computing systems need to process and store data and instructions, generally they are stored in the cache memory. A good alternative to implement a fast cache memory is the SRAM (Static Random Access Memory) technology.

This work evaluates different topologies of 1 bit SRAM cells. Results show the power consumption and write delay observed to each topology.

### Methodology

- 6 most often used topologies [1-5]
- NGSpice Simulator [6] •
- LP(Low Power) 32nm PTM . and HP(High Performance) [7]
- For all devices: L = 32nm, W<sub>pmos</sub>= 200nm, W<sub>nmos</sub>= 100nm.
- Observing: Write Delay and Power Consumption P =  $\frac{\int_{t}^{t} i dt}{dt} * V$







4T Loadless [2]











Universidade Federal do Rio Grande – FURG Brasil, Rio Grande do Sul, Rio Grande Campus Carreiros: Av. Itália km 8 Bairro Carreiros - Fone (53)3233.6500





## Conclusions

The results show that 4T LL cell has good performance with low power consumption. Future works will include the evaluation of a complete SRAM architecture and the observation of more parameters as read delay and SNM (static noise margin).

#### References

[1] Weste, N. H. E.; Harris, D. M. (2011). Cmos VIsi Design A Circuits And Systems Perspective (4ª Ed.), Editora Pearson,

Ed.). Editora Pearson.
[2] Sandeep R, Narayan T Deshpande, and A R Aswatha, "Designand Analysis of a New Loadless 4T SRAM Cell in Deep Submicron CMOS Technologies", Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09.
[3] L. Chang et al., "Stable SRAM Cell Design for the 32 nm Node and Beyond," Proc. Symp. VLSI Tech., IEEE Press, 2005, pp. 128 129.
[4] B.H. Calhoun and A.P. Chandrakasan, "A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation," IEEE J. Solid-State Circuits, vol. 42, no. 3, 2007, pp. 680-688.
[5] I.J. Chang et al., "A 32 kb 10T Sub-threshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90 nm CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 2, 2009, pp. 650-658.
[6] NGSpice. Available at: http://ngspice.sourceforge.net/
[7] ZHAO, W.: CAO, Y. New generation of Predictive Technology Model for sub-45nm early design

[7] ZHAO, W.; CAO, Y. New generation of Predictive Technology Model for sub-45nm early design exploration. IEEE Trans. on Electron Devices, vol. 53, no. 11, pp. 2816-2823, Nov. 2006.



Grupo de Sistemas Digitais e Embarcados - GSDE www.gsde.furg.br

# Low Latency Izhikevich's Simple Neuron Model on FPGA

#### Vitor Bandeira, Vivianne Costa, Guilherme Bontorin, and Ricardo Reis

#### Abstract

The Izhikevich Simple Model (ISM) for neural activity presents a good compromise between waveform quality and computational cost. FPGAs (Field-Programmable Gate Array) are powerful, flexible, and inexpensive digital hardware that can implement such a model. We present an implementation on FPGA of the ISM whose latency is up to 56 times smaller than the ones in the literature.



This data was obtained from the FPGA running our implementation through the SignalTap II tool in Quartus II® Software.

[1] E. M. Izhikevich, "Simple model of spiking neurons," IEEE, vol. 14, pp. 1569-1572, 2003

[1] E. M. Izhikevich, "Simple model of spiking neurons," IEEE, vol. 14, pp. 1569–1572, 2003.
 [2] A. Cassidy and A. Andreou, "Dynamical digital silicon neurons," in Biomedical Circuits and Systems Conference, 2008. BioCAS 2008. IEEE, Nov 2008, pp. 289–292.
 [3] M.Ambroise, T. Levi, Y.Bornat, and S. Saighi, "Biorealistic Spiking Neural Network on FPGA," in Information Sciences and Systems (CISS), 2013 47th Annual Conference on, March 2013, pp. 1–6.
 [4] A. Cassidy, S. Denham, P. Kanold, and A. Andreou, "FPGA Based Silicon Spiking Neural Array," in Biomedical Circuits and Systems Conference, 2007. BIOCAS 2007. IEEE, Nov 2007, pp. 75–78.
 [5] A. Cassidy and A. Andreou, "Dynamical digital silicon neurons," in Biomedical Circuits and Systems Conference, 2008. BioCAS 2008. IEEE, Nov 2008, pp. 289–292.



#### **Comparison with the Literature**



#### Conclusions

Our implementation is best suited for hybrid networks systems and presents a fair performance for artificial-only networks. The low latency of the circuit will allow us to reuse the same neuron multiple times.

Spining heural networks for characterized processing in the computing and in Ocs. 2009. ReconFig '09. International Conference on, Dec 2009, pp. 451–456.
[8] K. Cheung, S. Schultz, and P. Leong, "A parallel spiking neural network simulator," in *Field-Programmable Technology, 2009. FPT 2009. International Conference on*, Dec 2009, pp. 247–254.
[9] D. B. Thomas and W. Luk, "Fpga accelerated simulation of biologically plausible spiking neural networks." in *FCCM*, K. L. Pocek and D. A. Buell, Eds. IEEE Computer Society, 2009, pp. 45–52.

#### Universidade Federal do Rio Grande do Sul Instituto de Informática

Av. Bento Gonçalves, 9500 - Campus do Vale. Bloco IV CP15064, 91501-970- Porto Alegre-Brazil

Contact: {vvbandeira,reis}@inf.ufrgs.br

<sup>[6]</sup> M. Ambroise, T. Levi, Y. Bornat, and S. Saighi, "Biorealistic spiking neural network on fpga," in formation Sciences and Systems (CISS), 2013 47th Annual Conference on, March 2013, pp. [7] K. L. Rice, M. Bhuiyan, T. Taha, C. N. Vutsinas, and M. Smith, "Fpga implementation of izhikevich spiking neural networks for character recognition," in *Reconfigurable Computing and FPGAs*, 2009.



# Integration of the uCLinux on the TVD-SoC Architecture for the Brazilian Digital TV

#### Ana Luiza P. Brod, Cezar R. Reinbrecht, Altamiro A. Susin

#### Introduction

▶ Brazilian Digital TV Set-top Boxes needs an Operating System (OS) compatible with Ginga (middleware) and capable to manage and handle all Smart TV features.

>This work presents the integration of a Linux distribution (ucLinux) in the TVD-SoC Architecture, with the processor Leon3.

> The system provides all the utilities needed by the prototype, for example, peripherals drivers such as remote control, mouse and keyboard. **System Architecture** 



#### Figure 1. Linux Cross-compilation Workflow: The blocks necessary to achieve the linux boot image.

| 🗄 Step 2: package configuration                    | -OARM (big endian)                |
|----------------------------------------------------|-----------------------------------|
| Predefined configurations                          | -OAArch64                         |
| Toolchain configuration                            | -OAVR32                           |
| Buildroot Configuration                            | -OBlackfin                        |
| Buildroot configuration                            | -Oi386                            |
| Build options                                      | -OMicroblaze AXI (little endian)  |
| Toolchain                                          | -OMicroblaze non-AXI (big endian) |
| - System configuration                             | -OMIPS (big endian)               |
| Package Selection for the target                   | -OMIPS (little endian)            |
| - Host utilities                                   | -OMIPS64 (big endian)             |
| Filesystem images                                  | -OMIPS64 (little endian)          |
| Bootloaders                                        | - OPowerPC                        |
| - Kernel                                           | - OSuperH                         |
| Legacy config options                              | -OSuperH64                        |
| - Linux Configuration                              | -    SPARC                        |
| E-Linux configuration                              | -Ox86_64                          |
| 🗈 General setup                                    | OXtensa                           |
| <ul> <li>Enable loadable module support</li> </ul> | ∃-Target Architecture Variant     |
| Enable the block layer                             | -Ov8                              |
|                                                    |                                   |

#### Figure 2. Cross-compilation Setup Menu





Figure 4. TVD-SoC Architecture

#### Linux Requirements

>Access main peripherals:

- ► I2C
- External Memory
- > IR
- ➤ Ethernet

Develop custom drivers:

- GPIO Tuner Configuration
- Decoder Configuration
- Integration with the Set-top Box Interface Software.

#### **Experiments**

- i2C Control FPGA Fan
- External Memory Read/Write a file
- IR Read remote control signals
- Ethernet Initiates a TCP/IP connection
- GPIO Tuner Change channels
- Decoder Configuration Starts a process

### Conclusions

- $\succ$  The cross-compilation is challenging due to its various configuration parameters.
- $\succ$  The feature's requirements can be prohibitive if there is a memory size constraint.

The I2C communication will allow controlling the Tuner/Demodulator and the communication between the board components.

Future works consist in developing all features required to achieve a full connected device, obtaining the IoT concept.



Universidade Federal do Rio Grande do Sul Av. Osvaldo Aranha, 103 - 90035-190 Porto Alegre/RS, Brazil Contact: analuiza@brod.com.br

# An Optimization-Based Design Methodology for Fully Differential Amplifiers

Arthur Oliveira, Lucas Severo, Paulo Aguirre and Alessandro Girardi

#### Introduction

Due to better linearity and high common-mode rejection, fully differential amplifiers are used in applications that require high performance, such as analog-to-digital converters and active filters. Since this kind of amplifier is widely employed, it is desirable to have a reduced design time and an optimized solution. A CAD tool, called UCAF, was developed for the automatic synthesis of analog building blocks. A methodology for the automatic design of fully differential amplifier using a no capacitor feed-forward compensation scheme (NCFF) in 130nm CMOS technology is presented. To avoid slow-settling components, inserted by the compensation scheme, a pole-zero matching constraint is proposed.

#### Automatic Sizing Tool

This work uses Simulated Annealing (SA) as the optimization heuristic



#### **Evaluation**



#### Methodology



Federal University of Pampa, Alegrete-RS Computer Architecture and Microelectronics Group - GAMA Contact: arthuroliveira@alunos.unipampa.edu.br

#### Two-Stage NCFF Fully Differential Amplifier



#### **Pole-Zero Pair Matching Function**



Frequency (Hz)

A pole-zero pair matching  $(PZ_m)$  constraint is implemented in order to minimize the degradation of the settling-time:

$$PZ_m = \min\left(\frac{\partial A_{v0}}{\partial f}\right)$$

P

The mismatch is caracterized if  $PZ_m$  is larger than the ideal decay ratio of -20 dB/dec. The difference between the obtained and the ideal decay ratio define a constraint of how much acceptable this mismatch is.

#### Results

Comparisson between the results with and without the pole-zero matching constraint

| Specification      | Required   | Without $PZ_m$ | With $PZ_m$ |
|--------------------|------------|----------------|-------------|
| $A_{v0}$ (dB)      | $\geq 50$  | 64.8           | 50.5        |
| GBW (MHz)          | $\geq 256$ | 461.8          | 256.0       |
| PM (°)             | $\geq 50$  | 89.0           | 86.8        |
| $P_{diss} (\mu W)$ | Minimize   | 327.4          | 227.1       |
| $CM_e \ (mV)$      | $\leq 5$   | 1.5            | 3.6         |

#### Conclusion

An optimization-based design methodology for fully differential amplifiers was presented. The methodology consists in the partition of main amplifier and CMFB, where both circuits are designed independently. A two-stage fully differential amplifier using a feedforward compensation scheme with no capacitor was designed using the proposed methodology. Also, a pole-zero matching constraint is proposed in order to avoid unnecessary slow-settling components inserted by the compensation scheme. Simulation results show that the obtained solution can satisfy a set of high-performance constraints. In addition, the proposed pole-zero matching constraint reduces the power consumption while satisfying all the imposed constraints.



5th IEEE CASS Rio Grande do Sul Workshop - October 22-23, 2015 - Porto Alegre, Brazil

# Development of a DSP module in VHDL with use of SIS/SIL techniques

Bruna F. Flesch, Msc. Rodrigo M. Figueiredo, Msc. Lúcio R. Prade, Postdoc. Márcio R. da Silva, Bianca Brand

# Introduction

The aim of this proposal is to aggregate fault-tolerance for SEUs in a configurable DSP module in VHDL designed for a Spartan 3E FPGA. It executes basic operations of integers with up to 18 bits (due to primitives of the target used) by applying architecture 1003 described in [1]. Therefore, Safety Instrumented Functions (SIFs) are inserted in most of sequential and logical elements of the circuit.

Similar approaches are presented in [2] and [3] in which TMR is presented as a suitable option to avoid SEUs.

# SIS/SIL techniques implemented

## TMR architecture for throughput logic



#### **3-bit selector Multiplexer**



#### Sources:

[1] International Electrotechnical Comission (IEC), "Functional safety of electrical/electronic/ programmable electronic safety-related systems – part 6: guidelines on the application of IEC 61508-2 and IEC 61508-3 (IEC 61508), Geneva, 2010.

 [2] F.L. Kastensmidt, et al., "On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs," *IEEE Design, Automation and Test in Europe*, 1290-1295, 2005.
 [3] F. G. L. Kastensmidt, et al., "Designing fault-tolerant techniques for SRAM-based FPGAs", IEEE Design & Test of Computers, 552-562, 2004.



## Results



Data obtained by using ISE and ISIM Simulators from Xilinx.

# Conclusion

- Significant reduction in the number of errors;
- Area increase expressiveness will be dependent on the target used;
- Most of Single Event Upsets (SEUs) were mitigated.

# **Future work**

Study of SIS/SIL architectures applied in memory elements design in VHDL.

#### Universidade do Vale do Rio dos Sinos

Laboratório de Prototipação Digital e Sistemas Embarcados Av. Unisinos, 950– 93022-000 São Leopoldo/RS, Brazil

Contact:brunaf.flesch@gmail.com | marquesf@unisinos.br | luciorp@unisinos.br | marcior@unisinos.br | bianca sb@hotmail.com



# Generating a Multiple Program Transport Stream for SBTVD

Jefferson Johner, Cezar Reinbrecht, Altamiro A. Susin



### **Ffmpeg Framework**

>Ffmpeg is a set of tools responsible to decode, encode, transcode, multiplex, demultiplex, streamcast, and execute almost all available types of multimedia.

>It is an open source managed by an organization.

≻There is no feature regarding multi-programming TS generation according to the Brazilian standard ISDB-TB.

>The library *libavformat* responsible for multiplexing input data is the target.

> All tables present in the Brazilian standard will be included in the Transport Stream file been generated.



#### **Conclusion and Future Works**

Transport Stream (TS) (Creating an <u>TS containing one program</u> from some TS packets)



Figure 2: Actual feature available in Ffmpeg. The software is capable of generating Single Program Transport Streams.



Figure 3: Multiple Program Transport Stream Structure. This feature is currently being implemented in Ffmpeg, aiming to create MPTS Compatible with the SBTVD ("Sistema Brasileiro de Televisão Digital") standard.

Ffmpeg framework provides a great variety of transcoding processes. However, Brazilian standard is not fully supported.
 Source code structure very complex, which requires a deep understanding of multimedia processes and programming skills.
 Contributions of this work will be available to dev. community, as open source files, inside Ffmpeg Project.

- Contributions of this work will be available to dev. community, as open source
- Future works aims to explore live streaming feature with ffmpeg.
   Integrate all Brazilian standard tables inside our framework patch.





# Integration of ISDB-T NIM Tuner on TVD-SoC for Brazilian Digital TV Set-top Boxes

#### Paulo G. Kipper, Cezar R. Reinbrecht, Altamiro A. Susin

### Introduction

> TVD-SoC is a system responsible for implementing the main functionalities of DTV.

 It is composed by the Leon-3 SoC with a DDR memory and Video and Audio Decoders.
 To be complete, TVD-SoC needs to be interfaced with a NIM (Network Interface Module) Tuner device, in order to receive data stream from Digital TV transmissions

# TVD-SoC



# Integration Architecture



# **Experimental Results**

|                           | ListaCanais[L] Sintonizar[S] Dados[D]               |
|---------------------------|-----------------------------------------------------|
| Sintonizando canal 34 com |                                                     |
| Canal 34 sintonizado      | u dalah na≖ngan kovada venokadorna                  |
| / AutoProgramacao[P]      | /////TUNER_MENU//////////////////////////////////// |
| QUALIDADE DO SINAL:31.595 | 316                                                 |
| CODE RATE:1               |                                                     |
| MODO DEMODULADOR:1        |                                                     |
| hab error:1               |                                                     |
| pre_BER:0.302734          |                                                     |
| post_BER:0.006678         |                                                     |

Figure 3. Software Configuration of NIM Tuner



Figure 4. Raw data of NIM Tuner



Figure 5. Sync Interface Behavior



Figure 6. Demultiplexer Detecting Video Packet

# Conclusions

> The integration of external devices to a project involves not only their protocols but also full understanding of their functionalities

The I2C protocol, although very scattered have its own complications and dificulties implementing in hardware and software



Universidade Federal do Rio Grande do Sul Av. Osvaldo Aranha, 103 – 90035-190 Porto Alegre/RS, Brazil Contact: paulo.kipper@ufrgs.br

# Adjusting Video Tiling to Available Resources in a Per-frame Basis in HEVC

Giovani Malossi, Daniel Palomino, Cláudio Diniz, Sergio Bampi and Altamiro Susin

#### Introduction

- Increasing resolutions lead to more computational effort to compress
- Using parallelism is a good option because multi-cores are everywhere
- However, data dependencies limit the speedup and breaking contexts result in compression efficiency losses

# Challenge

- The number of cores available to the encoder software may vary over time
- This situation is not addressed by default and causes problems:
- 1. Idle resources or
- 2. Excessive tiling causing coding efficiency loss without good speedup



#### Method

- We propose to adjust tiling according to the number of available cores at the start of each frame – dynamic tiling (DT)
- We tested the method using three synthetic availability situations
- Speedup is maintained when cores are available, less coding efficiency is lost when not



# BD BQT C K PS NF POS

#### Conclusions

- Our method achieves its goal:
- 1. reduce compression efficiency loss when few cores are available
- 2. sustain speedup from parallelization

#### Universidade Federal do Rio Grande do Sul Instituto de Informática

Av. Bento Gonçalves, 9500 CEP 91509-900 Porto Alegre/RS, Brazil Contact: gmmalossi@inf.ufrgs.br



#### Ana Mativi, Eduarda Monteiro and Sergio Bampi

#### Introduction

#### **HEVC Encoder:**

- Requires 40%-70% higher computation effort and >2x more memory accesses when compared to H.264 [1]
- Accesses to main memory have great impact on energy comsumption
- Strongly relies on the cache hierarchy to enhance overall performance

#### Methodology

- Python script runs the tools, parses and refines results
- Callgrind tool [2] provides a summary of HEVC's memory behavior (on HM 16.2 [3])
- Cacti tool provides the cost of read/write in a given cache configuration



 Latency Estimation is modeled to reduce the cache memory set

 $Latency = (L1_{hits} \times L1_{lat}) + (LL_{hits} \times LL_{lat}) + (LL_{misses} \times RAM_{lat})$ 

#### **Conclusions and future work**

- The best cache shows positive results reduced latency - for this video application
- L1 hits are up to 95%
- LL global misses are less than 0.0012%
- All HEVC Encoder modules have more than 70% reads
- The proposed methodology provides new ways to analyse the encoder's features and could be used for any other application
- Next step will be changing the coding parameters to analyse the impact on the memory hierarchy



#### Results

 Generated results for HEVC encoder on 54 different cache configurations



 Used the best cache (L1 8K-4, LL 8MB-2) to generate detailed HEVC results (8 frames class D video, QP 32)





#### References

[1] Muhammad Shafique, Jörg Henkel. *Low Power Design of the Next-Generation High Efficiency Video Coding.* ASP-DAC, pages 274-281, 2014.

[2] Nicholas Nethercote and Julian Seward. *Valgrind: a framework for heavyweight dynamic binary instrumentation.* PLDI, pages 89–100, 2007.

[3] *HM16.2*, High Efficiency Video Coding Test Model (HM) Encoder, Strasbourg, 2014.

Instituto de Informática Universidade Federal do Rio Grande do Sul Caixa Postal 15064 | 91501-970 Porto Alegre - RS - Brasil Contact: ana.mativi@inf.ufrgs.br | inf.ufrgs.br/~acmsouza

38



# A Reconfigurable Operational Amplifier in 180nm CMOS Technology

Mateus C. S. Oliveira, Paulo César C. de Aguirre, Lucas C. Severo, Alessandro Girardi

### Introduction

- The fast evolution in communication standards required the use of flexible devices.
- To fulfill such demand, it is convenient the use of reconfigurable blocks in these devices, considering power efficiency and low cost.
- This way, a reconfiguration strategy for operational amplifiers is presented here using folded-cascode topology in 180nm CMOS technology.

# Methodology

- The reconfiguration strategy is based on relation between transconductance and gain-bandwidth product (GBW), given by:

$$GBW = \frac{Ng_m}{2\pi C_I}$$

- Where

- N is the number of amplifiers in parallel;
- $g_{m0}$  is the transconductance of transistors in differential pair;
- $C_L$  is the capacitive load.
- Amplifiers in parallel can be activated through switches controlled by digital inputs.

The strategy:



The N-Folded Cascode Amplifier:



## Results

| VDD=         | =1.8V      | SCHEMATIC |              |        |             | LAYOUT  |              |        |             |
|--------------|------------|-----------|--------------|--------|-------------|---------|--------------|--------|-------------|
| Control bits | Amps<br>On | Av (dB)   | GBW<br>(MHz) | PM (°) | PWR<br>(mW) | Av (dB) | GBW<br>(MHz) | PM (°) | PWR<br>(mW) |
| 00           | 1          | 57.62     | 45.57        | 82.31  | 1.0254      | 45.93   | 40.47        | 84.34  | 1.0189      |
| 01           | 2          | 57.62     | 87.43        | 71.81  | 1.9314      | 33.99   | 73.19        | 74.65  | 1.8986      |
| 10           | 3          | 57.62     | 127.24       | 62.26  | 2.8368      | 37.03   | 104.44       | 65.82  | 2.7132      |
| 11           | 4          | 57.62     | 161.26       | 54.43  | 3.7434      | 34.13   | 127.24       | 60.76  | 3.5491      |

Frequency response:





The layout:



Dimensions: 170.0µm x 152.2µm

#### Conclusions

- The designed reconfigurable amplifier can be used in Software Defined Radios (SDR) and multistandard communication devices, to achieve high flexibility and power efficiency.
- Optimization in layout is needed to reduce parasitcs effects.
- The frequency bands should be chosen in future works according specifics communication standards.

#### Universidade Federal do Pampa

**Grupo de Arquitetura de Computadores e Microeletrônica** Av. Tiarajú, 810 – 97546-550 Alegrete/RS, Brazil Contact: mateusoliveira@alunos.unipampa.edu.br



# An Educational Tool for VLSI Global Placement

😣 🔵 🛛 EduPlace

Gabriel Soares Porto, Cristina Meinhardt, Paulo Francisco Butzen



Universidade Federal do Rio Grande – FURG Grupo de Sistemas Digitais e Embarcados – GSDE www.gsde.furg.br Contact: gabrielporto@furg.br



**Introduction** The short time-to-market for integrated circuits makes the use of EDA (Electronic Design Automation) tools fundamental. The development of EDA tools aiming at education purpose is essential in order to keep up the pace of this technological trend in the semiconductor industry. The project goal is a support for study and a entry door for the microelectronic area and EDA tools development.

#### Method

This tool is developed in JAVA.

It is integrated with **Uplace** [4], a software developed by UFRGS, to visualize the circuit; **PlaceUtils** [5] to make the legalization step.

EduPlace implements two algorithms for Global Placement:

Analytical Quadratic Placement Model [1], a new answer file is generate every step enabling comparisons, visualizing the parameters impact ;

**Simulated Annealing** [2], its possible to run in a step by step mode, visualizing the swaps made and taken actions by the algorithm;

**Extra:** an **ISCAS 85** [3] to **BookShelf** parser ISCAS 85 was chose because the smaller circuit size than BookShelf, its easier to visualize.

# Conclusion

| Iscas to BookShelf Parser               | Posicionamento Quadrátic                                                                                                                              | o Simulated Annea                                                                               | aling                                                                                                            |
|-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| Posicionamento Analític                 | o Quadrático                                                                                                                                          |                                                                                                 |                                                                                                                  |
| Abrir arquivo:                          |                                                                                                                                                       |                                                                                                 |                                                                                                                  |
| HPWL: 0.0                               |                                                                                                                                                       |                                                                                                 |                                                                                                                  |
|                                         | Ru                                                                                                                                                    | n HPWL Run QP                                                                                   |                                                                                                                  |
| Forças de Espalhamento                  | 0                                                                                                                                                     |                                                                                                 |                                                                                                                  |
| Forma Aleatória:                        | Uma força:                                                                                                                                            |                                                                                                 |                                                                                                                  |
| Qnt:                                    | Área: 0 🔽                                                                                                                                             |                                                                                                 |                                                                                                                  |
| Peso:                                   | Peso:                                                                                                                                                 |                                                                                                 | reRun                                                                                                            |
| Add                                     | Add                                                                                                                                                   |                                                                                                 |                                                                                                                  |
|                                         |                                                                                                                                                       |                                                                                                 |                                                                                                                  |
|                                         |                                                                                                                                                       | UPlace                                                                                          | Sair                                                                                                             |
| 🛢 😑 UPlace - Universidade Federal do Ri | io Grande do Sul (UFRGS)                                                                                                                              |                                                                                                 | Q Search                                                                                                         |
|                                         |                                                                                                                                                       |                                                                                                 | Properties                                                                                                       |
|                                         |                                                                                                                                                       |                                                                                                 |                                                                                                                  |
|                                         |                                                                                                                                                       |                                                                                                 | Circuit<br>Name c3540                                                                                            |
|                                         |                                                                                                                                                       | 1000                                                                                            | Name c3540<br>Width 317,00                                                                                       |
|                                         |                                                                                                                                                       |                                                                                                 | Name c3540<br>Width 317,00<br>Height 320,00                                                                      |
|                                         |                                                                                                                                                       |                                                                                                 | Name c3540<br>Width 317,00                                                                                       |
|                                         | C C Eduplace                                                                                                                                          |                                                                                                 | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99                                                      |
|                                         | Iscas to BookShelf Parser                                                                                                                             | Pesicionamento Quadrático Sin                                                                   | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99                                                      |
|                                         | Iscas to BookShelf Parser                                                                                                                             | 7-2015/iscas85/c3540/c3540.aux                                                                  | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99                                                      |
|                                         | Iscas to BookShelf Parser<br>Abrir Arquivo: Trojects-06-0<br>HPWL: 0.0                                                                                |                                                                                                 | Name         c3540           Width         317,00           Height         320,00           Density         0,99 |
|                                         | Iscas to BookShelf Parser Abrir Arquive: 'rojects-06-0 HPWL 0.0 Parâmetros                                                                            | 7-2015/iscas85/c3540/c3540.aux                                                                  | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99                                                      |
|                                         | Lisca to BookShelf Parser<br>Abrir Arquivo: Trojects.08.0<br>HPWL: 0.0<br>Paràmetros<br>Temperatura:                                                  | 7-2015/Iscas85/c3540/c3540.aux<br>Run HPWL<br>1000 Incerteza:                                   | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99<br>Mulated Annealing                                 |
|                                         | Liscas to BookShelf Parser<br>Abrir Arquivo: Trojects-08-0<br>HPML: 0.0<br>Parâmetros<br>Temperatura:<br>Tava de Resframento:                         |                                                                                                 | Name c3540<br>Width 317,00<br>Height 320,09<br>Density 0,99<br>Mulated Annealing<br>                             |
|                                         | Liscas to BookShelf Parser<br>Abrir Arquivo: Trojects-08-0<br>HPML: 0.0<br>Parâmetros<br>Temperatura:<br>Taxa de Resframento:<br>Número de Iterações: | -7-2015/nsca959/c3540/c3540 aux     Run HPWL     1000 Incerteza:     0.5 Intervalo de Le     20 | Name c3540<br>Width 317,00<br>Height 327,00<br>Bensity 0,99<br>Mulated Annealing<br>                             |
|                                         | Liscas to BookShelf Parser<br>Abrir Arquivo: Trojects-08-0<br>HPML: 0.0<br>Parâmetros<br>Temperatura:<br>Tava de Resframento:                         |                                                                                                 | Name c3540<br>Width 317,00<br>Height 327,00<br>Bensity 0,99<br>Mulated Annealing<br>                             |
|                                         | Liscas to BookShelf Parser<br>Abrir Arquivo: Trojects-08-0<br>HPML: 0.0<br>Parâmetros<br>Temperatura:<br>Taxa de Resframento:<br>Número de Iterações: | -7-2015/nsca959/c3540/c3540 aux     Run HPWL     1000 Incerteza:     0.5 Intervalo de Le     20 | Name c3540<br>Width 317,00<br>Height 320,00<br>Density 0,99<br>Mulated Annealing                                 |

WL: 4,08320e+05; StWL: 0,00000e+00 ABU Penalty: 0,00

The tool meets some requirements by beginner users, like the preview of the algorithm steps and their functionality, becoming a support for the study.

Its a ongoing project, more features will be added in the final version.

### **References:**

[1] Brenner, U.,Vygen, J.: Analytical methods in VLSI placement. In: Handbook of Algorithms for VLSI Physical Design Automation, 2009.

[2]Rutenbar, R. Simulated Annealing Algorithms: an overview. Circuits and Devices Magazine. IEEE 1989.

[3] ISCAS85 Combinational Benchmark Circuits. https://filebox.ece.vt.edu/~mhsiao/iscas85.html.

[4] Flach, G. A. ;et.al. UPlace: A Graphics User Interface-Enabled Placement Tool. DAC, 2014.

[5] Executable Placement Utilities. http://vlsicad.eecs.umich.edu/BK/PlaceUtils/.



Author: Pedro Ochsendorf Portugal

Advisor Altamiro Amadeu Susin

#### Introduction

The TVD-SoC Architecture is platform that requires an interface software to enable users to interact with it. Such a program should identify the actions of the user, translate them to the system so it can generate the appropriate response. Finally, it communicates back to the user that the changes were successful.

# **TVD-SoC** Archicture



### **Functional Requirements**

- Software developed in C.
- Compatibility with custom peripherals
- >Internal systick based on interruption
- Graphical user interface containing :
- ≻Menu system
- Remote control integration.
- Ilustrative icons
- Virtual keyboard

# **Class Diagram**

➤ The following diagram ilustrates what the project is aiming toward. The current version has some of its functions implemented, but not yet apropriatly organized and standardized. The graphical functions have had the most progress.





# **Current Experimental Results:**



# **Conclusion and Future Work**

- 1. Conclusions:
- The significant number of features the menu system contains requires a high level of organization.
- The standardization of functions simplifies the overall project greatly
- 2. Future work:
- Integration with the peripherals
- Integration with embedded Linux

Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Engenharia Elétrica Av. Osvaldo Aranha, 103 – 90035-190 Porto Alegre/RS, Brazil Contact: pedro.portugal@ufrgs.br | http://lapsi.eletro.ufrgs.br/



#### ULLOA, Giane; MEINHARDT, Cristina;

ABSTRACT: The aim of this paper is to study the electrical characteristics of Bulk CMOS and FinFET devices and make a comparison with the results.

#### MOTIVATION

Bulk CMOS technology is the most widely used in the manufacture of transistors. However, miniaturization of these devices means that CMOS technology was no longer able to keep up with Moore's Law [1].

FinFET technology is seen as the main alternative replace the CMOS bulk to technology, since it has the same manufacturing process of the known CMOS transistors [2].

#### METHODOLOGY

IV Characteristic curves were simulated for PMOS and NMOS devices in bulk CMOS technology and PFET and NFET devices in FinFET technology.

This work also evaluates the impact of W and L parameters in the current and threshold voltage.

The simulations use NGSPICE and HSPICE tool in the sub-20nm technologies [5].

#### MULTIGATE DEVICE

- More than a gate terminal for device
- Low power consumption
- Better control of short channel effects
- Lower leakage current
- Better control of dynamic current
- Higher yield [1]

#### FINFET



Non-planar transistors;

- Fig.1 show a geometric structure of one multigate device FinFET [3]

- Channel conductor surrounded by a thin layer of silicon (fin);

- To increase the value of W in a FinFET device simply increase the number of fins [4].

Grupo de Sistemas Digitais e **Embarcados – GSDE** www.gsde.c3.furg.br



#### RESULTS

| Table 1: W impact in NMOS devices at 16nm              |      |      |      |       |         |         |       |  |  |
|--------------------------------------------------------|------|------|------|-------|---------|---------|-------|--|--|
| W (nm)                                                 | 32   | 64   | 128  | 196   | 256     | 512     | 1024  |  |  |
| l <sub>off</sub> (pA)                                  | 2.1  | 3.8  | 7.2  | 10.8  | 13.9    | 27.5    | 54.7  |  |  |
| I <sub>on</sub> (μΑ)                                   | 14.6 | 33.9 | 72.1 | 113.6 | 5 149.2 | 1 302.4 | 4 607 |  |  |
| Table 2: Number of fins impact in PFET devices at 16nm |      |      |      |       |         |         |       |  |  |
| Fin                                                    | 1    | 2    | 3    | 4     | 5       | 10      | 50    |  |  |
| I <sub>off</sub> (pA)                                  | 5,8  | 11,8 | 17,7 | 23,7  | 29,6    | 59,2    | 296,2 |  |  |

Table 3: Number of fins impact in NFET devices at 16nm

182,8

228,5

457,1

2285,3

| Fin                   | 1    | 2     | 3     | 4     | 5     | 10    | 50     |
|-----------------------|------|-------|-------|-------|-------|-------|--------|
| I <sub>off</sub> (pA) | 5,8  | 11,6  | 17,4  | 23,2  | 29,1  | 58,1  | 290,6  |
| I <sub>on</sub> (μΑ)  | 51,2 | 102,5 | 153,8 | 205,1 | 256,3 | 512,7 | 2563,5 |

137,1

Table 1 confirms that the greater the W, the greater the current. For FinFET devices, as showed in Table 2 and 3, larger the number fins, higher value of current.

#### REFERENCES

[1] KING, T. J. Finfets for nanoscale cmos digital integrated circuits.Int. Conf. onComputer-Aided Design., n. [S.n], p. 207-210, [2] ITRS.THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUC-TORS. 2015. Disponível em:<a href="http://www.itrs.net">http://www.itrs.net</a>

[3] ALIOTO, M. Comparative Evaluation of Layout Density in 3T, 4T and MT FinFET

Standard Cells. IEEE Trans. On Very Large Scale Integration (VLSI) Systems, v.19, n.5, May, 2011.
 HUANG, X., et al. Sub 50-nm FinFET: PMOS. International Electron Devices Meeting

Technical Digest, p. 67. Dec. 1999. [5] PTM. PREDICTIVE TECHNOLOGY MODEL.2015. Disponível em: <a href="http://ptm.asu.edu/>.



 $I_{on}(\mu A)$ 

45,7

91,4

Universidade Federal do Rio Grande - FURG Centro de Ciências Computacionais – C3 Grupo de Sistemas Digitais e Embarcados - GSDE Av Itália Km8 – Bairro Carreiros – Rio Grande/RS, Brazil Contact: {gianeulloa, cristinameinhardt}@furg.br