Current embedded systems have become popular with the dissemination of smartphones, wearables and many other smart gadgets. Such systems execute many kinds of applications: from web browsers to video decoders which cover a wide and heterogeneous environment. Furthermore, these systems must be energy efficient, which is hardly achievable with traditional superscalar processors. To deal with this matter, the industry has adopted MPSoC designs, which contains many accelerators to efficiently execute common embedded workloads, such as media (de/en)coding and DSP applications. The downside of MPSoCs is that each accelerator has its own ISA, which strongly bonds MPSoCs systems with specialized compilers and toolchains, directly affecting the time-to-market of new processors. Reconfigurable organizations have been proposed as an alternative for both superscalars and MPSoCs as they can emulate many accelerators in only one circuit. These systems can adapt themselves to the application at hand, reconfiguring their datapaths to improve execution. Implementation strategies are adopted to optimize data dependency and maximize the Instruction Level Parallelism (ILP) exploitation, providing huge performance improvements and energy saving over classic processors[*] [*].
We have been developing a reconfigurable architecture that overcomes the inefficiency problems of superscalar processors when exploiting ILP. It is coupled with a binary translator, which transforms - at run-time - the code to execute on the reconfigurable system, maintaining code compatibility with the GPP. We have also have shown the capabilities of this architecture to run on single[*] and multicore systems (both homogeneous[*] and heterogeneous[*]). Latest work show how this architecture can be coupled to a superscalar processor, increasing the later performance and energy efficiency.