While the increasing complexity of multithreaded applications demands more computing power, energy consumption has also become a primary concern: while most of the embedded devices are mobile and heavily dependent on battery (e.g., smartphones, tablets, etc.), general-purpose processors (GPPs) are being pulled back by the limits of thermal design power (TDP). Therefore, the main objective when designing a new processor is to improve performance with minimal impact on energy consumption. Performance improvements can be achieved by exploiting Thread-Level Parallelism (TLP), in which multiple processors simultaneously execute parts of the same program, exchanging data at runtime through shared memory region. However, these memory regions are more distant from the processor (e.g., L3 cache and main memory) and have higher delay and power consumption when compared to memories that are closer to it (e.g., register, L1, and L2 caches) [*]. Considering the aforementioned scenario, one can infer that the more communication a parallel application has, the more energy it will consume. On the other hand, performance will increase as a result of the parallelization, which may lead to energy reductions. However, this performance increase is not linear and sometimes does not scale with the number of threads, due to synchronization and bandwidth of the communication bus [*].
In this research area, we are assessing the influence of the most used parallel programming interfaces on the energy efficiency of parallel applications with different behaviors for embedded and GPPs [*]. We are also investigating the best combination of processors, communication models, and levels of Thread Parallelism exploitation to reach the best results in performance, energy, and EDP for a given parallel application [*]. Finally, we are addressing the implementation of an automatic and transparent approach to improving the energy efficiency of OpenMP applications [*].