Ahmadpour, Seyed-SajadZohaib, MuhammadRasmi, HadiJafari Navimipour, Nima2026-02-152026-02-1520260920-85421573-0484https://doi.org/10.1007/s11227-025-08220-8https://hdl.handle.net/20.500.12469/7745The rapid growth of high-performance computing (HPC) and supercomputing applications necessitates hardware architectures that provide both high computational performance and strong energy efficiency under real-time and massively parallel workloads. However, conventional complementary metal-oxide semiconductor (CMOS) technologies face fundamental challenges, including excessive power consumption, leakage currents, and severe scaling limitations, which restrict their suitability for future exascale systems. To overcome these limitations, emerging nanotechnologies such as Quantum-dot Cellular Automata (QCA) have gained significant attention due to their ultra low-power consumption and high device density. In this work, we present a high-performance and low-power Quantum-Dot Multiply-Accumulate (Q-Dot MAC) unit, where MAC denotes a fundamental arithmetic operation combining multiplication and accumulation, extensively used in scientific computing, and signal processing. The proposed QCA-based architecture is specifically designed to satisfy the high-frequency (HF) operational demands of modern HPC environments, enabling sustained high-throughput computation. The main objective of this design is to realize a compact, energy-efficient, and physically stable MAC unit suitable for large-scale deployment in energy-constrained supercomputing platforms. Exploiting the inherent parallelism and high-density layout characteristics of QCA, the proposed MAC architecture efficiently executes key computational kernels required in HPC workloads, including large-scale matrix multiplication, convolution operations, and scientific simulations. The proposed QCA-based circuits demonstrate significant performance and area efficiency improvements compared with the best existing designs in the literature. Specifically, the half adder (HA) achieves a 20.51% reduction in cell count and a 25% reduction in area, and the complete MAC unit provides a 22.84% decrease in cell count, a 9.03% reduction in occupied area and a 14.25% in delay. These results confirm the efficiency and scalability of the proposed design. The low-area enables the integration of large arrays of MAC units, facilitating scalable systolic and Single Instruction Multiple Data (SIMD) architectures required in supercomputing environments.eninfo:eu-repo/semantics/closedAccessHigh-Performance Computing (HPC)SupercomputingParallel ProcessingMultiplierMAC UnitQuantum-Dot TechnologyHigh-Performance and Low-Power Quantum-Dot Multiply-Accumulate Design for Next-Generation Supercomputing PlatformsArticle10.1007/s11227-025-08220-82-s2.0-105028888974