Shabestari, FatemehNavimipour, Nima Jafari2024-06-232024-06-23202402473-2400https://doi.org/10.1109/TGCN.2023.3347276https://hdl.handle.net/20.500.12469/5749Shabestari, Fatemeh/0000-0003-1926-4674Apache Spark is a popular framework for processing big data. Running Spark on Hadoop YARN allows it to schedule Spark workloads alongside other data-processing frameworks on Hadoop. When an application is deployed in a YARN cluster, its resources are given without considering energy efficiency. Furthermore, there is no way to enforce any user-specified deadline constraints. To address these issues, we propose a new deadline-aware resource management system and a scheduling algorithm to minimize the total energy consumption in Spark on YARN for heterogeneous clusters. First, a deadline-aware energy-efficient model for the considered problem is proposed. Then, using a locality-aware method, executors are assigned to applications. This algorithm sorts the nodes based on the performance per watt (PPW) metric, the number of application data blocks on nodes, and the rack locality. It also offers three ways to choose executors from different machines: greedy, random, and Pareto-based. Finally, the proposed heuristic task scheduler schedules tasks on executors to minimize total energy and tardiness. We evaluated the performance of the suggested algorithm regarding energy efficiency and satisfying the Service Level Agreement (SLA). The results showed that the method outperforms the popular algorithms regarding energy consumption and meeting deadlines.eninfo:eu-repo/semantics/closedAccessSparksYarnTask analysisResource managementEnergy efficiencyEnergy consumptionClustering algorithmsDistributed computingenergy managementresource managementschedulingAn Energy-Aware Resource Management Strategy Based on Spark and YARN in Heterogeneous EnvironmentsArticle63564428WOS:00123017790001910.1109/TGCN.2023.33472762-s2.0-85181573774N/AQ1