Enhancing Malware Classification: A Comparative Study of Feature Selection Models with Parameter Optimization

dc.authorscopusid58530614500
dc.authorscopusid6507328166
dc.contributor.authorDağ, Hasan
dc.contributor.authorDag,H.
dc.date.accessioned2024-06-23T21:39:24Z
dc.date.available2024-06-23T21:39:24Z
dc.date.issued2024
dc.departmentKadir Has Universityen_US
dc.department-tempCurebal F., Information Science University at Albany, Albany, NY, United States; Dag H., Management Information Systems, Kadir Has University, Istanbul, Turkeyen_US
dc.description.abstractThis study assesses the impact of seven feature selection algorithms (Minimum Redundancy Maximum Relevance (MRMR), Mutual Information (MI), Chi-Square (Chi), Leave One Feature Out (LOFO), Feature Relevance-based Unsupervised Feature Selection (FRUFS), A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization (AGRM), and BoostARoota) across two malware datasets (Microsoft and API call sequences) using three machine learning models (Extreme Gradient Boosting (Xgboost), Random Forest, and Histogram-Based Gradient Boosting (Hist Gradient Boosting)). The analysis reveals that no feature selection algorithm uniformly outperforms the others as their effectiveness varies based on the dataset and model characteristics. Specifically, BoostARoota demonstrated significant compatibility with the Microsoft dataset, especially after parameter optimization, whereas its performance varied with the API call sequences dataset, suggesting the need for customized parameter selection. This study highlights the necessity of tailored feature selection approaches and parameter adjustments to optimize machine learning model performance across different datasets. © 2024 IEEE.en_US
dc.identifier.citation0
dc.identifier.doi10.1109/SIEDS61124.2024.10534669
dc.identifier.endpage516en_US
dc.identifier.isbn979-835038514-4
dc.identifier.scopus2-s2.0-85195324534
dc.identifier.scopusqualityN/A
dc.identifier.startpage511en_US
dc.identifier.urihttps://doi.org/10.1109/SIEDS61124.2024.10534669
dc.identifier.urihttps://hdl.handle.net/20.500.12469/5873
dc.identifier.wosqualityN/A
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartof2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 3 May 2024 -- Charlottesville -- 199691en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectFeature selectionen_US
dc.subjectMachine learningen_US
dc.subjectMalware classificationen_US
dc.subjectParameter optimizationen_US
dc.titleEnhancing Malware Classification: A Comparative Study of Feature Selection Models with Parameter Optimizationen_US
dc.typeConference Objecten_US
dspace.entity.typePublication
relation.isAuthorOfPublicatione02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isAuthorOfPublication.latestForDiscoverye02bc683-b72e-4da4-a5db-ddebeb21e8e7

Files