Enhancing Malware Classification: a Comparative Study of Feature Selection Models With Parameter Optimization

dc.contributor.author Curebal,F.
dc.contributor.author Dag,H.
dc.contributor.other Management Information Systems
dc.contributor.other 03. Faculty of Economics, Administrative and Social Sciences
dc.contributor.other 01. Kadir Has University
dc.date.accessioned 2024-06-23T21:39:24Z
dc.date.available 2024-06-23T21:39:24Z
dc.date.issued 2024
dc.description.abstract This study assesses the impact of seven feature selection algorithms (Minimum Redundancy Maximum Relevance (MRMR), Mutual Information (MI), Chi-Square (Chi), Leave One Feature Out (LOFO), Feature Relevance-based Unsupervised Feature Selection (FRUFS), A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization (AGRM), and BoostARoota) across two malware datasets (Microsoft and API call sequences) using three machine learning models (Extreme Gradient Boosting (Xgboost), Random Forest, and Histogram-Based Gradient Boosting (Hist Gradient Boosting)). The analysis reveals that no feature selection algorithm uniformly outperforms the others as their effectiveness varies based on the dataset and model characteristics. Specifically, BoostARoota demonstrated significant compatibility with the Microsoft dataset, especially after parameter optimization, whereas its performance varied with the API call sequences dataset, suggesting the need for customized parameter selection. This study highlights the necessity of tailored feature selection approaches and parameter adjustments to optimize machine learning model performance across different datasets. © 2024 IEEE. en_US
dc.identifier.citationcount 0
dc.identifier.doi 10.1109/SIEDS61124.2024.10534669
dc.identifier.isbn 979-835038514-4
dc.identifier.scopus 2-s2.0-85195324534
dc.identifier.uri https://doi.org/10.1109/SIEDS61124.2024.10534669
dc.identifier.uri https://hdl.handle.net/20.500.12469/5873
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof 2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 3 May 2024 -- Charlottesville -- 199691 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Feature selection en_US
dc.subject Machine learning en_US
dc.subject Malware classification en_US
dc.subject Parameter optimization en_US
dc.title Enhancing Malware Classification: a Comparative Study of Feature Selection Models With Parameter Optimization en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Dağ, Hasan
gdc.author.scopusid 58530614500
gdc.author.scopusid 6507328166
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department Kadir Has University en_US
gdc.description.departmenttemp Curebal F., Information Science University at Albany, Albany, NY, United States; Dag H., Management Information Systems, Kadir Has University, Istanbul, Turkey en_US
gdc.description.endpage 516 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.startpage 511 en_US
gdc.identifier.openalex W4398174349
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.5942106E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.9478422E-9
gdc.oaire.publicfunded false
gdc.openalex.fwci 1.501
gdc.openalex.normalizedpercentile 0.67
gdc.opencitations.count 0
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 3
gdc.scopus.citedcount 3
relation.isAuthorOfPublication e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isAuthorOfPublication.latestForDiscovery e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isOrgUnitOfPublication ff62e329-217b-4857-88f0-1dae00646b8c
relation.isOrgUnitOfPublication acb86067-a99a-4664-b6e9-16ad10183800
relation.isOrgUnitOfPublication b20623fc-1264-4244-9847-a4729ca7508c
relation.isOrgUnitOfPublication.latestForDiscovery ff62e329-217b-4857-88f0-1dae00646b8c

Files