Predicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Models

dc.authorscopusid57208397666
dc.authorscopusid6506785648
dc.contributor.authorAylin Tokuc, A.
dc.contributor.authorDag, T.
dc.date.accessioned2025-04-15T23:42:53Z
dc.date.available2025-04-15T23:42:53Z
dc.date.issued2025
dc.departmentKadir Has Universityen_US
dc.department-temp[Aylin Tokuc A.] Kadir Has University, Fatih, Department of Computer Engineering, Istanbul, 34083, Turkey, Valinor AI, London, SE9 4HA, United Kingdom; [Dag T.] American University of the Middle East, College of Engineering and Technology, Egaila, 54200, Kuwaiten_US
dc.description.abstractPredicting purchase events from e-commerce clickstream data is a critical challenge with significant implications for optimizing marketing strategies and enhancing customer experience. This study addresses this challenge by systematically evaluating and comparing multiple data representations - aggregated session attributes, recent user actions, and hybrid combinations - which bridges gaps in the existing literature and demonstrates the superiority of hybrid approaches. Unlike prior research, which typically focuses on single representations, our approach combines aggregated session-level summaries with granular, sequential user actions to capture both long-term and short-term behavioral patterns. Through comprehensive experimentation, we compared multiple machine learning models, including LightGBM, decision trees, gradient boosting, SVC, and logistic regression, using real-world e-commerce clickstream data. Notably, the hybrid representation with LightGBM achieved superior predictive performance, significantly outperforming alternative methods. Feature importance analysis revealed key factors influencing purchase likelihood, such as time since the last event, session duration, and product interactions. This study provides actionable insights into real-time marketing interventions by demonstrating the practical utility of hybrid data representations and efficient tree-based models. Our findings offer a scalable and interpretable framework for e-commerce platforms to enhance purchase predictions and optimize marketing strategies. © 2013 IEEE.en_US
dc.identifier.doi10.1109/ACCESS.2025.3548267
dc.identifier.endpage43817en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-105001064548
dc.identifier.scopusqualityQ1
dc.identifier.startpage43796en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3548267
dc.identifier.urihttps://hdl.handle.net/20.500.12469/7283
dc.identifier.volume13en_US
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectClickstream Dataen_US
dc.subjectCustomer Behavior Modelingen_US
dc.subjectData Representationsen_US
dc.subjectE-Commerceen_US
dc.subjectFeature Importanceen_US
dc.subjectGradient Boostingen_US
dc.subjectLightgbmen_US
dc.subjectMachine Learningen_US
dc.subjectModel Selectionen_US
dc.subjectPurchase Predictionen_US
dc.titlePredicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Modelsen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files