Predicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Models

dc.contributor.author Tokuc, A. Aylin
dc.contributor.author Dag, Tamer
dc.date.accessioned 2025-04-15T23:42:53Z
dc.date.available 2025-04-15T23:42:53Z
dc.date.issued 2025
dc.description.abstract Predicting purchase events from e-commerce clickstream data is a critical challenge with significant implications for optimizing marketing strategies and enhancing customer experience. This study addresses this challenge by systematically evaluating and comparing multiple data representations - aggregated session attributes, recent user actions, and hybrid combinations - which bridges gaps in the existing literature and demonstrates the superiority of hybrid approaches. Unlike prior research, which typically focuses on single representations, our approach combines aggregated session-level summaries with granular, sequential user actions to capture both long-term and short-term behavioral patterns. Through comprehensive experimentation, we compared multiple machine learning models, including LightGBM, decision trees, gradient boosting, SVC, and logistic regression, using real-world e-commerce clickstream data. Notably, the hybrid representation with LightGBM achieved superior predictive performance, significantly outperforming alternative methods. Feature importance analysis revealed key factors influencing purchase likelihood, such as time since the last event, session duration, and product interactions. This study provides actionable insights into real-time marketing interventions by demonstrating the practical utility of hybrid data representations and efficient tree-based models. Our findings offer a scalable and interpretable framework for e-commerce platforms to enhance purchase predictions and optimize marketing strategies. en_US
dc.identifier.doi 10.1109/ACCESS.2025.3548267
dc.identifier.issn 2169-3536
dc.identifier.scopus 2-s2.0-105001064548
dc.identifier.uri https://doi.org/10.1109/ACCESS.2025.3548267
dc.language.iso en en_US
dc.publisher IEEE-Inst Electrical Electronics Engineers inc en_US
dc.relation.ispartof IEEE Access
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Predictive Models en_US
dc.subject Data Models en_US
dc.subject Hidden Markov Models en_US
dc.subject Electronic Commerce en_US
dc.subject Computational Modeling en_US
dc.subject Data Visualization en_US
dc.subject Analytical Models en_US
dc.subject Real-Time Systems en_US
dc.subject Random Forests en_US
dc.subject Machine Learning Algorithms en_US
dc.subject Clickstream Data en_US
dc.subject Customer Behavior Modeling en_US
dc.subject Data Representations en_US
dc.subject Feature Importance en_US
dc.subject Gradient Boosting en_US
dc.subject E-Commerce en_US
dc.subject Lightgbm en_US
dc.subject Machine Learning en_US
dc.subject Model Selection en_US
dc.subject Purchase Prediction en_US
dc.title Predicting User Purchases From Clickstream Data: a Comparative Analysis of Clickstream Data Representations and Machine Learning Models en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.wosid Tokuç, A. Aylin/Ixn-5337-2023
gdc.author.wosid Dag, Tamer/K-7830-2014
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Kadir Has University en_US
gdc.description.departmenttemp [Tokuc, A. Aylin] Kadir Has Univ, Dept Comp Engn, TR-34083 Fatih, Istanbul, Turkiye; [Tokuc, A. Aylin] Valinor AI, London SE9 4HA, England; [Dag, Tamer] Amer Univ Middle East, Coll Engn & Technol, Egaila 54200, Kuwait en_US
gdc.description.endpage 43817 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.startpage 43796 en_US
gdc.description.volume 13 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q2
gdc.identifier.openalex W4408426526
gdc.identifier.wos WOS:001445086900045
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.5933282E-9
gdc.oaire.isgreen false
gdc.oaire.keywords feature importance
gdc.oaire.keywords Clickstream data
gdc.oaire.keywords e-commerce
gdc.oaire.keywords customer behavior modeling
gdc.oaire.keywords data representations
gdc.oaire.keywords Electrical engineering. Electronics. Nuclear engineering
gdc.oaire.keywords gradient boosting
gdc.oaire.keywords TK1-9971
gdc.oaire.popularity 3.528061E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 9.88758557
gdc.openalex.normalizedpercentile 0.92
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 1
gdc.plumx.mendeley 22
gdc.plumx.newscount 1
gdc.plumx.scopuscites 3
gdc.scopus.citedcount 3
gdc.virtual.author Dağ, Tamer
gdc.wos.citedcount 0
relation.isAuthorOfPublication 6e6ae480-b76e-48a0-a543-13ef44f9d802
relation.isAuthorOfPublication.latestForDiscovery 6e6ae480-b76e-48a0-a543-13ef44f9d802
relation.isOrgUnitOfPublication fd8e65fe-c3b3-4435-9682-6cccb638779c
relation.isOrgUnitOfPublication 2457b9b3-3a3f-4c17-8674-7f874f030d96
relation.isOrgUnitOfPublication b20623fc-1264-4244-9847-a4729ca7508c
relation.isOrgUnitOfPublication.latestForDiscovery fd8e65fe-c3b3-4435-9682-6cccb638779c

Files