INTEGRATING BIG DATA AND CRM FOR SMES: A HYBRID RFM AND LEXICON-BASED CLUSTERING APPROACH FOR CUSTOMER SEGMENTATION

Authors

DOI:

https://doi.org/10.35631/AIJBES.828022

Keywords:

Big Data, CRM, Customer Segmentation, K-Means Clustering, Lexicon Approach, RFM Model

Abstract

In the era of Big Data, integrating Customer Relationship Management (CRM) with advanced analytics is essential for businesses to maintain a competitive edge. This study focuses on customer segmentation, a core CRM strategy, using a public dataset from a giftware retailer. Although previous CRM segmentation studies have extensively applied the RFM model and clustering algorithms, most focus mainly on transactional metrics without examining product-category purchasing behaviours embedded within unstructured product descriptions. Current studies rely heavily on raw product identifiers, which produce high-dimensional sparse data and limit the interpretability of customer preferences. Accordingly, there remains limited research integrating semantic product categorisation with RFM-based clustering to uncover cluster-specific purchasing behaviours within SME retail environments. To bridge this gap, this study proposes a methodology that combines the RFM (Recency, Frequency, Monetary) model with a Lexicon-based approach for product categorisation during data preprocessing. This approach effectively reduces the dimensionality of product varieties, allowing for a more meaningful analysis of buying behaviours. The study employed an unsupervised K-means clustering algorithm using engineered RFM features derived from transactional records. The optimal number of clusters was determined using the Elbow Method, while the model’s validity was confirmed using the Silhouette Index and business logic. The results identified four distinct customer segments: Platinum, Gold, Silver, and Bronze, ranked by their monetary value. Findings specify that the Lexicon-based categorisation significantly enhances the interpretability of purchasing patterns within each cluster. This research proposes SMEs a scalable framework for customer profiling, targeted marketing, inventory optimisation, and strategic CRM decision-making through the integration of transactional analytics and semantic product categorisation.

Downloads

Download data is not yet available.

References

Anitha, P., & Patil, M. M. (2020). RFM model for customer purchase behavior using K-Means algorithm. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2018.12.011

Anshari, M., Almunawar, M. N., Lim, S. A., & Al-Mudimigh, A. (2019). Customer relationship management and big data enabled: Personalization & customization of services. Applied Computing and Informatics, 15(2), 94–101. https://doi.org/10.1016/j.aci.2018.05.003

Arunachalam, D., & Kumar, N. (2018). Benefit-based consumer segmentation and performance evaluation of clustering approaches: Evidence of data-driven decision-making. Expert Systems with Applications, 111, 11–34. https://doi.org/10.1016/j.eswa.2018.01.035

Bergström, S. (2019). Customer segmentation of retail chain customers using cluster analysis [Master’s thesis, KTH Royal Institute of Technology]. DiVA Portal.

Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management, 19(3), 197–208. https://doi.org/10.1057/dbm.2012.17

Christy, A. J., Umamakeswari, A., Priyatharsini, L., & Neyaa, A. (2018). RFM ranking – An effective approach to customer segmentation. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2018.09.004

Dogan, O., Hiziroglu, A., & Seymen, O. F. (2020). Segmentation of retail consumers with soft clustering approach. In International Conference on Intelligent and Fuzzy Systems (pp. 39–46). Springer.

Dua, D., & Graff, C. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml

Griva, A., Bardaki, C., Pramatari, K., & Doukidis, G. (2021). Factors affecting customer analytics: Evidence from three retail cases. Information Systems Frontiers, 1–24. https://doi.org/10.1007/s10796-020-10040-2

Guney, S., Peker, S., & Turhan, C. (2020). A combined approach for customer profiling in video on demand services using clustering and association rule mining. IEEE Access, 8, 185107–185120.

Hamdan, A. R., Abu Bakar, A., & Ahamd Nazri, M. Z. (2018). Sains data penerokaan pengetahuan dari data raya. Penerbit Universiti Kebangsaan Malaysia.

Kabasakal, I. (2020). Customer segmentation based on recency frequency monetary model: A case study in e-retailing. Journal of Business and Economic Studies, 13, 47–56.

Kebede, A. M., & Tegegne, Z. L. (2018). The effect of customer relationship management on bank performance: In context of commercial banks in Amhara Region, Ethiopia. Cogent Business & Management, 5(1). https://doi.org/10.1080/23311975.2018.1493915

Liu, Y. C., & Chen, Y. L. (2017). Customer clustering based on customer purchasing sequence data. International Journal of Engineering Research and Application, 7(1), 49–58.

Namvar, M., Khakabimamaghani, S., & Gholamian, M. (2011). An approach to optimize customer segmentation and profiling using RFM, demographic features, and LTV. International Journal of Electronic Customer Relationship Management, 5, 220–235.

Otiko, A. O., Odey, J. A., & Inyang, G. A. (2019). Conceptualisation of market segmentation and patterns for pre-Christmas sales in an online retail store. International Journal of Research in Business and Social Science.

Piskunova, O., & Klochko, R. (2020). Classification of e-commerce customers based on data science techniques. Central European Management Journal.

Rahadian, Y. R., & Syairudin, B. (2020). Segmentation analysis of students in X course with RFM model and clustering. Jurnal Sosial Humaniora, 13(1), 1–12.

Savitri, A. D., Bachtiar, F. A., & Setiawan, N. Y. (2018). Segmentasi pelanggan menggunakan metode K-Means clustering berdasarkan model RFM pada klinik kecantikan (Studi kasus: Belle Crown Malang). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 2(9), 2957–2966.

Vohra, R., Pahareeya, J., Hussain, A., Ghali, F., & Lui, A. (2020). Using self organizing maps and K-Means clustering based on RFM model for customer segmentation in the online retail business. International Journal of Advanced Science and Technology, 29, 1641–1655.

Wang, S. C., Tsai, Y. T., & Ciou, Y. S. (2020). A hybrid big data analytical approach for analyzing customer patterns through an integrated supply chain network. Journal of Industrial Information Integration, 20, 100177.

Widyadhan, D., Hastuti, R. B., Kharisudin, I., & Fauzi, F. (2021). Perbandingan analisis klaster K-Means dan average linkage untuk pengklasteran kemiskinan di Provinsi Jawa Tengah. PRISMA, Prosiding Seminar Nasional Matematika, 584–594.

Zorina, K. (2019). Building segment based revenue prediction for CLV model. [Thesis].

Downloads

Published

2026-06-15

How to Cite

Shukor, S. A., & Zulkefly, S. H. (2026). INTEGRATING BIG DATA AND CRM FOR SMES: A HYBRID RFM AND LEXICON-BASED CLUSTERING APPROACH FOR CUSTOMER SEGMENTATION. ADVANCED INTERNATIONAL JOURNAL OF BUSINESS, ENTREPRENEURSHIP AND SME’S (AIJBES), 8(28), 327–341. https://doi.org/10.35631/AIJBES.828022