FRAUDULENT CREDIT CARD TRANSACTION DETECTION USING LOGISTIC REGRESSION
DOI:
https://doi.org/10.35631/JISTM.1038012Keywords:
Logistic Regression, Fraud Detection, Transactions Detection, Credit CardAbstract
Credit card fraud poses a significant threat to financial institutions and individuals, leading to substantial losses and undermining trust in digital payments. This study aimed to identify fraudulent transactions using a logistic regression-based machine learning model, develop a fraud detection prototype, and evaluate its accuracy using Precision-Recall Area Under the Curve (PR AUC). The methodology included three phases: Preliminary, Design, and Evaluation. In the Preliminary Phase, a literature review identified research gaps, and the September 2013 European credit card fraud dataset from Kaggle was preprocessed using robust scaling. The Design Phase involved constructing system architecture, creating flowcharts, designing a user interface, and developing logistic regression pseudocode. During the Evaluation Phase, the study balanced the dataset using undersampling, conducted 5-fold cross-validation, and split the data into training, testing, and validation sets in a 70:30 ratio. The logistic regression model was trained and evaluated using precision, recall, F1-score, and PR-AUC. The model achieved a PR-AUC score of 99.57% via the 10% validation set consisting of 52 fraud and 48 normal transactions, demonstrating high discriminatory power and reliability. The developed prototype enhances security and trust in digital payment systems. The use of robust scaling to normalise outliers, undersampling to balance the dataset, and comprehensive evaluation metrics provide valuable insights for future research and practical applications in fraud detection systems. This study contributes to mitigating credit card fraud and improving financial transaction integrity. Future work should encourage collaboration between financial institutions, regulatory bodies, and researchers to share various types of anonymised transaction data and best practices, which could lead to more robust and generalisable models.