Detecting Phishing URLs Using Machine Learning & Lexical Feature-based Analysis
Phishing URLs is one of the greatest threats for cybersecurity professionals and practitioners. This requires hold hands together, make great efforts, and use current technology to help identifying Phishing URLs and control the spread of this threat. Many researchers have investigated various machine learning techniques to tackle this threat. However, there are many difficulties and obstacles of using machine learning. The proposed approach detects Phishing URLs through analyzing URLs to extract lexical characteristics features. Afterward, apply machine learning approach based on the extracted features. The dataset was collected from different sources, it includes four different attack scenarios: Defacement, Spam, Phishing, Malware. However, in this research, the focus was on Phishing URLs. The dataset was used as an input for various machine learning and statistical detectionmodels?(RF: Random forest, DT: Decision Tree Classifier, GNB Gaussian Naive Bayes, KNN: k-nearest neighbour, Logistic regression, SVC: Support Vector Classifier, QDA: Quadratic Discriminant Analysis, Perceptron, SMOTE: Synthetic Minority Oversampling Technique)?. These models were employed to predict Phishing URLs based lexical characteristics features. The result indicates a relatively good accuracy rate. The Random forest (RF)modelhas produced the best accuracy (98%) compared to the other detection models. As well as, the RF has produced the best precision and recall (98%) respectively.
Publishing Year
2020