Özet:
In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.