| Abstract: | 入侵偵測系統(IDS)對於偵測惡意活動和警告即將發生的威脅非常有用。當入侵偵測系統監控到網路流量有違規行為或威脅時,系統就會發出警報。常見的入侵偵測系統有三個常見的步驟,即監視、偵測和警報等三個步驟。 由過去的研究可看出入侵偵測系統經常面臨許多挑戰,特別是在入侵偵測系統所收集到的資料類型經常是分佈不均的,常見的就是正常的流量數據和惡意或攻擊數據有很大的數量落差,這樣的情形也叫資料不平衡。這種不平衡的數據資料集,容易使訓練的模型產生偏差,並導致模型在進行偵測時,對於真正攻擊的資訊判斷不佳。另一個挑戰則是入侵偵測系統所蒐集到的數據集會包含許多雜訊或不相關的資訊,這些資料可能會影響機器學習模型偵測入侵的效能。為了克服上述的問題,過去學者們也提出不少解決的機器學習模型或方法。有關數據不平衡的問題仍然是目前研究學者們仍積極克服的議題,相關做法包括資料預處理時抽取較少樣本(under-sampling)方式、利用數據增強(data augmentation)方式或利用過度採樣(oversampling)方式等來減輕資料集裡的不平衡情形,減少資料的分佈不均才能提高機器學習模型對惡意入侵的偵測可能。除了資料預處理,特徵選擇(FS)方法也是用來識別入侵資料很重要關鍵研究,因為特徵選擇(FS)方法能有效減少數據集的雜訊資料、降低資料維度並提高模型計算效能和準確性。由上可知,選擇合適的資料預處理技術、特徵選擇方法和機器學習演算法至關重要。本研究的目的是提出一個通用的「多動態特徵選擇(MDFS)」技術,用以降低數據維度並提高入侵偵測系統中機器學習模型的性能。本研究將多動態特徵選擇與機器學習模型相結合,針對三個完全不同的網路資料數據集:KDD Cup 99、CICIDS 2017和UNSW NB15進行研究和實驗。由本研究所提的方法經實驗結果證實,本研究提出的整合模型能有效地偵測入侵攻擊,所獲得的準確率和F1值都優於過去其它學者的研究結果。由此可知,本研究所提的多動態特徵選(MDFS)技術確實能提高入侵偵測系統的效能,即便在三個不同的數據資料集中,依然能克服不同數據集的資料不平衡內容和雜訊問題,達成提升效能的成果。 The Intrusion Detection System (IDS) is very useful for detecting malicious activities and warning about impending threats. When IDS monitor network flow and detects policy violations or threats, alerts will be triggered. In general, monitoring, detection, and alerting are three typical steps of the IDS process. Past studies have shown that there are many challenges for IDS. One of the challenges is data imbalance, which contains a big variation amount of data between normal and malicious data. This kind of imbalanced data will create a biased model and cause the model to perform poorly in detecting minority classes or malicious attack. Another challenge is the noisy or irrelevant data in the IDS datasets, which can impact the performance of Machine Learning (ML) models in detecting the intrusion.To overcome these challenges, researchers have proposed many machine-learning models or solutions. Usually, data preprocessing is the critical step in handling data imbalance problems. These include approaches like under-sampling, data augmentation, and oversampling which can reduce the imbalanced distribution and improve the performance of ML model. In addition to data preprocessing, Feature Selection (FS) methods are also a critical in recognizing important features, reducing noise in the dataset, and improving model Accuracy. Based on these steps, it is evident that a good IDS should incorporate appropriate data preprocessing technique, proper FS methods, and high-performance ML algorithm(s). This research aims to improve the performance of ML models in IDS by reducing data imbalance and dimensionality using Multi-Dynamic Feature Selection (MDFS) techniques. Combining MDFS with ML can increase the performance of IDS in addressing challenges in three different datasets: KDD Cup 99, CICIDS 2017, and UNSW NB15. Based on the experimental results, this research shows that the proposed model can detect attacks better when compare with other researchers’ model. The experimental results are confirmed that the Accuracy value and F1-Score of this research obtained are better than the Accuracy value and F1-Score in other studies. |