Generative AI and Cognitive Computing-Driven Intrusion Detection System in Industrial CPS

This section provides the complete details about the experimental setup followed by the dataset and pre-processing details. We further provide details about the metrics that are employed to evaluate the proposed model’s performance. Finally, we evaluate the performance of the proposed IDS and discuss the results in this section.

Experimental Setup

The experiment is conducted on a PowerEdge R940xa Rack Server, equipped with two Intel Xeon Gold 6240 processors running at 2.6 GHz, 256 GB of RAM, and 8 NVIDIA Ampere A100, 80GB Passive GPUs. The server uses Windows Server 2019 standard. The deep learning models are built using TensorFlow 2.16 and Keras 3. In order to select the most suitable parameters, we conducted numerous experiments (approximately 5 to 7 iterations) guided by the results of performance metrics. The final parameters used are illustrated in Table 1. Additionally, the default parameters of Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB) in scikit-learn Python are utilized.

Table 1 Experimental setup for each stageDataset and Preprocessing

We employed two publicly available datasets, such that ToN-IoT [34] and Edge-IIoTset [35] to evaluate the performance of the proposed IDS. ToN-IoT is a significant resource for research in the field of IT security. It is designed to facilitate the study of IDS by providing a comprehensive set of network traffic data that simulates a variety of cyber-attacks and normal traffic scenarios. This dataset is instrumental in developing and evaluating IDS models. On the other hand, the EDGEIIoTset dataset is particularly focused on edge computing environments within the CPS ecosystem. It provides data related to CPS devices operating in edge computing scenarios, including network traffic, device behavior, and security threats specific to such environments. Further, it aids in the development of security solutions and monitoring systems that are optimized for the edge computing landscape. In this work, we consider a normal class and nine attack classes of the ToN-IoT dataset, i.e., DDoS, Backdoor, MiTM, etc while for the Edge-IIoTset we consider one normal and fourteen attack classes. Furthermore, we employed different steps to preprocess the data as it impacts the performance of the model [36]. Firstly, we imputed all the missing values and removed the incomplete rows from both the dataset. Secondly, we converted all the categorical variables to numerical values by using one-hot encoding. Thirdly, we employed the Min-Max scaler function to normalize the data. Finally, we divide both the datasets into training and testing data, i.e., the model was trained using 70% of the data and validated and tested using the remaining 30%. The complete details about the instances in the training and testing sets of these datasets are provided in Table 2.

Evaluation Metrics

In this study, we used a number of assessment measures, including Accuracy (Acc), Precision (Pr), Recall (Re), and F1-score (F1), to evaluate the performance of the proposed IDS. For additional performance assessment, we used the confusion matrix and Receiver Operating Characteristic (ROC) curve. The following equations are used to calculate the values for Acc, Pr, Re, and F1 [37]:

$$\begin Acc= \fracP+T_N}P + T_N + F_P + F_N} \end$$

(16)

$$\begin Pr= \fracP}P + F_P} \end$$

(17)

$$\begin Re= \fracP}P + F_N} \end$$

(18)

$$\begin F1= 2 \times \frac \end$$

(19)

where \(T_P\) denotes the true positive, \(T_N\) represents the true negative, \(F_P\) is the false positive, and \(F_N\) is false negative. Further, for overall analysis, we have used weighted component. The weighted calculations for precision, recall, and F1-score are as follows: The precision for each class is weighted by the number of true instances for that class in the dataset. The overall weighted precision is the sum of these individual weighted precisions.

$$\begin Pr_} = \sum _^ w_i \times Pr_i \end$$

(20)

where \( w_i \) is the proportion of true instances for class \( i \) in the dataset, and \( Pr_i \) is the precision for class \( i \). \( N \) is the total number of classes. The recall for each class is weighted by the proportion of true instances for that class. The overall weighted recall is the sum of these individual weighted recalls.

$$\begin Re_} = \sum _^ w_i \times Re_i \end$$

(21)

where \( w_i \) is as defined above, and \( Re_i \) is the recall for class \( i \). The F1-score for each class is computed and then weighted by the proportion of true instances for that class. The overall weighted F1-score is the sum of these individual weighted F1-scores.

$$\begin F1_} = \sum _^ w_i \times F1_i \end$$

(22)

Fig. 1figure 1Fig. 2figure 2Performance Evaluation of the proposed IDS

We evaluate the performance of the proposed IDS in this subsection. Firstly, we provide the accuracy vs loss output of the proposed model to show the optimal fit. Figure 1 depicts the training Acc and validation Acc Vs training loss and validation loss for the ToN-IoT dataset. In contrast, Fig. 2 presents the output for the Edge-IIoTset dataset. It can be seen in Fig. 1 that the proposed model achieved a training and validation Acc of 99.85% and 99.95% for the ToN-IoT dataset, while it has a training loss of 0.64% with validation loss of 0.41% respectively. For the Edge-IIoTset dataset, it achieved training Acc of 95.32% and validation Acc of 95.35% with training and validation loss of 9.35% and 9.30% accordingly. These results show the optimal fit of the proposed model and prove it is neither overfitting nor underfitting. Further, a confusion matrix, which is also known as an uncertainty matrix is used for evaluation. In the confusion matrix, each of the rows denotes the true class and the predicted class is represented by each column in the matrix. The cell indicates the number of instances from the true class that were predicted correctly by the model. We provide the confusion matrix of the proposed mechanism using both datasets. Figure 3 depicts the confusion matrix for the Ton-IoT dataset, and Fig. 4 presents the confusion matrix for the Edge-IIoTset dataset. It can be seen that the proposed model identified all of the classes of these datasets correctly, i.e., it predicted 90,036 instances from the Normal class, 6031 from DoS, 6014 from the DDoS class, and so on. Moreover, the Receiver Operating Characteristic (ROC) is also considered an important evaluation metric. It is a graphical representation, which is used to evaluate the performance of a classification model. An ROC value near 1 indicates the efficient performance of a model, while ROC values less than 0.5 are considered as poor performance by the model. We provide the ROC curve of the proposed model in Figs. 5 and 6 for ToN-IoT and Edge-IIoTset datasets. It can be seen that the proposed model has a 0.99999 micro average and a 0.99998 macro average for the ToN-IoT dataset. Further, it has micro and macro averages of 0.99931 and 0.99522 for the Edge-IIoTset dataset respectively. The micro and macro averages under both these datasets are almost equal to 1, which further indicates the efficient performance of the proposed IDS.

Fig. 3figure 3

This confusion matrix provides an in-depth evaluation of the model’s classification performance for various classes present in ToN-IoT dataset. The x-axis represents predicted labels, while the y-axis corresponds to true labels

Fig. 4figure 4

This confusion matrix provides an in-depth evaluation of the model’s classification performance for various classes present in Edge-IIoT dataset. The x-axis represents predicted labels, while the y-axis corresponds to true labels

Fig. 5figure 5

In this ROC curve, each class from ToN-IoT dataset is evaluated based on its False Positive Rate (FPR) depicted on the x-axis and True Positive Rate (TPR) represented on the y-axis

Fig. 6figure 6

In this ROC curve, each class from Edge-IIoT dataset is evaluated based on its False Positive Rate (FPR) depicted on the x-axis and True Positive Rate (TPR) represented on the y-axis

Table 3 Class-wise results (%) for ToN-IoT dataset

Moreover, we provide the class-wise performance of the proposed IDS in terms of Pr, Re, and F1. Table 3 presents the class-wise performance of the proposed IDS using the ToN-IoT dataset. The proposed IDS has a Pr of 100% for the Backdoor, Normal, and Scanning classes. For other classes, it has achieved Pr values between 98.07 and 99.96%. In terms of Re, it has achieved 100% Re for Normal, Ransomware, Scanning, and XSS classes. However, it has Re between 99.58 and 99.96% for the remaining classes of the ToN-IoT dataset. For F1, it has achieved F1 of 99.88% for Backdoor class, 99.78% for DDoS, 99.83% for DoS, 99.64% for Injection, 98.39% for MiTM, 99.90% for Password, 99.89% for Ransomware, and 99.94% for XSS classes. It has achieved an F1 of 100% for Normal and Scanning classes. Regarding the false positive rate, the proposed IDS achieved the lowest false positive rate of 0.00 for the MITM class. For other classes, it has a false positive rate between 0.000001 and 0.00005. Furthermore, we provide the class-wise performance for the Edge-IIoTset dataset in Table 4. Regarding Pr, it achieved the Pr of 100% for the Normal class, while for DDoS UDP, it has Pr of 99.98%. Further, it has a Pr of 99.97% for DDoS ICMP, 46.31% for SQL Injection, 86.41% for DDoS TCP, 95.67 for Vulnerability Scanner, 29.73% for Password, 94.78% for DDoS HTTP, 67.69% for Uploading, 99.43 for Backdoor, 95.63% for Port Scanning, 47.33% for XSS, 99.96% for Ransomware, 99.48% for Fingerprinting, and 99.06% for MITM classes. Regarding Re, it has 100% Re for DDoS UDP, DDoS TCP, and MITM classes. For other classes, it has a minimum Re of 21.07% for the Password class and a maximum of 97.77% for the Ransomware class. Moreover, it has achieved an F1 of 99.99% for the Normal and DDoS UDP classes, 99.98% for DDoS ICMP, 60.87% for SQL Injection, 92.71% for DDoS TCP, 60.38 for Vulnerability Scanner, 24.66% for Password, 87.54% for DDoS HTTP, 60.48% for Uploading, 98.88 for Backdoor, 74.41% for Port Scanning, 58.85% for XSS, 98.34% for Ransomware, 81.85$ for Fingerprinting, and 99.53% for MITM classes. Moreover, the proposed IDS achieved the lowest false positive rate of 0.00 for the Normal and MITM classes. For other classes, it has a false positive rate between 0.000001 and 0.03186.

Table 4 Class-wise results (%) for Edge-IIoTset datasetFig. 7figure 7

Cognitive refinement of confidence score for password attack present in ToN-IoT dataset

Fig. 8figure 8

Cognitive refinement of confidence score for DDoS_UDP attack present in Edge-IIoTset dataset

Analysis for Generalized Cognitive Refinement of Confidence Score

Figures  7 and 8 compare the initial confidence scores (Stage 2) with the refined scores post-algorithm application (Stage 3) for “password” and “DDoS_UDP” attack present in ToN-IoT and Edge-IIoTset datasets. This approach can be generalized for each attack present in the dataset. The notable differences between these two stages, particularly the reduction in confidence scores for several instances, indicate the algorithm’s conservative approach toward instances with lower initial confidence. This approach is especially evident where the initial confidence scores are significantly reduced post-refinement, aligning with the algorithm’s criterion of scaling down scores below the threshold. This outcome demonstrates the efficacy of the cognitive refinement process in enhancing the reliability of the intrusion detection system. By applying this algorithm, we ensure that the system’s predictions are not just based on initial assessments but are re-evaluated through a lens that mimics human-like skepticism and caution. Consequently, this method aids in reducing false positives, thereby strengthening the system’s capability to differentiate between genuine threats and benign activities.

Interpretation of Activation Values

In Figs. 9 and 10, the x-axis represents the individual units of the LSTM layer. Since the LSTM layer was defined with 64 units and is bidirectional, it effectively has 128 units (64 in each direction). The y-axis, values represent the mean activation of each LSTM unit over the subset of data processed. Activation values in LSTM units can be negative or positive, indicating the extent to which each unit is activated by the input data. Values close to 0 suggest minimal activation, while higher absolute values (either positive or negative) indicate stronger activation. By comparing two parts inside Figs. 9 and 10 for ToN-IoT and Edge-IIoT datasets, we can get insights into how different types of RNN units (LSTM vs. GRU) process the same data. It might reveal differences in how they capture and respond to patterns in the data. Understanding the activation patterns can also help in diagnosing the model’s behavior. For example, if certain units are consistently not activated (mean activation values close to 0), they might not be contributing much to the model’s performance. Moreover, high variation in activation values across the units may also indicate how different units are picking up various features or aspects of the input data. This can lead to insights into which features are more relevant to the model’s predictions.

Fig. 9figure 9

Mean activation values ToN-IoT dataset

Fig. 10figure 10

Mean activation values Edge-IIoT dataset

Fig. 11figure 11

Comparison of algorithm performance on ToN-IoT dataset

Fig. 12figure 12

Comparison of algorithm performance on Edge-IIoT dataset

Comparison with Baselines

Finally, the performance of the proposed IDS is compared with some baseline approaches, i.e., Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Long-short-term Memory (LSTM), and Bidirectional Long-short-term Memory (BiLSTM) to further evaluate its performance. The comparison with these baseline approaches on the ToN-IoT dataset is provided in Fig. 11. It is clear from the figure that the proposed IDS obtained an Acc of 99.95% with Pr, Re, and F1 each at 99.94%. On the other hand, DT has an Acc, Pr, Re, and F1 of 95.34%, 74.72%, 80.00%, and 76.33%. Further, the RF has Acc of 97.81%, NB has 90.69%, LSTM has 82% and BiLSTM has 84.49%. Whereas the Pr values of RF, NB, LSTM, and BiLSTM are 87.55%, 77.68%, 78.00%, and 83.98% accordingly. Regarding Re, they achieved Re of 85.43%, 77.70%, 81.45%, and 81.56%. Finally, they have F1 values of 76.41%, 72.43%, 81.49%, and 81.20%. The proposed IDS outperformed the baseline classifiers by achieving higher values of Acc, Pr, Re, and F1 under the ToN-IoT dataset.

We further provide a comparison against these baseline approaches using the Edge-IIoTset dataset. Figure 12 depicts the comparison of the proposed IDS against these baselines. It can be seen that the proposed IDS has an Acc of 94.20%, Pr of 95.06%, Re of 94.19%, and F1 of 94.07%. The values of Acc achieved by the baselines are as follows: DT achieved 92.20%, RF achieved 92.50%, NB achieved 92.00%, LSTM achieved 92.80%, and BiLSTM achieved an Acc of 93.00% accordingly. Regarding Pr, the DT has Pr of 93.06%, whereas the Pr values of RF, NB, LSTM, and BiLSTM are 93.36%, 92.86%, 93.6%, and 93.86%. Furthermore, the Re values of these baseline approaches are as follows: DT has achieved Re of 92.19%, RF has 92.49%, NB has 91.99%, LSTM has 92.79%, and BiLSTM has Re of 92.99% respectively. Finally, the comparison in terms of F1 under the Edge-IIoTset dataset is also provided in Fig. 12. The DT and RF have F1 values of 92.07% and 92.37%. While, NB, LSTM, and BiLSTM have F1 of 91.87%, 92.67%, and 92.87% accordingly. This comparison using the Edge-IIoTset dataset also proves the efficient performance of the proposed IDS compared to these baseline approaches, thus proving its efficiency in threat detection.

Comments (0)

No login
gif