Question about wget dataset results #13

jiangdie666 · 2024-05-23T11:51:33Z

I have evaluated the original dataset using your project's code to train the generated pkl, and also the trained pkl that comes with your project, respectively. But the results are not satisfactory, is it because I didn't set other parameter details.
Raw data evaluation results from my own training

python3 eval.py --dataset wget --device 0
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.41680000000000006
F1: 0.6666666662222221
PRECISION: 0.5
RECALL: 1.0
TN: 0
FN: 0
TP: 25
FP: 25
#Test_AUC: 0.4168±0.0000

This is the pkl that comes with your project.

python3 eval.py --dataset wget --device 0
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.47440000000000004
F1: 0.6666666662222221
PRECISION: 0.5
RECALL: 1.0
TN: 0
FN: 0
TP: 25
FP: 25
#Test_AUC: 0.4744±0.0000

The text was updated successfully, but these errors were encountered:

jiangdie666 · 2024-05-23T12:14:36Z

Sorry for also reporting the zero-dimensional tensor error while training the darpar dataset, have you had this problem while training the data?

Jimmyokok · 2024-05-26T10:02:06Z

Your problem on wget dataset is unknown and possibly more information is needed.
The "iteration over a 0-d tensor" issue is alien to me but someone else also encountered it.
I'm going to work on these two issues tomorrow.

jiangdie666 · 2024-05-26T12:19:32Z

I wonder if it's possible that my environment is not the same as yours, both of which I'm executing in the following environment
Python 3.10.13
pytorch==2.1.0
torchvision==0.16.0
torchaudio==2.1.0
pytorch-cuda=12.1
conda install -c dglteam/label/th21_cu121 dgl
Because I didn't find a good Deep Learning version to install when I installed your environment requirements dgl=1.0.0.
Or can you provide how you installed the 1.0.0 version of DGL?

Jimmyokok · 2024-05-27T03:08:17Z

I have tried to evaluate wget under your environment setting (pytorch==2.1.0 and dgl==2.0.0). I'm getting the same results as using dgl==1.0.0 both with and without the pre-trained pkls.

Jimmyokok · 2024-05-27T03:10:34Z

Did you obtain the graphs.pkl from parsing the raw logs or from the pkl provided by MAGIC?

jiangdie666 · 2024-05-27T03:22:44Z

I trained directly from your trained pkl this time and still had some problems with the results

Jimmyokok · 2024-05-27T03:27:02Z

What is your k (i.e. num_neighbors)? Using k == 1 on wget dataset could be the cause.

Jimmyokok · 2024-05-27T05:49:40Z

The "zero-dimension" error is simply a bug. Modifying loss, _ = model(g) to loss = model(g) fixes the bug.

jiangdie666 · 2024-05-27T06:33:14Z

Sorry for such a simple code question, I can't believe I forgot about it. I'm thankful that the training on the darpar data is now running successfully！

I didn't move k, but I think the code means that the default parameter is 2 if you don't change it.

Jimmyokok · 2024-05-27T06:37:26Z

Yes. I'm getting normal evaluation results when k == 2 but results like yours when k == 1.

jiangdie666 · 2024-05-27T06:43:30Z

I am very sorry, I tried to change the value of k, but the result is still strange, the score is a bit too weird.

Jimmyokok · 2024-05-27T06:46:15Z

If your graphs.pkl is not the provided one, make sure that node type in index 2 is 'task'.

jiangdie666 · 2024-05-27T07:29:50Z

I found the problem, before I said I used your original data, but I only used your own checkpoint.pkl, I forgot that graphs.pkl also comes with a graphs.zip compressed package, the following results are generated by the project's own checkpoint.pkl and my step-by-step data processed in accordance with the project from scratch own graphs.pkl.

Then I unzipped the graphs.zip and used the project's own pkl and its own checkpoint to achieve the expected results, I think it may be that there was a problem with the data processing in the beginning using the wget_parser.py script, or there was a problem with the call to load_rawdata function to generate the graphs.pkl which resulted in a result of There is a problem.

I'll look at it again myself, thanks for the reply

Jimmyokok · 2024-05-27T07:40:30Z

Does your version of graphs.pkl matches the size of the provided one? If not, what is your data source?
And most importantly, make sure that node type in index 2 is 'task', which is very important to the detection performance. If not, find the index for 'task' and modify line 28 of ./model/eval.py to out = pooler(g, out, [index_for_task]).cpu().numpy()

jiangdie666 · 2024-05-28T03:30:13Z

I retrained the dataset and added the category display code to the wget_parese code kind and found that the task is indeed at index 3. So I changed the index of the eval code as you said, but the result is still incorrect and my graphs.pkl is exactly the same size as the graphs.pkl in your zip. Very strange.

Jimmyokok · 2024-05-28T04:09:12Z

Is it possible that the order of the raw logs is different, which results in incorrect labeling during loaddata and triggers the shift in node type indices as a byproduct?

jiangdie666 · 2024-05-28T04:20:14Z

I just tried the indexes 0-7, and it still didn't work well. I'll start with downloading the data in the afternoon and try building it again. It's a really strange problem.

Jimmyokok · 2024-05-28T04:23:34Z

I forgot if attack logs should be the first 25 or the last 25 logs to be parsed, but this absolutely matters.

jiangdie666 · 2024-05-28T08:37:20Z

Your comment woke me up to the fact that I've been so obsessed with the fact that it wasn't my environment or code manipulation that was at fault that I forgot if there was a problem in processing the dataset in the first place. I found that the original code used the ls function directly when processing the 150 graph data, which may have resulted in the first 25 logs that were not processed corresponding to the ATTACK data. So I modified the code to prove that this was the problem.

This was modified before running

Here's the modified run

So the task index changed also because the data wasn't processed properly, and ultimately the eval code doesn't need to change, which is 2.

Thank you so much for answering my questions over and over again! 谢谢

SaraDadjouy · 2024-06-10T15:26:55Z

@jiangdie666 @Jimmyokok
Hello. Thank you for sharing. I had the same problem.

I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows:

I have done the Quick Evaluation and got the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9359999999999999
F1: 0.9056603768600924
PRECISION: 0.8571428571428571
RECALL: 0.96
TN: 21
FN: 1
TP: 24
FP: 4
#Test_AUC: 0.9360±0.0000

I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?

Jimmyokok · 2024-06-10T16:35:55Z

@jiangdie666 @Jimmyokok Hello. Thank you for sharing. I had the same problem.

I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows:

I have done the Quick Evaluation and got the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] Loading processed wget dataset... [n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] AUC: 0.9359999999999999 F1: 0.9056603768600924 PRECISION: 0.8571428571428571 RECALL: 0.96 TN: 21 FN: 1 TP: 24 FP: 4 #Test_AUC: 0.9360±0.0000

I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?

I have rerun the Quick Evaluation with exactly the same data, checkpoints and code as in this repository, which gives me this:
AUC: 0.96
F1: 0.9599999994999999
PRECISION: 0.96
RECALL: 0.96
TN: 24
FN: 1
TP: 24
FP: 1
#Test_AUC: 0.9600±0.0000
Then, I modified the code to repeat the evaluation with random seed 0 to 49 and report the average, which gives me this:
AUC: 0.952864+0.013846093456278552
F1: 0.9595209114984354+0.016390904351784117
PRECISION: 0.9663880341880342+0.031628609309857315
RECALL: 0.9536+0.018521339044464343
TN: 24.14+0.8248636250920511
FN: 1.16+0.463033476111609
TP: 23.84+0.463033476111609
FP: 0.86+0.8248636250920512
#Test_AUC: 0.9529±0.0138
This is extremely strange, since I have never seen as many as 4 FPs. Meanwhile, I'm sure I have n_neighbor == 2 which is standard setting, and I have tried these evaluations with PyTorch 1.x and 2.x respectively, which yield the same result.

Jimmyokok · 2024-06-10T16:37:16Z

With seed 2022, which aligns with the repository code, I'm getting this:
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test_AUC: 0.9616±0.0000

jiangdie666 · 2024-07-19T02:03:39Z

其实我用原始您xiang项目自带的那个checkponti-wget.pt FP的结果也依旧为4，远不及您上面的结果

Jimmyokok · 2024-07-19T02:17:08Z

尝试一下多个种子平均？说不定2022在其他设备上正好表现非常差？

jiangdie666 · 2024-07-22T00:58:12Z

用项目自带的 graghs.zip解压后的图处理文件 graphs.pkl结果是满足的，那就说明依旧是预处理wget的数据代码的部分有小bug。

m-shayan73 · 2024-10-11T11:14:11Z

Hello, thank you for pointing out the issues in the thread. I have corrected the file name issue and now the first 25 files processed are attack logs then next 125 files are normal logs. I am using the dependencies mentioned in the repo and k=2.

For wget dataset, the graph.pkl and checkpoint present in the repo give me the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test AUC: 0.9616±0.0000

However, when parse the raw logs, then train and evaluate, I get the following results (more similar to @SaraDadjouy):

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9184
F1: 0.8799999995
PRECISION: 0.88
RECALL: 0.88
TN: 22
FN: 3
TP: 22
FP: 3
#Test_AUC: 0.9184±0.0000

Would appreciate any guidance on what might be causing the difference in metrics. I have also tried using different seeds and am not getting better results.

Jimmyokok · 2024-10-11T12:18:18Z

@m-shayan73 I repeated my data processing again from scratch, and come to notice that the for loop at line 790 of wget_parser.py is not processing files in the desired order (first 25 being attack logs then next 125 files being normal logs). After sorting the filenames, the resulting new graphs.pkl gives me:
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test_AUC: 0.9616±0.0000

If re-train the model, the result becomes:
AUC: 0.9567999999999999
F1: 0.9599999994999999
PRECISION: 0.96
RECALL: 0.96
TN: 24
FN: 1
TP: 24
FP: 1
#Test_AUC: 0.9568±0.0000

You mentioned that you 'have corrected the file name issue and now the first 25 files processed are attack logs then next 125 files are normal logs', and still get unsatisfactory results. Does that mean you have already observed and corrected the above issue, and the problem still exists?

m-shayan73 · 2024-10-11T12:51:54Z

Yes, I have added the following code to sort the filenames

Jimmyokok · 2024-10-11T13:07:34Z

I will try your code then. My version simply sorts the filenames based on string order, which may be different from yours in the indices of node and edge types.

Jimmyokok · 2024-10-11T16:27:09Z

@m-shayan73 I'm able to reproduce your result. However, its a seed issue. Measuring average AUC on 100 random seeds gives me AUC~0.9450 under both file orders.

I have also found a trick that leads to improvement on the wget detection AUC (~0.9450 to ~0.9650), which I have just updated through the '1.0.6' commit.

m-shayan73 · 2024-10-12T06:23:44Z

Thanks a lot for the help. It was indeed a seed issue, I changed the seed in one file (./eval.py) but it was again being changed in another file (./model/eval.py). The results are better especially with the updates in 1.0.6.

Just to confirm, the change made is that instead of just using the task/index 2 nodes we are now pool over 5 different node types? Is there a reason to select these 5 node types instead of all 8/other combinations?

Jimmyokok · 2024-10-12T06:58:43Z

Thanks a lot for the help. It was indeed a seed issue, I changed the seed in one file (./eval.py) but it was again being changed in another file (./model/eval.py). The results are better especially with the updates in 1.0.6.

Just to confirm, the change made is that instead of just using the task/index 2 nodes we are now pool over 5 different node types? Is there a reason to select these 5 node types instead of all 8/other combinations?

The reason is that when I tested the full combination yesterday, it yields nan values in prediction results. I have just fixed the bug related to it and now the result is AUC~0.9750.

jiangdie666 closed this as completed May 28, 2024

Jimmyokok reopened this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about wget dataset results #13

Question about wget dataset results #13

jiangdie666 commented May 23, 2024

jiangdie666 commented May 23, 2024

Jimmyokok commented May 26, 2024

jiangdie666 commented May 26, 2024

Jimmyokok commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 28, 2024

Jimmyokok commented May 28, 2024

jiangdie666 commented May 28, 2024

Jimmyokok commented May 28, 2024

jiangdie666 commented May 28, 2024

SaraDadjouy commented Jun 10, 2024

Jimmyokok commented Jun 10, 2024

Jimmyokok commented Jun 10, 2024

jiangdie666 commented Jul 19, 2024

Jimmyokok commented Jul 19, 2024

jiangdie666 commented Jul 22, 2024

m-shayan73 commented Oct 11, 2024

Jimmyokok commented Oct 11, 2024 •

edited

Loading

m-shayan73 commented Oct 11, 2024 •

edited

Loading

Jimmyokok commented Oct 11, 2024

Jimmyokok commented Oct 11, 2024

m-shayan73 commented Oct 12, 2024 •

edited

Loading

Jimmyokok commented Oct 12, 2024

Question about wget dataset results #13

Question about wget dataset results #13

Comments

jiangdie666 commented May 23, 2024

jiangdie666 commented May 23, 2024

Jimmyokok commented May 26, 2024

jiangdie666 commented May 26, 2024

Jimmyokok commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 27, 2024

Jimmyokok commented May 27, 2024

jiangdie666 commented May 28, 2024

Jimmyokok commented May 28, 2024

jiangdie666 commented May 28, 2024

Jimmyokok commented May 28, 2024

jiangdie666 commented May 28, 2024

SaraDadjouy commented Jun 10, 2024

Jimmyokok commented Jun 10, 2024

Jimmyokok commented Jun 10, 2024

jiangdie666 commented Jul 19, 2024

Jimmyokok commented Jul 19, 2024

jiangdie666 commented Jul 22, 2024

m-shayan73 commented Oct 11, 2024

Jimmyokok commented Oct 11, 2024 • edited Loading

m-shayan73 commented Oct 11, 2024 • edited Loading

Jimmyokok commented Oct 11, 2024

Jimmyokok commented Oct 11, 2024

m-shayan73 commented Oct 12, 2024 • edited Loading

Jimmyokok commented Oct 12, 2024

Jimmyokok commented Oct 11, 2024 •

edited

Loading

m-shayan73 commented Oct 11, 2024 •

edited

Loading

m-shayan73 commented Oct 12, 2024 •

edited

Loading