Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about wget dataset results #13

Open
jiangdie666 opened this issue May 23, 2024 · 32 comments
Open

Question about wget dataset results #13

jiangdie666 opened this issue May 23, 2024 · 32 comments

Comments

@jiangdie666
Copy link

I have evaluated the original dataset using your project's code to train the generated pkl, and also the trained pkl that comes with your project, respectively. But the results are not satisfactory, is it because I didn't set other parameter details.
Raw data evaluation results from my own training

python3 eval.py --dataset wget --device 0
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.41680000000000006
F1: 0.6666666662222221
PRECISION: 0.5
RECALL: 1.0
TN: 0
FN: 0
TP: 25
FP: 25
#Test_AUC: 0.4168±0.0000

This is the pkl that comes with your project.

python3 eval.py --dataset wget --device 0
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.47440000000000004
F1: 0.6666666662222221
PRECISION: 0.5
RECALL: 1.0
TN: 0
FN: 0
TP: 25
FP: 25
#Test_AUC: 0.4744±0.0000
@jiangdie666
Copy link
Author

Sorry for also reporting the zero-dimensional tensor error while training the darpar dataset, have you had this problem while training the data?
image

@Jimmyokok
Copy link
Collaborator

  • Your problem on wget dataset is unknown and possibly more information is needed.
  • The "iteration over a 0-d tensor" issue is alien to me but someone else also encountered it.
    I'm going to work on these two issues tomorrow.

@jiangdie666
Copy link
Author

I wonder if it's possible that my environment is not the same as yours, both of which I'm executing in the following environment
Python 3.10.13
pytorch==2.1.0
torchvision==0.16.0
torchaudio==2.1.0
pytorch-cuda=12.1
conda install -c dglteam/label/th21_cu121 dgl
Because I didn't find a good Deep Learning version to install when I installed your environment requirements dgl=1.0.0.
Or can you provide how you installed the 1.0.0 version of DGL?

@Jimmyokok
Copy link
Collaborator

I have tried to evaluate wget under your environment setting (pytorch==2.1.0 and dgl==2.0.0). I'm getting the same results as using dgl==1.0.0 both with and without the pre-trained pkls.

@Jimmyokok
Copy link
Collaborator

Did you obtain the graphs.pkl from parsing the raw logs or from the pkl provided by MAGIC?

@jiangdie666
Copy link
Author

image
I trained directly from your trained pkl this time and still had some problems with the results

@Jimmyokok
Copy link
Collaborator

What is your k (i.e. num_neighbors)? Using k == 1 on wget dataset could be the cause.

@Jimmyokok
Copy link
Collaborator

The "zero-dimension" error is simply a bug. Modifying loss, _ = model(g) to loss = model(g) fixes the bug.

@jiangdie666
Copy link
Author

Sorry for such a simple code question, I can't believe I forgot about it. I'm thankful that the training on the darpar data is now running successfully!
image
I didn't move k, but I think the code means that the default parameter is 2 if you don't change it.
image

@Jimmyokok
Copy link
Collaborator

Yes. I'm getting normal evaluation results when k == 2 but results like yours when k == 1.

@jiangdie666
Copy link
Author

image
I am very sorry, I tried to change the value of k, but the result is still strange, the score is a bit too weird.

@Jimmyokok
Copy link
Collaborator

If your graphs.pkl is not the provided one, make sure that node type in index 2 is 'task'.

@jiangdie666
Copy link
Author

I found the problem, before I said I used your original data, but I only used your own checkpoint.pkl, I forgot that graphs.pkl also comes with a graphs.zip compressed package, the following results are generated by the project's own checkpoint.pkl and my step-by-step data processed in accordance with the project from scratch own graphs.pkl.
image
Then I unzipped the graphs.zip and used the project's own pkl and its own checkpoint to achieve the expected results, I think it may be that there was a problem with the data processing in the beginning using the wget_parser.py script, or there was a problem with the call to load_rawdata function to generate the graphs.pkl which resulted in a result of There is a problem.
image
I'll look at it again myself, thanks for the reply

@Jimmyokok
Copy link
Collaborator

Does your version of graphs.pkl matches the size of the provided one? If not, what is your data source?
And most importantly, make sure that node type in index 2 is 'task', which is very important to the detection performance. If not, find the index for 'task' and modify line 28 of ./model/eval.py to out = pooler(g, out, [index_for_task]).cpu().numpy()

@jiangdie666
Copy link
Author

I retrained the dataset and added the category display code to the wget_parese code kind and found that the task is indeed at index 3. So I changed the index of the eval code as you said, but the result is still incorrect and my graphs.pkl is exactly the same size as the graphs.pkl in your zip. Very strange.
image
image

@Jimmyokok
Copy link
Collaborator

Is it possible that the order of the raw logs is different, which results in incorrect labeling during loaddata and triggers the shift in node type indices as a byproduct?

@jiangdie666
Copy link
Author

I just tried the indexes 0-7, and it still didn't work well. I'll start with downloading the data in the afternoon and try building it again. It's a really strange problem.

@Jimmyokok
Copy link
Collaborator

I forgot if attack logs should be the first 25 or the last 25 logs to be parsed, but this absolutely matters.

@jiangdie666
Copy link
Author

Your comment woke me up to the fact that I've been so obsessed with the fact that it wasn't my environment or code manipulation that was at fault that I forgot if there was a problem in processing the dataset in the first place. I found that the original code used the ls function directly when processing the 150 graph data, which may have resulted in the first 25 logs that were not processed corresponding to the ATTACK data. So I modified the code to prove that this was the problem.
image
This was modified before running
3e04530785192599491089126c7cd32
Here's the modified run
d27cba43b5c527cfb23c87eb24e187b
So the task index changed also because the data wasn't processed properly, and ultimately the eval code doesn't need to change, which is 2.
image
Thank you so much for answering my questions over and over again! 谢谢

@SaraDadjouy
Copy link

@jiangdie666 @Jimmyokok
Hello. Thank you for sharing. I had the same problem.

I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows:
Screenshot 2024-06-10 184907

I have done the Quick Evaluation and got the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
Loading processed wget dataset...
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9359999999999999
F1: 0.9056603768600924
PRECISION: 0.8571428571428571
RECALL: 0.96
TN: 21
FN: 1
TP: 24
FP: 4
#Test_AUC: 0.9360±0.0000

I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?

@Jimmyokok
Copy link
Collaborator

@jiangdie666 @Jimmyokok Hello. Thank you for sharing. I had the same problem.

I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows: Screenshot 2024-06-10 184907

I have done the Quick Evaluation and got the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] Loading processed wget dataset... [n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] AUC: 0.9359999999999999 F1: 0.9056603768600924 PRECISION: 0.8571428571428571 RECALL: 0.96 TN: 21 FN: 1 TP: 24 FP: 4 #Test_AUC: 0.9360±0.0000

I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?

I have rerun the Quick Evaluation with exactly the same data, checkpoints and code as in this repository, which gives me this:
AUC: 0.96
F1: 0.9599999994999999
PRECISION: 0.96
RECALL: 0.96
TN: 24
FN: 1
TP: 24
FP: 1
#Test_AUC: 0.9600±0.0000
Then, I modified the code to repeat the evaluation with random seed 0 to 49 and report the average, which gives me this:
AUC: 0.952864+0.013846093456278552
F1: 0.9595209114984354+0.016390904351784117
PRECISION: 0.9663880341880342+0.031628609309857315
RECALL: 0.9536+0.018521339044464343
TN: 24.14+0.8248636250920511
FN: 1.16+0.463033476111609
TP: 23.84+0.463033476111609
FP: 0.86+0.8248636250920512
#Test_AUC: 0.9529±0.0138
This is extremely strange, since I have never seen as many as 4 FPs. Meanwhile, I'm sure I have n_neighbor == 2 which is standard setting, and I have tried these evaluations with PyTorch 1.x and 2.x respectively, which yield the same result.

@Jimmyokok Jimmyokok reopened this Jun 10, 2024
@Jimmyokok
Copy link
Collaborator

With seed 2022, which aligns with the repository code, I'm getting this:
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test_AUC: 0.9616±0.0000

@jiangdie666
Copy link
Author

其实我用原始 您xiang项目自带的那个checkponti-wget.pt FP的结果也依旧为4,远不及您上面的结果
image

@Jimmyokok
Copy link
Collaborator

尝试一下多个种子平均?说不定2022在其他设备上正好表现非常差?

@jiangdie666
Copy link
Author

用项目自带的 graghs.zip解压后的图处理文件 graphs.pkl结果是满足的,那就说明依旧是预处理wget的数据代码的部分有小bug。

@m-shayan73
Copy link

Hello, thank you for pointing out the issues in the thread. I have corrected the file name issue and now the first 25 files processed are attack logs then next 125 files are normal logs. I am using the dependencies mentioned in the repo and k=2.

For wget dataset, the graph.pkl and checkpoint present in the repo give me the following results:

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test AUC: 0.9616±0.0000

However, when parse the raw logs, then train and evaluate, I get the following results (more similar to @SaraDadjouy):

[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4]
AUC: 0.9184
F1: 0.8799999995
PRECISION: 0.88
RECALL: 0.88
TN: 22
FN: 3
TP: 22
FP: 3
#Test_AUC: 0.9184±0.0000

Would appreciate any guidance on what might be causing the difference in metrics. I have also tried using different seeds and am not getting better results.

@Jimmyokok
Copy link
Collaborator

Jimmyokok commented Oct 11, 2024

@m-shayan73 I repeated my data processing again from scratch, and come to notice that the for loop at line 790 of wget_parser.py is not processing files in the desired order (first 25 being attack logs then next 125 files being normal logs). After sorting the filenames, the resulting new graphs.pkl gives me:
AUC: 0.9616
F1: 0.9795918362349021
PRECISION: 1.0
RECALL: 0.96
TN: 25
FN: 1
TP: 24
FP: 0
#Test_AUC: 0.9616±0.0000

If re-train the model, the result becomes:
AUC: 0.9567999999999999
F1: 0.9599999994999999
PRECISION: 0.96
RECALL: 0.96
TN: 24
FN: 1
TP: 24
FP: 1
#Test_AUC: 0.9568±0.0000

You mentioned that you 'have corrected the file name issue and now the first 25 files processed are attack logs then next 125 files are normal logs', and still get unsatisfactory results. Does that mean you have already observed and corrected the above issue, and the problem still exists?

@m-shayan73
Copy link

m-shayan73 commented Oct 11, 2024

Yes, I have added the following code to sort the filenames

image

@Jimmyokok
Copy link
Collaborator

I will try your code then. My version simply sorts the filenames based on string order, which may be different from yours in the indices of node and edge types.

@Jimmyokok
Copy link
Collaborator

@m-shayan73 I'm able to reproduce your result. However, its a seed issue. Measuring average AUC on 100 random seeds gives me AUC~0.9450 under both file orders.

I have also found a trick that leads to improvement on the wget detection AUC (~0.9450 to ~0.9650), which I have just updated through the '1.0.6' commit.

@m-shayan73
Copy link

m-shayan73 commented Oct 12, 2024

Thanks a lot for the help. It was indeed a seed issue, I changed the seed in one file (./eval.py) but it was again being changed in another file (./model/eval.py). The results are better especially with the updates in 1.0.6.

Just to confirm, the change made is that instead of just using the task/index 2 nodes we are now pool over 5 different node types? Is there a reason to select these 5 node types instead of all 8/other combinations?

@Jimmyokok
Copy link
Collaborator

Thanks a lot for the help. It was indeed a seed issue, I changed the seed in one file (./eval.py) but it was again being changed in another file (./model/eval.py). The results are better especially with the updates in 1.0.6.

Just to confirm, the change made is that instead of just using the task/index 2 nodes we are now pool over 5 different node types? Is there a reason to select these 5 node types instead of all 8/other combinations?

The reason is that when I tested the full combination yesterday, it yields nan values in prediction results. I have just fixed the bug related to it and now the result is AUC~0.9750.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants