What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76

Shine21497 · 2020-08-08T08:15:04Z

Hi, I tried using full HDFS log data to reproduce benchmarking results, I use logparser/Drain to get the full HDFS.log_structed.csv, which has the same structure with HDFS_100k.log_structed.csv. I load the full HDFS.log_structed.csv and label file in HDFS_benchmark.py, just like you did in demo, but the results of PCA and IM are very different from the results showed in readme.(LR,SVM,DT results are similar)
It seems that the data in HDFS.npz are different from the data generated from the full HDFS.log_structed.csv using the load_HDFS function. Even if I get the HDFS.npz, it's still hard to use without knowing this difference.
Many thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76

What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76

Shine21497 commented Aug 8, 2020

What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76

What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76

Comments

Shine21497 commented Aug 8, 2020