-
Ensemble framework of some log based anomaly detection work.
-
It is the basic thought with feature engineering to analyse raw logs and finally report the potential malicious logs based on a series of processings.
- dvc experiments
- dvc dags
- convert the logs to structured pandas framework
- extract the log keys from raw logs
- analyse the log key exeuction path
- analyse the paramaters in log key
- analyse the time series data generated from window size and time interval by PCA.
- online learning for feedbacks
For the dataset, I have given some examples and you can put your own data into that folder.
# in order to match the libraries versions, please run and build the project in virtual environment
virtualenv env
pip3 install -r requirement.txt
When the data format is in csv, we need translate them into txt files and split them into batches.
python3 csv_txt_trans.py
You will get notice on inputing the source location and output location.
we use the logparser tool to transform the source txt log files into structured csv files under a folder, the folder is named by the start and end time. (Find the Lenma_demo under the logparser/logparser/demo)
(use Lenma_demo.py with python2) ---> The python3 version is not provided here. You need to set the locations first:
input_dir = '../../Dataset/Linux/Clear/' # set the location to yours
output_dir = '../../Dataset/Linux/Clear_Separate_Structured_Logs/' # set the location to yours
Then you can execute the demo file with python 2.x:
python Lenma_demo.py
In the stage, we calculate the EventTemplate for every log.
The log_value_vector.py will be used to generate the csv file, which will be used to implement the anomaly detection later.
(and has been integrated into models already in demo)
Basiclly, we have two modules for DeepLog
-
Whereas, before implementing the modules, we will first see whether there is obvious malicious logs, we will report them first.
-
After that, we will first implement execution path anomaly detection with Execution_Path_Anomaly.py
-
Finally, we will implement parameter values anomaly detection with Parameter_value_performance_anomaly.py
-
As a plus, there is the ML model using PCA in loglizer.
# go to the folder of model
python3 Execution_Path_Anomaly.py
# go to the folder of model
python3 Parameter_Value_Vector.py
- The model is based on off-line work, the online real-time detection is not available.
- The loglizer and logparser are open source tools, author's rights are reserved.
- I enriched the two tools in the project, notice the differences from the original version.
1.Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
2.DeepLog: Anomaly Detection and Diagnosis from System Logs
3.Incremental Construction of LSTM Recurrent Neural Network