Skip to content

使用Bert做embedding,结合BiLSTM做恶意软件的多分类任务

Notifications You must be signed in to change notification settings

bitterzzZZ/Bert-malware-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bert-malware-classification

使用Bert做embedding,结合BiLSTM做恶意软件的多分类任务

执行步骤

1. 在当前目录创建save_tensor、save_model、bert目录。下载bert模型,并放在./bert/目录下

# vocab 文件下载
'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt",
'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",
'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt",

# 预训练模型参数下载
'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",
'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",
'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",
'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz",
'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz",
'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz",
'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",

2. 解压数据集文件 解压mal-api-2019.zip到当前目录,数据集来自于https://github.com/ocatak/malware_api_class

3. 安装依赖

torch
pytorch_pretrained_bert
scikit-learn
numpy
tqdm

4. 运行程序 python3 class.py

说明

这里只是提供一个应用bert做文本分类的demo,如有错误,还望见谅指正

About

使用Bert做embedding,结合BiLSTM做恶意软件的多分类任务

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages