diff --git a/content/all_datasets.md b/content/all_datasets.md index 37ad122..8e91263 100644 --- a/content/all_datasets.md +++ b/content/all_datasets.md @@ -9,8 +9,8 @@ before-content: gh_buttons.html | Name | Network/Host Data | TL;DR | Year | Setting | OS Type | Labeled?ยน | Data Type/Source | Packed Size | Unpacked Size | |----------------------------------------------------------------------------------------------------|:-----------------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------:|---------------|-----------------------|:---------:|--------------------------------------------------------------------------------------|------------:|--------------:| | [AIT Alert Dataset](../datasets/ait_alert_dataset) | Both | Alerts generated from the AIT log dataset, including labels. Only caveat is the lack of Windows machines | 2023 | Enterprise IT | Linux | ๐ŸŸฉ | Wazuh, Suricata and AMiner alerts | 96 MB | 2,9 GB | -| [AIT Log Dataset](../datasets/ait_log_dataset) | Both | Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines | 2023 | Enterprise IT | Linux | ๐ŸŸฉ | pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) | 130 GB | 206 GB | | [OTFR Security Datasets - LSASS Campaign](../datasets/otfr_lsass_campaign) | Both | Very small simulation focusing on exploiting Windows' LSASS.exe. Lacking documentation, no labels and no user behavior | 2023 | Single OS | Windows | ๐ŸŸฅ | pcaps, Windows events, Zeek logs | 423 MB | 1 GB | +| [AIT Log Dataset](../datasets/ait_log_dataset) | Both | Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines | 2022 | Enterprise IT | Linux | ๐ŸŸฉ | pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) | 130 GB | 206 GB | | [CLUE-LDS](../datasets/clue_lds) | Host | Database of real user behavior without known attacks, for evaluation of methods detecting shifts in user behavior | 2022 | Subsystem | Undisclosed | ๐ŸŸฅ | Custom event logs | 640 MB | 14,9 GB | | [EVTX to MITRE ATT&CK](../datasets/evtx_to_mitre_attck) | Host | Small dataset providing various events corresponding to certain MITRE tactics/techniques | 2022 | Single OS | Windows | ๐ŸŸฉ | Windows events | <1 GB | <1 GB | | [OTFR Security Datasets - Atomic](../datasets/otfr_atomic) | Both | Various small datasets, each corresponding to a specific MITRE tactic/technique. Lacks user simulation / underlying scenario and does not provide explicit labels | 2019-2022 | Single OS | Windows, Linux, Cloud | ๐ŸŸจ | pcaps, Windows events, auditd logs, AWS CloudTrail logs | 125 MB | - | @@ -23,7 +23,7 @@ before-content: gh_buttons.html | [DAPT 2020](../datasets/dapt2020) | Both | Focuses on attacks mimicking those of an APT group, executed in a rather small environment | 2020 | Enterprise IT | Undisclosed | ๐ŸŸฉ | NetFlows, misc. logs (DNS, syslog, auditd, apache, auth, various services) | 460 MB | - | | [OpTC](../datasets/optc) | Both | Huge amount of data and interesting attacks, but possibly hard to use due to uncommon event format and requiring semi-manual labeling | 2020 | Enterprise IT | Windows | ๐ŸŸจ | Custom event logs, Zeek events | - | 1 TB | | [OTFR Security Datasets - APT 29](../datasets/otfr_apt_29) | Both | Replication of APT29 evaluation developed by MITRE. Well made and documented, but without labels or user behavior | 2020 | Enterprise IT | Windows, Linux | ๐ŸŸฅ | pcaps, Windows events, Zeek events | 126 MB | 2 GB | -| [CIC-DDoS2019](../datasets/cic_ddos) | Network | Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows | 2019 | Enterprise IT | Windows, Linux | ๐ŸŸฉ | Pcaps, NetFlows, Windows events, Ubuntu events | 24,4 GB | - | +| [CICDDoS2019](../datasets/cic_ddos) | Network | Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows | 2019 | Enterprise IT | Windows, Linux | ๐ŸŸฉ | Pcaps, NetFlows, Windows events, Ubuntu events | 24,4 GB | - | | [DARPA TC5](../datasets/darpa_tc5) | Host | Custom event logs from network under attack from APT groups, designed to facilitate provenance tracking | 2019 | Undisclosed | Undisclosed | ๐ŸŸจ | Custom event logs | - | - | | [LID-DS 2019](../datasets/lids_ds_2019) | Host | Contains system calls + associated data/metadata for a variety of Linux exploits, includes normal behavior | 2019 | Single OS | Linux | ๐ŸŸจ | Sequences of syscalls with extended information | 13 GB | - | | [OTFR Security Datasets - APT 3](../datasets/otfr_apt_3) | Host | Replication of APT3 evaluation developed by MITRE. Lacking documentation, no labels and no user behavior | 2019 | Enterprise IT | Windows, Linux | ๐ŸŸฅ | Windows events | 30 MB | 855 MB | diff --git a/content/datasets/cic_ddos.md b/content/datasets/cic_ddos.md index 548b1c8..3b22015 100644 --- a/content/datasets/cic_ddos.md +++ b/content/datasets/cic_ddos.md @@ -1,5 +1,5 @@ --- -title: CIC-DDos2019 +title: CICDDos2019 --- - [Overview](#overview) @@ -32,7 +32,7 @@ title: CIC-DDos2019 *** ### Overview -The CIC-DDos2019 dataset, developed by the Canadian Institute for Cybersecurity (CIC), was created to enable evaluation of new DDoS detection methods, which, according to the authors, was not possible with previously existing datasets containing DDoS attacks. +The CICDDos2019 dataset, developed by the Canadian Institute for Cybersecurity (CIC), was created to enable evaluation of new DDoS detection methods, which, according to the authors, was not possible with previously existing datasets containing DDoS attacks. The dataset is accompanied by a newly proposed taxonomy for DDoS attacks, dividing them into several subclasses. These attacks are then executed within a small testbed, consisting of a victim network performing benign behavior and a separate attacker network. This simulation was run on two separate days, namely training and testing day; diff --git a/content/datasets/otfr_apt_29.md b/content/datasets/otfr_apt_29.md index 9877812..c3aa9c7 100644 --- a/content/datasets/otfr_apt_29.md +++ b/content/datasets/otfr_apt_29.md @@ -7,7 +7,6 @@ title: OTFR APT 29 - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data Examples](#data-examples) | | | @@ -66,10 +65,6 @@ However, they are not labeled. - [MITRE Emulation Plan Day 1](https://github.com/mitre-attack/attack-arsenal/tree/master/adversary_emulation/APT29/Emulation_Plan/Day%201) - [MITRE Emulation Plan Day 2](https://github.com/mitre-attack/attack-arsenal/tree/master/adversary_emulation/APT29/Emulation_Plan/Day%202) -### Related Entries - -- [OTFR Security Datasets](../collections/security_datasets.md) - ### Data Examples Snippet of day 1 Windows event logs taken from `apt29/day1/apt29_evals_day1_manual_2020-05-01225525.json` diff --git a/content/datasets/otfr_apt_3.md b/content/datasets/otfr_apt_3.md index 812860b..95be9ef 100644 --- a/content/datasets/otfr_apt_3.md +++ b/content/datasets/otfr_apt_3.md @@ -7,7 +7,6 @@ title: OTFR APT 3 - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data Examples](#data-examples) | | | @@ -67,10 +66,6 @@ Labels are not provided. - [APT3 Info Page](https://attackevals.mitre-engenuity.org/enterprise/apt3/) - [APT3 Environment](https://attackevals.mitre-engenuity.org/enterprise/apt3/environment) -### Related Entries - -- [OTFR Security Datasets](../collections/security_datasets.md) - ### Data Examples Snippet of winlogbeat events from CALDERA-based dataset, taken diff --git a/content/datasets/otfr_atomic.md b/content/datasets/otfr_atomic.md index faed106..4a6ed32 100644 --- a/content/datasets/otfr_atomic.md +++ b/content/datasets/otfr_atomic.md @@ -7,7 +7,6 @@ title: OTFR Atomic Security Datasets - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data Examples](#data-examples) | | | @@ -72,10 +71,6 @@ None go into detail regarding any labeling, presumably expecting all logs to be - [Linux Datasets](https://securitydatasets.com/notebooks/atomic/linux/intro.html) - [Windows Datasets](https://securitydatasets.com/notebooks/atomic/windows/intro.html) -### Related Entries - -- [OTFR Security Datasets](../collections/security_datasets.md) - ### Data Examples Metadata for the scenario "Stopping Event Log Service via Modification of Start Up Type" ( diff --git a/content/datasets/otfr_golden_saml.md b/content/datasets/otfr_golden_saml.md index bddc16d..7d88b20 100644 --- a/content/datasets/otfr_golden_saml.md +++ b/content/datasets/otfr_golden_saml.md @@ -7,7 +7,6 @@ title: SimuLand Golden SAML Dataset - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data](#data) | | | @@ -68,10 +67,6 @@ Labels are not provided, presumably expecting all logs to be considered maliciou - [AAD Hybrid Identity: AD FS Environment](https://github.com/Azure/SimuLand/blob/main/docs/environments/aadHybridIdentityADFS/README.md) - [(Blog Post) Sharing the first SimuLand dataset to expedite research and learn about adversary trade-craft](https://www.microsoft.com/en-us/security/blog/2021/08/05/sharing-the-first-simuland-dataset-to-expedite-research-and-learn-about-adversary-tradecraft/) -### Related Entries - -- [SimuLand](../frameworks/simuland.md) - ### Data AAD audit events diff --git a/content/datasets/otfr_log4shell.md b/content/datasets/otfr_log4shell.md index f355464..184ec1a 100644 --- a/content/datasets/otfr_log4shell.md +++ b/content/datasets/otfr_log4shell.md @@ -7,7 +7,6 @@ title: OTFR Log4Shell - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data Examples](#data-examples) | | | @@ -59,10 +58,6 @@ Their origin is not documented, labeling information in general is also not prov - [GitHub](https://github.com/OTRF/Security-Datasets/tree/master/datasets/compound/Log4Shell) - [Setup Docs](https://github.com/Cyb3rWard0g/log4jshell-lab/blob/main/research-notes/2021-12-11_01-CVE-2021-44228-simulation.md) -### Related Entries - -- [OTFR Security Datasets](../collections/security_datasets.md) - ### Data Examples Snippet of Windows Sysmon logs taken diff --git a/content/datasets/otfr_lsass_campaign.md b/content/datasets/otfr_lsass_campaign.md index d6c4835..0b1fcd9 100644 --- a/content/datasets/otfr_lsass_campaign.md +++ b/content/datasets/otfr_lsass_campaign.md @@ -7,7 +7,6 @@ title: OTFR LSASS Campaign - [Activity](#activity) - [Contained Data](#contained-data) - [Links](#links) -- [Related Entries](#related-entries) - [Data Examples](#data-examples) | | | @@ -61,10 +60,6 @@ Labels are not provided. - [Metadata](https://github.com/OTRF/Security-Datasets/tree/master/datasets/compound/_metadata) - [Datasets](https://github.com/OTRF/Security-Datasets/tree/master/datasets/compound) -### Related Entries - -- [OTFR Security Datasets](../collections/security_datasets.md) - ### Data Examples Snippet of Windows events taken from `datasets/compound/lsass_campaign_01/metasploit_procdump_lsass_memory_dump.json` diff --git a/content/datasets/unraveled.md b/content/datasets/unraveled.md index 1647234..d7220ac 100644 --- a/content/datasets/unraveled.md +++ b/content/datasets/unraveled.md @@ -102,6 +102,7 @@ Notably, only very few of the provided processed host log files seem to contain - [Homepage on Gitlab](https://gitlab.com/asu22/unraveled) - [Raw Data Download Guide](https://dapt2021.s3.amazonaws.com/README.txt) + - While not included in this guide, consider using the `--no-sign-request` option to avoid having to provide AWS credentials ### Related Entries