AFL synchronization mode #327

GrosQuildu · 2020-02-19T12:45:47Z

Currently, AFL executor (deepstate-afl) always runs the fuzzer in master mode (-M argument). Code here.

We should detect when to run it in slave mode (-S). Thus not running deterministic step. Consult AFL documentation (parallel_fuzzing.txt) for more informations.

The text was updated successfully, but these errors were encountered:

Travmatth · 2020-03-15T02:19:19Z

So I've looked into this some ,with an eye towards proposing a pull request to close the issue. From AFL's parallel_fuzzing.txt

The difference between the -M and -S modes is that the master instance will
still perform deterministic checks; while the secondary instances will
proceed straight to random tweaks. If you don't want to do deterministic
fuzzing at all, it's OK to run all instances with -S. With very slow or complex
targets, or when running heavily parallelized jobs, this is usually a good plan.

As I understand it, DeepState does not support running parallelized individual fuzzers, so a normal deepstate-afl job should run afl in master mode. The only time I could imagine running afl in slave mode could possibly be in the ensembler, although I could also imagine allowing a user to set a deepstate-afl --afl-nonderministic option to give them the flexibility. If there was some heuristic we could use to determine the slowness or complexity of the target we could configure deepstate-afl to call afl in slave mode automatically (is there such a heuristic that could be used?).
Proposal:
Add a --afl-nondeterministic flag to deepstate-afl and deepstate-ensemble that would allow user to run afl in slave mode. If there is some heuristic that can be used to determine targets slowness or complexity, use it to enable this option automatically.

GrosQuildu · 2020-03-15T20:04:11Z

I think we shouldn't add new flag --afl-nondeterministic, unnecessary complication. Although we may detect if user specify -S in fuzzer_args. But idk, maybe the option would be better...

Anyway, implementing ensembler first is a priority afaik. Then we can auto-detect when to use -S in the ensembler only and leave default -M in single AFL.

If there was some heuristic we could use to determine the slowness or complexity of the target we could configure deepstate-afl to call afl in slave mode automatically (is there such a heuristic that could be used?)

Don't know any, but would be useful.

jscriven-digimarc · 2020-03-24T20:23:46Z

@GrosQuildu - My team has implemented an ensemble afl class for deepstate that we currently use for fleet fuzzing with AFL. It toggles -M and -S depending on args. It also uses tgz files for syncing, just like the afl parallel fuzzing guide suggests. Are you interested in this as a pull request?

GrosQuildu · 2020-03-24T22:09:12Z

Its rather question for @agroce. But I think yeah, we may see the PR.

agroce · 2020-03-24T22:39:09Z

Sounds interesting for sure!

jscriven-digimarc · 2020-03-25T23:52:04Z

Ok, our team is wrapping up work on our fuzzer system for now, I'll put it in as a pull req for your consideration at the end of the week.

agroce · 2020-04-04T19:49:21Z

Let us know when this is ready!

jscriven-digimarc · 2020-04-06T22:19:04Z

Sorry for the delay. You may want to take a quick look at this commit:
jscriven-digimarc@82b80b1

This contains the afl-ensemble class we use. A few notes/thoughts for review.
This system relies on a network share that contains a tgz file for each fuzzer node. A node is machine that may run multiple fuzzers. Each tgz must contain the queue and fuzzer stats for each 'fuzzer' on each node. We run 4 nodes with 4 fuzzers (all in -S mode). Each node must have a primary fuzzer (fuzzer id 0). This will do the seed synchronization using tgz files. We rely on AFLs builtin ability to pull interesting seeds from neighboring queues. This system will extract all other fuzzer nodes queues and extract them alongside the fuzzers that are running. The running fuzzers will steal seeds from their neighbors and add interesting ones.
We have some scripting around this that drives deepstate. It runs N instances of deepstate on each node on a separate thread. We also have scripting that runs a minimization cycle on all queues on the share. I would be happy to answer any questions for you guys, let me know if you really would like a pull req for this and i can tidy up the code (add headers, etc).

GrosQuildu · 2020-04-08T12:28:25Z

Added a few comments in your repo.

In general, there is too much code repetition imho. New class should inherit from AFL or AFL class should be extended.

Arguments fuzzer_node_id and fuzzer_id seems to be redundant. Idea was that every fuzzer instance have its own output_dir, to there is no need to maintain ids. And node (machine) should be handled implicitly, maybe name stuff that is send across network "local/remote" or like that? So only ensemble_mode seems necessary. And we may add it under ensemble_group args and name it like is_master?

Also have some concerns about directory structure. Each fuzzer outdir will have (after sync cycle): its own directory (fuzzername_node_id) and one directory for each other node. The idea was to push all testcases that a fuzzer should sync under SYNC_DIR.

As for tars: afl docs suggests to pack testcases and send them via ssh or something like that. To avoid using of network shares. Now the tars seems to be unnecessary complication if sync_dir is a network share.

So in current state it's in a direction of how ensembler could be, but for AFL only. To merge it, we should handle all supported fuzzers, just replacing current ensembler class ;) (plus comments above ofc).

jscriven-digimarc · 2020-04-08T22:04:02Z

Thanks Pawel for the feedback.

I will explain a few details that may help. Regarding code repetition, i agree. If the deepstate team was strongly considering merging this class, I would replace the existing afl.py with this one and merge some of the existing checks and behaviors into this class. For example, the compile step is removed in the ensemble version, but could easily be replaced. In our code base, we simply use this one instead. It can be used to run a single fuzzer or several.

Arguments fuzzer_node_id and fuzzer_id are not really redundant. The problem this solves is having multiple fuzzers on the same 'node' each with a distinct queue. The original afl class sets up a local sync directory (output directory), but hardcodes 'the_fuzzer' as the name of every fuzzer on that system. In order to share queues between running fuzzers, they must share a local sync directory, but have a unique name in the subdirectory with its queue and fuzzer stats. The fuzzer_id allows this. The node ID, allows queues to be synched across fuzzer nodes. For example, node 0 will pull the queues for node 1, and 2 and extract them to a sync dir. The seeds must remain in individual fuzzer queues. So after a sync cycle, the local sync_dir will look like this:
/blah/blah/workspace/myLocal/SyncDir/fuzzer_0_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_0_1/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_1_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_1_1/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_2_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_2_1/

each fuzzer directory has a queue and fuzzer stats. On this system, there are 2 fuzzers running (node id 0, fuzzer id 1 and fuzzer id 2). The other queues are automatically pulled from by afl. fuzzers on any node cannot shared a queue. Every 'output_dir' is the same on all fuzzers and nodes (/blah/blah/workspace/myLocal/SyncDir).

The original ensembler code was attempting to synchronize seeds, not queues. This is not consistent with how afl generates new test cases from seeds. Once the fuzzer is running, you cannot simply drop seeds into the existing queues of running fuzzers without re-naming them following the correct scheme (id:00xxxx,...), otherwise you have to restart the fuzzer to allow it to re-order the input seeds. By sharing queues, the fuzzers pull from their neighbors when they find interesting new seeds. From this perspective, this class doesn't really follow model of the other "ensemble" fuzzers, so yes, you are correct, this strategy is 'AFL' only.

I am happy to help code up any direction the team would like to take, but please note my commit on the branch was meant to help others in a similar situation to my team (run a whole bunch of AFL fuzzers on one target, share seeds, and use deepstate to drive it). As an aside, our system works great! =)

Also regarding tars, it was just a convenient mechanism for quick file transfer. We began with rsync (like deepstate ensemble) but ran into several issues with missing and extraneous seeds. We also have routines (outside of deepstate) that run minimize and other seed analysis, so it worked well within our infrastructure.

So, all that said, is this a good candidate to just replace the afl class? Or is there work to do with the core ensembler to understand this strategy?

GrosQuildu added enhancement good first issue fuzzing AFL front-ends python labels Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AFL synchronization mode #327

AFL synchronization mode #327

GrosQuildu commented Feb 19, 2020

Travmatth commented Mar 15, 2020

GrosQuildu commented Mar 15, 2020

jscriven-digimarc commented Mar 24, 2020

GrosQuildu commented Mar 24, 2020

agroce commented Mar 24, 2020

jscriven-digimarc commented Mar 25, 2020

agroce commented Apr 4, 2020

jscriven-digimarc commented Apr 6, 2020

GrosQuildu commented Apr 8, 2020

jscriven-digimarc commented Apr 8, 2020

AFL synchronization mode #327

AFL synchronization mode #327

Comments

GrosQuildu commented Feb 19, 2020

Travmatth commented Mar 15, 2020

GrosQuildu commented Mar 15, 2020

jscriven-digimarc commented Mar 24, 2020

GrosQuildu commented Mar 24, 2020

agroce commented Mar 24, 2020

jscriven-digimarc commented Mar 25, 2020

agroce commented Apr 4, 2020

jscriven-digimarc commented Apr 6, 2020

GrosQuildu commented Apr 8, 2020

jscriven-digimarc commented Apr 8, 2020