Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFL synchronization mode #327

Open
GrosQuildu opened this issue Feb 19, 2020 · 10 comments
Open

AFL synchronization mode #327

GrosQuildu opened this issue Feb 19, 2020 · 10 comments

Comments

@GrosQuildu
Copy link
Contributor

Currently, AFL executor (deepstate-afl) always runs the fuzzer in master mode (-M argument). Code here.

We should detect when to run it in slave mode (-S). Thus not running deterministic step. Consult AFL documentation (parallel_fuzzing.txt) for more informations.

@Travmatth
Copy link

So I've looked into this some ,with an eye towards proposing a pull request to close the issue. From AFL's parallel_fuzzing.txt

The difference between the -M and -S modes is that the master instance will
still perform deterministic checks; while the secondary instances will
proceed straight to random tweaks. If you don't want to do deterministic
fuzzing at all, it's OK to run all instances with -S. With very slow or complex
targets, or when running heavily parallelized jobs, this is usually a good plan.

As I understand it, DeepState does not support running parallelized individual fuzzers, so a normal deepstate-afl job should run afl in master mode. The only time I could imagine running afl in slave mode could possibly be in the ensembler, although I could also imagine allowing a user to set a deepstate-afl --afl-nonderministic option to give them the flexibility. If there was some heuristic we could use to determine the slowness or complexity of the target we could configure deepstate-afl to call afl in slave mode automatically (is there such a heuristic that could be used?).
Proposal:
Add a --afl-nondeterministic flag to deepstate-afl and deepstate-ensemble that would allow user to run afl in slave mode. If there is some heuristic that can be used to determine targets slowness or complexity, use it to enable this option automatically.

@GrosQuildu
Copy link
Contributor Author

I think we shouldn't add new flag --afl-nondeterministic, unnecessary complication. Although we may detect if user specify -S in fuzzer_args. But idk, maybe the option would be better...

Anyway, implementing ensembler first is a priority afaik. Then we can auto-detect when to use -S in the ensembler only and leave default -M in single AFL.

If there was some heuristic we could use to determine the slowness or complexity of the target we could configure deepstate-afl to call afl in slave mode automatically (is there such a heuristic that could be used?)

Don't know any, but would be useful.

@jscriven-digimarc
Copy link
Contributor

@GrosQuildu - My team has implemented an ensemble afl class for deepstate that we currently use for fleet fuzzing with AFL. It toggles -M and -S depending on args. It also uses tgz files for syncing, just like the afl parallel fuzzing guide suggests. Are you interested in this as a pull request?

@GrosQuildu
Copy link
Contributor Author

Its rather question for @agroce. But I think yeah, we may see the PR.

@agroce
Copy link
Collaborator

agroce commented Mar 24, 2020

Sounds interesting for sure!

@jscriven-digimarc
Copy link
Contributor

Ok, our team is wrapping up work on our fuzzer system for now, I'll put it in as a pull req for your consideration at the end of the week.

@agroce
Copy link
Collaborator

agroce commented Apr 4, 2020

Let us know when this is ready!

@jscriven-digimarc
Copy link
Contributor

Sorry for the delay. You may want to take a quick look at this commit:
jscriven-digimarc@82b80b1

This contains the afl-ensemble class we use. A few notes/thoughts for review.
This system relies on a network share that contains a tgz file for each fuzzer node. A node is machine that may run multiple fuzzers. Each tgz must contain the queue and fuzzer stats for each 'fuzzer' on each node. We run 4 nodes with 4 fuzzers (all in -S mode). Each node must have a primary fuzzer (fuzzer id 0). This will do the seed synchronization using tgz files. We rely on AFLs builtin ability to pull interesting seeds from neighboring queues. This system will extract all other fuzzer nodes queues and extract them alongside the fuzzers that are running. The running fuzzers will steal seeds from their neighbors and add interesting ones.
We have some scripting around this that drives deepstate. It runs N instances of deepstate on each node on a separate thread. We also have scripting that runs a minimization cycle on all queues on the share. I would be happy to answer any questions for you guys, let me know if you really would like a pull req for this and i can tidy up the code (add headers, etc).

@GrosQuildu
Copy link
Contributor Author

Added a few comments in your repo.

In general, there is too much code repetition imho. New class should inherit from AFL or AFL class should be extended.

Arguments fuzzer_node_id and fuzzer_id seems to be redundant. Idea was that every fuzzer instance have its own output_dir, to there is no need to maintain ids. And node (machine) should be handled implicitly, maybe name stuff that is send across network "local/remote" or like that? So only ensemble_mode seems necessary. And we may add it under ensemble_group args and name it like is_master?

Also have some concerns about directory structure. Each fuzzer outdir will have (after sync cycle): its own directory (fuzzername_node_id) and one directory for each other node. The idea was to push all testcases that a fuzzer should sync under SYNC_DIR.

As for tars: afl docs suggests to pack testcases and send them via ssh or something like that. To avoid using of network shares. Now the tars seems to be unnecessary complication if sync_dir is a network share.

So in current state it's in a direction of how ensembler could be, but for AFL only. To merge it, we should handle all supported fuzzers, just replacing current ensembler class ;) (plus comments above ofc).

@jscriven-digimarc
Copy link
Contributor

Thanks Pawel for the feedback.

I will explain a few details that may help. Regarding code repetition, i agree. If the deepstate team was strongly considering merging this class, I would replace the existing afl.py with this one and merge some of the existing checks and behaviors into this class. For example, the compile step is removed in the ensemble version, but could easily be replaced. In our code base, we simply use this one instead. It can be used to run a single fuzzer or several.

Arguments fuzzer_node_id and fuzzer_id are not really redundant. The problem this solves is having multiple fuzzers on the same 'node' each with a distinct queue. The original afl class sets up a local sync directory (output directory), but hardcodes 'the_fuzzer' as the name of every fuzzer on that system. In order to share queues between running fuzzers, they must share a local sync directory, but have a unique name in the subdirectory with its queue and fuzzer stats. The fuzzer_id allows this. The node ID, allows queues to be synched across fuzzer nodes. For example, node 0 will pull the queues for node 1, and 2 and extract them to a sync dir. The seeds must remain in individual fuzzer queues. So after a sync cycle, the local sync_dir will look like this:
/blah/blah/workspace/myLocal/SyncDir/fuzzer_0_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_0_1/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_1_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_1_1/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_2_0/
/blah/blah/workspace/myLocal/SyncDir/fuzzer_2_1/

each fuzzer directory has a queue and fuzzer stats. On this system, there are 2 fuzzers running (node id 0, fuzzer id 1 and fuzzer id 2). The other queues are automatically pulled from by afl. fuzzers on any node cannot shared a queue. Every 'output_dir' is the same on all fuzzers and nodes (/blah/blah/workspace/myLocal/SyncDir).

The original ensembler code was attempting to synchronize seeds, not queues. This is not consistent with how afl generates new test cases from seeds. Once the fuzzer is running, you cannot simply drop seeds into the existing queues of running fuzzers without re-naming them following the correct scheme (id:00xxxx,...), otherwise you have to restart the fuzzer to allow it to re-order the input seeds. By sharing queues, the fuzzers pull from their neighbors when they find interesting new seeds. From this perspective, this class doesn't really follow model of the other "ensemble" fuzzers, so yes, you are correct, this strategy is 'AFL' only.

I am happy to help code up any direction the team would like to take, but please note my commit on the branch was meant to help others in a similar situation to my team (run a whole bunch of AFL fuzzers on one target, share seeds, and use deepstate to drive it). As an aside, our system works great! =)

Also regarding tars, it was just a convenient mechanism for quick file transfer. We began with rsync (like deepstate ensemble) but ran into several issues with missing and extraneous seeds. We also have routines (outside of deepstate) that run minimize and other seed analysis, so it worked well within our infrastructure.

So, all that said, is this a good candidate to just replace the afl class? Or is there work to do with the core ensembler to understand this strategy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants