Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AngrTracerError: Could not step to the first address of the trace - state split #81

Open
TheBlueMatt opened this issue Dec 11, 2019 · 14 comments

Comments

@TheBlueMatt
Copy link

When attempting to test against a real (albeit incredibly simple - just a tiny message deserialization test, though the same happens on much more complicated targets too) Rust target, after cle loads and does a run, angr gets mad immediately upon calling simgr.use_technique(t) (see stack trace below).

WARNING | 2019-12-11 03:05:52,461 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
WARNING | 2019-12-11 03:05:54,099 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
Traceback (most recent call last):
  File "run_driller.py", line 68, in <module>
    main()
  File "run_driller.py", line 55, in main
    for _, new_input in Driller(binary, seed).drill_generator():
  File "/root/driller/venv/lib/python3.7/site-packages/driller/driller_main.py", line 101, in drill_generator
    for i in self._drill_input():
  File "/root/driller/venv/lib/python3.7/site-packages/driller/driller_main.py", line 131, in _drill_input
    simgr.use_technique(t)
  File "/root/driller/venv/lib/python3.7/site-packages/angr/sim_manager.py", line 188, in use_technique
    tech.setup(self)
  File "/root/driller/venv/lib/python3.7/site-packages/angr/exploration_techniques/tracer.py", line 192, in setup
    raise AngrTracerError("Could not step to the first address of the trace - state split")
angr.errors.AngrTracerError: Could not step to the first address of the trace - state split
@rhelmot
Copy link
Member

rhelmot commented Dec 11, 2019

This is generally because the binary or one of its shared libraries uses input which is not concretized before reaching the entry point. Can you post a testcase to reproduce this, including all the dependent shared objects?

@TheBlueMatt
Copy link
Author

Sure! ldd output is below but give me a sec and I can push something that you can easily cargo build.

root@fuzzer:~/driller# ldd ./rust-lightning/fuzz/target/release/msg_ping_target
	linux-vdso.so.1 (0x00007fff58bef000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa05dd2d000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa05dd0c000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa05dcf2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa05db31000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa05dd66000)

@rhelmot
Copy link
Member

rhelmot commented Dec 11, 2019

I don't have a rust compiler installed and don't want to, can you just zip and upload the binaries

@TheBlueMatt
Copy link
Author

Sure. The simplest binary is at http://web.bluematt.me/msg_ping_fuzz_target_for_driller

@rhelmot
Copy link
Member

rhelmot commented Dec 11, 2019

You need to provide a full testcase, that is, your script and all the inputs, including all the shared libraries, since one of the inputs will be a trace including shared object addresses.

@TheBlueMatt
Copy link
Author

TheBlueMatt commented Dec 11, 2019

Simplified input and simplified crash-demonstrating script at bug81.tar.gz

Running against dependencies as installed with

pip install cle angr archinfo
pip install git+https://github.com/angr/tracer.git#egg=tracer
pip install git+https://github.com/shellphish/driller

All the dependencies are Debian buster system dependencies (
sysdeps.tar.gz if you dont have them handy)

@rhelmot
Copy link
Member

rhelmot commented Dec 11, 2019

The program is attempting to use sigaction (among other things) in the initializers before the entry point. Because we can't correctly identify sections of the trace which correspond to initializers, we have to run through these blind, without knowing how to resolve branches. Our simulated sigaction syscall is just a stub, meaning it provides a symbolic return value, so any attempt to branch on its return value will split the state.

To fix this, we will either need to a) implement the sigaction syscall in our environment model or b) implement trace following for shared library initializers.

@TheBlueMatt
Copy link
Author

TheBlueMatt commented Dec 11, 2019 via email

@rhelmot
Copy link
Member

rhelmot commented Dec 11, 2019

For the sigaction route: assuming you know how to manipulate basic angr objects the docs pages you want are:

Sigaction will most certainly not be the last piece you run into which causes issues. In order to tell what the problem causing a split is, you should get a postmortem pdb shell at the crash site and examine state.solver.constraints[-1] for one of the split states. It will include a variable whose name is hopefully descriptive enough of where it came from.

For the initializer tracing route: the tracer code is here: https://github.com/angr/angr/blob/master/angr/exploration_techniques/tracer.py

What needs to happen is the part in setup commented as "step to entry point" needs to be removed and replaced with something more similar to the part commented as "calc ASLR slide for main binary and find the entry point", for each initializer (project.loader.initializers). Then, we need to store a list which allows you for a given initializer to determine which index in the trace corresponds to it. Then, in _update_state_tracking, we need to add a clause that checks to see if we just ran the LinuxLoader SimProcedure, and if so figure out which initializer (or the entry point) we're about to jump into, and adjust the current trace index appropriately.

@TheBlueMatt
Copy link
Author

Well, I went the "easy" route and instead exported the rust code in a static library and called it from a C wrapper, which got past it, sorry for the lack of useful conribution. Am now getting what appears to be #80.

@clampz
Copy link

clampz commented Jul 10, 2020

For the sigaction route: assuming you know how to manipulate basic angr objects the docs pages you want are:

Sigaction will most certainly not be the last piece you run into which causes issues. In order to tell what the problem causing a split is, you should get a postmortem pdb shell at the crash site and examine state.solver.constraints[-1] for one of the split states. It will include a variable whose name is hopefully descriptive enough of where it came from.

For the initializer tracing route: the tracer code is here: https://github.com/angr/angr/blob/master/angr/exploration_techniques/tracer.py

What needs to happen is the part in setup commented as "step to entry point" needs to be removed and replaced with something more similar to the part commented as "calc ASLR slide for main binary and find the entry point", for each initializer (project.loader.initializers). Then, we need to store a list which allows you for a given initializer to determine which index in the trace corresponds to it. Then, in _update_state_tracking, we need to add a clause that checks to see if we just ran the LinuxLoader SimProcedure, and if so figure out which initializer (or the entry point) we're about to jump into, and adjust the current trace index appropriately.

Hey @rhelmot is there more you can say about the loop that the part commented as "step to entry point" needs to be replaced with? Sorry! I'm just sitting here trying to figure out how i can use driller for my project and ran into this same error. I would like to write this fix you're talking about and see if it works for me. I'm just a bit confused what we're looking for with this loop .. is it to create the list you mentioned? or something else? any more info or thoughts would likely help me .. thanks!

@rhelmot
Copy link
Member

rhelmot commented Jul 10, 2020

Yes, the goal is to create the list - a list which indicates that for the nth initializer, its presence in the trace starts at the specified index.

So for example the trace looks like this

---------------------------------------------------------------------
   ^initiializer 1        ^initializer 2                     ^entry point

And what we want to find out is what indices correspond to each of those points so we can correctly keep track of where in the trace our execution corresponds to when we're executing with angr's simplified model of running initializers (the LinuxLoader simprocedure).

We already use heuristics to determine the entry point trace index, we just need to do the same thing to figure out where the initializers are, too. The result of that computation will be a list of trace indices, one corresponding to each initializer. Then, we need to store that list and do the thing I described earlier (only 7 months ago? yikes) where we use it to update the current index while executing.

@clampz
Copy link

clampz commented Jul 14, 2020

Thanks for the guidance @rhelmot ! I've gotten started, and I think I get what you mean. I'm not finding any of the project.loader.initializers addresses in _trace though! When I look through the symbols in my binary for stuff labeled init I am able to find it in _trace but they arent in the list of initializers previously mentioned. Not sure what I'm missing here, it doesn't seem right to me .. maybe I'm over-simplifying this programming problem, or using a poor example binary. I was just using the binary I am currently fuzzing, which is a wrapper program around some poppler api calls.

I've attached my files so you can see what I'm working on. Lemme know if I missed something or if you need more information to see what's going on for me. Thanks for all your help

driller-testcase-clampz.pt1.tar.gz
driller-testcase-clampz.pt2.tar.gz
driller-testcase-clampz.pt3.tar.gz
driller-testcase-clampz.pt4.tar.gz
driller-testcase-clampz.pt5.tar.gz

[edit] - my archive was too big so i just split into pieces, you should be able to just cat them together
[edit2] - for context my binary was compiled with -no-pie -g
[edit3] - i guess maybe my question is, addresses from qemu trace and from cle are not matching up, seems like maybe thats to be expected .. but how might you go about resolving this so we can match up our initializer addresses and trace addresses? my thoughts are i need a memory map for the process loaded in qemu then i need to rebase all the addresses in the list of initializers based on that memory map - i tried using a qemu option to set the base address to the same one used in cle but it remained the same

@rhelmot
Copy link
Member

rhelmot commented Jul 14, 2020

The point of the algorithm I referenced which identifies the entry point (the find the entry point comment) is that it works regardless of the base address used by qemu, by assuming that the page-alignment must be the same and that the block prior to it will be very far away, a jump from a different mapped image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants