Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching some expressions out of an expression array #1338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tomaskender
Copy link
Contributor

@tomaskender tomaskender commented Aug 16, 2020

Usage:
X of [ true, false, true ]

It does what you would expect, if enough array items in array are evaluated to true, then whole statement is true. X can be a number/'any'/'all' just like in for loops. The array evaluation structure is reusing loop indexing for memory organization (saving internal variables) and the algorithm is pretty similar too, so it should cause no confusion.

It supports short circuit evaluation, which means that if enough array items are evaluated to true, the evaluation of array is cut and remaining items evaluation is skipped.

Why we need this:

Analysts frequently use rules such as

for 2 i in (1..4): (
(i == 1 and cuckoo.filesystem.file_write(/.../i)) or
(i == 2 and cuckoo.filesystem.file_write(/.../i)) or
(i == 3 and cuckoo.filesystem.file_write(/../i)) or
(i == 4 and cuckoo.network.http_request(/.../i))
)

The aim is to simplify such rules to something that can be read and written with more ease:

2 of [
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@tomaskender
Copy link
Contributor Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@plusvic
Copy link
Member

plusvic commented Oct 19, 2020

I like the idea of implementing this feature, but I'm not sure this is the more appropriate way to do it. Instead of generating VM code specifically for expressions of the form <for_expression> of [<boolean_array_expression>] we should try to do something more general, like <for_expression> of <iterator> where iterator can be an array of expressions or some other array of booleans (for example it could be an array returned by some module).

I'm going to give a second thought to this feature. I also have in mind implementing <expression> in <iteraror>, which is somehow related.

@tomaskender
Copy link
Contributor Author

if my understanding is correct, then in and of is going to be pretty much same piece of code, the only difference is going to be that 'of' is going to search for X matches of bool@true expressions and shortcircuit after finding that many matches, while 'in' is going to search for 1 expression of provided type@value and shortcircuit after that.

that means that if i make bool_arrays accept any expression and then i will check for the type and value that will be stored in the memory, then i will effectively create the iterator you were referring to.

is my assumption correct? is there any other reason you want to generalize bool_arrays into an iterator or is it only because of reusing the same code? is there a use-case for supporting different types than booleans in arrays for the of operator?
if i'll be doing these changes, then i might as well add the in operator if you want to, as there isn't going to be much of a difference between the two constructs

@tomaskender
Copy link
Contributor Author

@plusvic do you think i should pursue the changes i described or do i put this whole thing on hold?

@plusvic
Copy link
Member

plusvic commented Nov 23, 2020

Put it on hold, I think this should be part of a larger more ambitious change that I have in mind.

@metthal
Copy link
Contributor

metthal commented Nov 25, 2020

@plusvic Do you think you could possibly share some details about this ambitious change with us? :) @tomaskender is working in my team on some improvements to YARA itself and we'd like to start using them internally while we also want to share them with upstream. Having an insight into what the plans are with YARA would help us a lot with steering our design decision in the future.

@plusvic
Copy link
Member

plusvic commented Nov 26, 2020

I don't have the full picture yet, but the plan is generalizing your proposal to something that could accept expressions like...

2 of some_module.some_array

...where some_array is an array of booleans. So the feature you propose would be actually something like:

<for_expression> of <iterator>

Also, I want to implemnt an in operator like <expression> in <iteraror> which would return true if the value of <expression> is contained in the <iterator> would be useful.

In all these cases <iterator> should anything that can be iterable, including a list of expressions like:

[
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

This for example would be perfectly valid:

true in [
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

The overall idea is making all this construct orthogonal, in the sense that you have simple pieces that you can combine in a flexible way. That may require some large refactoring of the existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants