Matching some expressions out of an expression array #1338

tomaskender · 2020-08-16T20:56:34Z

Usage:
X of [ true, false, true ]

It does what you would expect, if enough array items in array are evaluated to true, then whole statement is true. X can be a number/'any'/'all' just like in for loops. The array evaluation structure is reusing loop indexing for memory organization (saving internal variables) and the algorithm is pretty similar too, so it should cause no confusion.

It supports short circuit evaluation, which means that if enough array items are evaluated to true, the evaluation of array is cut and remaining items evaluation is skipped.

Why we need this:

Analysts frequently use rules such as

for 2 i in (1..4): (
(i == 1 and cuckoo.filesystem.file_write(/.../i)) or
(i == 2 and cuckoo.filesystem.file_write(/.../i)) or
(i == 3 and cuckoo.filesystem.file_write(/../i)) or
(i == 4 and cuckoo.network.http_request(/.../i))
)

The aim is to simplify such rules to something that can be read and written with more ease:

2 of [
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

googlebot · 2020-08-16T20:56:40Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

tomaskender · 2020-08-16T21:21:28Z

@googlebot I signed it!

googlebot · 2020-08-16T21:21:32Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

plusvic · 2020-10-19T09:13:40Z

I like the idea of implementing this feature, but I'm not sure this is the more appropriate way to do it. Instead of generating VM code specifically for expressions of the form <for_expression> of [<boolean_array_expression>] we should try to do something more general, like <for_expression> of <iterator> where iterator can be an array of expressions or some other array of booleans (for example it could be an array returned by some module).

I'm going to give a second thought to this feature. I also have in mind implementing <expression> in <iteraror>, which is somehow related.

tomaskender · 2020-10-22T19:25:46Z

if my understanding is correct, then in and of is going to be pretty much same piece of code, the only difference is going to be that 'of' is going to search for X matches of bool@true expressions and shortcircuit after finding that many matches, while 'in' is going to search for 1 expression of provided type@value and shortcircuit after that.

that means that if i make bool_arrays accept any expression and then i will check for the type and value that will be stored in the memory, then i will effectively create the iterator you were referring to.

is my assumption correct? is there any other reason you want to generalize bool_arrays into an iterator or is it only because of reusing the same code? is there a use-case for supporting different types than booleans in arrays for the of operator?
if i'll be doing these changes, then i might as well add the in operator if you want to, as there isn't going to be much of a difference between the two constructs

tomaskender · 2020-11-23T10:50:32Z

@plusvic do you think i should pursue the changes i described or do i put this whole thing on hold?

plusvic · 2020-11-23T10:59:25Z

Put it on hold, I think this should be part of a larger more ambitious change that I have in mind.

metthal · 2020-11-25T23:44:06Z

@plusvic Do you think you could possibly share some details about this ambitious change with us? :) @tomaskender is working in my team on some improvements to YARA itself and we'd like to start using them internally while we also want to share them with upstream. Having an insight into what the plans are with YARA would help us a lot with steering our design decision in the future.

plusvic · 2020-11-26T12:28:04Z

I don't have the full picture yet, but the plan is generalizing your proposal to something that could accept expressions like...

2 of some_module.some_array

...where some_array is an array of booleans. So the feature you propose would be actually something like:

<for_expression> of <iterator>

Also, I want to implemnt an in operator like <expression> in <iteraror> which would return true if the value of <expression> is contained in the <iterator> would be useful.

In all these cases <iterator> should anything that can be iterable, including a list of expressions like:

[
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

This for example would be perfectly valid:

true in [
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]

The overall idea is making all this construct orthogonal, in the sense that you have simple pieces that you can combine in a flexible way. That may require some large refactoring of the existing code.

tomaskender added 2 commits August 16, 2020 16:59

arrays updated to work with new yara

8b69050

removing redundant memory usage

d6a9808

tests

2d4063c

plusvic added the feature-request label Sep 4, 2022

wxsBSD mentioned this pull request Sep 8, 2022

Iterating over constant strings in yara conditions block #1765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching some expressions out of an expression array #1338

Matching some expressions out of an expression array #1338

tomaskender commented Aug 16, 2020 •

edited

Loading

googlebot commented Aug 16, 2020

tomaskender commented Aug 16, 2020

googlebot commented Aug 16, 2020

plusvic commented Oct 19, 2020

tomaskender commented Oct 22, 2020

tomaskender commented Nov 23, 2020

plusvic commented Nov 23, 2020

metthal commented Nov 25, 2020

plusvic commented Nov 26, 2020

Matching some expressions out of an expression array #1338

Are you sure you want to change the base?

Matching some expressions out of an expression array #1338

Conversation

tomaskender commented Aug 16, 2020 • edited Loading

googlebot commented Aug 16, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

tomaskender commented Aug 16, 2020

googlebot commented Aug 16, 2020

plusvic commented Oct 19, 2020

tomaskender commented Oct 22, 2020

tomaskender commented Nov 23, 2020

plusvic commented Nov 23, 2020

metthal commented Nov 25, 2020

plusvic commented Nov 26, 2020

tomaskender commented Aug 16, 2020 •

edited

Loading