Replies: 3 comments
-
I think the 'online' tests are what should be aimed for. Although it makes the tests a bit more troublesome to run, it should add to the trust we have in the overall package. It is important to note that for the Lucene backend, these are the most important resources to take a look at:
Perhaps a wise first step would be to identify which the common field types are for the fields that we want to search. Ideally, we would have test cases for each field type. Moreover, I agree with @andurin that tests should cover a wide range of Sigma modifiers in order to cover various aspects. Specifically, boolean fields and date fields are currently not covered by the test cases. Another important consideration, however, is that pySigma-backend-elastic search is kind of meaningless on its own and is only meaningful in combination with, for example, pySigma-pipeline-sysmon and the rules in the main sigma repo. Therefore, it might be a good idea to work on developing a type of integration test. One particular issue I tried to address in #43 but that is currently not covered in tests, is the possibility to have queries containing both wildcards and spaces. We should also include tests using special characters that must be escaped (e.g. quotes) and field types that are currently not tested. |
Beta Was this translation helpful? Give feedback.
-
On the matter "to quote, or not to quote", I think the answer is quite simple. We should strive to never quote. To elaborate, according to the Lucene documentation I mentioned above:
In other words, if the Lucene backend should support wildcard searches, which is an essential part of the Sigma syntax, we should not generate phrases and hence should not quote. The only exception I can think of at the moment is when you want to search for an empty string field. (note that this is different from asserting that a field exists) |
Beta Was this translation helpful? Give feedback.
-
Moreover, I have come to realize the cause of all the problems we are having with the Lucene backend and Sigma. Sigma promises to transform Sigma queries (allowing for regular expressions) whereas this is a feature Lucene simply cannot support. Lucene only allows for using certain wildcards and is less expressive than the Sigma syntax suggests. A Sigma query such as Of course, this becomes more complicated when queries specify character groups, lookahead, exclusion, or the number of matches. Perhaps this is something the team behind Sigma should take a look at, because it looks like a contradiction at Sigma's core to me. EDIT: This means that we should also look back at previous issues and PRs such as #9. In the commit to address this, a new test was also added: 563c565#diff-8e673d84136778434f31a4b9af2fc02d9afcb5c2bbac2698f57017989f65943aR142-R156 |
Beta Was this translation helpful? Give feedback.
-
Hi Community,
this is my very first discussion and I want to ask for your help with a constructive discussion.
#43 (also #25 and #36) revealed some issues between indexed data and their ability to be searched "as intended" using this elasticsearch backend - special for the lucene backend.
There are some unclear points hitting me since I'm not using ES on a daily basis (at the moment). For example: "To quote or not to quote?", "What about asterix (*) searches against
wildcard
fields?", etc.I would like to cover those points using Tests within this project - so everyone interested is invited to help.
I'm thinking about test cases for the following:
process.command_line (as keyword): <test data>
process.command_line.text (as text <test data>
process.command_line.wildcard (as wildcard): <test data>
contains
,endswith
, etc.Starting point to help writing tests:
I'm hoping for a lot of PRs for new and valuable test cases.
Regards,
@andurin
Beta Was this translation helpful? Give feedback.
All reactions