Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-parser: recognize SARIF format for semgrep output #162

Merged
merged 2 commits into from
Feb 19, 2024

Conversation

rhyw
Copy link
Contributor

@rhyw rhyw commented Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me how to create this stdout, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My attempt was very tricky - I don't find from anywhere the helpers on generating files like this.

What I did was to use the plugin semgrep to scan a source rpm, so I get a .sarif file and a .js(json) file. I then tweak the sarif file as stdin and the scan-results-all.js as stdout. I did some investigation and guess that's how these text files were feeded by other plugins(snyk, gitleaks for example). If I understand it correctly, it's used by csgrep(used in the filter_hook) to test if after processing the output is as expected.

By tweaking the sarif file, I mean the original scan may contain tens of thousands of lines, thus I need to remove some redundant lines, for example I randomly chose a gzip source build, there're 700+ semgrep rules claimings with approx 70k+ lines, we definitely don't want to include all of them. Other tweaks includes for example to alter some of the lines produced, for example the unintended suspicous temporary dir prefix in the rules path. I think the process is quite time consuming without a guide. Also I doubt if there's a better way of doing this.

cc @kdudka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanchuntao There is a script to sync the expected output of tests: https://github.com/csutils/csdiff/blob/main/tests/csgrep/sync.sh

@kdudka
Copy link
Member

kdudka commented Feb 12, 2024

@rhyw OSH-57 should be referred to as Related: rather than Resolves: because this pull request does not resolve the issue on its own.

Copy link
Member

@lzaoral lzaoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhyw
Copy link
Contributor Author

rhyw commented Feb 19, 2024

@rhyw OSH-57 should be referred to as Related: rather than Resolves: because this pull request does not resolve the issue on its own.

Thanks. Commit message / PR description updated.

Copy link
Member

@kdudka kdudka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhyw Looks good. Thanks for the update!

@kdudka kdudka closed this in 0a73880 Feb 19, 2024
@kdudka kdudka merged commit 0a73880 into csutils:main Feb 19, 2024
33 checks passed
@rhyw rhyw deleted the semgrep-recognition branch February 19, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants