diff --git a/README.md b/README.md index bdbdd6b..a42d5bb 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,26 @@ # phylum-analyze-pr-action A GitHub Action to automatically analyze Pull Requests for changes to package manager lockfiles using Phylum. -Phylum provides a complete risk analyis of "open-source packages" (read: untrusted software from random Internet strangers). Phylum evolved forward from legacy SCA tools to defend from supply-chain malware, malicious open-source authors, and engineering risk, in addtion to software vulnerabilities and license risks. To learn more, please see [our website](https://phylum.io) +Phylum provides a complete risk analyis of "open-source packages" (read: untrusted software from random Internet +strangers). Phylum evolved forward from legacy SCA tools to defend from supply-chain malware, malicious open-source +authors, and engineering risk, in addtion to software vulnerabilities and license risks. To learn more, please see +[our website](https://phylum.io) -This action enables users to configure thresholds for each of Phylum's five risk domain scores. If a package risk domain score is below the threshold, the action will fail the check on the pull request. When packages fail the risk analysis, a comment is created on the PR to summarize the issues. +This action enables users to configure thresholds for each of Phylum's five risk domain scores. If a package risk +domain score is below the threshold, the action will fail the check on the pull request. When packages fail the risk +analysis, a comment is created on the PR to summarize the issues. ## Features - configurable risk domain thresholds -- uses [peter-evans/create-or-update-comment](https://github.com/marketplace/actions/create-or-update-comment) to add comments to PRs +- uses [peter-evans/create-or-update-comment](https://github.com/marketplace/actions/create-or-update-comment) + to add comments to PRs ## Getting Started -1. Create a workflow in a repository that uses the workflow definition listed below as an example. -2. Be sure to include the base branches you use for development, the defaults are set to `master` and `main`. -3. Define risk domain thresholds using `vul_threshold`, `mal_threshold`, etc to define a score requirement. A Phylum project score requirement of 60 is defined as `0.6`, for example. +1. Create a workflow in a repository that uses the workflow definition listed below as an example +2. Be sure to include the base/default branches used for development, where the defaults are set to `master` and `main` +3. Define risk domain thresholds using `vul_threshold`, `mal_threshold`, etc. to define a score requirement + 1. For example, a Phylum project score requirement of 60 is defined as `0.6` +4. Additional inputs can be used - see [action.yml](action.yml) for full list ```yaml on: @@ -26,9 +34,9 @@ jobs: runs-on: ubuntu-latest name: A job to analyze PR with phylum steps: - - uses: actions/checkout@v2 + - uses: actions/checkout@v3 - id: analyze-pr-test - uses: phylum-dev/phylum-analyze-pr-action@v1.4 + uses: phylum-dev/phylum-analyze-pr-action@v1 with: vul_threshold: 0.6 mal_threshold: 0.6 @@ -39,34 +47,39 @@ jobs: ``` ### Supported lockfiles -- requirements.txt (Python PyPI) -- package-lock.json (JavaScript/TypeScript NPM) -- yarn.lock (JavaScript/TypeScript NPM) -- Gemfile.lock (Ruby Rubygems/Bundler) +- `requirements.txt` (Python PyPI) +- `poetry.lock` (Python PyPI) +- `package-lock.json` (JavaScript/TypeScript NPM) +- `yarn.lock` (JavaScript/TypeScript NPM) +- `Gemfile.lock` (Ruby Rubygems/Bundler) ### Requirements - active Phylum account ([Register here](https://app.phylum.io/auth/registration)) -- GitHub repository secret defined: PHYLUM_TOKEN (extracted from Phylum CLI configuration file "offline_access") - 1. Ensure you've updated the Phylum CLI on a local installation to a version >= `1.2.0` - 2. Successfully authenticate using Phylum CLI. This will ensure the token is populated in the phylum config file `~/.phylum/settings.yaml` in stanza `offline_access` - 3. Copy the token value in the `offline_access` stanza - 4. Create a new GitHub secret in the desired repository. This can be done through the GitHub web UI or using the gh command line tool: `gh secret set PHYLUM_TOKEN -b ` - 5. Optionally, you can remove the vestigial `PHYLUM_USER` and `PHYLUM_PASS` GitHub secrets as they are no longer used. -- concrete package versions (only applicable for requirements.txt) +- GitHub repository secret defined: `PHYLUM_TOKEN` + 1. Ensure you've updated the Phylum CLI on a local installation to a version >= `2.0.1` + 2. Successfully authenticate using Phylum CLI to ensure the token is populated and correct + 3. Copy the token value from the output of the `phylum auth token` command + 4. Create a new GitHub secret named `PHYLUM_TOKEN` in the desired repository, through the GitHub web UI or using the gh command line tool: `gh secret set PHYLUM_TOKEN -b ` +- concrete package versions (only applicable for `requirements.txt`) - existing Phylum project for repository (`.phylum_project` must be present) ### Known Issues -~~1. Incomplete packages: if Phylum hasn't yet analyzed a package requested by this action, the action will fail with an exit code of 5. This is momentarily preferable than waiting.~~ +- [Issue tracker](https://github.com/phylum-dev/phylum-analyze-pr-action/issues) +- [Open bugs](https://github.com/phylum-dev/phylum-analyze-pr-action/labels/%F0%9F%95%B7%EF%B8%8F%20bug) ### Incomplete Packages -Sometimes, users will request risk analysis information for open-source packages Phylum has not yet processed. When this occurs, Phylum cannot reasonably provide risk scoring information until those packages have been processed. +Sometimes, users will request risk analysis information for open-source packages Phylum has not yet processed. +When this occurs, Phylum cannot reasonably provide risk scoring information until those packages have been processed. -New in `v1.4`, `phylum-analyze-pr-action` will: +Starting with `v1.4.0`, `phylum-analyze-pr-action` will: 1. Detect the case of incomplete packages -2. Return an exit code of 0 (a "passing" mark in GitHub Action parlance). This is to avoid failing a check in the PR with incomplete information. -3. Add a comment to the PR indicating that there were incomplete packages. The comment will advise users to wait 30m and re-run the check on the Pull Request. This will give Phylum sufficient time to download, process and analyze the incomplete packages. -4. When the check is run a second time, another comment will be added to the Pull Request noting the result of the risk analysis operation. +2. Return an exit code of 0 (a "passing" mark in GitHub Action parlance) + 1. This is to avoid failing a check in the PR with incomplete information +3. Add a comment to the PR indicating that there were incomplete packages + 1. The comment will advise users to wait 30m and re-run the check on the Pull Request + 2. This will give Phylum sufficient time to download, process and analyze the incomplete packages +4. When the check is run a second time, another comment will be added to the Pull Request noting the result of the + risk analysis operation. ### Example comment ![image](https://user-images.githubusercontent.com/132468/140830714-24acc278-0102-4613-b006-6032a62b6896.png) - diff --git a/action.yml b/action.yml index bb60ffa..4eb66f8 100644 --- a/action.yml +++ b/action.yml @@ -1,9 +1,10 @@ -# _ -# __ _ _ __ __ _| |_ _ _______ -# / _` | '_ \ / _` | | | | |_ / _ \ -#| (_| | | | | (_| | | |_| |/ / __/ -# \__,_|_| |_|\__,_|_|\__, /___\___| -# |___/ +# _ +# __ _ _ __ __ _| |_ _ _______ +# / _` | '_ \ / _` | | | | |_ / _ \ +# | (_| | | | | (_| | | |_| |/ / __/ +# \__,_|_| |_|\__,_|_|\__, /___\___| +# |___/ +--- name: 'Analyze PR' description: 'Analyze Pull request' inputs: @@ -42,27 +43,27 @@ inputs: runs: using: "composite" steps: - - id: phylum-test - uses: phylum-dev/install-phylum-latest-action@v1.3 + - name: Install phylum CLI + uses: phylum-dev/install-phylum-latest-action@v1 with: phylum_token: ${{ inputs.phylum_token }} phylum_version: ${{ inputs.phylum_version }} - - name: Check for previous comment + - name: Check for INCOMPLETE comment uses: peter-evans/find-comment@v1 id: fc with: issue-number: ${{ github.event.pull_request.number }} body-includes: INCOMPLETE - - name: Store result of id=fc in environment + - name: Record presence of Phylum INCOMPLETE comment shell: bash if: "contains(steps.fc.outputs.comment-body, 'Phylum')" run: | echo "storing PREVIOUS_INCOMPLETE" echo PREVIOUS_INCOMPLETE=1 >> $GITHUB_ENV - - name: Check for existing project + - name: Check for existing .phylum_project shell: bash run: | result=$(find . -maxdepth 1 -iname ".phylum_project") @@ -73,29 +74,31 @@ runs: exit 11 fi - - name: Generate Phylum label + - name: Generate PHYLUM_LABEL shell: bash - run: | - echo PHYLUM_LABEL="GHA-PR${{ github.event.number }}-${GITHUB_HEAD_REF}" >> $GITHUB_ENV + run: echo PHYLUM_LABEL="GHA-PR${{ github.event.number }}-${GITHUB_HEAD_REF}" >> $GITHUB_ENV - - uses: actions/setup-python@v2 + - name: Setup Python + uses: actions/setup-python@v3 with: python-version: '3.x' - - name: install python dependencies + - name: Install python dependencies shell: bash run: | - pip install unidiff + python -m pip install -U pip setuptools + python -m pip install unidiff packaging - - name: run analyze.py pr_type + - name: Determine the PR type shell: bash run: python $GITHUB_ACTION_PATH/analyze.py "pr_type" $GITHUB_REPOSITORY ${{ github.event.number }} - - name: cat prtype + - name: Display the PR type shell: bash run: cat ~/prtype.txt - - id: get-prtype + - name: Make PR type available for future steps + id: get-prtype shell: bash run: | ret="$(cat ~/prtype.txt)" @@ -104,42 +107,45 @@ runs: ret="${ret//$'\r'/'%0A'}" echo "::set-output name=prtype::$ret" - - id: should-proceed + - name: Bail when no updates to analyze + id: should-proceed shell: bash if: "contains(steps.get-prtype.outputs.prtype, 'NA')" run: | echo 'exiting with 0 since package dependency files were not modified' echo '0' > $HOME/returncode.txt - - name: Analyze project lockfile + - name: Analyze project lockfile with phylum CLI shell: bash if: "!contains(steps.get-prtype.outputs.prtype, 'NA')" run: | export PATH="$HOME/.phylum:$PATH" pushd $GITHUB_WORKSPACE || exit 11 - phylum analyze -l $PHYLUM_LABEL ${{ steps.get-prtype.outputs.prtype }} --verbose --json > ~/phylum_analysis.json - echo "[*] Analyzed ${{ steps.get-prtype.outputs.prtype }} under label ${PHYLUM_LABEL} and wrote results to ~/phylum_analysis.json" + phylum analyze -l $PHYLUM_LABEL ${{ steps.get-prtype.outputs.prtype }} --verbose --json > ~/phylum_analysis.json + echo "[*] Analyzed ${{ steps.get-prtype.outputs.prtype }} under label ${PHYLUM_LABEL} and wrote results to ~/phylum_analysis.json" popd - - # - name: tmate - # uses: mxschmitt/action-tmate@v3 - - - name: invoke test matrix + - name: Invoke test matrix shell: bash if: "contains(inputs.invoke_test_matrix, 'true')" - run: | - python $GITHUB_ACTION_PATH/test_matrix.py + run: python $GITHUB_ACTION_PATH/test_matrix.py - - name: python script analyze.py + - name: Compare added dependencies in PR to analysis results shell: bash if: "!contains(steps.get-prtype.outputs.prtype, 'NA')" - run: python $GITHUB_ACTION_PATH/analyze.py "analyze" $GITHUB_REPOSITORY ${{ github.event.number }} ${{ inputs.vul_threshold }} ${{ inputs.mal_threshold }} ${{ inputs.eng_threshold }} ${{ inputs.lic_threshold }} ${{ inputs.aut_threshold }} - - # - name: tmate - # uses: mxschmitt/action-tmate@v3 - - - id: get-returncode + run: > + python $GITHUB_ACTION_PATH/analyze.py + "analyze" + $GITHUB_REPOSITORY + ${{ github.event.number }} + ${{ inputs.vul_threshold }} + ${{ inputs.mal_threshold }} + ${{ inputs.eng_threshold }} + ${{ inputs.lic_threshold }} + ${{ inputs.aut_threshold }} + + - name: Get return code + id: get-returncode shell: bash run: | ret="$(cat ~/returncode.txt)" @@ -149,14 +155,15 @@ runs: echo "::set-output name=ret::$ret" # This will catch SUCCESS cases - - name: return 0 for success + - name: Return 0 for success shell: bash if: "contains(steps.get-returncode.outputs.ret, '0')" run: | - echo 'exiting with 0 for success' + echo "exiting with 0 for success" exit 0 - - id: get-comment-body + - name: Get comment body + id: get-comment-body # this will have to check for 1 or 5 AND if on the second run # if: "contains(steps.get-returncode.outputs.ret, '1')" if: "steps.get-returncode.outputs.ret > 0" @@ -171,15 +178,15 @@ runs: - name: Set comment # This will have to check for 1 or 5 # Could check for > 0 ? - #if: "contains(steps.get-returncode.outputs.ret, '1')" + # if: "contains(steps.get-returncode.outputs.ret, '1')" if: "steps.get-returncode.outputs.ret > 0" - uses: peter-evans/create-or-update-comment@v1 + uses: peter-evans/create-or-update-comment@v2 with: issue-number: ${{ github.event.pull_request.number }} body: ${{ steps.get-comment-body.outputs.body }} # This will catch INCOMPLETE and COMPLETE_SUCCESS - - name: handle ret values of 4 or 5 + - name: Handle return values of 4 or 5 shell: bash if: "steps.get-returncode.outputs.ret >= 4" run: | @@ -187,10 +194,9 @@ runs: exit 0 # This will catch FAILURE and COMPLETE_FAILURE - - name: return 1 for risk analysis failure + - name: Handle risk analysis failures shell: bash if: "contains(steps.get-returncode.outputs.ret, '1')" run: | - echo 'exiting with 1 for risk analysis failure' + echo "exiting with 1 for risk analysis failure" exit 1 - diff --git a/analyze.py b/analyze.py index 981f52a..dc5add1 100644 --- a/analyze.py +++ b/analyze.py @@ -1,17 +1,29 @@ #!/usr/bin/env python3 -import os -import sys +"""Analyze a GitHub PR with Phylum. + +States on returncode: +0 = No comment +1 = FAILED_COMMENT +5 = INCOMPLETE_COMMENT then: + 4 = COMPLETE_SUCCESS_COMMENT + 1 = COMPLETE_FAILED_COMMENT +""" import json -import re -from unidiff import PatchSet +import os import pathlib +import re +import sys from subprocess import run + +from packaging.utils import parse_sdist_filename, parse_wheel_filename +from unidiff import PatchSet + import parse_yarn ENV_KEYS = [ - "GITHUB_SHA", # for get_PR_diff; this is the SHA of the commit for the branch being merged - "GITHUB_BASE_REF", # for get_PR_diff; this is the target branch of the merge - "GITHUB_WORKSPACE", # for get_PR_diff; this is where the Pull Request code base is + "GITHUB_SHA", # for get_PR_diff; this is the SHA of the commit for the branch being merged + "GITHUB_BASE_REF", # for get_PR_diff; this is the target branch of the merge + "GITHUB_WORKSPACE", # for get_PR_diff; this is where the Pull Request code base is ] FILE_PATHS = { @@ -21,15 +33,6 @@ "pr_comment": "/home/runner/pr_comment.txt", } -''' - States on returncode - 0 = No comment - 1 = FAILED_COMMENT - 5 = INCOMPLETE_COMMENT then: - 4 = COMPLETE_SUCCESS_COMMENT - 1 = COMPLETE_FAILED_COMMENT -''' - # Headers for distinct comment types DETAILS_DROPDOWN = "
\nBackground\n
\nThis repository uses a GitHub Action to automatically analyze the risk of new dependencies added via Pull Request. An administrator of this repository has set score requirements for Phylum's five risk domains.

\nIf you see this comment, one or more dependencies added to the package manager lockfile in this Pull Request have failed Phylum's risk analysis.\n
\n\n" @@ -46,11 +49,10 @@ COMPLETE_SUCCESS_COMMENT += DETAILS_DROPDOWN FAILED_COMMENT = "## Phylum OSS Supply Chain Risk Analysis\n\n" -FAILED_COMMENT +=DETAILS_DROPDOWN +FAILED_COMMENT += DETAILS_DROPDOWN - -class AnalyzePRForReqs(): +class AnalyzePRForReqs: def __init__(self, repo, pr_num, vul, mal, eng, lic, aut): self.repo = repo self.pr_num = pr_num @@ -66,21 +68,21 @@ def __init__(self, repo, pr_num, vul, mal, eng, lic, aut): self.env = dict() self.get_env_vars() - def get_env_vars(self): for key in ENV_KEYS: temp = os.environ.get(key) if temp is not None: self.env[key] = temp else: - print(f"[ERROR] could not get value for required env variable os.environ.get({key})") + print( + f"[ERROR] could not get value for required env variable os.environ.get({key})" + ) sys.exit(11) if os.environ.get("PREVIOUS_INCOMPLETE"): self.previous_incomplete = True return - def new_get_PR_diff(self): - pr_commit_sha = self.env.get("GITHUB_SHA") + def new_get_pr_diff(self): target_branch = self.env.get("GITHUB_BASE_REF") diff_target = f"origin/{target_branch}" @@ -90,7 +92,7 @@ def new_get_PR_diff(self): git_fetch_res = run("git fetch origin".split(" ")) if git_fetch_res.returncode != 0: - print(f"[ERROR] failed to git fetch origin") + print("[ERROR] failed to git fetch origin") sys.exit(11) cmd = [ @@ -100,68 +102,79 @@ def new_get_PR_diff(self): ] result = run(cmd, capture_output=True) if result.returncode != 0: - print(f"[ERROR] failed to git diff") + print("[ERROR] failed to git diff") sys.exit(11) os.chdir(prev) return result.stdout - - ''' Determine which changes are present in the diff. - If more than one package manifest file has been changed, fail as we can't be sure which Phylum project to analyze against ''' def determine_pr_type(self, diff_data): - patches = PatchSet(diff_data.decode('utf-8')) - ''' - Types = [ - requirements.txt, - yarn.lock, - package-lock.json, - poetry.lock, #? - ] - ''' + """Determine which changes are present in the diff. + + If more than one package manifest file has been changed, fail as we can't be sure which Phylum project to + analyze against. Supported package dependency / lock files include: + + * Python + * requirements.txt + * poetry.lock + * Javascript + * yarn.lock + * package-lock.json + * Ruby + * Gemfile.lock + """ + patches = PatchSet(diff_data.decode("utf-8")) pr_type = None - lang = None - conflict = False - changes = list() for patchfile in patches: # TODO: add poetry.lock - if 'requirements.txt' in patchfile.path: + if "requirements.txt" in patchfile.path: if not pr_type: - pr_type = 'requirements.txt' - lang = 'python' + pr_type = "requirements.txt" else: - if pr_type != 'requirements.txt': - print(f"[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset") - if 'yarn.lock' in patchfile.path: + if pr_type != "requirements.txt": + print( + "[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset" + ) + if "poetry.lock" in patchfile.path: if not pr_type: - pr_type = 'yarn.lock' - lang = 'javascript' + pr_type = "poetry.lock" else: - if pr_type != 'yarn.lock': - print(f"[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset") - if 'package-lock.json' in patchfile.path: + if pr_type != "poetry.lock": + print( + "[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset" + ) + if "yarn.lock" in patchfile.path: if not pr_type: - pr_type = 'package-lock.json' - lang = 'javascript' + pr_type = "yarn.lock" else: - if pr_type != 'package-lock.json': - print(f"[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset") - if 'Gemfile.lock' in patchfile.path: + if pr_type != "yarn.lock": + print( + "[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset" + ) + if "package-lock.json" in patchfile.path: if not pr_type: - pr_type = 'Gemfile.lock' - lang = 'ruby' + pr_type = "package-lock.json" else: - if pr_type != 'Gemfile.lock': - print(f"[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset") + if pr_type != "package-lock.json": + print( + "[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset" + ) + if "Gemfile.lock" in patchfile.path: + if not pr_type: + pr_type = "Gemfile.lock" + else: + if pr_type != "Gemfile.lock": + print( + "[ERROR] PR contains changes from mulitple packaging systems - cannot determine changeset" + ) print(f"[DEBUG] pr_type: {pr_type}") return pr_type - - ''' Build a list of changes from diff hunks based on the PR_TYPE ''' def get_diff_hunks(self, diff_data, pr_type): - patches = PatchSet(diff_data.decode('utf-8')) + """Build a list of changes from diff hunks based on the PR_TYPE.""" + patches = PatchSet(diff_data.decode("utf-8")) changes = list() for patchfile in patches: @@ -173,43 +186,42 @@ def get_diff_hunks(self, diff_data, pr_type): print(f"[DEBUG] get_reqs_hunks: found {len(changes)} changes for {pr_type}") return changes - ''' Parse package-lock.json diff to generate a list of tuples of (package_name, version) ''' def parse_package_lock(self, changes): + """Parse package-lock.json diff to generate a list of tuples of (package_name, version).""" cur = 0 - name_pat = re.compile(r".*\"(.*?)\": \{") - version_pat = re.compile(r".*\"version\": \"(.*?)\"") - resolved_pat = re.compile(r".*\"resolved\": \"(.*?)\"") + name_pat = re.compile(r".*\"(.*?)\": \{") + version_pat = re.compile(r".*\"version\": \"(.*?)\"") + resolved_pat = re.compile(r".*\"resolved\": \"(.*?)\"") pkg_ver = list() - while cur < len(changes)-2: + while cur < len(changes) - 2: name_match = re.match(name_pat, changes[cur]) - if version_match := re.match(version_pat, changes[cur+1]): - if resolved_match := re.match(resolved_pat, changes[cur+2]): + if version_match := re.match(version_pat, changes[cur + 1]): + if resolved_match := re.match(resolved_pat, changes[cur + 2]): name = name_match.groups()[0] ver = version_match.groups()[0] - pkg_ver.append((name,ver)) - cur +=1 + pkg_ver.append((name, ver)) + cur += 1 print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}") return pkg_ver - ''' Parse yarn.lock diff to generate a list of tuples of (package_name, version) ''' - def parse_yarn_lock(self, changes): + """Parse yarn.lock diff to generate a list of tuples of (package_name, version).""" pkg_ver = parse_yarn.parse_yarn_lock_changes(changes) print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}") return pkg_ver def parse_gemfile_lock(self, changes): cur = 0 - name_ver_pat = re.compile(r"\s{4}(.*?)\ \((.*?)\)") + name_ver_pat = re.compile(r"\s{4}(.*?)\ \((.*?)\)") pkg_ver = list() while cur < len(changes): if name_ver_match := re.match(name_ver_pat, changes[cur]): name = name_ver_match.groups()[0] ver = name_ver_match.groups()[1] - pkg_ver.append((name,ver)) + pkg_ver.append((name, ver)) cur += 1 print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}") @@ -224,110 +236,131 @@ def parse_requirements_txt(self, changes): if name_ver_match := re.match(name_ver_pat, changes[cur]): name = name_ver_match.groups()[0] ver = name_ver_match.groups()[1] - pkg_ver.append((name,ver)) + pkg_ver.append((name, ver)) cur += 1 print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}") return pkg_ver + def parse_poetry_lock(self, changes): + """Parse lines added to a poetry.lock file to identify package names and versions.""" + file_name_pat = re.compile( + r"""^ # match beginning of string + \s{4} # start with four spaces + {file # start of the mapping for a file + \s=\s # whitespace separated mapping assignment operator + "(.*?)" # non-greedy capture group for the file name + """, + re.VERBOSE, + ) + pkg_ver = set() + + for change in changes: + if pattern_match := re.match(file_name_pat, change): + filename = pattern_match.groups()[0] + if filename.endswith(".tar.gz"): + name, ver = parse_sdist_filename(filename) + pkg_ver.add((name, str(ver))) + elif filename.endswith(".whl"): + name, ver, *_ = parse_wheel_filename(filename) + pkg_ver.add((name, str(ver))) + + print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}") + return list(pkg_ver) - ''' Parse requirements.txt to generate a list of tuples of (package_name, version) ''' def generate_pkgver(self, changes, pr_type): - if pr_type == 'requirements.txt': - # pat = re.compile(r"(.*)==(.*)") - pkg_ver_tup = self.parse_requirements_txt(changes) - return pkg_ver_tup - elif pr_type == 'yarn.lock': - pkg_ver_tup = self.parse_yarn_lock(changes) - return pkg_ver_tup - elif pr_type == 'package-lock.json': - pkg_ver_tup = self.parse_package_lock(changes) - return pkg_ver_tup - elif pr_type == "Gemfile.lock": - pkg_ver_tup = self.parse_gemfile_lock(changes) - return pkg_ver_tup - - # shouldn't get here - return pkg_ver_tup - - ''' Read phylum_analysis.json file ''' + """Parse dependency file to generate a list of tuples of (package_name, version).""" + if pr_type == "requirements.txt": + return self.parse_requirements_txt(changes) + if pr_type == "poetry.lock": + return self.parse_poetry_lock(changes) + if pr_type == "yarn.lock": + return self.parse_yarn_lock(changes) + if pr_type == "package-lock.json": + return self.parse_package_lock(changes) + if pr_type == "Gemfile.lock": + return self.parse_gemfile_lock(changes) + return None + def read_phylum_analysis(self, filename): + """Read phylum_analysis.json file.""" if not pathlib.Path(filename).is_file(): print(f"[ERROR] Cannot find {filename}") sys.exit(11) - with open(filename,'r') as infile: + with open(filename, "r", encoding="utf-8") as infile: data = infile.read() phylum_analysis_json = json.loads(data) print(f"[DEBUG] phylum_analysis: read {len(data)} bytes") return phylum_analysis_json - ''' Parse risk packages in phylum_analysis.json - Ensure packages are in "complete" state; If not, fail - Call check_risk_scores on individual package data ''' def parse_risk_data(self, phylum_json, pkg_ver): - phylum_pkgs = phylum_json.get('packages') + """Parse risk packages in phylum_analysis.json file. + + Packages that are in a completed analysis state will be included in the risk score report. + Packages that have not completed analysis will be included with other incomplete packages + and the overall PR will be allowed to pass, but with a note about re-running again later. + """ + phylum_pkgs = phylum_json.get("packages") risk_scores = list() - for pkg,ver in pkg_ver: - for elem in phylum_pkgs: - if elem.get('name') == pkg and elem.get('version') == ver: - if elem.get('status') == 'complete': - risk_scores.append(self.check_risk_scores(elem)) - elif elem.get('status') == 'incomplete': - self.incomplete_pkgs.append((pkg,ver)) + for pkg, ver in pkg_ver: + for phylum_pkg in phylum_pkgs: + if phylum_pkg.get("name") == pkg and phylum_pkg.get("version") == ver: + if phylum_pkg.get("status") == "complete": + risk_scores.append(self.check_risk_scores(phylum_pkg)) + elif phylum_pkg.get("status") == "incomplete": + self.incomplete_pkgs.append((pkg, ver)) self.gbl_incomplete = True return risk_scores - ''' Check risk scores of a package against user-provided thresholds - If a package has a risk score below the threshold, set the fail bit and - Generate the markdown output for pr_comment.txt ''' def check_risk_scores(self, package_json): - riskvectors = package_json.get('riskVectors') + """Check risk scores of a package against user-provided thresholds. + + If a package has a risk score below the threshold, set the fail bit and + Generate the markdown output for pr_comment.txt + """ + riskvectors = package_json.get("riskVectors") failed_flag = 0 - vuln_flag = 0 issue_flags = list() fail_string = f"### Package: `{package_json.get('name')}@{package_json.get('version')}` failed.\n" - fail_string += f"|Risk Domain|Identified Score|Requirement|\n" - fail_string += f"|-----------|----------------|-----------|\n" - - - pkg_vul = riskvectors.get('vulnerability') - pkg_mal = riskvectors.get('malicious_code') - pkg_eng = riskvectors.get('engineering') - pkg_lic = riskvectors.get('license') - pkg_aut = riskvectors.get('author') + fail_string += "|Risk Domain|Identified Score|Requirement|\n" + fail_string += "|-----------|----------------|-----------|\n" + + pkg_vul = riskvectors.get("vulnerability") + pkg_mal = riskvectors.get("malicious_code") + pkg_eng = riskvectors.get("engineering") + pkg_lic = riskvectors.get("license") + pkg_aut = riskvectors.get("author") if pkg_vul <= self.vul: failed_flag = 1 - vuln_flag = 1 - issue_flags.append('vul') + issue_flags.append("vul") fail_string += f"|Software Vulnerability|{pkg_vul*100}|{self.vul*100}|\n" if pkg_mal <= self.mal: failed_flag = 1 - issue_flags.append('mal') + issue_flags.append("mal") fail_string += f"|Malicious Code|{pkg_mal*100}|{self.mal*100}|\n" if pkg_eng <= self.eng: failed_flag = 1 - issue_flags.append('eng') + issue_flags.append("eng") fail_string += f"|Engineering|{pkg_eng*100}|{self.eng*100}|\n" if pkg_lic <= self.lic: failed_flag = 1 - issue_flags.append('lic') + issue_flags.append("lic") fail_string += f"|License|{pkg_lic*100}|{self.lic*100}|\n" if pkg_aut <= self.aut: failed_flag = 1 - issue_flags.append('aut') + issue_flags.append("aut") fail_string += f"|Author|{pkg_aut*100}|{self.aut*100}|\n" fail_string += "\n" fail_string += "#### Issues Summary\n" - fail_string += f"|Risk Domain|Risk Level|Title|\n" - fail_string += f"|-----------|----------|-----|\n" + fail_string += "|Risk Domain|Risk Level|Title|\n" + fail_string += "|-----------|----------|-----|\n" issue_list = self.build_issues_list(package_json, issue_flags) - for rd,rl,title in issue_list: + for rd, rl, title in issue_list: fail_string += f"|{rd}|{rl}|{title}|\n" - # return fail_string if failed_flag else None if failed_flag: self.gbl_failed = True return fail_string @@ -338,7 +371,6 @@ def build_issues_list(self, package_json, issue_flags: list): issues = list() pkg_issues = package_json.get("issues") - for flag in issue_flags: for pkg_issue in pkg_issues: if flag in pkg_issue.get("risk_domain"): @@ -349,24 +381,22 @@ def build_issues_list(self, package_json, issue_flags: list): return issues - def get_project_url(self, phylum_json): project_id = phylum_json.get("project") url = f"https://app.phylum.io/projects/{project_id}" return url def run_prtype(self): - diff_data = self.new_get_PR_diff() + diff_data = self.new_get_pr_diff() pr_type = self.determine_pr_type(diff_data) if pr_type is None: pr_type = "NA" - # with open('/home/runner/prtype.txt','w') as outfile: - with open(FILE_PATHS.get("pr_type"),'w') as outfile: + with open(FILE_PATHS.get("pr_type"), "w", encoding="utf-8") as outfile: outfile.write(pr_type) sys.exit(0) def run_analyze(self): - diff_data = self.new_get_PR_diff() + diff_data = self.new_get_pr_diff() pr_type = self.determine_pr_type(diff_data) changes = self.get_diff_hunks(diff_data, pr_type) pkg_ver = self.generate_pkgver(changes, pr_type) @@ -376,11 +406,12 @@ def run_analyze(self): returncode = 0 output = "" - # Write pr_comment.txt only if the analysis failed and all pkgvers are completed(self.gbl_result == 1) - if self.gbl_failed == True and self.gbl_incomplete == False: + # Write pr_comment.txt only if the analysis failed and all pkgvers are completed + if self.gbl_failed and not self.gbl_incomplete: returncode = 1 - # if this is a repeated test of previously incomplete packages, set the comment based on states of failed, not incomplete and previous - if self.previous_incomplete == True: + # if this is a repeated test of previously incomplete packages, + # set the comment based on states of failed, not incomplete and previous + if self.previous_incomplete: output = COMPLETE_FAILED_COMMENT else: output = FAILED_COMMENT @@ -391,30 +422,35 @@ def run_analyze(self): output += line # If any packages are incomplete, add 5 to the returncode so we know the results are incomplete - if self.gbl_incomplete == True: + if self.gbl_incomplete: returncode = 5 - print(f"[DEBUG] {len(self.incomplete_pkgs)} packages were incomplete as of the analysis job") - output = INCOMPLETE_COMMENT.replace("TKTK",str(len(self.incomplete_pkgs))) + print( + f"[DEBUG] {len(self.incomplete_pkgs)} packages were incomplete as of the analysis job" + ) + output = INCOMPLETE_COMMENT.replace("TKTK", str(len(self.incomplete_pkgs))) - if self.gbl_failed == False and self.gbl_incomplete == False and self.previous_incomplete == True: + if not self.gbl_failed and not self.gbl_incomplete and self.previous_incomplete: returncode = 4 - print(f"[DEBUG] failed=False incomplete=False previous_incomplete=True") + print("[DEBUG] failed=False incomplete=False previous_incomplete=True") output = COMPLETE_SUCCESS_COMMENT - with open(FILE_PATHS.get("returncode"),'w') as resultout: + with open(FILE_PATHS.get("returncode"), "w", encoding="utf-8") as resultout: resultout.write(str(returncode)) print(f"[DEBUG] returncode: wrote {str(returncode)}") - with open(FILE_PATHS.get("pr_comment"),'w') as outfile: + with open(FILE_PATHS.get("pr_comment"), "w", encoding="utf-8") as outfile: outfile.write(output) outfile.write(f"\n[View this project in Phylum UI]({project_url})") print(f"[DEBUG] pr_comment.txt: wrote {outfile.tell()} bytes") + if __name__ == "__main__": argv = sys.argv if argc := len(sys.argv) < 4: - print(f"Usage: {argv[0]} ACTION:(analyze|pr_type) GITHUB_REPOSITORY PR_NUM VUL_THRESHOLD MAL_THRESHOLD ENG_THRESHOLD LIC_THRESHOLD AUT_THRESHOLD") + print( + f"Usage: {argv[0]} ACTION:(analyze|pr_type) GITHUB_REPOSITORY PR_NUM VUL_THRESHOLD MAL_THRESHOLD ENG_THRESHOLD LIC_THRESHOLD AUT_THRESHOLD" + ) sys.exit(11) action = argv[1] diff --git a/parse_yarn.py b/parse_yarn.py index 9d097a9..0e45333 100644 --- a/parse_yarn.py +++ b/parse_yarn.py @@ -1,88 +1,90 @@ #!/usr/bin/env python3 -import io import re import sys from pathlib import Path + def search(pkg_ver, search): - for a,b in pkg_ver: + for a, b in pkg_ver: if search in a: print(f"({a},{b})") + def parse_yarnpkg(line: str): ret_str = "" # resolved "https://registry.yarnpkg.com/window-size/-/window-size-0.1.0.tgz#5438cd2ea93b202efa3a19fe8887aee7c94f9c9d" if "yarnpkg.com" in line: - line = line.replace("https://registry.yarnpkg.com/","") + line = line.replace("https://registry.yarnpkg.com/", "") yarnpkg_match = re.match(r"(.*?)(?=/)", line) ret_str = yarnpkg_match.group() # resolved "https://registry.npmjs.org/@types/styled-jsx/-/styled-jsx-2.2.8.tgz#b50d13d8a3c34036282d65194554cf186bab7234" elif "npmjs" in line: - line = re.sub(r'https://registry.npmjs....','',line) + line = re.sub(r"https://registry.npmjs....", "", line) npmpkg_match = re.match(r"(.*?)(?=/)", line) ret_str = npmpkg_match.group() - else: #should only be npm + else: # should only be npm print("[ERROR] yarn_parse:parse_yarnpkg found a registry link that is unknown") return ret_str + def parse_yarnv2_lock(changes): cur = 0 - name_pat = re.compile(r"[\"]?(@?.*?)(?=@).*:") - version_pat = re.compile(r".*version: (.*)") - resolved_pat = re.compile(r".*resolution: \"(.*?)\"") - integrity_pat = re.compile(r".*checksum.*") + name_pat = re.compile(r"[\"]?(@?.*?)(?=@).*:") + version_pat = re.compile(r".*version: (.*)") + resolved_pat = re.compile(r".*resolution: \"(.*?)\"") pkg_ver = list() - while cur < len(changes)-3: + while cur < len(changes) - 3: name_match = re.match(name_pat, changes[cur]) - if version_match := re.match(version_pat, changes[cur+1]): - if resolved_match := re.match(resolved_pat, changes[cur+2]): - # if integrity_match := re.match(integrity_pat, changes[cur+3]): + if version_match := re.match(version_pat, changes[cur + 1]): + if resolved_match := re.match(resolved_pat, changes[cur + 2]): if name_match: name = name_match.groups()[0] else: - #print(f"No name - need to parse {resolved_match.groups()[0]}") + # print(f"No name - need to parse {resolved_match.groups()[0]}") name = parse_yarnpkg(resolved_match.groups()[0]) ver = version_match.groups()[0] - pkg_ver.append((name,ver)) + pkg_ver.append((name, ver)) cur += 1 return pkg_ver def parse_yarnv1_lock(changes): cur = 0 - name_pat = re.compile(r"[\"]?(@?.*?)(?=@).*:") - version_pat = re.compile(r".*version \"(.*?)\"") - resolved_pat = re.compile(r".*resolved \"(.*?)\"") - integrity_pat = re.compile(r".*integrity.*") + name_pat = re.compile(r"[\"]?(@?.*?)(?=@).*:") + version_pat = re.compile(r".*version \"(.*?)\"") + resolved_pat = re.compile(r".*resolved \"(.*?)\"") + integrity_pat = re.compile(r".*integrity.*") pkg_ver = list() # the parser breaks if the changeset only has 1 package upgraded from one version to another - if len(changes) < 4: #only 1 package was upgraded in the changeset that will trigger this + if ( + len(changes) < 4 + ): # only 1 package was upgraded in the changeset that will trigger this if version_match := re.match(version_pat, changes[cur]): - if resolved_match := re.match(resolved_pat, changes[cur+1]): - if integrity_match := re.match(integrity_pat, changes[cur+2]): + if resolved_match := re.match(resolved_pat, changes[cur + 1]): + if integrity_match := re.match(integrity_pat, changes[cur + 2]): name = parse_yarnpkg(resolved_match.groups()[0]) ver = version_match.groups()[0] - pkg_ver.append((name,ver)) + pkg_ver.append((name, ver)) else: - while cur < len(changes)-3: + while cur < len(changes) - 3: name_match = re.match(name_pat, changes[cur]) - if version_match := re.match(version_pat, changes[cur+1]): - if resolved_match := re.match(resolved_pat, changes[cur+2]): - if integrity_match := re.match(integrity_pat, changes[cur+3]): + if version_match := re.match(version_pat, changes[cur + 1]): + if resolved_match := re.match(resolved_pat, changes[cur + 2]): + if integrity_match := re.match(integrity_pat, changes[cur + 3]): if name_match: name = name_match.groups()[0] else: - #print(f"No name - need to parse {resolved_match.groups()[0]}") + # print(f"No name - need to parse {resolved_match.groups()[0]}") name = parse_yarnpkg(resolved_match.groups()[0]) ver = version_match.groups()[0] - pkg_ver.append((name,ver)) + pkg_ver.append((name, ver)) cur += 1 return pkg_ver @@ -98,19 +100,20 @@ def parse_yarn_lock_changes(changes): lockfile_version = 2 if lockfile_version == 1: - print(f"Parsed yarn v1 lockfile") + print("Parsed yarn v1 lockfile") return parse_yarnv1_lock(changes) else: - print(f"Parsed yarn v2 lockfile") + print("Parsed yarn v2 lockfile") return parse_yarnv2_lock(changes) -''' Take a file name and call the relevant yarn parser''' + def parse_yarn_lockfile(filename): + """Take a file name and call the relevant yarn parser.""" if not Path(filename).is_file(): print("[ERROR] filename is not a file") sys.exit(1) - with open(filename, 'r') as infile: + with open(filename, "r") as infile: changes = infile.read().splitlines() lockfile_version = 1 candidate_line = "" @@ -122,10 +125,10 @@ def parse_yarn_lockfile(filename): lockfile_version = 2 if lockfile_version == 1: - print(f"Parsed yarn v1 lockfile") + print("Parsed yarn v1 lockfile") return parse_yarnv1_lock(changes) else: - print(f"Parsed yarn v2 lockfile") + print("Parsed yarn v2 lockfile") return parse_yarnv2_lock(changes) diff --git a/test_matrix.py b/test_matrix.py index a8d34eb..640c6d9 100644 --- a/test_matrix.py +++ b/test_matrix.py @@ -1,10 +1,18 @@ #!/usr/bin/env python +"""Test matrix for validating various phylum analysis output results. +Return Codes: +0 = FAIL +1 = INCOMPLETE +2 = COMPLETE_FAIL +3 = COMPLETE_SUCCESS +4 = SUCCESS +""" + +import hashlib import os -import sys -from pathlib import Path import shutil -import hashlib +from pathlib import Path ENV_KEYS = [ "GITHUB_RUN_ATTEMPT", @@ -17,17 +25,12 @@ "FAIL_FILE": Path(GHAP + "/testing/fail_phylum.json").resolve(), "INCOMPLETE_FILE": Path(GHAP + "/testing/incomplete_phylum.json").resolve(), "COMPLETE_FAIL_FILE": Path(GHAP + "/testing/complete_fail_phylum.json").resolve(), - "COMPLETE_SUCCESS_FILE": Path(GHAP + "/testing/complete_success_phylum.json").resolve(), + "COMPLETE_SUCCESS_FILE": Path( + GHAP + "/testing/complete_success_phylum.json" + ).resolve(), "SUCCESS_FILE": Path(GHAP + "/testing/success_phylum.json").resolve(), } -''' -0 = FAIL -1 = INCOMPLETE -2 = COMPLETE_FAIL -3 = COMPLETE_SUCCESS -4 = SUCCESS -''' class TestMatrix: def __init__(self): @@ -40,7 +43,7 @@ def get_env_vars(self): if temp is not None: self.env[key] = temp - def swap_phylum_file(self,filename): + def swap_phylum_file(self, filename): file = FILES.get(filename) home = Path.home() dest = home.joinpath("phylum_analysis.json") @@ -50,7 +53,6 @@ def swap_phylum_file(self,filename): md5 = hashlib.md5(open(dest, "rb").read()).hexdigest() print(f"MD5 of target: {md5}") - def run(self): state = int(self.env.get("GITHUB_RUN_ATTEMPT")) % 5 print(f"state: {state}")