Skip to content

Steps for updating dependency generation for new languages

Vincent Raymond edited this page Mar 15, 2024 · 3 revisions

skema/program_analysis/module_locate.py

extract_imports

Modify the extract_imports function to support extracting the names of imported modules for your language. For Python we use regex on the raw source code, but depending on the particular language you could also use Gromet or tree-sitter.

For example, the following would return ["time", "tree_sitter"]:

import time
from tree_sitter import Tree

Note that we are only returning the module name and not the object or function being imported. This is because module is currently the smallest unit of ingestion for Gromet.

module_locate

Update module_locate function to support locating the source for the modules. This task is more opened ended depending on your particular language. For Python we check:

  1. Python directory for built-in modules
  2. site-packages for installed modules
  3. pypi for all other module

skema/program_analyis/multi_file_ingester.py

clean_dependencies

The clean_dependencies function is used to postprocess the dependency references generated by the module_locate module. Some of these steps will likely be relevant between languages such as removing duplicate dependencies. Others like removing submodules if a parent module is present will only be relevant for certain languages like Python. Additionally, there may need to be further steps added for other languages.

Clone this wiki locally