Skip to content

Adding New Active Site Source

Steve Teplica edited this page May 2, 2019 · 1 revision

Current Sources of Truth

Currently, the sources of truth for active site information are:

  1. The Catalytic Site Atlas (CSA), we use curated_data.csv
  2. Curated data from ProMol

We use a version-locked file from the CSA so that the file format can't change out from under our feet. We recommend version-locking future sources of truth similarly, by just downloading the file and adding it to the repository.

Adding a New Source of Truth

We implemented a pattern for adding new active site parsers to Moltimate. There is an interface under src/main/java/org/moltimate/moltimatebackend/parser/activesite called ActiveSiteParser.java.

  1. Create another class in this package, similar to one of the other custom parsers, which implements this interface
  2. Add a method implementation for List<ActiveSite> parseMotifs(), which parses some data file
  3. Add a new instance of this new parser to the ORDERED_PARSERS list in ActiveSiteUtils.java.

In the case of duplicate protein active sites, where the primary key of an active site is its PDB id, the parser which appears earlier in the list will have a higher priority. This means that a duplicate active site which comes from a parser later in the list will be overridden by a parser that found that same active site earlier in the parser list.