Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe versioned TRS URIs #202

Merged
merged 3 commits into from
Aug 4, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 69 additions & 7 deletions 2.0.0-Data-Model.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,84 @@ We are starting with a read-only API due to potentially different views and appr

Multiple formats for descriptors such as [CWL](https://github.com/common-workflow-language/common-workflow-language) and [WDL](https://github.com/broadinstitute/wdl) are permitted.

#### TRS URIs

For convenience, including potentially when passing content references to a WES server, we define a URI syntax for TRS-accessible content. Strings of the form `trs://<server>/<id>` mean _“you can fetch the content with TRS id `<id>` from the TRS server at `<server>` "_.

For example, if a WES server was asked to process `trs://trs.example.org/123456323`, it would know that it could issue a GET request to `https://trs.example.org/api/ga4gh/trs/v2/tools/123456323` to learn how to fetch that object.
#### TRS Tool and TRS Tool Version IDs

Each implementation of TRS can choose its own identifier scheme, as long as it
follows these guidelines:

* TRS Tool and TRS Tool Version IDs are strings made up of uppercase and
lowercase letters, decimal digits, hyphen, period, underscore and tilde
(`[A-Za-z0-9.-_~]`). See [RFC 3986 §
2.3](https://datatracker.ietf.org/doc/html/rfc3986#section-2.3). TRS Tool and
TRS Tool Version IDs MAY further contain percent characters (`%`) whenever
they are used in API calls, but only if they were introduced through
percent-encoding of any non-valid characters (see next bullet point).
* TRS Tool and TRS Tool Version IDs can contain other characters, but they MUST
be percent-encoded (see [RFC 3986 §
2.4](https://datatracker.ietf.org/doc/html/rfc3986#section-2.4)) into valid
TRS Tool and TRS Tool Version IDs as per the previous rule whenever they are
used in API calls. This is because non-encoded IDs may interfere with the
interpretation of routes, e.g., for the `/tools/{id}/versions/{version_id}`
endpoint.
* Any given TRS Tool or TRS Tool Version ID MUST always identify the same
resource (tool or tool version, respectively) on a given TRS implementation.
This constraint aids with reproducibility.
* TRS implementations MAY have more than one TRS Tool or TRS Tool Version ID
mapping to the same resource (tool or tool version, respectively).

This recommendation is intended to mirror the discussion that went into the [DRS URI scheme](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/#_drs_uris).
#### TRS URIs

For informational purposes, we recommend that TRS implementations add themselves to https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/registry.json to provide for the possibility of creating a global indexing service and to allow others to more easily discover a TRS implementation.
To conveniently pass content references to TRS resources, e.g., to advise a
[WES](https://github.com/ga4gh/workflow-execution-service-schemas) or
[TES](https://github.com/ga4gh/task-execution-schemas) service which tool
(version) to use, we define a URI syntax for TRS-accessible content. Strings of
the form `trs://<server>/<id>` (unversioned) and
`trs://<server>/<id>/<version_id>` (versioned) mean _“you can fetch the content
with TRS Tool ID `<id>` and, if provided, TRS Tool Version ID `<version_id>`
from the TRS server at `<server>` "_.

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how this will work with versioning?

What happens when TRS v3 comes out and a new product only implements TRS v3 and doesn't implement v2?

That's assuming TRS v3 will use a new path segment v3, which I suppose it doesn't have to. But I assume it would.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially very primitive content negotiation where the user agent would need to try the API versions it would be familiar with?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think versioning APIs is an important topic and I think there's most definitely a need for discussion and harmonization across GA4GH APIs (e.g., Stian mentioned HATEOAS in #160), but as this PR does not introduces the issue nor is designed to address it, I don't think it should be blocked by it.

Besides, I don't think that TRS URIs, as identifiers of resources, should be concerned with API versioning at all: IMO, when bumping API versions, care must be taken (both on the spec- and implementation-side) not to introduce ambiguities in the way that the different responses in, e.g., TRS v2 and v3 can be interpreted by a client. And if changes to the internal representation of resources are required that would break backward compatibility, then I think those resources should be cloned and given new identifiers, such that reproducibility is not negatively affected. As long as these considerations/constraints are honored, it shouldn't matter which API version is used to access a given resource.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I have clarified this/addressed these concerns now in lines 56/57 and 69-72

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was not about API/schema versioning, which is a very complex subject and I'm no expert in. It was about the example that if you have a URI trs://trs.example.org/tool_ABC/v1.2.3, it says you can fetch the content https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC (note the v2 in the path).

If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2 in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.

Maybe I'm overthinking this, and it is a real edge case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is an API/schema versioning topic because the API version is precisely what v2 refers to. In any case, there anyway needs to be some sort of negotiation between a service and a client, because not every client will implement every version of the API specs, and so there will be cases where clients and servers won't match, regardless of whether the client will iterate or whether we specify the version in the TRS URI. One possibility to facilitate this negotation would be to define the supported API versions in the service info, but of course, the Service Info API is versioned, too...

I agree this is an important issue and needs to be discussed on a larger scale within the Cloud WS in order to come up with a sensible strategy that covers DRS and TRS URIs as well as other artefacts/problems with API versioning. A related issue I faced is also that the path (e.g., ga4gh/trs/v2) defined in the schema may not always be possible or desirable to implement. For example, as far as I know, Dockstore does not. The path info could perhaps also be defined in the service info for any given service (although the same problem applies: how do you get to the service info in the first place; perhaps via the service registry...).

Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs, so I don't think it should block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is an API/schema versioning topic

Fair. I was thinking more of different schemas in different versions of the API when I made my comment, but you're right.

Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs

I think the explicitness that you can take a URI and translate it to a specific path is a new wrinkle, at least for TRS. You're right about DRS -- I hadn't looked until now and they do the same thing, with v1 in the path.

so I don't think it should block.

I had already approved, so it is not blocked.

Copy link
Collaborator Author

@uniqueg uniqueg Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @coverbeck

So @denis-yuen, others: shall we bring up this important discussion to the Cloud WS?

Also perhaps to see if people agree that it might be useful to specify TRS URIs as inputs to WES (to access workflows) and inside workflows (to access containers through WES/TES implementations)? Because this might require changes on WES and TES specs as well, without which TRS URIs might not be all that useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to break this discussion into a separate ticket #221

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uniqueg This example is a bit awkward and likely points to a deficiency in the current approach for defining trs identifiers (and potentially drs identifiers as well). The TES specification defines the root path of the API as /ga4gh/trs/<version>.

Putting aside specific trs versioning issues (from the conversation above), I am now left with another dilema: TRS identifiers as defined only provide the hostname and a toolId and possibly a version id. This approach does not leave ANY room for path values that may come before the /ga4gh/... prefix. Given this, how am I supposed to resolve the actual trs URL for the example?

By convention, any tooling I create would resolve trs://trs.example.org/tool_ABC/v1.2.3 to https://trs.example.org/ga4gh/trs/v2/tools/tool_ABC/versions/v1.2.3, which is not what is presented in this example (note the missing API). I would not be so concerned over this if it was not already present in the wild in the Dockstore api (@denis-yuen). So my question then is how can we possibly resolve the path components without introducing ambiguity into the trs identifier.

One approach could be just adding the prefix paths prior to the toolId trs://trs.example.org/api/tool_ABC. Unfortuantely this is ambiguous since this could be referring to either a tool with the path prefix api and version tool_ABC or a tool tool_ABC with a path api. If we do not allow the path prefix to be in the identifier, any tooling is limited to the trs implementations that It already knows how to resolve, effectively limiting the usefulness of the trs id in general.

We essentially need a reliable way to extract the toolId and versionId without any ambiguity, while still allowing preceding fragments in the path. Some examples are below:

trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/v/<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/version/<versionId>

Copy link
Member

@denis-yuen denis-yuen May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could probably calculate the prefixes on the fly from https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/registry.json but it probably does make more sense to change the URI format since we're not using it yet(?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denis-yuen I would advise against relying on prefixes in the registry.json file, which automatically enforces that file is the authoratative source of all tool registries. It does not really allow for new registries to dynamically be created (ie in a multitenant system)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my personal preference for approaches would likely be: trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>. The side effect of this is that : is now a restricted character that cannot be used in the identifier itself, but I think that is a reasonable restriction

Copy link
Collaborator Author

@uniqueg uniqueg May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr

I agree with @patmagee's suggestion provided that DRS folks agree that it will work for them as well.


Good catch @patmagee - thanks a lot! I had actually realized this issue before when implementing a TRS client that understands TRS URIs and having to introduce an optional base_path parameter so that it would work with Dockstore. It seems I had forgotten about it though. And I agree, the TRS URI loses its utility if this has to be known beforehand.

From what I can see, @patmagee's suggestion should do the trick.

But thinking this through a little more, maybe there are a couple of things that we should discuss outside of this issue before merging:

  • To make sure that whatever is decided here has the perspective of aligning with whatever is decided for DRS URIs in the future, I think this discussion should include the DRS community. @patmagee's other worry of being able to include basically any character in TRS Tool/Version IDs could be brought up there as well.
  • I am not quite sure whether API versioning could have any effect on a client's ability to properly resolve a TRS URI. I feel that it should not. But without explicit guidelines on API versioning, outlining expected behaviors and discussing their impact on reproducibility, I feel that it is hard to be sure. So perhaps it might be worth to discuss specifying such guidelines somewhere, possibly in a place that includes other work streams as well.

Besides that, it is possibly a good idea to aim for a consensus of how servers are defined in the various Cloud WS APIs in the mid to long term. And whatever is decided here, should not collide with our definitions of TRS and DRS URIs. Currently, the situation is like this:

TRS

servers:
  - url: /ga4gh/trs/v2

DRS

servers:
  - url: https://{serverURL}/ga4gh/drs/v1
    variables:
      serverURL:
        default: drs.example.org
        description: >
          DRS server endpoints MUST be prefixed by the '/ga4gh/drs/v1' endpoint
          path

WES

servers:
- url: https://{defaultHost}/{basePath}/{apiVersion}
  variables:
    defaultHost:
      default: www.example.com
    basePath:
      default: ga4gh/wes
    apiVersion:
      default: v1

TES

servers:
- url: /ga4gh/tes/v1

So, it looks like TRS and TES do not care where the API is deployed, as long as endpoint paths are immediately preceded by /ga4gh/trs/v2 and /ga4gh/tes/v1, respectively. DRS is very similar but a little more prescriptive in that it requires the use of the https schema. WES, on the other hand, requires https as well, but otherwise only seems to require that there be at least two slashes included in the URL before the endpoint path (weird?). In particular, it does not require a specific base path suffix like /ga4gh/wes/v1 (though the default does include it).

Now, there is currently no consensus across those definitions. But perhaps it seems to be going in the direction of the DRS definition, i.e., start with https and end with a specific base path suffix of the form /ga4gh/{api_short_name}/{api_version}. @patmagee's definition would account for that (having the client fill in the appropriate base path suffix). But even with a more relaxed definition that does not prescribed a specific base path for GA4GH Cloud APIs, the suggested solution should still work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to also break this into a separate ticket, but to prioritize it for discussion before 2.0.1 release and/or the GA4GH plenary
#220

`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC/versions/v1.2.3` to
fetch information about tool `tool_ABC` and its version `v1.2.3` from a `v2`
TRS API hosted at `https://trs.example.org/`, respectively.

> Note that clients issuing requests to TRS services MUST NOT encode forward
> the special characters slashes separating the `trs`, `<server>`, `<id>` and
> `<version_id>` components of TRS URIs. However, [TRS Tool IDs and TRS Tool
> Version IDs](#trs-tool-and-trs-tool-version-ids) containing
> non-valid characters MUST be encoded _individually_ before constructing TRS
> URIs. For example, for a TRS Tool ID `tool#1` and a TRS Version ID `(1)`, the
> correct TRS URI for server `trs.example.org` would be
> `trs://trs.example.org/tool%231/%281%29`, where `tool%231` and `%281%29` are
> the percent-encoded TRS Tool and TRS Tool Version IDs, respectively.
>
> Also note that to ensure reproducibility, servers implementing multiple
> versions of the TRS API specification MUST ensure that, within the limits of
> schema differences across different API versions, corresponding endpoints
> return consistent responses.

This recommendation is intended to mirror the discussion that went into the
[DRS URI
scheme](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/#_drs_uris),
with the necessary additions to account for versioned TRS URIs.

### Misc

The entire schema is shown below, but a more useful form is the Swagger editor to view our [schema in progress](https://editor.swagger.io/?url=https://raw.githubusercontent.com/ga4gh/tool-registry-service-schemas/develop/openapi/ga4gh-tool-discovery.yaml)

Note that the swagger editor itself can kickstart a project by generating servers and clients in a variety of languages.

#### Central GA4GH Service Registry

For informational purposes, we recommend that TRS implementations add
themselves to
<https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/registry.json>
to provide for the possibility of creating a global indexing service and to
allow others to more easily discover a TRS implementation.

#### Outstanding Questions

##### Authentication and Authorization
Expand Down