Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radicle-surf: design document #38

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
339 changes: 339 additions & 0 deletions radicle-surf/docs/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
# radicle-surf

## Overview

The main goal for the `radicle-surf` is to provide an API for
accessing a `git` repository and providing a code browsing
experience. This experience can be likened to GitHub or GitLab's
project browsing pages. It does not aim to be an UI layer, but rather
provides the functionality for a UI layer to built on top of it. With
that in mind, this document sets out to define the main components of
`radicle-surf` and a high-level design of the API.

Note that this is the second iteration of designing this library --
where the first can be found in the
[denotational-design.md][denotational] document. Since part of this
work is refactoring the original design there will be mention of
previous artefacts that are being changed. The original source code
can be found [here][radicle-surf], if you wish to study some history.

## Motivation

The `radicle-surf` crate aims to provide a safe and easy-to-use API that
supports the following features:

1. Code browsing: given a specific revision, browse files and directories.
2. Getting the difference between two revisions.
3. Retrieve the history of the commits, files, and directories.
4. Retrieve the references stored in the `git` project, e.g. branches,
remote branches, tags, etc.
5. Retrieve specific `git` objects, in a user-friendly structure.

The main goals of the refactoring are:

* Reviewing the previous API and making it simpler where possible.
* Address open issues in the original [`radicle-surf`] project as much
as possible.
* In contrast to the previous implementation, be `git` specific and
not support other VCS systems.
* Hide away `git2` in the exposed API. The use of `git2` should be an
implementation detail.

## API Review

Before defining the future design of the API, this document intends to
review the previous API to provide guidelines for building the future
version of the API.

### Remove `Browser`

The `Browser` started out succinct but became a kitchen sink for
functionality. Some if its problems include:

* It is not a source of truth of any information. For example,
`list_branches` method is just a wrapper of
`Repository::list_branches`.
* It takes in `History`, but really works at the `Snapshot` level.
* It is mutable but the state it holds is minimal and does not provide
any use, beyond switching the `History`.

Going forward the removal of `Browser` is recommended. Some ways the
API will change with this removal are:

* For iteratoring the history, use `History`.
* For generating `Directory`, use the repository storage directly
given a revision.
* For accessing references and objects use the repository storage
directly.

### Remove `Snapshot`

A `Snapshot` was previously a function that converted a `History` into
a `Directory`. Since we can assume we are working in `git` this can be
simplified to a single function that can take a revision, that
resolves to a `Commit`, and produces a `Directory`.

## Remove `Vcs` trait

The `Vcs` trait was introduced to support different version control
backends, for example both `git` and `pijul`, and potentially
others. However, since this port is part of `radicle-git` repo, we are
only supporting `git` going forward. We no longer need another layer
of indirection defined by `Vcs` trait.

## Components

The `radicle-surf` library can be split into a few main components for
browsing a `git` repository, each of which will be discussed in the
following subsections.

Note that any of the API functions defined are _sketches_ and may not
be the final form or signature of the functions themselves. The traits
defined are recommendations, but other solutions for these
representations may be discovered during implementation of this design.

### Revisions

Before describing the next components, it is important to first
describe [revisions][git-revisions]. A revision in `git` is a way of
specifying an `Oid`. This can be done in a multitude of ways. One can
also specify a range of `Oid`s (think `git log`). The API will support
taking revisions as parameters where an `Oid` is expected. It will
not, however, permit ranges (at least for the time being) and so a
revision will be scoped to any string that can resolve to a single
`Oid`, e.g. an `Oid` string itself, a reference name, `@{date}`, etc.
The aim will be to have a trait similar to:


```rust
/// `Self` is expected to be a type that can resolve to a single
/// `Oid`.
///
/// An `Oid` is the trivial case and returns itself, and is
/// infallible.
///
/// However, some other revisions require parsing and/or looking at the
/// storage, which may result in an `Error`.
pub trait FromRevision {
type Error;

/// Resolve the revision to its `Oid`, if possible.
fn peel(&self, storage: &Storage) -> Result<Oid, Self::Error>;
}
```

#### Commit-ish

The snapshot mentioned above is a `Commit` in `git`, where the
commit points to a `Tree` object. Thus, the API should be able to take
any parameter that may resolve to a `Commit`. This idea can be
captured as a trait, similar to `FromRevision`, which allows something
to be peeled to a `Commit`.

```rust
/// `Self` is expected to be a type that can resolve to a single
/// `Commit`.
///
/// A `Commit` is the trivial case and returns itself, and is
/// infallible.
///
/// However, some other kinds of data require parsing and/or looking at the
/// storage, which may result in an `Error`.
///
/// Common cases are:
///
/// * Reference that points to a commit `Oid`.
/// * A `Tag` that has a `target` of `Commit`.
/// * An `Oid` that is the identifier for a particular `Commit`.
pub trait Commitish {
type Error;

/// Resolve the type to its `Commit`, if possible.
fn peel(&self, storage: &Storage) -> Result<Commit, Self::Error>;
}
```

### References

Many are familiar with `git` branches. They are the main point of
interaction when working within a `git` repository. However, the more
general concept of a branch is a
[reference][git-references]. References are stored within the `git`
repository under the `refs/` directory. Within this directory, `git`
designates a few specific [namespaces][git-references]:

* `refs/heads` -- local branches
* `refs/remotes` -- remote branches
* `refs/tags` -- tagged `git` objects
* `refs/notes` -- attached notes to `git` references

These namespaces are designated special within `git`'s tooling, such
as the command line, however, there is nothing stopping one from
defining their own namespace, e.g. `refs/rad`.

As well as this, there is another way of separating `git` references
by a namespace which is achieved via the [gitnamespaces] feature. When
`GIT_NAMESPACE` or `--git-namespace` is set, the references are scoped
by `refs/namespaces/<namespace>`, e.g. if `GIT_NAMESPACE=rad` set then the
`refs/heads/main` branch would mean
`refs/namespaces/rad/refs/heads/main`.

With the above in mind, the following API functions are suggested:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API listed are all free-standing functions that takes in &Storage. Probably worth explaining what &Storage is and why the API is designed using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, good point! Maybe to reiterate -- and perhaps I didn't explain this well -- the functions are merely sketches and it's the functionality I wanted to focus on. I mention that here.

But I can also mention that Storage is deliberately opaque and merely represents any way of accessing a git repository. To clarify, this attempt doesn't care about the backing storage -- it just cares about the functions (in whatever form) exist to use.


```rust
/// Return a list of references based on the `pattern` string supplied, e.g. `refs/rad/*`
pub fn references(storage: &Storage, pattern: PatterStr) -> Result<References, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general function overlaps with more specific functions below (e.g. branches) in the sense that the user can use either of them to get the local branches (or other types of references).

Will Reference be an enum of different type of references? Or will it be basically a string? If it's an enum, we can get away without other functions for special reference types. On the other hand, special functions provide better type checking.

Do we have some user inputs on this topic, from our previous experience?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general function overlaps with more specific functions below (e.g. branches) in the sense that the user can use either of them to get the local branches (or other types of references).

I deliberately added this as a general case! :) This is to leave the API open for upstream users to fetch references under namespaces that they define themselves. The cobs use case is a prime example -- the clients team may want to be able to fetch collaborative objects and they could use radicle-surf tooling for that. Would that be right, @cloudhead?

Will Reference be an enum of different type of references? Or will it be basically a string? If it's an enum, we can get away without other functions for special reference types. On the other hand, special functions provide better type checking.

References would be an iterator over the references that match the pattern. Each reference would be their name and their target -- Oid for direct references and another reference for symbolic references.

The same would go for Branches, but the reason it has another type is that the names are expected to be prefixed with refs/heads. Similar for Tags, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is pretty standard and we wouldn't want to lose this when moving from git2.


/// Return a list of branches based on the `pattern` string supplied, e.g. `refs/heads/features/*`
pub fn branches(storage: &Storage, pattern: BranchPattern) -> Result<Branches, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the BranchPattern parameter? How common would our users like to provide an actual pattern, vs. providing no patterns?

If providing an actual pattern is not common, we could consider to support it via a separate filter function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I guess we could go either way. If a pattern was taken then we could define a constant:

lazy_static! {
  static ref ALL: BranchPattern = pattern_str!("refs/heads/*");
}

As long as Branches stays an iterator, I think the filter approach after works equally as well! :)


/// Return a list of remote branches based on the `pattern` string supplied, e.g. `refs/remotes/origin/features/*`
pub fn remotes(storage: &Storage, pattern: RemotePattern) -> Result<Remotes, Error>;

/// Return a list of tags based on the `pattern` string supplied, e.g. `refs/tags/releases/*`
pub fn tags(storage: &Storage, pattern: TagPattern) -> Result<Tags, Error>;

/// Return a list of notes based on the `pattern` string supplied, e.g. `refs/notes/blogs/*`
pub fn notes(storage: &Storage) -> Result<Notes, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what the result types of the above functions are. I.e. what Branches, Remotes, Tags, Notes would look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I probably should have been more clear but as mentioned here, #38 (comment), they would be iterators over references but with new types to signal they start with their correct namespaces.

```

It may be considered to be able to set an optional `gitnamespace`
within the storage, or ammend the pattern types to allow for scoping
by the `gitnamespace`.

The returned list will not be the objects themselves. Instead they
will be the metadata for those objects, i.e. `Oid`, `name`, etc. For
the retrieval of those objects see the section on
[Objects][#Objects]. The reason for this choice is that an UI may want
to avoid retrieving the actual object to limit the amount of data
needed. The `Oid` is the minimal amount of information required to
fetch the object itself.

### Objects

Within the `git` model, [references][#References] point to
[objects][git-objects]. The types of objects in `git` are: commits, tags (lightweight
& annotated), notes, trees, and blobs.

All of these objects can retrieved via their `Oid`. The API will
supply functions to retrieve them all for completion's sake, however,
we expect that retrieving commits, tags, and blobs will be the most
common usage.

```rust
/// Get the commit found by `oid`.
pub fn commit<R: FromRevision>(storage: &Storage, rev: R) -> Result<Commit, Error>;

/// Get the tag found by `oid`.
pub fn tag<R: FromRevision>(storage: &Storage, rev: R) -> Result<Tag, Error>;

/// Get the blob found by `oid`.
pub fn blob<R: FromRevision>(storage: &Storage, rev: R) -> Result<Blob, Error>;

/// Get the tree found by `oid`.
pub fn tree<R: FromRevision>(storage: &Storage, rev: R) -> Result<Tree, Error>;
Comment on lines +231 to +234
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although many engineers know about blob and tree objects, they are Git internals. For a high-level lib like radicle-surf, I think we should not expose them in the API, at least not initially, unless requested by our users explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair! I still think it would be good to have them available. They may be available through an objects module but not at the top level.

I think for libraries like this, it's better to have the functionality available. Just because we can't imagine the use case doesn't mean it doesn't exist yet 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with using file, directory in our API, and only support blob, tree when users request them in the future. (Unless users have already requested ;-) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file/directory only makes sense for working copies; if this library is going to be used on bare repositories, I don't see how we can get around blob and tree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not be equivalently used, no? I've advised to have these functions available in a module and not expose them at the top-level module.

This is another point for standalone functions @keepsimple1. You can hide them because they're not attached to the struct as methods 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not be equivalently used, no? I've advised to have these functions available in a module and not expose them at the top-level module.

Yeah, I mean if it's just a question of naming, and File has all the capabilities of Blob, and Directory has all the capabilities of Tree, then it could work; although I'd say it's a little confusing because it's not an abstraction, just a rename.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, that's fair. I would opt for a non-specialised tree/blob and then a specialised file/directory because they won't just be renamed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file/directory only makes sense for working copies; if this library is going to be used on bare repositories, I don't see how we can get around blob and tree?

GitHub repos are bare repos but they still present "files" and "directories". (They might did some internal extra work for that, but it is an implementation detail).

I guess that "files" and "directories" are more identified by "paths", while "blob" and "tree" are identified by their Oid. We could support both. I just think if I were to use radicle-surf to write an UI I would just use file and directory as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it depends on whether we're ok with users of surf to still require direct access to the git2 repository. If that's by design, then we could get away with File/Directory. But if the plan is for users of this library to not need git2 access, then we might need access to these lower level objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not let users of surf to have direct access to git2. I am ok if it turns out that we need to provide users access to lower-level objects via our API.


/// Get the note found by `oid`.
pub fn note<R: FromRevision>(storage: &Storage, rev: R) -> Result<Note, Error>;
```

### Project Browsing

Project browsing boils down to taking a snapshot of a `git` repository
at a point in time and providing an object at that point in
time. Generally, this object would be a `Tree`, i.e. a directory of
files. However, it may be that a particular file, i.e. `Blob`, can be
viewed.

### Directories anf Files

This provides the building blocks for defining common cases of viewing
files and directories given a `Commitish` type.

```rust
/// Get the `Directory` found at `commit`.
pub fn directory<C: Commitish, P: AsRef<Path>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a previous comment, I am pondering in the refactor design that if we can define a CommitId and IntoCommitId trait. Then directory will just take a CommitId (not a Commit itself).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I touched on here, https://github.com/radicle-dev/radicle-git/pull/38/files#r1004154254, this can be an Oid for a Commit 😁 Or anything else that resolves to a Commit, e.g. a reference, a reference name, a tag object that points to a commit, a tag name that resolves to a tag object that points to a commit, etc. 😄

storage: &Storage,
commit: C,
) -> Result<Directory, Error>

/// Get the `File` found at `commit` under the given `path`.
pub fn file<C: Commitish, P: AsRef<Path>>(
storage: &Storage,
commit: C,
path: P,
) -> Result<Option<File>, Error>
```

The `Directory` and `File` types above are deliberately opaque as
how they are defined falls out of scope of this document and should be
defined in an implementation specific design document.

### History

Since `Commit`s in `git` form a history of changes via a linked-list,
i.e. commits may have parents, and those parents grand-parents etc.,
it is important that an API for iterating through the history is
provided.

The general mechanism for looking at a history of commits is called a
[`revwalk`][libgit-revwal] in most `git` libraries. This provides a
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`revwalk`][libgit-revwal] in most `git` libraries. This provides a
[`revwalk`][libgit-revwalk] in most `git` libraries. This provides a

lazy iterator over the history, which will be useful for limiting the
memory consumption of any implementation.

```rust
// history.rs

/// Return an iterator of the `Directory`'s history, beginning with
/// the `start` provided.
pub fn directory<C: Commitish>(start: C) -> History<Directory>

/// Return an iterator of the `file`'s history, beginning with the
/// `start` provided.
pub fn file<C: Commitish>(start: C, file: File) -> History<File>

/// Return an iterator of the `Commit`'s history, beginning with the
/// `start` provided.
pub fn commit<C: Commitish>(start: C) -> History<Commit>
```

The `History` type above are deliberately opaque as how it is defined
falls out of scope of this document and should be defined in an
implementation specific design document.

### Diffs

The final component for a good project browsing experience is being
able to look at the difference between two snapshots in time. This is
colloquially shortened to the term "diff" (or diffs for plural).

Since diffs are between two snapshots, the expected API should take
two `Commit`s that resolve to `Tree`s.

```rust
/// New type to differentiate the old side of a [`diff`].
pub struct Old<C>(C);

/// New type to differentiate the new side of a [`diff`].
pub struct New<C>(C);

/// Get the difference between the `old` and the `new` directories.
pub fn diff<C: Commitish>(old: Old<C>, new: New<C>) -> Result<Diff, Error>
```

## Conclusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a missing point is a high-level overview of how this API would be used for some common tasks:

  • Show all files and directories in a repository
  • Show the repo history
  • Show the history of a file
  • Show a file's content for a specific commit (I assume this is part of the objectives of the API)

For example:

Show all files and directories at root of main branch of repository:

let rev = Revision::from_str("refs/heads/main");
let root_commit = Commit::from_rev(rev);
let root_dir = directory(&storage, root_commit);

// process root_dir iterator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are you saying an examples section would be useful? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my long-winded way of saying that :D

Like "examples of how we think this API would be used". I can write that tomorrow as a way of giving this a second review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahaha, I just wanted to clarify 🙏


This document has provided the foundations for building the
`radicle-surf` API. It has provided a sketch of the functionality of
each of the subcomponents -- which should naturally feed into each
other -- and some recommended traits for making the API easier to
use. A futher document should be specified for a specific Rust
implementation.

[denotational]: https://github.com/radicle-dev/radicle-git/blob/main/radicle-surf/docs/denotational-design.md
[gitnamespaces]: https://git-scm.com/docs/gitnamespaces
[git-objects]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
[git-references]: https://git-scm.com/book/en/v2/Git-Internals-Git-References
[git-revisions]: https://git-scm.com/docs/revision
[libgit-revwalk]: https://github.com/libgit2/libgit2/blob/main/include/git2/revwalk.h
[radicle-surf]: https://github.com/radicle-dev/radicle-surf