Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radicle-surf: Repository.blob_at() to retrieve a blob using its oid. #124

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions radicle-surf/src/repo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ use crate::{

/// Enumeration of errors that can occur in repo operations.
pub mod error {
use crate::Oid;
use std::path::PathBuf;
use thiserror::Error;

Expand All @@ -56,6 +57,8 @@ pub mod error {
pub enum Repo {
#[error("path not found for: {0}")]
PathNotFound(PathBuf),
#[error("blob not found for: {0}")]
BlobNotFound(Oid),
}
}

Expand Down Expand Up @@ -301,6 +304,20 @@ impl Repository {
Ok(Blob::<BlobRef<'a>>::new(file.id(), git2_blob, last_commit))
}

/// Retrieves the blob with `oid` in `commit`.
pub fn blob_at<'a, C: ToCommit>(
&'a self,
commit: C,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input commit is not required if we only wanted to retrieve the blob content. It is here as we also need to retrieve the Blob.commit that created the blob (oid).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't expecting to have the commit as a parameter, so what if we changed the result type to just BlobRef<'a>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and I chose to have the commit parameter for a couple of reasons:

  1. To be consistent in the API (similar to blob()), including the return type Blob. The intent of Blob.commit was helping the caller to identify the last_commit easily without having to do another HTTP call. That is the same for both blob() and blob_at().

  2. From the zulip discussion linked in Description, the caller would (most likely) have the commit info available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the zulip discussion linked in Description, the caller would (most likely) have the commit info available.

My reading of the discussion was that we should be able to get a blob without the commit[0]. So if something doesn't know about a commit but does have a blob id, then it can just retrieve the blob content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the original request message.

Yeah, like we discussed so far, there are two options that I can see:

a) Get the blob content only (input: Oid). If the caller wanted to find the commit that created the blob, they make another HTTP call. This is not consistent with existing blob() method.

b) Get the blob content and the commit that created the blob. (input: Oid and a Commit that has the blob. ) . Both info are retrieved in one call. This is consistent with existing blob() method.

For the blob content part, b) does more involved work than a), because it also retrieves the commit.

It's a trade-off. My preference is to go with option b) based on the reasons I mentioned earlier. And if we did want to go with a), I would suggest to re-visit Blob struct and see if it makes sense to separate Blob.commit out, and then change blob() to be consistent with blob_at().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the original request message.

Sure and within that thread, I mentioned how I imagined the URL being '.../blob/<oid>' :)

I don't mind going down this route, but I don't understand why we have to go fetch the commit that created the blob. Can you explain why that's necessary?

I'd expect that we just check the Blob's Oid exists with the Commit's tree, either via diff or using a tree walk.

Copy link
Contributor Author

@keepsimple1 keepsimple1 Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind going down this route, but I don't understand why we have to go fetch the commit that created the blob. Can you explain why that's necessary?

It's because this existing Blob struct and its API:

/// Returns the commit that created this blob.
pub fn commit(&self) -> &Commit {
&self.commit

oid: Oid,
) -> Result<Blob<BlobRef<'a>>, Error> {
let commit = commit
.to_commit(self)
.map_err(|e| Error::ToCommit(e.into()))?;
let git2_blob = self.find_blob(oid)?;
let last_commit = self.find_commit_of_blob(oid, &commit)?;
Ok(Blob::<BlobRef<'a>>::new(oid, git2_blob, last_commit))
}

/// Returns the last commit, if exists, for a `path` in the history of
/// `rev`.
pub fn last_commit<P, C>(&self, path: &P, rev: C) -> Result<Option<Commit>, Error>
Expand Down Expand Up @@ -524,6 +541,53 @@ impl Repository {
Ok(diff)
}

/// Returns true if the diff between `from` and `to` creates `oid`.
fn diff_commits_has_oid(
&self,
from: Option<&git2::Commit>,
to: &git2::Commit,
oid: &git2::Oid,
) -> Result<bool, Error> {
let diff = self.diff_commits(None, from, to)?;
for delta in diff.deltas() {
if &delta.new_file().id() == oid {
return Ok(true);
}
}
Ok(false)
}

/// Returns whether `oid` was created in `commit` or not.
fn is_oid_in_commit(&self, oid: Oid, commit: &git2::Commit) -> Result<bool, Error> {
if commit.parent_count() == 0 {
return self.diff_commits_has_oid(None, commit, oid.as_ref());
}

for p in commit.parents() {
if self.diff_commits_has_oid(Some(&p), commit, oid.as_ref())? {
return Ok(true);
}
}

Ok(false)
}

/// Returns the commit that created the blob with `oid`.
///
/// It is assumed that `oid` exists in `head`.
fn find_commit_of_blob(&self, oid: Oid, head: &Commit) -> Result<Commit, Error> {
let mut revwalk = self.revwalk()?;
revwalk.push(head.id.into())?;
for commit_id in revwalk {
let commit_id = commit_id?;
let git2_commit = self.inner.find_commit(commit_id)?;
if self.is_oid_in_commit(oid, &git2_commit)? {
return Ok(Commit::try_from(git2_commit)?);
}
}
Err(Error::Repo(error::Repo::BlobNotFound(oid)))
}

/// Returns a full reference name with namespace(s) included.
pub(crate) fn namespaced_refname<'a>(
&'a self,
Expand Down
28 changes: 27 additions & 1 deletion radicle-surf/t/src/source.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::path::PathBuf;

use radicle_git_ext::ref_format::refname;
use radicle_surf::{Branch, Glob, Repository};
use radicle_surf::{Branch, Glob, Oid, Repository};
use serde_json::json;

const GIT_PLATINUM: &str = "../data/git-platinum";
Expand Down Expand Up @@ -175,6 +175,32 @@ fn repo_blob() {
assert_eq!(json_ref, json_owned);
}

#[test]
fn repo_blob_at() {
let repo = Repository::open(GIT_PLATINUM).unwrap();
let oid = Oid::try_from("b84992d24be67536837f5ab45a943f1b3f501878").unwrap();

// Retrieve the blob using its oid.
let blob = repo
.blob_at("27acd68c7504755aa11023300890bb85bbd69d45", oid)
.unwrap();

// Verify the blob oid.
let blob_oid = blob.object_id();
assert_eq!(blob_oid, oid);

// Verify the commit that created the blob.
let blob_commit = blob.commit();
assert_eq!(
blob_commit.id.to_string(),
"e24124b7538658220b5aaf3b6ef53758f0a106dc"
);

// Verify the blob content ("memory.rs").
assert!(!blob.is_binary());
assert_eq!(blob.size(), 6253);
}

#[test]
fn tree_ordering() {
let repo = Repository::open(GIT_PLATINUM).unwrap();
Expand Down