Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Slice File Hashing Logic #701

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 15 additions & 43 deletions src/slice_file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ use crate::grammar::*;
use crate::utils::ptr_util::WeakPtr;
use console::style;
use serde::Serialize;
use sha2::{Digest, Sha256};
use std::cmp::{max, min, Ordering};
use std::fmt::{Display, Write};

Expand Down Expand Up @@ -167,53 +166,26 @@ impl SliceFile {

formatted_snippet + &line_prefix
}

/// Hashes the SliceFile using a SHA-256 hash and returns the hash as a hex string.
pub fn compute_sha256_hash(files: &[&SliceFile]) -> String {
files
.compute_sha256_hash_as_bytes()
.iter()
.map(|byte| format!("{:02x}", byte))
.collect()
}
}

pub trait SliceFileHashable {
/// Hashes the SliceFile using a SHA-256 hash and returns the 32-byte array.
fn compute_sha256_hash_as_bytes(&self) -> [u8; 32];
}
pub fn compute_sha256_hash_of_source_files(files: &[SliceFile]) -> String {
use sha2::{Digest, Sha256};

impl SliceFileHashable for SliceFile {
/// Hash the combination of the filename and the raw text using a SHA-256 hash.
///
/// # Returns
/// The SHA-256 hash as a 32-byte array.
fn compute_sha256_hash_as_bytes(&self) -> [u8; 32] {
Sha256::new()
.chain_update(self.filename.as_bytes())
.chain_update(self.raw_text.as_bytes())
.finalize()
.into()
}
}
// Filter out any reference files, and sort the source files which remain.
let mut sorted_sources: Vec<&SliceFile> = files.iter().filter(|f| f.is_source).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget where were landed on the discussion of only including src files. Was this previously being done elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right I should of added a comment mentioning this.
Currently, we do the filtering in slicec-cs:

let src_files: Vec<&SliceFile> = compilation_state.files.iter().filter(|f| f.is_source).collect();
let hash = SliceFile::compute_sha256_hash(&src_files);

https://github.com/icerpc/icerpc-csharp/blob/9edecc0ccaddee829828f2da64ce150b692d340f/tools/slicec-cs/src/main.rs#L65-L67
My thinking was to move it in slicec, to keep all the hashing logic and what/how we do it in a single place.


I'm also not completely sold on one vs the other, but this is the current behavior we have, so I kept it.
I'll blog about it, and see if we can reach a conclusion.

sorted_sources.sort_by(|a, b| a.filename.cmp(&b.filename));

impl SliceFileHashable for &[&SliceFile] {
fn compute_sha256_hash_as_bytes(&self) -> [u8; 32] {
// Sort the slice files by their filename before hashing them.
let mut sorted = self.iter().collect::<Vec<_>>();
sorted.sort_by(|a, b| a.filename.cmp(&b.filename));

// Hash the sorted slice files.
sorted
.iter()
.map(|slice_file| slice_file.compute_sha256_hash_as_bytes())
.fold(Sha256::new(), |mut hasher, file_hash| {
hasher.update(file_hash);
hasher
})
.finalize()
.into()
// Hash the sorted source files.
// Included in the hash are the files' names (no path, just filenames), and their raw (unparsed) content.
let mut hash_engine = Sha256::new();
for file in sorted_sources {
hash_engine.update(&file.filename);
hash_engine.update(&file.raw_text);
}

// Return the hash engine's final result, formatted as a lowercase-hexadecimal string.
// The hash engine guarantees that this string will have 64 chars (representing 32 bytes, or 256 bits).
format!("{:x}", hash_engine.finalize())
}

implement_Attributable_for!(SliceFile);
Expand Down
10 changes: 3 additions & 7 deletions tests/files/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
mod io;

use slicec::diagnostics::Diagnostics;
use slicec::slice_file::SliceFile;
use slicec::slice_file::compute_sha256_hash_of_source_files;
use slicec::slice_options::SliceOptions;
use slicec::utils::file_util::resolve_files_from;
use std::path::PathBuf;
Expand All @@ -27,13 +27,9 @@ fn fixed_slice_file_hash() {
let slice_files1 = resolve_files_from(&options1, &mut diagnostics);
let slice_files2 = resolve_files_from(&options2, &mut diagnostics);

// Convert slices to slices of references
let slice_files1_refs: Vec<&SliceFile> = slice_files1.iter().collect();
let slice_files2_refs: Vec<&SliceFile> = slice_files2.iter().collect();

// Act
let hash1 = SliceFile::compute_sha256_hash(slice_files1_refs.as_slice());
let hash2 = SliceFile::compute_sha256_hash(slice_files2_refs.as_slice());
let hash1 = compute_sha256_hash_of_source_files(&slice_files1);
let hash2 = compute_sha256_hash_of_source_files(&slice_files2);

// Assert
assert_eq!(hash1, hash2);
Expand Down
Loading