-
I'm encountering an issue where compressing a PDF file using Code snippet let (bytes_read, hash_hex, compressed) =
match (compression, rmaker.maybe_content_format()) {
(Compression::Zlib(level), Ok(MaybeContentFormat::MaybeLargeText)) => {
dbg!("zlib");
let mut writer =
ZlibEncoder::new(&mut cwp, flate2::Compression::new(*level));
let mut hwriter = HashWriter::new(&mut writer, &digest::SHA256);
let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;
let hash = hwriter.finish();
let hash_hex = hex::encode(hash);
dbg!(bytes_copied);
(bytes_copied, hash_hex, true)
}
(Compression::Zstd(lv), Ok(MaybeContentFormat::MaybeLargeText)) => {
dbg!("zstd");
let mut writer = ZstdEncoder::new(&mut cwp, *lv)?;
let mut hwriter = HashWriter::new(&mut writer, &digest::SHA256);
let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;
let hash = hwriter.finish();
let hash_hex = hex::encode(hash);
(bytes_copied, hash_hex, true)
}
_ => {
let mut hwriter = HashWriter::new(&mut cwp, &digest::SHA256);
let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;
let hash = hwriter.ctx.finish();
let hash_hex = hex::encode(hash);
(bytes_copied, hash_hex, false)
}
}; Could you help identify why ZlibEncoder is producing a different hash_hex compared to Let me know if I need provide more detail information such as I was guessing the issue might because I didn't call environment"flate2 = 1.0.31" EDIT: I tried with |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Update: I am able to make a call to pub trait Finishable: Write {
fn finish(self) -> io::Result<()>;
}
// Implement `Finishable` for any type that implements `Write` without a specific finish behavior.
impl<W: Write> Finishable for W {
fn finish(mut self) -> io::Result<()> {
self.flush()
}
}
pub struct HashWriter<W>
where
W: Finishable,
{
pub writer: W,
// XXX: make this a generic type?? Hasher? which has update/finish method
pub ctx: Context,
}
impl<W> HashWriter<W>
where
W: Finishable + Write,
{
pub fn new(writer: W, algorithm: &'static Algorithm) -> Self {
let ctx = Context::new(algorithm);
Self { writer, ctx }
}
pub fn finish(mut self) -> Digest {
let _ = self.writer.flush();
let _ = self.writer.finish();
self.ctx.clone().finish()
}
}
impl<W> Write for HashWriter<W>
where
W: Write,
{
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
// hasher compute hash from original data pass to buf
self.ctx.update(buf);
let n = self.writer.write(buf)?;
Ok(n)
}
fn flush(&mut self) -> io::Result<()> {
self.writer.flush()
}
} |
Beta Was this translation helpful? Give feedback.
-
Since the hash-writer is the outer writer, it sees the uncompressed stream and would be expected to produce the same hash if the input is the same. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! As you said, I do context update after write with using n bytes and it works all good now! Here is my change. impl<W> Write for HashWriter<W>
where
W: Write,
{
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
// hasher compute hash from original data pass to buf
let n = self.writer.write(buf)?;
self.ctx.update(&buf[..n]);
Ok(n)
}
fn flush(&mut self) -> io::Result<()> {
self.writer.flush()
}
} But I am curious why zstd encoder takes care of this automatically, and don't have this problem? If the hash-writer is the outer write, I assume it has nothing to do with inner compressor it wrapped. |
Beta Was this translation helpful? Give feedback.
Since the hash-writer is the outer writer, it sees the uncompressed stream and would be expected to produce the same hash if the input is the same.
However, in the write implementation of the hash-writer it unconditionally hashes all input even though the call to
writer.write()
returnsn
bytes written. That number must be respected.