Skip to content

Commit

Permalink
Discuss reasons why md5 sum differs
Browse files Browse the repository at this point in the history
  • Loading branch information
hagenw committed Jun 26, 2024
1 parent 11ad353 commit 6717a86
Showing 1 changed file with 8 additions and 3 deletions.
11 changes: 8 additions & 3 deletions audformat/core/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,9 +592,14 @@ def save(
is stored under the key ``b"hash"``
in the metadata of the schema of the parquet file.
This provides a deterministic hash for the file,
as md5 sums of parquet files
can be different
for the same content.
as md5 sums of parquet files,
containing identical information,
often differ.
Reasons include factors like the library
that wrote the parquet file,
the chosen compression codec
and metadata written by the library.
The hash can be accessed with ``pyarrow`` by::
pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()
Expand Down

0 comments on commit 6717a86

Please sign in to comment.