You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.
Background
The current caching system stores files in memory, which can lead to high memory usage, especially when dealing with large datasets. By utilizing hardlinks, we can reduce memory consumption and storage redundancy by allowing multiple references to the same file on disk without duplicating the file content.
Design
Key Components
HardlinkManager: Manages the creation, validation, and persistence of hardlinks.
CreateLink: Attempts to create a hardlink for a given cache key.
HasHardlink: Checks if a hardlink exists for a given key.
Persist and Restore: Manages the persistence of hardlink metadata to disk and restores it on startup.
DirectoryCache: Implements the cache logic, including hardlink support.
CreateHardlink: Invokes the HardlinkManager to create a hardlink.
HasHardlink: Checks for the existence of a hardlink using the HardlinkManager.
Configuration: The EnableHardlink flag in the configuration determines whether hardlinking is enabled.
Work Flow
[Start]
|
v
[Initialize Cache]
|
v
[Check if Hardlinking is Enabled]
|
v
[Access Cached File]
|
v
[Check if Hardlink Exists] -- No --> [Create Hardlink]
| |
Yes v
| [Verify Hardlink]
v |
[Use Hardlink] v
| [Rename to Final Location]
v |
[Persist Hardlink State] <-------------|
|
v
[Restore Hardlink State on Startup]
|
v
[End]
+-----------------------------+
Cache Write:
When a file is added to the cache, the system checks if hardlinking is enabled.
If enabled, it attempts to create a hardlink for the cached file.
Cache Read:
When accessing a cached file, the system checks if a hardlink exists.
If a hardlink exists, it uses the hardlink path to access the file.
Persistence:
Hardlink metadata is periodically persisted to disk.
On startup, the system restores hardlink metadata from disk.
Benefits
Reduced Memory Usage: By leveraging hardlinks, we can significantly decrease the memory footprint of the caching system.
Improved Performance: Hardlinks allow for faster access to cached files, as they avoid the overhead of duplicating file data.
Data Deduplication: Hardlinks inherently support data deduplication by allowing multiple cache entries to reference the same physical file, reducing storage redundancy.
Scalability: This feature will enable the caching system to handle larger datasets more efficiently.
The text was updated successfully, but these errors were encountered:
ChengyuZhu6
added a commit
to ChengyuZhu6/stargz-snapshotter
that referenced
this issue
Jan 24, 2025
- Adds the EnableHardlink configuration option
- Adds the HardlinkCapability interface
- Updates the directoryCache struct to support hardlinks
- Adds logging for hardlink configuration
- Updates the layer package to pass through hardlink configuration
- Concurrent access testing
Fixes: containerd#1953
Signed-off-by: ChengyuZhu6 <[email protected]>
Description
I would like to propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.
Background
The current caching system stores files in memory, which can lead to high memory usage, especially when dealing with large datasets. By utilizing hardlinks, we can reduce memory consumption and storage redundancy by allowing multiple references to the same file on disk without duplicating the file content.
Design
Key Components
Work Flow
Persistence:
Benefits
The text was updated successfully, but these errors were encountered: