Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow matching of files to DATs via non-CRC32 checksums #818

Closed
TheBrainScrambler opened this issue Nov 19, 2023 · 10 comments · Fixed by #945
Closed

Allow matching of files to DATs via non-CRC32 checksums #818

TheBrainScrambler opened this issue Nov 19, 2023 · 10 comments · Fixed by #945
Assignees
Labels
enhancement New feature or request

Comments

@TheBrainScrambler
Copy link

Is your feature request related to a problem?

No response

Describe the solution you'd like

From what I understand, igir checks if roms match a certain DAT by either comparing it to the CRC32 checksum stored in the header of a zip file, if the rom is zipped and the header exists, or by computing their CRC32 checksum.
I would like an option to force to verify roms by comparing them using the checksum algorithm of my choice, for me specifically SHA-256.

I was thinking of extending igir report by having a flag that allows me to specify that, maybe --checksum-algo sha256.

If there are no sha256 checksums in the DAT, then you try weaker checksums like SHA1, MD5 or CRC32, but you report that you couldn't verify the rom using the requested checksum algorithm. Or perhaps you just add a field saying with which checksum algorithm you did the verification. In any case, igir should first try the algorithm you asked for.

I'm wondering though: in https://igir.io/alternatives/ , you say that "ROMs: scan/checksum caching" is not supported by design. Does this concern my feature request ?
I was also thinking about the efficiency of this since it would require to unzip archives of roms to verify them. This is where I think having some kind of caching would be good, but then I know you don't want caching by design. Perhaps igir could make some kind of DAT file which for each rom could also contain a checksum of the zipped rom ? If you're using a deterministic zipping algorithm like TorrentZip then this would work.

Additional context

No response

@TheBrainScrambler TheBrainScrambler added the enhancement New feature or request label Nov 19, 2023
@emmercm
Copy link
Owner

emmercm commented Nov 20, 2023

igir uses filesize in addition to CRC32, because of the implication here that CRC32 has more false positives than other hashing algorithms. Other ROM managers do this as well. This primarily helps when scanning a very large ROM collection as the chance of collision is not statistically insignificant with a large collection.

I think what you're really looking for here is the ability to use something other than CRC32+size when matching scanned files to a DAT, rather than during testing. For igir test, it would be extremely unlikely to receive a false positive CRC32+size.

For the file matching, it's going to be best to wait until I finally finish #740 which necessitates calculating the MD5 and SHA1 of scanned files.


RE: caching, you've touched on every important part that I've thought of. And if we consider CHDs to be a deterministic archive of sorts, they would apply here as well.

Thankfully igir can extract non-excessively large .zip files into memory without temp files (I just broke out #819 into its own issue), so calculating checksums on those isn't too bad.

Unfortunately, I don't see any active Node.js TorrentZip implementations, so the creation of them is unlikely to happen any time soon.

But in general, I think local caching can be handled separately from the issue of alternative checksum matching.

@TheBrainScrambler
Copy link
Author

I think what you're really looking for here is the ability to use something other than CRC32+size when matching scanned files to a DAT, rather than during testing. For igir test, it would be extremely unlikely to receive a false positive CRC32+size.

Yes, I maybe wasn't clear but this would be to match roms to DATs. So for igir report and not igir test

@emmercm
Copy link
Owner

emmercm commented Nov 27, 2023

File matching on something other than CRC32+size has been on my personal to-do list forever:

// TODO(cemmer): ability to index files by some other property such as name

My free time is going to be limited most likely through the end of the year, but I think this is a good suggestion. It should be easier to land after #819 has merged.

@emmercm emmercm changed the title Ensure integrity of roms with cryptographic checksums Allow matching of files to DATs via non-CRC32 checksums Nov 27, 2023
@emmercm
Copy link
Owner

emmercm commented Nov 29, 2023

Note: this will be required for processing MAME "disk" files, the MAME DATs only provide a SHA1 for those files and no filesize.

The code added in #835 will need to account for the zero filesize of these.

@emmercm
Copy link
Owner

emmercm commented Mar 19, 2024

Question for you @TheBrainScrambler while I wrap this feature up - what DAT group includes SHA256 in their DATs? It's very non-standard, and I haven't come across it yet myself.

@TheBrainScrambler
Copy link
Author

While the No-Intro PC/XML set doesn't have any SHA256, the Standard DAT set has some.

From grep -lr sha256 inside the Standard DAT directory:

Non-Redump/Non-Redump - IBM - PC Compatible (Discs) (Hentai) (20230818-230113).dat
Non-Redump/Non-Redump - Sega - Dreamcast (20231230-212004).dat
Non-Redump/Non-Redump - Sony - PlayStation 2 (20240206-091905).dat
Non-Redump/Non-Redump - Nintendo - Nintendo GameCube (20240207-082902).dat
Non-Redump/Non-Redump - Sega - Sega Saturn (20221104-092247).dat
Non-Redump/Non-Redump - NEC - PC-88 (20230330-093338).dat
Non-Redump/Non-Redump - Microsoft - Xbox 360 (20240209-114420).dat
Non-Redump/Non-Redump - Audio CD (Deprecated) (20240105-073751).dat
Non-Redump/Non-Redump - Sega - Sega Mega CD + Sega CD (20231229-110423).dat
Non-Redump/Non-Redump - Sony - PlayStation Portable (20240129-111557).dat
Non-Redump/Non-Redump - Nintendo - Wii (20240215-054602).dat
Non-Redump/Non-Redump - Sony - PlayStation (20231229-070830).dat
Non-Redump/Non-Redump - Nintendo - Wii U (20231229-065143).dat
Non-Redump/Non-Redump - Audio CD (20240105-073751).dat
Non-Redump/Non-Redump - NEC - PC Engine CD + TurboGrafx CD (20240318-203404).dat
Non-Redump/Non-Redump - IBM - PC Compatible (Discs) (20240228-135803).dat
Non-Redump/Non-Redump - Apple-Bandai - Pippin (20231104-154236).dat
Non-Redump/Non-Redump - Konami - Python 2 (20240318-125121).dat
Non-Redump/Non-Redump - Philips - CD-i (20240218-145829).dat
Unofficial/Unofficial - Sony - PlayStation Vita (NoNpDrm) (20240104-010459).dat
Unofficial/Unofficial - Sony - PlayStation Vita (BlackFinPSV) (20240104-010459).dat
Unofficial/Unofficial - Sony - PlayStation Portable (UMD Video) (20231230-075921).dat
Unofficial/Unofficial - Video Game Magazine Scans (RAW) (20220824-051032).dat
Unofficial/Unofficial - Nintendo - Wii (Digital) (Deprecated) (WAD) (20231216-065527).dat
Unofficial/Unofficial - Sony - PlayStation Vita (PSVgameSD) (20240104-010459).dat
Unofficial/Unofficial - Sony - PlayStation Portable (PSX2PSP) (20130318-035538).dat
Unofficial/Unofficial - Video Game Magazine Scans (PDF) (20220824-051032).dat
Unofficial/Unofficial - Nintendo - Wii U (Digital) (Deprecated) (20191222-002825).dat
Unofficial/Unofficial - Sony - PlayStation 4 (PSN) (20230925-001500).dat
Unofficial/Unofficial - Video Game Documents (PDF) (20221123-054502).dat
Unofficial/Unofficial - Video Game Magazine Scans (CBZ) (20220824-051032).dat
Unofficial/Unofficial - Nintendo - Nintendo 3DS (Digital) (Updates and DLC) (Encrypted) (20230502-011510).dat
Unofficial/Unofficial - Sony - PlayStation Portable (UMD Music) (20231230-080356).dat
Unofficial/Unofficial - Sony - PlayStation Portable (PSN) (Decrypted) (20230704-163013).dat
Unofficial/Unofficial - Video Game Scans (RAW) (20230712-063322).dat
Unofficial/Unofficial - Video Game OSTs (Digital) (RAW) (20230105-084656).dat
Source Code/Source Code - Apple - II (20230107-005706).dat
Source Code/Source Code - Nintendo - Game Boy Advance (20230107-011927).dat
Source Code/Source Code - Sega - DreamCast (20230107-012643).dat
Source Code/Source Code - Nintendo - Nintendo DS (20220204-041058).dat
Source Code/Source Code - Panasonic - 3DO Interactive Multiplayer (20230118-072841).dat
Source Code/Source Code - Nintendo - Nintendo GameCube (20220309-093302).dat
Source Code/Source Code - IBM - PC and Compatibles (20231228-060222).dat
Source Code/Source Code - Various (20230416-052851).dat
Source Code/Source Code - Nintendo - Nintendo Entertainment System (20230107-011201).dat
Source Code/Source Code - Atari - 8-bit Family (20230107-012326).dat
Source Code/Source Code - Nintendo - Super Nintendo Entertainment System (20230107-012046).dat
Source Code/Source Code - Nintendo - Nintendo - Game Boy Color (20230107-011902).dat
Source Code/Source Code - Arcade (20230201-090434).dat
Source Code/Source Code - Atari - 2600 (20220406-030534).dat
No-Intro/Nintendo - Wii (Development Kit Hard Drives) (20231202-142947).dat
No-Intro/Commodore - VIC-20 (20231226-072946).dat
No-Intro/Casio - Loopy (BigEndian) (20231004-134719).dat
No-Intro/NEC - PC-98 (Greaseweazle) (20231101-162607).dat
No-Intro/Digital Media Cartridge - Firecore (20240212-194543).dat
No-Intro/RCA - Studio II (20200201-121822).dat
No-Intro/SNK - NeoGeo Pocket Color (20240311-150026).dat
No-Intro/Bit Corporation - Gamate (20230627-112619).dat
No-Intro/Nichibutsu - My Vision (Mame) (20230724-090438).dat
No-Intro/Sony - PlayStation Vita (PSN) (Updates) (20231009-113000).dat
No-Intro/Acorn RISC OS - Flash Media (Misc) (20221123-054527).dat
No-Intro/Atari - Jaguar (COF) (20231013-072322).dat
No-Intro/Nintendo - Nintendo GameCube (NPDP Carts) (20240104-124921).dat
No-Intro/Nintendo - Game Boy Advance (Multiboot) (20240221-035028).dat
No-Intro/Atari - 5200 (20231023-200302).dat
No-Intro/Seta - Aleck64 (ByteSwapped) (20220513-040448).dat
No-Intro/Nintendo - Virtual Boy (20240118-143523).dat
No-Intro/Fujitsu - FM Towns (Flux) (20230501-233459).dat
No-Intro/Nintendo - Nintendo 3DS (Digital) (Dev ROMs) (20220409-104434).dat
No-Intro/Sony - PlayStation (PS one Classics) (PSN) (20220402-020621).dat
No-Intro/Nichibutsu - My Vision (20230724-090438).dat
No-Intro/Nintendo - Misc (20240318-130654).dat
No-Intro/Toshiba - Pasopia (BIN) (20220726-115432).dat
No-Intro/Commodore - Amiga (20240308-060724).dat
No-Intro/Apple - IIe (A2R) (20220718-130608).dat
No-Intro/Tiger - Gizmondo (20070531-125518).dat
No-Intro/Nintendo - Game Boy Advance (20240316-035432).dat
No-Intro/Sega - SG-1000 (20231205-110448).dat
No-Intro/Nintendo - Nintendo 3DS (Digital) (CDN) (20231011-123410).dat
No-Intro/Acorn - Archimedes (20231029-220453).dat
No-Intro/Apple - Macintosh (BETA) (FluxDumps) (20220831-024638).dat
No-Intro/Atari - Jaguar (JAG) (20231013-072322).dat
No-Intro/Apple - Macintosh (A2R) (20220727-190526).dat
No-Intro/Toshiba - Pasopia (WAV) (20220726-115432).dat
No-Intro/NEC - PC-98 (HardDisk) (20231101-162607).dat
No-Intro/IBM - PC and Compatibles (Digital) (Updates and DLC) (20221207-104837).dat
No-Intro/Bally - Astrocade (20220411-220423).dat
No-Intro/Fujitsu - FM-7 (Bitstream) (20231101-163040).dat
No-Intro/IBM - PC and Compatibles (Digital) (JAST USA) (20220607-112544).dat
No-Intro/Casio - Loopy (LittleEndian) (20231004-134719).dat
No-Intro/Arduboy Inc - Arduboy (20230528-053947).dat
No-Intro/ACT - Apricot PC Xi (20211125-165629).dat
No-Intro/Nintendo - Nintendo DSi (Digital) (20220506-190731).dat
No-Intro/Apple - II Plus (Flux) (20211227-061630).dat
No-Intro/Amstrad - CPC (Misc) (20230406-091045).dat
No-Intro/APF - MP-1000 (20211213-125803).dat
No-Intro/Apple - I (Tapes) (20230313-130448).dat
No-Intro/Sega - 32X (20231229-101939).dat
No-Intro/Mattel - Intellivision (20231027-021641).dat
No-Intro/Nintendo - Nintendo DSi (Decrypted) (20240131-121648).dat
No-Intro/Nintendo - Super Nintendo Entertainment System (20240317-134803).dat
No-Intro/Nintendo - Nintendo DS (Decrypted) (20240313-082215).dat
No-Intro/Nintendo - Game Boy Advance (e-Reader) (20240114-074237).dat
No-Intro/Apple - Macintosh (KryoFlux) (20220727-190526).dat
No-Intro/Sony - PlayStation Vita (PSN) (Content) (20240209-090229).dat
No-Intro/Amstrad - CPC (Flux) (20230406-091045).dat
No-Intro/Benesse - Pocket Challenge V2 (20230819-030515).dat
No-Intro/Microsoft - MSX (20231222-131915).dat
No-Intro/Sega - Master System - Mark III (20240308-010656).dat
No-Intro/Watara - Supervision (20230924-042856).dat
No-Intro/IBM - PC and Compatibles (Digital) (Misc) (20230824-110124).dat
No-Intro/Mobile - J2ME (20240108-151210).dat
No-Intro/Apple - Macintosh (BETA) (Bitstreams) (20220831-024638).dat
No-Intro/Sega - Beena (20240208-083255).dat
No-Intro/VTech - CreatiVision (20230426-080718).dat
No-Intro/Arcade - PC-based (20230329-073558).dat
No-Intro/Atari - 2600 (20240317-073010).dat
No-Intro/Microsoft - Xbox (Development Kit Hard Drives) (20230925-080914).dat
No-Intro/iQue - iQue (CDN) (20220514-122827).dat
No-Intro/Apple - II (WOZ) (20220728-095306).dat
No-Intro/IBM - PC and Compatibles (Digital) (Steam) (Hentai) (20230424-174742).dat
No-Intro/Apple - IIe (WOZ) (20220718-130608).dat
No-Intro/Nintendo - Game Boy Color (20240318-113356).dat
No-Intro/Bandai - WonderSwan Color (20240210-011315).dat
No-Intro/SNK - NeoGeo Pocket (20240301-154358).dat
No-Intro/Sony - PlayStation 3 (PSN) (Updates) (20240104-103028).dat
No-Intro/Project EGG (20230831-231500).dat
No-Intro/Apple - Macintosh (Uncategorized) (20220727-190526).dat
No-Intro/Hartung - Game Master (20211012-064712).dat
No-Intro/VTech - V.Smile (20231202-191234).dat
No-Intro/Nintendo - Sufami Turbo (20240311-144531).dat
No-Intro/Nintendo - Nintendo 3DS (Encrypted) (20240315-133709).dat
No-Intro/Atari - Lynx (BLL) (20240203-184931).dat
No-Intro/Apple - IIGS (WOZ) (20220727-120719).dat
No-Intro/Nintendo - Family Computer Disk System (FDS) (20240302-011835).dat
No-Intro/Sony - PlayStation Mobile (PSN) (20200524-163740).dat
No-Intro/Nintendo - Nintendo 3DS (Decrypted) (20240315-133709).dat
No-Intro/Nintendo - Family Computer Network System (20220516-232939).dat
No-Intro/Microsoft - Xbox 360 (Development Kit Hard Drives) (20230411-073408).dat
No-Intro/IBM - PC and Compatibles (Digital) (Misc) (Hentai) (20220717-123500).dat
No-Intro/Nintendo - Family Computer Disk System (QD) (20240302-011835).dat
No-Intro/NEC - PC-98 (20231101-162607).dat
No-Intro/Atari - Jaguar (ROM) (20231013-072322).dat
No-Intro/IBM - PC and Compatibles (LooseFilesArchive) (20230507-112016).dat
No-Intro/Seta - Aleck64 (BigEndian) (20220513-040448).dat
No-Intro/Sony - PlayStation Portable (PSN) (Encrypted) (20230704-161201).dat
No-Intro/Nintendo - Nintendo DSi (Encrypted) (20240131-121648).dat
No-Intro/Nintendo - Wallpapers (20230410-103428).dat
No-Intro/Sony - PlayStation 3 (PSN) (Content) (20230918-071659).dat
No-Intro/Nintendo - Nintendo 64 (Mario no Photopi SmartMedia) (20210514-090046).dat
No-Intro/Nintendo - Kiosk Video Compact Flash (CardImage) (20211208-080217).dat
No-Intro/Atari - 8-bit Family (20240207-115626).dat
No-Intro/IBM - PC and Compatibles (Digital) (Groupees) (20220803-071205).dat
No-Intro/Atari - 7800 (20240318-233505).dat
No-Intro/Nintendo - Nintendo 64DD (20230131-042611).dat
No-Intro/Microsoft - Xbox 360 (Digital) (20230314-081206).dat
No-Intro/Nintendo - Kiosk Video Compact Flash (Extracted) (20211208-080217).dat
No-Intro/Apple - II (A2R) (20220728-095306).dat
No-Intro/Konami - Picno (20201121-052249).dat
No-Intro/Nintendo - Nintendo DSi (Digital) (CDN) (Encrypted) (20230417-043358).dat
No-Intro/Welback - Mega Duck (20220531-111927).dat
No-Intro/Nintendo - Nintendo DS (Download Play) (20231110-004520).dat
No-Intro/Nintendo - Nintendo Entertainment System (Headered) (20240318-090355).dat
No-Intro/Fujitsu - FM-7 (Flux) (20231101-163040).dat
No-Intro/Acorn - Risc PC (Flux) (20230506-040449).dat
No-Intro/Nintendo - Game Boy Advance (Video) (20230804-201325).dat
No-Intro/Nintendo - New Nintendo 3DS (Digital) (Deprecated) (20211118-112910).dat
No-Intro/Sega - Mega Drive - Genesis (20240316-023439).dat
No-Intro/APF - Imagination Machine (20220416-042756).dat
No-Intro/Apple - Macintosh (DC42) (20220727-190526).dat
No-Intro/Bally - Astrocade (Tapes) (WAV) (20220914-145554).dat
No-Intro/Atari - Lynx (LNX) (20240203-184931).dat
No-Intro/Apple - II Plus (WOZ) (20211227-061630).dat
No-Intro/IBM - PC and Compatibles (Flash Media) (20240202-102604).dat
No-Intro/Nintendo - Nintendo 3DS (Digital) (Deprecated) (20231025-042741).dat
No-Intro/Sega - Dreamcast (Development Kit Hard Drives) (20230104-093851).dat
No-Intro/Nintendo - Nintendo 64 (BigEndian) (20240318-212425).dat
No-Intro/Nintendo - Game & Watch (20240116-133325).dat
No-Intro/Nintendo - Nintendo 3DS (Digital) (Pre-Install) (20220704-084112).dat
No-Intro/Nintendo - New Nintendo 3DS (Decrypted) (20230413-121153).dat
No-Intro/Microsoft - MSX2 (20231015-143336).dat
No-Intro/Nintendo - Wii U (Digital) (CDN) (20231116-082151).dat
No-Intro/Atari - Jaguar (J64) (20231013-072322).dat
No-Intro/Nintendo - Nintendo Entertainment System (Headerless) (20240318-090355).dat
No-Intro/Nintendo - Pokemon Mini (20231230-144004).dat
No-Intro/Bandai - WonderSwan (20240207-120747).dat
No-Intro/Epoch - Super Cassette Vision (20201123-013546).dat
No-Intro/NEC - PC Engine - TurboGrafx-16 (20240318-211220).dat
No-Intro/Mobile - Palm OS (Digital) (20230926-163739).dat
No-Intro/Nintendo - Nintendo DS (Encrypted) (20240313-082215).dat
No-Intro/Atari - Lynx (LYX) (20240203-184931).dat
No-Intro/Coleco - ColecoVision (20240317-073036).dat
No-Intro/Nintendo - Nintendo 64 (ByteSwapped) (20240318-212425).dat
No-Intro/GamePark - GP2X (20220107-115126).dat
No-Intro/Tiger - Game.com (20221031-184634).dat
No-Intro/Apple - Macintosh (WOZ) (20220727-190526).dat
No-Intro/Bandai - Gundam RX-78 (20211124-013520).dat
No-Intro/GCE - Vectrex (20240112-151308).dat
No-Intro/Apple - IIGS (A2R) (20220727-120719).dat
No-Intro/VM Labs - NUON (Digital) (20240109-115607).dat
No-Intro/Commodore - Commodore 64 (20240127-100749).dat
No-Intro/Nintendo - Satellaview (20240217-101914).dat
No-Intro/Nintendo - amiibo (20211113-040458).dat
No-Intro/Benesse - Pocket Challenge W (20231026-135210).dat
No-Intro/IBM - PC and Compatibles (Flux) (20230507-112016).dat
No-Intro/Sega - Game Gear (20240221-193102).dat
No-Intro/Yamaha - Copera (20211125-171549).dat
No-Intro/Sega - Dreamcast (Visual Memory Unit) (20230103-091559).dat
No-Intro/Nintendo - New Nintendo 3DS (Encrypted) (20230413-121153).dat
No-Intro/Sega - PICO (20231210-021932).dat
No-Intro/Nokia - N-Gage (WIP) (20220220-010530).dat
No-Intro/Fujitsu - FM-7 (Tapes) (Bitstream) (20230406-075508).dat
No-Intro/Nintendo - Game Boy Advance (Play-Yan) (20210113-092936).dat
No-Intro/Zeebo - Zeebo (20190815-004208).dat
No-Intro/Nintendo - Game Boy (20240310-045233).dat
No-Intro/Interton - VC 4000 (20211122-135810).dat
No-Intro/Nintendo - Nintendo DSi (Digital) (CDN) (Decrypted) (20230417-043358).dat
No-Intro/Toshiba - Visicom (20200202-120958).dat
No-Intro/Atari - Jaguar (ABS) (20231013-072322).dat
No-Intro/Mobile - Palm OS (20220725-121548).dat
No-Intro/Bandai - Design Master Denshi Mangajuku (20211124-132745).dat
No-Intro/Fujitsu - FM-7 (Sector) (20231101-163040).dat
No-Intro/Epoch - Game Pocket Computer (20211122-141248).dat

@emmercm
Copy link
Owner

emmercm commented Mar 21, 2024

Interesting. I have absolutely no idea why their P/C DATs differ so greatly from their "standard" ones. I see no reason why P/C couldn't include SHA256 as well.

@TheBrainScrambler
Copy link
Author

Thanks ! And any chance to see SHA256 support in igir as well ?

@emmercm
Copy link
Owner

emmercm commented Mar 23, 2024

@TheBrainScrambler absolutely, this will add it soon: #1032

Copy link

🔒 Inactive issue lock

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Comment generated by the GitHub Lock Issues workflow.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants