Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to replace forbidden characters with unicode equivalent #616

Open
hadess opened this issue Apr 26, 2024 · 4 comments
Open

Add option to replace forbidden characters with unicode equivalent #616

hadess opened this issue Apr 26, 2024 · 4 comments

Comments

@hadess
Copy link

hadess commented Apr 26, 2024

I use whipper to rip CDs on a Linux system, which I then consume from my NAS over SMB, so directory and file names created shouldn't have any Unix or FAT32 forbidden characters.

It would be great if there was an option to replace forbidden characters with look-alike Unicode characters.

For example:
/ with ∕ or ⁄ (0x2f slash with fraction slash or division slash)
: with ∶ (0x58 with ratio)
etc.

I believe those 2 would cover most of my own rips.

@hadess
Copy link
Author

hadess commented Apr 26, 2024

A bit of a stretch:
? with ⹔
11. Offspring - What Happened to You?.flac
11. Offspring - What Happened to You⹔.flac

@hadess
Copy link
Author

hadess commented May 2, 2024

... with which FAT32/SMB doesn't like ending file or directory names without an extension

@MerlijnWajer
Copy link
Collaborator

I suppose we could provide additional arguments to the getPath function in whipper/common/program.py, where it performs certain substitutions (passed by argument) at the end, that might be enough.

@texneus
Copy link

texneus commented Aug 30, 2024

This is part of a bash script I made to process rips that, among other things, was indented to deal with this plus a few things that just annoy me. I targeted CIFS illegal characters (not FAT32) since my files are stored on Linux and shared with SAMBA. I assume since whipper runs in Linux that illegal Linux file names are already dealt with. Any specifics for FAT would need to be added.

You are more than welcome to adapt it to your use case. Script assumes MUSIC/ARTIST/ALBUM directory structure and is run from the MUSIC level (i.e. the directory containing the ARTIST folders). Directories are renamed first, then files.

CAUTION: This script is potentially dangerous as it will modify your files! Use only if you understand what it is doing and you agree to what it does! If executed as-is, this will only show what files will be changed. Remove the 'echo' command from the final if block to actually have it work.

#!/bin/bash

echo "Renaming files to avoid illegal CIFS names."
for I in d f; do
	while IFS= read -r FULLPATH; do
		FILEORG=$(basename "$FULLPATH")
		FILEDIR=$(dirname "$FULLPATH")

		#Remove without replacement "Control" characters 0x00-0x1F and 0x7f
		FILENEW=$(echo $FILEORG | sed -e 's/[\x00-\x1F\x7f]//g')

		#Remove without replacement specifically not allowed characters: */<>?\|
		# NOTE: All characters are referred to by hex codes since several interfere with SED or BASH scripts
		FILENEW=$(echo $FILENEW | sed -e 's/[\x2a\x2f\x3c\x3e\x3f\x5c\x7c]//g')

		#Remove without replacement all forms of double quotation marks
		# NOTE: Double quotes are not allowed so the smart equivalents cannot be converted, so just get rid of them all.
		FILENEW=$(echo $FILENEW | sed -e 's/["“”]//g')

		#Substitute odd forms of single quotes to actual single quotes
		FILENEW=$(echo $FILENEW | sed -e 's/\[‘’`]/\x27/g')

		#Substitute : with ' -' (space dash)
		FILENEW=$(echo $FILENEW | sed -e 's/\:/ -/g')

		#Eliminate leading/trailing spaces and periods
		FILENEW=$(echo $FILENEW | sed -e 's/^[. ]*//; s/[ .]*$//')

		if [[ "$FILEORG" != "$FILENEW" ]]; then
			#Remove 'echo' ONLY AFTER you are certain this script does what you want!
			echo mv "$FULLPATH" "$FILEDIR/$FILENEW"
		fi
	done < <(find . -type $I -not -path ".")
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants