Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--include option #581

Open
ashishdamania opened this issue Sep 1, 2023 · 12 comments
Open

--include option #581

ashishdamania opened this issue Sep 1, 2023 · 12 comments

Comments

@ashishdamania
Copy link

Hello,
This may be very trivial but I am trying to figure out how to use include only option for tarsnap?

tarsnap --dry-run --no-default-config --print-stats --include="*.pdf*"  -c  /Users/xyz

If I run this, I get a warning that "Archive contains no files" and it does not seem to work

tarsnap: Warning: Archive contains no files
Total size Compressed size
All archives 1.5 kB 1.4 kB
(unique data) 1.5 kB 1.4 kB
This archive 1.5 kB 1.4 kB
New data 1.5 kB 1.4 kB

However, this seems to work.

tarsnap --dry-run --no-default-config --print-stats --exclude="*.pdf*"  -c  /Users/xyz

tarsnap: Removing leading '/' from member names
Total size Compressed size
All archives 8.4 MB 3.4 MB
(unique data) 8.4 MB 3.4 MB
This archive 8.4 MB 3.4 MB
New data 8.4 MB 3.4 MB

Am I missing anything? This is my tar version:

bsdtar 3.5.3 - libarchive 3.5.3 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8

Thanks for making Tarsnap.

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

Do you have pdf files directly inside /Users/xyz?

Unfortunately, the behaviour of --include is inherited from the ancient (1979!) tar(1) command. It's easier to think of --include as --include-only, meaning "only include file and directories whose name matches this".

https://www.tarsnap.com/selecting-files.html

So if you have

/usrs/xyz/my-docs/foo.pdf

then tarsnap notices that my-docs/ does not match *.pdf, so it doesn't look inside my-docs/

@ashishdamania
Copy link
Author

ashishdamania commented Sep 1, 2023 via email

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

Yes, passing filenames via find is a great way to avoid this problem!

@ashishdamania
Copy link
Author

Just for future reference, I ended up using find command as follow:

find /Users/xyz -type f -name '*docx' -o -name "*.pdf" -print0 | xargs -0 tarsnap --dry-run --no-default-config --print-stats --humanize-numbers -c

This will only include files with .docx and .pdf extension. May be there is a better and easier way to achieve this goal.

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

That's one workable option!

I personally would do something like this (untested):

find /Users/xyz  -type f -name '*docx' -o -name "*.pdf" > ~/my-files-list.txt
tarsnap -c -T myfiles.txt
rm ~/my-files-list.txt

but I can't offhand think of any reason to prefer that method over yours.

The important thing is the tarsnap -T option.

(This should handle filenames with spaces, but not filenames which contain newlines. tarsnap -T is documented as being able to work with --null so it should be able to handle such filenames, but I'd want to double-check that before relying on it.)

EDIT: if you have a lot of filenames, you might run out of room for the command-line arguments, which would result in missing files from your archives! For that reason, I recommend this method, instead of find | xargs directly.

@gperciva

This comment was marked as outdated.

@ashishdamania
Copy link
Author

Actually, I am getting this output with my naive xargs one liner:

tarsnap: Argument vector exceeds 128 kB in length; vector stored in archive is being truncated.
tarsnap: Removing leading '/' from member names
                                       Total size  Compressed size
All archives                               823 MB           724 MB
  (unique data)                            793 MB           698 MB
This archive                               823 MB           724 MB
New data                                   793 MB           698 MB

I am concerned about that Argument vector exceeds 128 kB in length message.

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

Right; that'll happen if you have a lot of files. Your archive does not contain all of your pdf or docx files!

Writing the list of filenames to a file would avoid that problem.

@cperciva
Copy link
Member

cperciva commented Sep 1, 2023

I'm 99% confident that in the case above the archive does contain all of the files -- if there were too many to fit into a command line, xargs would run multiple tarsnap processes (of which all but the first would fail with a "archive already exists" error since all of the tarsnap processes would use the same archive name).

But yes, putting the complete list of files into the command line is a bit of a weird way of doing this; much better to use the -T option.

@cperciva
Copy link
Member

cperciva commented Sep 1, 2023

FWIW I would say that the safest "most unixy" way of doing this is "find ... -print0 | tarsnap -c --null -T- ...`.

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

Oh right, that's a tarsnap warning, not a bash warning. So the result is if you ran

tarsnap --list-archives -vv

(which prints out the command-line used to generate each archive), then that archive wouldn't print the right thing.

But that's another reason to go with writing to a list of files; --list-archives -vv is going to be un-readable if you have tons of filenames in there.

@gperciva
Copy link
Member

gperciva commented Sep 1, 2023

I'll revisit this tomorrow morning and look at something to add to the webpage, along with reasons why other methods might not be ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants