Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup-restore throughput is awful #333

Open
cemeyer opened this issue Nov 6, 2018 · 9 comments
Open

Backup-restore throughput is awful #333

cemeyer opened this issue Nov 6, 2018 · 9 comments

Comments

@cemeyer
Copy link

cemeyer commented Nov 6, 2018

I have a gigabit internet connection, a fast multi-core CPU, and a fast local NVMe disk.

Yet tarsnap -rf xxx | pv | tar -xf - shows tarsnap is only able to download around 50 kB/s. What gives? At this rate it'll take 6 days to download a 26 GB backup set.

@gperciva
Copy link
Member

gperciva commented Nov 6, 2018

Yes, this is a known problem with Tarsnap. The extract performance is currently latency-bound by the network connection to S3. @cperciva has a plan for addressing it, but at the moment we are not announcing any estimated time for this improvement.

This can be mitigated somewhat by doing parallel extracts:
http://www.tarsnap.com/improve-speed.html#faster-restore

@cemeyer
Copy link
Author

cemeyer commented Nov 7, 2018

Thanks, the context helps a little. It's unfortunate the throughput is so limited in the product tool.

I've gone ahead and forked the redsnapper tool to remove the dependency on a rubygem that isn't in FreeBSD ("thread"). It seems to be parallelizing successfully, although it uses some awful O(N log N + N · M log M) algorithm to balance files across workers, which took quite a long time to run between the -tf and -xf portions of the operation. (Obviously it doesn't scale well to backups with large numbers of individual files.)

Edit: with about 500 workers it seems I can pull 80-140 Mbps out of EC2 even with relatively small files.

@cemeyer
Copy link
Author

cemeyer commented Nov 7, 2018

Sent my redsnapper changes upstream in case they're helpful to others: https://github.com/directededge/redsnapper/pulls . I suggest --jobs 1000 if you've got the RAM and ulimit for it.

Edit: my fork is here: https://github.com/cemeyer/redsnapper

@cperciva
Copy link
Member

cperciva commented Nov 8, 2018

I don't think --jobs 1000 is what you want -- because the Tarsnap service won't (currently) allow you to have that many connections open at once. I don't recommend going past --jobs 100.

@cemeyer
Copy link
Author

cemeyer commented Nov 8, 2018

Well, I'd appreciate it if you'd solve the restore performance problem in the tarsnap client itself, but failing that, please get out of the way and lift the arbitrary connections restriction.

EC2 has big pipes, S3 has big pipes, and I have a fast computer and fast internet connection, and even with 100+ parallel jobs, redsnapper+tarsnap can only use about 16% of my available internet bandwidth.

@jrnewton
Copy link

jrnewton commented Jan 8, 2021

Any update about whether this issue will be addressed in tarsnap?

@gperciva
Copy link
Member

gperciva commented Jan 8, 2021

At the moment we can't provide an estimated time of completion, sorry.

jrnewton added a commit to jrnewton/dotfiles that referenced this issue Jan 10, 2021
Some reasons for moving on:
- Tarsnap/tarsnap#333
- https://www.tarsnap.com/faq.html#out-of-money

DM me recommendations for good cmd line backup
@greghuc
Copy link

greghuc commented Nov 8, 2022

@cperciva and @gperciva. Any progress on improving tarsnap backup-restore throughput, as per this ticket from Nov 2018?

I thought I'd ask, as I've been evaluating tarsnap for production use. The design philosophy is great, the cli tool is nice to use, but I keep seeing recent concerns re restore speed:

As per Tarsnap "improve speed" tips, I've been experimenting with restoring named files in parallel, using this gist as a basis. Am currently running tests from my UK computer, but will also try from US cloud to be closer to the data.

Many thanks

Greg

@greghuc
Copy link

greghuc commented Nov 9, 2022

Brief update re Tarsnap performance test. Restore performance for 17 git repos totalling 2.7G, using 100 tarsnap clients in parallel with xargs (as per #333 (comment)):

  • UK computer = 57m 12s
  • Heroku US dyno = 8m 46s

That's a 6.5x speed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants