Backup-restore throughput is awful #333

cemeyer · 2018-11-06T16:46:52Z

I have a gigabit internet connection, a fast multi-core CPU, and a fast local NVMe disk.

Yet tarsnap -rf xxx | pv | tar -xf - shows tarsnap is only able to download around 50 kB/s. What gives? At this rate it'll take 6 days to download a 26 GB backup set.

The text was updated successfully, but these errors were encountered:

gperciva · 2018-11-06T22:24:22Z

Yes, this is a known problem with Tarsnap. The extract performance is currently latency-bound by the network connection to S3. @cperciva has a plan for addressing it, but at the moment we are not announcing any estimated time for this improvement.

This can be mitigated somewhat by doing parallel extracts:
http://www.tarsnap.com/improve-speed.html#faster-restore

cemeyer · 2018-11-07T00:18:23Z

Thanks, the context helps a little. It's unfortunate the throughput is so limited in the product tool.

I've gone ahead and forked the redsnapper tool to remove the dependency on a rubygem that isn't in FreeBSD ("thread"). It seems to be parallelizing successfully, although it uses some awful O(N log N + N · M log M) algorithm to balance files across workers, which took quite a long time to run between the -tf and -xf portions of the operation. (Obviously it doesn't scale well to backups with large numbers of individual files.)

Edit: with about 500 workers it seems I can pull 80-140 Mbps out of EC2 even with relatively small files.

cemeyer · 2018-11-07T07:56:41Z

Sent my redsnapper changes upstream in case they're helpful to others: https://github.com/directededge/redsnapper/pulls . I suggest --jobs 1000 if you've got the RAM and ulimit for it.

Edit: my fork is here: https://github.com/cemeyer/redsnapper

cperciva · 2018-11-08T00:20:46Z

I don't think --jobs 1000 is what you want -- because the Tarsnap service won't (currently) allow you to have that many connections open at once. I don't recommend going past --jobs 100.

cemeyer · 2018-11-08T00:39:55Z

Well, I'd appreciate it if you'd solve the restore performance problem in the tarsnap client itself, but failing that, please get out of the way and lift the arbitrary connections restriction.

EC2 has big pipes, S3 has big pipes, and I have a fast computer and fast internet connection, and even with 100+ parallel jobs, redsnapper+tarsnap can only use about 16% of my available internet bandwidth.

jrnewton · 2021-01-08T17:57:54Z

Any update about whether this issue will be addressed in tarsnap?

gperciva · 2021-01-08T19:08:16Z

At the moment we can't provide an estimated time of completion, sorry.

Some reasons for moving on: - Tarsnap/tarsnap#333 - https://www.tarsnap.com/faq.html#out-of-money DM me recommendations for good cmd line backup

greghuc · 2022-11-08T15:40:12Z

@cperciva and @gperciva. Any progress on improving tarsnap backup-restore throughput, as per this ticket from Nov 2018?

I thought I'd ask, as I've been evaluating tarsnap for production use. The design philosophy is great, the cli tool is nice to use, but I keep seeing recent concerns re restore speed:

This current Github ticket from Nov 2018
This tarsnap-users thread from May 2021, where a French user notes the slowness of extracting a 100GB file.
This hacker news thread from Sept 2020, where user "regecks" talks about the "glacial" slowness of tarsnap restore, resulting in migration to borgbase.com.

As per Tarsnap "improve speed" tips, I've been experimenting with restoring named files in parallel, using this gist as a basis. Am currently running tests from my UK computer, but will also try from US cloud to be closer to the data.

Many thanks

Greg

greghuc · 2022-11-09T10:36:50Z

Brief update re Tarsnap performance test. Restore performance for 17 git repos totalling 2.7G, using 100 tarsnap clients in parallel with xargs (as per #333 (comment)):

UK computer = 57m 12s
Heroku US dyno = 8m 46s

That's a 6.5x speed up.

jrnewton added a commit to jrnewton/dotfiles that referenced this issue Jan 10, 2021

End of my tarsnap experiment

2e51898

Some reasons for moving on: - Tarsnap/tarsnap#333 - https://www.tarsnap.com/faq.html#out-of-money DM me recommendations for good cmd line backup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup-restore throughput is awful #333

Backup-restore throughput is awful #333

cemeyer commented Nov 6, 2018

gperciva commented Nov 6, 2018

cemeyer commented Nov 7, 2018 •

edited

Loading

cemeyer commented Nov 7, 2018 •

edited

Loading

cperciva commented Nov 8, 2018

cemeyer commented Nov 8, 2018 •

edited

Loading

jrnewton commented Jan 8, 2021

gperciva commented Jan 8, 2021

greghuc commented Nov 8, 2022

greghuc commented Nov 9, 2022 •

edited

Loading

Backup-restore throughput is awful #333

Backup-restore throughput is awful #333

Comments

cemeyer commented Nov 6, 2018

gperciva commented Nov 6, 2018

cemeyer commented Nov 7, 2018 • edited Loading

cemeyer commented Nov 7, 2018 • edited Loading

cperciva commented Nov 8, 2018

cemeyer commented Nov 8, 2018 • edited Loading

jrnewton commented Jan 8, 2021

gperciva commented Jan 8, 2021

greghuc commented Nov 8, 2022

greghuc commented Nov 9, 2022 • edited Loading

cemeyer commented Nov 7, 2018 •

edited

Loading

cemeyer commented Nov 7, 2018 •

edited

Loading

cemeyer commented Nov 8, 2018 •

edited

Loading

greghuc commented Nov 9, 2022 •

edited

Loading