Issue with ZFS Replication Using Syncoid #937

Mikesco3 · 2024-07-13T04:42:37Z

Description

I am experiencing recurring issues with ZFS replication using syncoid.

I have scheduled a script to run every two hours to synchronize datasets between an SSD and a pool of hard drives and to another server.

The script often fails during the zfs send and zfs receive processes with errors like:

cannot restore to rpool/_VMs/vm-10111-disk-1@autosnap_2024-07-13_02:00:18_hourly: destination already exists
Use of uninitialized value $existing in string eq at /usr/sbin/syncoid line 750.
Broken pipe and
critical errors .

My Script:

Here is the script that I have scheduled:

#!/usr/bin/bash

## tfh-fs00 Server to pve2
/usr/sbin/syncoid --force-delete --identifier=pve2 fast200/_VMs/vm-111-disk-0 pve2:tank100/vm-10111-disk-0 && \
/usr/sbin/syncoid --force-delete --identifier=pve2 fast200/_VMs/vm-111-disk-1 pve2:tank100/vm-10111-disk-1 && \
/usr/sbin/syncoid --force-delete --identifier=pve2 fast200/_VMs/vm-111-disk-2 pve2:tank100/vm-10111-disk-2

## tfh-fs00 Server from SSD fast200 to HDs on rpool
/usr/sbin/syncoid --force-delete --identifier=rpool fast200/_VMs/vm-111-disk-0 rpool/_VMs/vm-10111-disk-0 && \
/usr/sbin/syncoid --force-delete --identifier=rpool fast200/_VMs/vm-111-disk-1 rpool/_VMs/vm-10111-disk-1 && \
/usr/sbin/syncoid --force-delete --identifier=rpool fast200/_VMs/vm-111-disk-2 rpool/_VMs/vm-10111-disk-2

Schedule

run every two hours using cron:

17 */2 * * * (/root/mirrorVMs_to-PVE2-Shadows.sh) > /dev/null

Error Sample

mbuffer: error: outputThread: error writing to <stdout> at offset 0x40000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR:  zfs send  -I 'fast200/_VMs/vm-111-disk-2'@'syncoid_rpool_tfh-pve1_2024-07-11:04:23:02-GMT-05:00' 'fast200/_VMs/vm-111-disk-2'@'syncoid_rpool_tfh-pve1_2024-07-12:20:20:47-GMT-05:00' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 950304232 |  zfs receive  -s -F 'rpool/_VMs/vm-10111-disk-2' 2>&1 failed: 256

mbuffer: error: outputThread: error writing to <stdout> at offset 0x20000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR:  zfs send  -I 'fast200/_VMs/vm-111-disk-1'@'syncoid_rpool_tfh-pve1_2024-07-12:20:17:45-GMT-05:00' 'fast200/_VMs/vm-111-disk-1'@'syncoid_rpool_tfh-pve1_2024-07-12:22:17:22-GMT-05:00' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 117088184 |  zfs receive  -s -F 'rpool/_VMs/vm-10111-disk-1' 2>&1 failed: 256

CRITICAL ERROR:  zfs send  -I 'fast200/_VMs/vm-111-disk-0'@'syncoid_rpool_tfh-pve1_2024-07-11:06:17:17-GMT-05:00' 'fast200/_VMs/vm-111-disk-0'@'syncoid_rpool_tfh-pve1_2024-07-12:18:17:27-GMT-05:00' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 34944 |  zfs receive  -s -F 'rpool/_VMs/vm-10111-disk-0' 2>&1 failed: 256

Reprouction

I try to run the syncoid line manually, and some go through fine, and then one will throw this error:

/usr/sbin/syncoid --force-delete --identifier=rpool fast200/_VMs/vm-111-disk-1 rpool/_VMs/vm-10111-disk-1
INFO: Sending incremental fast200/_VMs/vm-111-disk-1@syncoid_rpool_tfh-pve1_2024-07-12:20:17:45-GMT-05:00 ... syncoid_rpool_tfh-pve1_2024-07-12:23:17:37-GMT-05:00 to rpool/_VMs/vm-10111-disk-1 (~ 1.8 GB):
cannot restore to rpool/_VMs/vm-10111-disk-1@autosnap_2024-07-13_02:00:18_hourly: destination already exists
64.0KiB 0:00:00 [ 423KiB/s] [>                                                                            ]  0%            
mbuffer: error: outputThread: error writing to <stdout> at offset 0x30000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR:  zfs send  -I 'fast200/_VMs/vm-111-disk-1'@'syncoid_rpool_tfh-pve1_2024-07-12:20:17:45-GMT-05:00' 'fast200/_VMs/vm-111-disk-1'@'syncoid_rpool_tfh-pve1_2024-07-12:23:17:37-GMT-05:00' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 1968025760 |  zfs receive  -s -F 'rpool/_VMs/vm-10111-disk-1' 2>&1 failed: 256
Use of uninitialized value $existing in string eq at /usr/sbin/syncoid line 750.

Troubleshooting I've attempted:

Making sure both zpools have been upgraded
Deleting the destination datasets and then attempting again:
(It goes fine for a bit and then again I get the errors all over again.)
I've downloaded the latest syncoid from github.

Additionally:

Here is the portion of my sanoid.conf that is relevant to this:


[fast200/_VMs]
	use_template = production
	recursive = yes

[rpool]
	use_template = production
	recursive = yes

[rpool/_VMs]
	use_template = production
	recursive = yes

[fast200/_VMs/vm-112-disk-0]
	use_template = ignore

[fast200/_VMs/vm-112-disk-1]
	use_template = ignore

[rpool/_VMs/vm-112-disk-0]
	use_template = ignore

## This is for the replica VM of tfh-fs00
[rpool/_Shadows]
	use_template = shadows

[rpool/_VMs/vm-10111-disk-0]
	use_template = shadows

[rpool/_VMs/vm-10111-disk-1]
	use_template = shadows

[rpool/_VMs/vm-10111-disk-2]
	use_template = shadows


#############################
# templates below this line #
#############################

[template_production]
	frequently = 0
	hourly = 36
	daily = 8
	monthly = 1
	yearly = 0
	autosnap = yes
	autoprune = yes

[template_backup]
	autoprune = yes
	frequently = 0
	hourly = 0
	daily = 31
	monthly = 6
	yearly = 0

	### don't take new snapshots - snapshots on backup
	### datasets are replicated in from source, not
	### generated locally
	autosnap = no

	### monitor hourlies and dailies, but don't warn or
	### crit until they're over 48h old, since replication
	### is typically daily only
	hourly_warn = 2880
	hourly_crit = 3600
	daily_warn = 48
	daily_crit = 60

[template_shadows]
	autoprune = yes
	frequently = 0
#	hourly = 0
	daily = 31
	monthly = 6
	yearly = 0

[template_hotspare]
	autoprune = yes
	frequently = 0
	hourly = 30
	daily = 9
	monthly = 0
	yearly = 0

	### don't take new snapshots - snapshots on backup
	### datasets are replicated in from source, not
	### generated locally
	autosnap = no

	### monitor hourlies and dailies, but don't warn or
	### crit until they're over 4h old, since replication
	### is typically hourly only
	hourly_warn = 4h
	hourly_crit = 6h
	daily_warn = 2d
	daily_crit = 4d

[template_scripts]
	### information about the snapshot will be supplied as environment variables,
	### see the README.md file for details about what is passed when.
	### run script before snapshot
	pre_snapshot_script = /path/to/script.sh
	### run script after snapshot
	post_snapshot_script = /path/to/script.sh
	### run script after pruning snapshot
	pruning_script = /path/to/script.sh
	### don't take an inconsistent snapshot (skip if pre script fails)
	#no_inconsistent_snapshot = yes
	### run post_snapshot_script when pre_snapshot_script is failing
	#force_post_snapshot_script = yes
	### limit allowed execution time of scripts before continuing (<= 0: infinite)
	script_timeout = 5

[template_ignore]
	autoprune = no
	autosnap = no
	monitor = no

I'm practically pulling my hair out, I don't have this issue on any of my other proxmox servers...

The text was updated successfully, but these errors were encountered:

Mikesco3 · 2024-07-20T01:29:36Z

Update:

I copied over the executables from version 2.1.0 and my problem disappeared....

Mikesco3 · 2024-08-01T04:30:11Z

I think I may have found the issue...
I must have forgot to turn on sanoid.timer ?
I ran systemctl enable --now sanoid.timer
sanoid wasn't pruning the old snapshots... I see it pruning now a bunch of old snapshots so I'm cautiously optimistic....

Mikesco3 mentioned this issue Jul 20, 2024

syncoid trying to send an older snapshot than the incremental point when using a bookmark #860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with ZFS Replication Using Syncoid #937

Issue with ZFS Replication Using Syncoid #937

Mikesco3 commented Jul 13, 2024

Mikesco3 commented Jul 20, 2024

Mikesco3 commented Aug 1, 2024

Issue with ZFS Replication Using Syncoid #937

Issue with ZFS Replication Using Syncoid #937

Comments

Mikesco3 commented Jul 13, 2024

Description

My Script:

Schedule

Error Sample

Reprouction

Troubleshooting I've attempted:

Additionally:

Mikesco3 commented Jul 20, 2024

Update:

Mikesco3 commented Aug 1, 2024