Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Successfully created the backup but failing in restore mid way ( v2.4.4 ) #476

Open
tarunpantqb opened this issue Dec 12, 2024 · 6 comments

Comments

@tarunpantqb
Copy link

tarunpantqb commented Dec 12, 2024

Current Behavior

I've taken the backup from milvus running in AWS account S3 bucket and copied the backup data into another s3 bucket of different aws account. Now, while running the restore process for second milvus instance, I get below error :

main [2024/12/12 10:37:56.456 +00:00] [INFO] [core/backup_impl_restore_backup.go:850] [getBackupPartitionPaths] [bucketName=lsai-dev-milvus-storage] [backupPath=backup/backup_2024_12_11_09_35_07_561872490] [p │
│ main [2024/12/12 10:37:56.467 +00:00] [ERROR] [core/backup_impl_restore_backup.go:357] ["executeRestoreCollectionTask failed"] [TargetDBName=default] [TargetCollectionName=publications_test] [error="bulk inse │
│ main [2024/12/12 10:37:56.467 +00:00] [ERROR] [core/backup_impl_restore_backup.go:321] ["execute restore collection fail"] [backupId=356c4cfd-b7a3-11ef-95c7-fe22564fb5a1] [error="workerpool: execute job bulk  │
│ main workerpool: execute job bulk insert fail, info: misaligned binlog count, field1:87, field0:85: importing data failed                                                                                        │
│ main duration:3328 s                                                                                                                                                                                             │
│ main time="2024-12-12T10:37:56.572Z" level=info msg="sub-process exited" argo=true error="<nil>" '

This comes midway of restore job run, it fails after restoring certain records.
I'm using V2.4.4 for both source and target milvus instances.

@tarunpantqb tarunpantqb changed the title [Bug]: Successfully created the backup but failing in restore mid way [Bug]: Successfully created the backup but failing in restore mid way ( v2.4.4 ) Dec 12, 2024
@huanghaoyuanhhy
Copy link
Collaborator

Thank you for your report!
It seems that some files may be missing during the restore process, which could be related to how the data was copied.

Could you please clarify how you performed the copy operation? It would be helpful to know the method you used. In the meantime, we recommend verifying the number of files in both the source and destination S3 buckets to ensure they match. You could also try the copying process again to ensure all files are correctly transferred.

If the issue persists, please let us know, and we'll be happy to assist you further.

@tarunpantqb
Copy link
Author

tarunpantqb commented Dec 18, 2024

Thanks @huanghaoyuanhhy for your response.
Both milvus are running in 2 different AWS accounts.

  • I ran milvus backup in source AWS account.
  • Copied the backup bucket data from source to destination AWS account
  • Ran milvus restore in destination AWS account using the bucket that has copied data.

I used aws sync for copying the data.I verified the number of objects and size of backups. Its same.

@tarunpantqb
Copy link
Author

tarunpantqb commented Dec 20, 2024

I've tried running the backup again and did a S3 copy again but still gets the same error while running restore :

`[2024/12/19 17:28:32.443 +00:00] [ERROR] [core/backup_impl_restore_backup.go:357] ["executeRestoreCollectionTask failed"] [TargetDBName=default] [TargetCollectionName=publications_chemx] [error="bulk insert fail, info: misaligned binlog count, field103:86, field0:85: importing data failed"] [errorVerbose="bulk insert fail, info: misaligned binlog count, field103:86, field0:85: importing data failed\n(1) attached stack trace\n -- stack trace:\n | github.com/zilliztech/milvus-backup/core.(*BackupContext).watchBulkInsertState\n | \t/app/core/backup_impl_restore_backup.go:804\n | github.com/zilliztech/milvus-backup/core.(*BackupContext).executeBulkInsert\n | \t/app/core/backup_impl_restore_backup.go:773\n | github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreCollectionTask.func3\n | \t/app/core/backup_impl_restore_backup.go:588\n | github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreCollectionTask.func7\n | \t/app/core/backup_impl_restore_backup.go:668\n | github.com/zilliztech/milvus-backup/internal/common.(*WorkerPool).work.func1\n | \t/app/internal/common/workerpool.go:70\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1571\nWraps: (2) bulk insert fail, info: misaligned binlog count, field103:86, field0:85: importing data failed\nError types: (1) *withstack.withStack (2) *errutil.leafError"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreBackupTask.func1\n\t/app/core/backup_impl_restore_backup.go:357\ngithub.com/zilliztech/milvus-backup/internal/common.(*WorkerPool).work.func1\n\t/app/internal/common/workerpool.go:70\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75"]
[2024/12/19 17:28:34.645 +00:00] [ERROR] [core/backup_impl_restore_backup.go:775] ["fail or timeout to bulk insert"] [error="rpc error: code = Canceled desc = context canceled"] [taskId=454680452797637579] [targetCollectionName=internet_preprint_chemx] [partitionName=partition_internet_biorxiv_2018_2018] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).executeBulkInsert\n\t/app/core/backup_impl_restore_backup.go:775\ngithub.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreCollectionTask.func3\n\t/app/core/backup_impl_restore_backup.go:588\ngithub.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreCollectionTask.func7\n\t/app/core/backup_impl_restore_backup.go:668\ngithub.com/zilliztech/milvus-backup/internal/common.(*WorkerPool).work.func1\n\t/app/internal/common/workerpool.go:70\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75"]
[2024/12/19 17:28:34.645 +00:00] [ERROR] [core/backup_impl_restore_backup.go:668] ["fail to bulk insert to partition"] [backup_db_name=default] [backup_collection_name=default] [target_db_name=default] [target_collection_name=internet_preprint_chemx] [partition=partition_internet_biorxiv_2018_2018] [error="rpc error: code = Canceled desc = context canceled"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreCollectionTask.func7\n\t/app/core/backup_impl_restore_backup.go:668\ngithub.com/zilliztech/milvus-backup/internal/common.(*WorkerPool).work.func1\n\t/app/internal/common/workerpool.go:70\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75"]
[2024/12/19 17:28:34.645 +00:00] [ERROR] [core/backup_impl_restore_backup.go:357] ["executeRestoreCollectionTask failed"] [TargetDBName=default] [TargetCollectionName=internet_preprint_chemx] [error="rpc error: code = Canceled desc = context canceled"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreBackupTask.func1\n\t/app/core/backup_impl_restore_backup.go:357\ngithub.com/zilliztech/milvus-backup/internal/common.(*WorkerPool).work.func1\n\t/app/internal/common/workerpool.go:70\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75"]
[2024/12/19 17:28:34.645 +00:00] [ERROR] [core/backup_impl_restore_backup.go:321] ["execute restore collection fail"] [backupId=92298cdf-be03-11ef-85e8-120aa4397b45] [error="workerpool: execute job bulk insert fail, info: misaligned binlog count, field103:86, field0:85: importing data failed"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).RestoreBackup\n\t/app/core/backup_impl_restore_backup.go:321\ngithub.com/zilliztech/milvus-backup/cmd.glob..func7\n\t/app/cmd/restore.go:83\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:876\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:990\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:918\ngithub.com/zilliztech/milvus-backup/cmd.Execute\n\t/app/cmd/root.go:35\nmain.main\n\t/app/main.go:24\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
workerpool: execute job bulk insert fail, info: misaligned binlog count, field103:86, field0:85: importing data failed
duration:4449 s

`

@huanghaoyuanhhy
Copy link
Collaborator

Since you've already verified the file count and size with aws sync, it seems file loss during the copy process is unlikely.

However, the issue might be with the source Milvus bucket. Could you double-check the files in the {insert_log, delta log}/collection_id/partition_id/segment_id/{0, 1, 101, 102……} directories in the source Milvus bucket? The file counts in the blue and green boxes from the screenshot I provided should match.

image

@tarunpantqb
Copy link
Author

Thanks @huanghaoyuanhhy ! I have checked on a random collection_id/partition_id and found the counts are different for few folders in source bucket. However source milvus dashboard still showing the records for that particular partition.
image

  • Should counts of all segment ids match ? If yes, then why its working for source milvus.
  • Whats the solution here if few segment ids doesn't match the count ?

@huanghaoyuanhhy
Copy link
Collaborator

huanghaoyuanhhy commented Dec 24, 2024

@tarunpantqb Thanks for checking!
It seems this issue is related to Milvus itself.

Could you please open an issue in the Milvus repository? When creating the issue, provide detailed information about your environment, the version of Milvus you are using, and whether there were any upgrades during the process.

Additionally, please include a link to this issue in your report so that the Milvus team can see the context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants