Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AWS EBS Snapshot Lock mode #1005

Closed

Conversation

ruimarinho
Copy link

@ruimarinho ruimarinho commented Aug 22, 2024

This pull request introduces support for AWS EBS Snapshot Lock to protect Amazon EBS snapshots against accidental or malicious deletions, or to store them in WORM (Write-Once-Read-Many) format for a specific duration.

While a snapshot is locked, as long as it is out of the cool-off period, it can't be deleted by any user, regardless of their IAM permissions.

Overview of changes

  • Added new command-line arguments for AWS snapshot lock mode, lock duration, cool-off period, and expiration date. Implemented mutual exclusivity checks for lock duration and expiration date, as per AWS requirements.
    • --aws-snapshot-lock-mode
    • --aws-snapshot-lock-duration
    • --aws-snapshot-lock-cool-off-period
    • --aws-snapshot-lock-expiration-date
  • Added new provider-specific snapshot_lock_mode to backup metadata json.
  • Implemented checks to prevent attempting to deleted a locked snapshot until it becomes expired.

If the approach looks good, I will start working on documentation next.

Closes #972.

@ruimarinho ruimarinho requested a review from a team as a code owner August 22, 2024 17:38
Copy link

@diogotorres97 diogotorres97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great feature that we were waiting for! Already tested it in an AWS EC2 instance with barman installed and it is working as expected!

@diogotorres97
Copy link

diogotorres97 commented Aug 28, 2024

ping

Copy link
Contributor

@andremagui andremagui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for your contribution! I’ve tested your code, and it’s functioning correctly. However, I did come across a few issues, particularly regarding the usage with a Barman server for managing backups (not the barman-cloud-backup script). Please review my comments and suggestions, make the necessary changes, and then we can move forward from there. @gustabowill will be helping as well!

Comment on lines +742 to +744
if int_value < 1 or int_value > 36500:
raise ArgumentTypeError("'%s' is outside supported range of 1-36500 days" % value)

Copy link
Contributor

@andremagui andremagui Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if int_value < 1 or int_value > 36500:
raise ArgumentTypeError("'%s' is outside supported range of 1-36500 days" % value)
if not 1 <= int_value <= 36500:
raise ArgumentTypeError(f"Value must be between 1 and 36,500 days.")

This way is more efficient because the value is compared only once in a single operation and It directly expresses the idea that the value should fall within a specific range.

Comment on lines +761 to +762
if int_value < 1 or int_value > 72:
raise ArgumentTypeError("'%s' is outside supported range of 1-72 hours" % value)
Copy link
Contributor

@andremagui andremagui Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if int_value < 1 or int_value > 72:
raise ArgumentTypeError("'%s' is outside supported range of 1-72 hours" % value)
if not 1 <= int_value <= 72:
raise ArgumentTypeError("Value must be between 1 and 72 hours.")

same here.

@@ -783,6 +821,16 @@ def check_size(value):
return int_value


def check_timestamp(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def check_timestamp(value):
def check_aws_expiration_date_format(value):

More specific name to the job.

@@ -783,6 +821,16 @@ def check_size(value):
return int_value


def check_timestamp(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add docstrings like the other aws checks

@@ -1761,6 +1761,7 @@ def test_help_output(self, minimal_parser, capsys):
if sys.version_info < (3, 10):
options_label = "optional arguments"
expected_output = self._expected_help_output.format(options_label=options_label)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert or remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +235 to +243
def parse_datetime(value):
"""
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ

:param str value: the string to evaluate
"""

return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing in the sense that it is parsing a string to datetime but it is constraining it to a specific format, which is in fact just for AWS in this case.

Suggested change
def parse_datetime(value):
"""
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ
:param str value: the string to evaluate
"""
return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ")
def parse_aws_datetime(value):
"""
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ
:param str value: the string to evaluate
"""
return check_aws_expiration_date_format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas here @gustabowill? Maybe in the future we could re-think this if we have other formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as it is for now as this date format is very common.

I think you mistyped the return statement of your suggestion @andremagui.

)

def _lock_snapshots(self, snapshots, lock_mode, lock_duration, lock_cool_off_period, lock_expiration_date):
Copy link
Contributor

@andremagui andremagui Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be revisited and refactored.

  • This should lock just one snapshot.
  • The values should not be checked but passed to the the client method call. The checks will be done higher in the code flow, so they will come to this part as they should be. Also, pay attention that if this fails somehow, the backup will also fail. The snapshot will be created though.
  • The response of the request to lock the snapshot has all the information you need to output information. Do not manipulate yourself the values in the response object, because if you use LockDuration OR LockExpiresOn, each one has a different response. If the field is not present in the response object, the lock is successful but the backup fails.
  • Use output.info() instead of logging.info(). This can be imported from barman import output.

To summarize, try to use this method to lock one snapshot, pass all the arguments, lock the snapshot and output information dynamically from the response object. In line 768, where we have for volume_identifier, volume_metadata in volumes.items():, you should lock snapshot as they are created.

Also, you should add a mutually exclusive test in the beginning of the backup process for fields that cannot be passed together, like duration and expiration_date OR governance mode with cool_off period. You did it for the client, but its not being checked when using this with a barman server


I can give you two examples of response object that i have from my tests:

LockDuration

{'SnapshotId': 'snap-00adc6c28f422e98c', 'LockState': 'governance', 'LockDuration': 1, 'LockCreatedOn': datetime.datetime(2024, 10, 22, 17, 0, 49, 253000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 23, 17, 0, 49, 253000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 22, 17, 0, 49, 253000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': '3f8c3e22-d1b2b2da0', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3f8c3e22-d1b2-b2da0', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Tue, 22 Oct 2024 17:00:49 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}

LockExpiresOn

{'SnapshotId': 'snap-0d83456985415270c', 'LockState': 'compliance-cooloff', 'CoolOffPeriod': 3, 'CoolOffPeriodExpiresOn': datetime.datetime(2024, 10, 22, 19, 55, 3, 461000, tzinfo=tzlocal()), 'LockCreatedOn': datetime.datetime(2024, 10, 22, 16, 55, 3, 461000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 24, 15, 53, 35, 457000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 22, 16, 55, 3, 461000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': 'bac96fa3-008405f8d79', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac969e2e-6808405f8d79', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Tue, 22 Oct 2024 16:55:02 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}

@@ -761,10 +781,12 @@ def take_snapshot_backup(self, backup_info, instance_identifier, volumes):
snapshot_name, snapshot_resp = self._create_snapshot(
backup_info, volume_identifier, volume_metadata.id
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

parsed_args = parser.parse_args(args=args)

# Perform mutual exclusivity check
if parsed_args.aws_snapshot_lock_duration is not None and parsed_args.aws_snapshot_lock_expiration_date is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move and refactor this in the _validate_config function in this module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only about mutually exclusiveness, we can also leverage of argparse to handle that for us. Something like this:

    s3_lock_target_group = s3_arguments.add_mutually_exclusive_group()
    s3_lock_target_group.add_argument(
        "--aws-snapshot-lock-duration",
        help="The duration (in days) for which the snapshot should be locked. Must be between 1 and 36500. To lock a snapshopt, you must specify either this argument or --aws-snapshot-lock-expiration-date, but not both.",
        type=check_aws_snapshot_lock_duration_range,
    )
    s3_lock_target_group.add_argument(
        "--aws-snapshot-lock-expiration-date",
        help="The expiration date for a locked snapshot in the format YYYY-MM-DDThh:mm:ss.sssZ. To lock a snapshot, you must specify either this argument or --aws-snapshot-lock-duration, but not both.",
        type=check_timestamp
    )

If you attempt to specify both, argparse would throw an error:

barman-cloud-backup: error: argument --aws-snapshot-lock-expiration-date: not allowed with argument --aws-snapshot-lock-duration

Comment on lines +847 to +857
logging.info(
"Snapshot %s locked in state '%s' (lock duration: %s days, cool-off period: %s hours, "
"cool-off period expires on: %s, lock expires on: %s, lock duration time: %s)",
snapshot.identifier,
resp["LockState"],
resp["LockDuration"],
resp["CoolOffPeriod"],
resp["CoolOffPeriodExpiresOn"],
resp["LockExpiresOn"],
resp["LockDurationStartTime"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In accordance with what @andremagui said, I think this is the biggest problem here. It seems the response from AWS is not always the same depending on the request parameters you pass, so you should ideally not rely on their response for logging.

E.g. if i run barman-cloud-backup with --aws-snapshot-lock-mode governance --aws-snapshot-lock-duration 1 i get:

2024-10-23 14:14:14,636 [3122] ERROR: Backup failed uploading data ('CoolOffPeriod')

Because CoolOffPeriod is not present in the response:

{'SnapshotId': 'snap-0a9e1d413ef5a9254', 'LockState': 'governance', 'LockDuration': 1, 'LockCreatedOn': datetime.datetime(2024, 10, 23, 14, 14, 14, 594000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 24, 14, 14, 14, 594000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 23, 14, 14, 14, 594000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': '91c97b91-9dde-4d38-a652-0607ff9574f8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '91c97b91-9dde-4d38-a652-0607ff9574f8', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Wed, 23 Oct 2024 14:14:14 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}

The snapshot itself was created and locked but the process errored out and the backup is not completed successfully in Barman.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use a formatted message, we should at least use resp.get to return default values when the keys are not present.

Comment on lines +893 to +901
if snapshot.snapshot_lock_mode is not None:
resp = self.ec2_client.describe_locked_snapshots(
SnapshotIds=[snapshot.identifier],
)

if resp["Snapshots"] and resp["Snapshots"][0]["LockState"] != "expired":
logging.warning("Skipping deletion of snapshot %s as it not expired yet", snapshot.identifier)
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has do be done earlier in the code. By doing it here you are only avoiding the deletion of the snapshot itself, but the backup metadata in the s3 bucket is still deleted. This would make Barman lose track of the backup while the snapshot itself will still exist.

E.g.

  1. I have a backup 20241023T135554 in my catalog:
$ barman-cloud-backup-list s3://barman-testing-snapshot-tags pg-server
Backup ID           End Time                 Begin Wal                     Archival Status  Name                                                  
20241023T135554     2024-10-23 13:56:10      000000010000000000000024
  1. I try to delete it and receive your message saying it was skipped.
$ barman-cloud-backup-delete s3://barman-testing-snapshot-tags pg-server -b 20241023T135554
2024-10-23 13:57:17,634 [2735] WARNING: Skipping deletion of snapshot snap-03c42da5c8a2f2ce7 as it not expired yet
  1. However, its metadata was deleted from s3, so Barman is no longer aware of it.
$ barman-cloud-backup-list s3://barman-testing-snapshot-tags pg-server
Backup ID           End Time                 Begin Wal                     Archival Status  Name 
$ barman-cloud-backup-delete s3://barman-testing-snapshot-tags pg-server -b 20241023T135554
2024-10-23 13:57:34,720 [2736] WARNING: Backup 20241023T135554 does not exist

I believe you would have to handle this in cloud_backup_delete.py in _delete_backup. Also in backup.py in delete_backup_data for when running it through barman standard server instead of cloud-barman-backup-delete script.

Comment on lines +235 to +243
def parse_datetime(value):
"""
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ

:param str value: the string to evaluate
"""

return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as it is for now as this date format is very common.

I think you mistyped the return statement of your suggestion @andremagui.

@gcalacoci
Copy link
Contributor

@ruimarinho we have provided some feedback, suggestions and patches to this PR, are you interested in bringing this forward yourself?

parsed_args = parser.parse_args(args=args)

# Perform mutual exclusivity check
if parsed_args.aws_snapshot_lock_duration is not None and parsed_args.aws_snapshot_lock_expiration_date is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only about mutually exclusiveness, we can also leverage of argparse to handle that for us. Something like this:

    s3_lock_target_group = s3_arguments.add_mutually_exclusive_group()
    s3_lock_target_group.add_argument(
        "--aws-snapshot-lock-duration",
        help="The duration (in days) for which the snapshot should be locked. Must be between 1 and 36500. To lock a snapshopt, you must specify either this argument or --aws-snapshot-lock-expiration-date, but not both.",
        type=check_aws_snapshot_lock_duration_range,
    )
    s3_lock_target_group.add_argument(
        "--aws-snapshot-lock-expiration-date",
        help="The expiration date for a locked snapshot in the format YYYY-MM-DDThh:mm:ss.sssZ. To lock a snapshot, you must specify either this argument or --aws-snapshot-lock-duration, but not both.",
        type=check_timestamp
    )

If you attempt to specify both, argparse would throw an error:

barman-cloud-backup: error: argument --aws-snapshot-lock-expiration-date: not allowed with argument --aws-snapshot-lock-duration

@@ -761,10 +781,12 @@ def take_snapshot_backup(self, backup_info, instance_identifier, volumes):
snapshot_name, snapshot_resp = self._create_snapshot(
backup_info, volume_identifier, volume_metadata.id
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

should be locked.
:param int lock_cool_off_period: The cool-off period (in hours) for the snapshot.
:param str lock_expiration_date: The expiration date for the snapshot in the format
YYYY-MM-DDThh:mm:ss.sssZ.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YYYY-MM-DDThh:mm:ss.sssZ.
``YYYY-MM-DDThh:mm:ss.sssZ``.

Sphinx recommendation for literals.

Comment on lines +847 to +857
logging.info(
"Snapshot %s locked in state '%s' (lock duration: %s days, cool-off period: %s hours, "
"cool-off period expires on: %s, lock expires on: %s, lock duration time: %s)",
snapshot.identifier,
resp["LockState"],
resp["LockDuration"],
resp["CoolOffPeriod"],
resp["CoolOffPeriodExpiresOn"],
resp["LockExpiresOn"],
resp["LockDurationStartTime"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use a formatted message, we should at least use resp.get to return default values when the keys are not present.

@@ -1027,12 +1103,14 @@ def __init__(
:param str device_name: The device name used in the AWS API.
:param str snapshot_id: The snapshot ID used in the AWS API.
:param str snapshot_name: The snapshot name stored in the `Name` tag.
:param str snapshot_lock_mode: The mode with which the snapshot has been locked, if set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param str snapshot_lock_mode: The mode with which the snapshot has been locked, if set.
:param str snapshot_lock_mode: The mode with which the snapshot has been locked (``governance`` or ``compliance``), if set.

Comment on lines +237 to +240
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ

:param str value: the string to evaluate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Parse a string, transforming it in a datetime object.
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ
:param str value: the string to evaluate
Parse a string, transforming it in a datetime object.
Accepted format: ``YYYY-MM-DDThh:mm:ss.sssZ``
:param str value: the string to evaluate
:return: *value* parsed as :class:`datetime.datetime`.

Comment on lines +732 to +734
"aws_snapshot_lock_duration": int,
"aws_snapshot_lock_cool_off_period": int,
"aws_snapshot_lock_expiration_date": parse_datetime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could somehow leverage of the utils functions that you created for validating the CLI arguments of barman-cloud, to validate all the new configuration options (including the lock mode).

@@ -1761,6 +1761,7 @@ def test_help_output(self, minimal_parser, capsys):
if sys.version_info < (3, 10):
options_label = "optional arguments"
expected_output = self._expected_help_output.format(options_label=options_label)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@martinmarques
Copy link
Contributor

Hello, we've had internal communication with Rui and decided the Barman team will finish the work here. Rui's commits will be used and have been a valuable source for finishing this feature which we see as very important.

This said, we will close this PR, work internally (we have integration tests we can run) and make this available in the next release if nothing unexpected comes up.

Thank you very much @ruimarinho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "AWS Snapshot Lock" feature
7 participants