With many countries creating law to protect user's privacy and data - such as European GDPR or Brazilian LGPD - companies are rushing to find a way to comply to these regulations.
When storing multiple user's data in the same S3 file and we want to wipe a single user, we have to open the S3 File, remove all lines with data from a specific user and then reupload the file with that data removed or anonymized.
The S3 File Line Rewriter library aims to ease the process of rewriting those files, with a clean API that will do everything with little memory footprint by processing everything with streams.
This library is published to Bintray jcenter
, so you'll need to configure that in your repositories:
repositories {
mavenCentral()
jcenter()
}
And then you can import it into your dependencies:
dependencies {
implementation("br.com.guiabolso:s3-file-line-rewriter:{version}")
}
Using this library is very easy:
val rewriter = S3FileLineRewriter(myAmazonS3Client)
rewriter.rewriteFile(
bucket = "bucket",
key = "key"
) { lines: Sequence<String> ->
lines.map { it.replace("StringIWantToRedact", "*****") }
}
rewriter.rewriteAll(
bucket = "bucket",
prefix = "MyDirectory/SubDirectory"
) { lines: Sequence<String> ->
lines.map { it.replace("StringIWantToRedact", "*****") }
}
By default, when rewriting, this library will use \n
as the new line character. This can be changed by the System Property br.com.guiabolso.s3filelinerewriter.newline
By default, when rewriting, if a line becomes empty after transforming it, this library will remove the empty lines from the file. This can be changed by the System Property br.com.guiabolso.s3filelinerewriter.removeblank
If you have any improvements, please feel free to file a PR!