Stream Reactor 8.1.23
DataLakes (S3, GCP) source fixes
Polling Backoff
The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.
From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.
Avoid filtering by lastSeenFile where a post process action is configured
When ordering by LastModified
and a post-process action is configured, avoid filtering to the latest result.
This change avoids bugs caused by inconsistent LastModified
dates used for sorting.
If LastModified
sorting is used, ensure objects do not arrive late, or use a post-processing step to handle them.
Add a flag to populate kafka headers with the watermark partition/offset
- This adds a connector property for GCP Storage and S3 Sources:
connect.s3.source.write.watermark.header
connect.gcpstorage.source.write.watermark.header
If set to true
then the headers in the source record produced will include details of the source and line number of the file.
If set to false
(the default) then the headers won't be set.
Currently this does not apply when using the envelope mode.