Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect: Add delta writer support #12070

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ismailsimsek
Copy link
Contributor

resolves #10842

bryanck and others added 11 commits January 23, 2025 14:26
(cherry picked from commit 2d4e680f20283efcd1064f8da33ea099133b171c)
(cherry picked from commit 12d44660fddc49fa0d60fa914ab9e8d1c9cb0867)
(cherry picked from commit c7651903a4a9daa1d5de6c5ecf4ddaa2573c0552)
(cherry picked from commit f40bac5b2179f1862df5924cec3ffaa11159f64f)
(cherry picked from commit 479a44bbb18eb554bfbab0be00a1e590decc51e2)
(cherry picked from commit 5331c8d8ffbb92551c95a6ea72abbe03ae4c01ae)
(cherry picked from commit 39982cbec35d7aeb767e07683ebc72eb1e5bec6c)
Apply spotless and fix method name
@ismailsimsek
Copy link
Contributor Author

@bryanck copied over the code as is.

Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior.
should we merge this first and add the changes with separate PR. or combine them? what do you think?

Comment on lines +71 to +72
private static final String TABLES_CDC_FIELD_PROP = "iceberg.tables.cdcField";
private static final String TABLES_UPSERT_MODE_ENABLED_PROP = "iceberg.tables.upsertModeEnabled";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe naming should change to dash-case-naming ? example cdc-field?

Comment on lines +31 to +39
/**
* This is modified from {@link org.apache.iceberg.util.StructProjection} to support record types.
*/
public class RecordProjection implements Record {

/**
* Creates a projecting wrapper for {@link Record} rows.
*
* <p>This projection does not work with repeated types like lists and maps.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add RecordProjection class to iceberg.data or iceberg.core for other downstream projects to use it?

import org.apache.iceberg.relocated.com.google.common.collect.Sets;
import org.apache.iceberg.types.TypeUtil;

abstract class BaseDeltaTaskWriter extends BaseTaskWriter<Record> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanck should we add this and other DeltaWriter* classes to core, next to AppendWriter classes? for downstream projects to reuse it?

using RecordWrapper + Operation it becomes generic enough to be in core? what do you think?

example refactored version:
BaseDeltaTaskWriter

Operation

@bryanck
Copy link
Contributor

bryanck commented Jan 24, 2025

@bryanck copied over the code as is.

Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. should we merge this first and add the changes with separate PR. or combine them? what do you think?

There are a couple of discussions on why we didn't originally add the delta writer functionality. I think we will need to resolve those discussions before we add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kafka Connect: Add delta writer support
2 participants