Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kafka components to reduce duplication #2918

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Jeffail
Copy link
Collaborator

@Jeffail Jeffail commented Oct 7, 2024

I've done a broad refactor of the various kafka input/output components in order to:

  1. Reduce the amount of duplicate field definitions and parsers
  2. Reduce the amount of duplicate client code around applying those field definitions
  3. Improve our mechanisms for sharing client and connection info across components

There's quite a lot going on here and it touches lots of the existing connectors. The integration tests run fine but we'll need to do some manual testing of the migrator components before merging/releasing.

One facet of this work is the idea of bringing out an implementation of a "consumer" and "producer" into a reusable type, this is important as we will have multiple plugins essentially doing the same thing (reading or writing benthos messages from/to kafka records) where the implementation is non-trivial (ordering, checkpointing, batching).

Eventually I want to have a solid mechanism for instantiating inputs and outputs that will be more user friendly for guaranteeing things like strict ordering. However, this is a complicated body of work and so it's important to first ensure that it will only need to be implemeted in one place, and that the codebase is tidy enough to reason about.

@Jeffail Jeffail marked this pull request as ready for review October 7, 2024 09:30
@rockwotj rockwotj self-requested a review October 8, 2024 08:22
Copy link
Collaborator

@mihaitodor mihaitodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stellar job on this refactor 🏆 Thank you for taking the time to clean up all of this! I left a few nits in the comments, but it looks great otherwise. Feel free to :shipit:

internal/impl/kafka/franz_client.go Show resolved Hide resolved
internal/impl/kafka/franz_shared_client.go Show resolved Hide resolved
Comment on lines +76 to +79
_, exists := r.clients[name]
if exists {
return errSharedClientNameDuplicate
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks redundant.

Comment on lines +32 to +37
root = if this.partitioner == "manual" {
if this.partition.or("") == "" {
"a partition must be specified when the partitioner is set to manual"
}
} else if this.partition.or("") != "" {
"a partition cannot be specified unless the partitioner is set to manual"
}`).
Copy link
Collaborator

@mihaitodor mihaitodor Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The indentation looks a bit off

Comment on lines +106 to +107
// Deprecated
service.NewStringField(rmoFieldRackID).Deprecated(),
Copy link
Collaborator

@mihaitodor mihaitodor Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks! ❤️


if w.rackID, err = conf.FieldString("rack_id"); err != nil {
var err error
if w.recordConverter, err = kafka.NewFranzWriterFromConfig(conf, nil, nil); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for NewFranzWriterFromConfig say:

A closure function must be provided that is responsible for granting access to a connected client.

Might be worth updating that to mention it's optional.

Comment on lines 180 to 186
const (
rwFieldTopic = "topic"
rwFieldKey = "key"
rwFieldPartition = "partition"
rwFieldMetadata = "metadata"
rwFieldTimestamp = "timestamp"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be prefixed with fw (or kfw like the ones above) instead of rw?

@@ -55,76 +56,20 @@ if this.partition.or("") == "" {
// FranzKafkaOutputConfigFields returns the full suite of config fields for a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The lint rule indentation above is a bit off.

Copy link
Collaborator

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legendary - thanks for cleaning this up! A couple nits for you.

Comment on lines 26 to 27
service.NewIntField(roFieldMaxInFlight).
Description("The maximum number of batches to be sending in parallel at any given time.").
Default(10),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use service.NewOutputMaxInFlightField()

maxInFlight int,
err error,
) {
if maxInFlight, err = conf.FieldInt(roFieldMaxInFlight); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: conf.FieldMaxInFlight()

Default("").
Example("__redpanda.connect.status"),

// Deprecated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark it deprecated in the config too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants