From 8c678f3530f8d3f37f310847352b9efc8513c500 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Sat, 21 Sep 2024 17:20:44 -0500 Subject: [PATCH 01/66] Create a new post for creating a new database integration with data prepper Signed-off-by: Taylor Gray --- ...ase-integration-with-data-prepper.markdown | 393 ++++++++++++++++++ 1 file changed, 393 insertions(+) create mode 100644 _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown new file mode 100644 index 0000000000..66c8382348 --- /dev/null +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -0,0 +1,393 @@ +--- +layout: post +title: "How to Create a New Database Integration with Data Prepper" +authors: +- tylgry +- dinujoh +date: 2022-09-21 10:00:00 -0500 +categories: + - technical-post +twittercard: + description: " Data Prepper contains support for migrating databases like MongoDB and DynamoDB to OpenSearch. This infrastructure is extendable to other databases, and involves creating a new Data Prepper source plugin." +redirect_from: "/blog/technical-post/2024/09/How to Create a New Database Integration with Data Prepper/" +--- + +Data Prepper is an open source data collector for trace and log data that can filter, +enrich, and aggregate data for downstream analysis and visualization in OpenSearch. + +Data Prepper pipelines are made up of a source, an optional set of processors, and one or more sinks. See [ Data Prepper +Key Concepts and Fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals) +for more details. The following sections discuss the process required to create a new database source integration in Data Prepper. + +### Push-based vs. Pull-based sources + +Data Prepper source plugins can be divided into two different categories: push-based and pull-based. Pull-based sources, +like HTTP and OpenTelemetry, are easy to scale between Data Prepper containers. Kubernetes, Nginx, or Docker Swarm will support +load balancing between Data Prepper containers for push-based sources. + +Pull-based sources, on the other hand, need an alternative mechanism to scale and distribute work between a fleet of Data Prepper containers. For Data Prepper, +this alternative mechanism is [Source Coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/). + +Source coordination uses an external store that acts as a lease table, similar to the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html), + + +### Choosing a partition of work +Source coordination enables Data Prepper to divide "partitions" of work between Data Prepper containers. + +For example, for Data Prepper's S3 source, a partition of work is a single S3 object. For OpenSearch as a source, +a partition of work is an OpenSearch index. For DynamoDB as a source, there are two types of partitions: an S3 data file for the one-time export, +and a shard when reading from DynamoDB streams. + +When creating a new Data Prepper source that requires source coordination, one of the first questions to answer is "What are my partitions of work?" + +### Creating a Data Prepper plugin with Source Coordination Enabled + +To create a plugin that uses Source Coordination, the minimum required amount of code is to create two classes: one for the plugin itself, and one for the configuration of the plugin. +The configuration will contain the necessary parameters users must provide to run your plugin. For example, it may contain the database name or endpoint, as well as information on authorization and performance tuning. +Anything that a user would need to provide to run your plugin should be defined in this configuration. + +To get started, see this [sample source example](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java). +This example is for a hypothetical database, where the only configurations required are `database_name`, `username`, and `password` ([Configuration Class](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSourceConfig.java)). Note in the example code that the plugin name and configuration class is defined in the `@DataPrepperPlugin` annotation on the class. +When running Data Prepper, the pipeline.yaml for this source would be as follows: + +```yaml +version: 2 +sample-source-pipeline: + source: + sample_source: + database_name: "my-database" + username: 'my-username' + password: 'my-password' + sink: + - stdout: +``` + +### Using the Source Coordination APIs + +The [Source Coordination Interface](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/coordinator/enhanced/EnhancedSourceCoordinator.java) defines +the methods available for interacting with the [Source Coordination Store](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/SourceCoordinationStore.java). + +These methods enable plugin creators to handle CRUD operations on their partitions, as well as acquiring the next partition to be worked on with the `acquireAvailablePartition(String partitionType)` method. A common pattern when using Source Coordination +is to assign a "leader" Data Prepper container that is responsible for partition discovery and creation of partitions. This is achieved by creating one partition on startup of Data Prepper that is the "Leader Partition", and utilizing the `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` +method to determine if the job of partition discovery and creation of other partitions is assigned to a given Data Prepper container. + +The following code snippet demonstrates a simple flow while using source coordination. This example references a hypothetical database, where the "partition" is a single database file. +The full code for this example can be found [here](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40). + +The overview of this code is as follows for the different numbered sections + +1. A leader partition was created on startup of Data Prepper ([Code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). There is only one leader partition, and whichever Data Prepper node acquires it with the method call to `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` will now be assigned the responsibility of discovering new database files. +2. If the leader partition is owned by this Data Prepper node, it will query the hypothetical database and create new partitions in the Source Coordination Store. When these partitions are created in the store, all Data Prepper nodes running this source will be able to acquire the database file partitions. +3. Acquire a database file partition to process. If no database files need to be processed at this time, an empty Optional will be returned +4. Process the database file by writing the records from that file into the Data Prepper buffer as individual Events. After the database file has all records written to the buffer, mark the database file partition as COMPLETED in the source coordination store. This will make it so the database file is not processed again. + +```java +public void run() { + + while (!Thread.currentThread().isInterrupted()) { + try { + + // 1 - Check if this node is already the leader. If it is not, then try to acquire leadership in case the leader node has crashed + if (leaderPartition == null) { + final Optional sourcePartition = sourceCoordinator.acquireAvailablePartition(LeaderPartition.PARTITION_TYPE); + if (sourcePartition.isPresent()) { + LOG.info("Running as a LEADER that will discover new database files and create partitions"); + leaderPartition = (LeaderPartition) sourcePartition.get(); + } + } + + // 2- If this node is the leader, run discovery of new database files and create partitions + if (leaderPartition != null) { + final List> databaseFilePartitions = discoverDatabaseFilePartitions(); + LOG.info("Discovered {} new database file partitions", databaseFilePartitions.size()); + + databaseFilePartitions.forEach(databaseFilePartition -> { + sourceCoordinator.createPartition(databaseFilePartition); + }); + + LOG.info("Created {} database file partitions in the source coordination store", databaseFilePartitions.size()); + } + + // 3 - Grab a database file partition, process it by writing to the buffer, and mark that database file partition as completed + final Optional databaseFilePartition = sourceCoordinator.acquireAvailablePartition(DatabaseFilePartition.PARTITION_TYPE); + + // 4 - If it's empty that means there are no more database files to process for now. If it's not empty, the database file is processed and then marked as COMPLETED in the source coordination store + if (databaseFilePartition.isPresent()) { + processDataFile(databaseFilePartition.get().getPartitionKey()); + sourceCoordinator.completePartition(databaseFilePartition.get()); + } + + } catch (final Exception e) { + LOG.error("Received an exception in DatabaseWorker loop, retrying"); + } + } +} +``` + +### Running and Testing Data Prepper with Source Coordination + +To create a new plugin, it is helpful to first experience setting up and running Data Prepper locally. The following steps will outline how to get Data Prepper up and running with a pipeline +that is streaming documents from MongoDB, and writing them to OpenSearch using source coordination. The example will only use a single Data Prepper application instance, +but using source coordination allows for scaling if multiple instances of Data Prepper were run with the same pipeline configuration (i.e. pointing to the same MongoDB database), +and utilizing the same source coordination store as defined in the `data-prepper-config.yaml`. + +#### Step 1 - Set up Data Prepper locally for development + +The [Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) demonstrates all of the ways to run Data Prepper. +When developing a new source plugin, one must clone the Data Prepper repository and build from source with the following commands + + +Clone the Data Prepper repository +``` +git clone https://github.com/opensearch-project/data-prepper.git +``` + +Build data prepper from source + +``` +./gradlew assemble +``` + +#### Step 2 - Set up MongoDB locally + +Follow the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/) to install and run MongoDB. Before you run MongoDB, +you will need to enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following [Converting your Standalone Self-Manged MongoDB to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). + +Run `mongosh` to enter the shell. Once in the shell, create a new user and password. This username and password will be specified later in the Data Prepper `pipeline.yaml`. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information on creating users with MongoDB. + +``` +use admin +db.createUser({"user": "dbuser","pwd": "admin1234","roles": []}); +``` + +Create a new database named `demo`: + +``` +use demo +``` + +Now create a new MongoDB collection in the `demo` database named `demo_collection` with + +``` +db.createCollection("demo_collection") +``` + +Insert some records into the collection. These will be processed during the export phase of the MongoDB pipeline. + +``` +db.demo_collection.insertOne({"key-one": "value-one"}) +db.demo_collection.insertOne({"key-two": "value-two"}) +db.demo_collection.insertOne({"key-three": "value-three"}) +``` + +#### Step 3 - Set up OpenSearch locally + +Follow the [Installation Quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/) guide to run OpenSearch locally. + +#### Step 4 - Create an AWS S3 bucket +Follow the steps here to [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). If you already have a bucket, this step can be skipped. Since only one Data Prepper node can read from MongoDB streams +at a time, this S3 bucket will be used by the pipeline to provide a way to parallelize the processing and writing of data to OpenSearch between multiple Data Prepper containers when running in a multi-node environment. + +#### Step 5 - Get credentials for accessing AWS DynamoDB and AWS S3 + +To create and interact with the DynamoDB Source Coordination Store that will be created on startup of Data Prepper, and to access the S3 bucket that was created in step 4, create a role with the following AWS policy permissions. +Be sure to replace the `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID`. + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "s3Access", + "Effect": "Allow", + "Action": [ + "s3:PutObject" + ], + "Resource": [ "arn:aws:s3:::{{MONGODB_BUCKET}}/*" ] + }, + { + "Sid": "allowReadingFromS3Buckets", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:DeleteObject", + "s3:GetBucketLocation", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::{{MONGODB_BUCKET}}", + "arn:aws:s3:::{{MONGODB_BUCKET}}/*" + ] + }, + { + "Sid": "allowListAllMyBuckets", + "Effect":"Allow", + "Action":"s3:ListAllMyBuckets", + "Resource":"arn:aws:s3:::*" + }, + { + "Sid": "ReadWriteSourceCoordinationDynamoStore", + "Effect": "Allow", + "Action": [ + "dynamodb:DescribeTimeToLive", + "dynamodb:UpdateTimeToLive", + "dynamodb:DescribeTable", + "dynamodb:CreateTable", + "dynamodb:GetItem", + "dynamodb:DeleteItem", + "dynamodb:PutItem", + "dynamodb:Query" + ], + "Resource": [ + "arn:aws:dynamodb:${REGION}:${AWS_ACCOUNT_ID}:table/DataPrepperSourceCoordinationStore", + "arn:aws:dynamodb:${REGION}:${AWS_ACCOUNT_ID}:table/DataPrepperSourceCoordinationStore/index/source-status" + ] + } + ] +} +``` + +Now run + +``` +aws configure +``` + +and insert the `Access Key Id` and `Secret Access Key` for credentials that allow the previously created role to be assumed. + +Set the following environment variables: + +``` +export AWS_REGION="{{REGION}}" +export SOURCE_COORDINATION_PIPELINE_IDENTIFIER="test-mongodb" +``` + +The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` should be the same as the `partition_prefix` in the `data-prepper-config.yaml` that +will be created in the next step. + +#### Step 6 - Create the data-prepper-config.yaml + +The `data-prepper-config.yaml` is used to configure the source coordination store for Data Prepper. At this time, +only DynamoDB is supported as a source coordination store. + +Create a file named `data-prepper-config.yaml` in the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/config/` directory +and place the following contents into it. Be sure to replace the `REGION` with the desired region the DynamoDB table will be created in, as well as the `ROLE_ARN_FROM_STEP_5`: + +```yaml +ssl: false +source_coordination: + partition_prefix: "test-mongodb" + store: + dynamodb: + sts_role_arn: "{{ROLE_ARN_FROM_STEP_5}}" + table_name: "DataPrepperSourceCoordinationStore" + region: "{{REGION}}" + skip_table_creation: false +``` + +Note how the `skip_table_creation` parameter is set to false. This will make it so Data Prepper will attempt to create the table on startup if it does not exist. +After the first time Data Prepper is run, this flag can be set to true to speed up the startup of Data Prepper. + +Also note the `partition_prefix`. This prefix makes it easy to do a soft reset of the pipeline in the coordination store. If you are testing a new source plugin, simply bumping the prefix between each +run (i.e. `test-mongodb-1`, `test-mongodb-2`) will make it so Data Prepper ignores DynamoDB items from the previous test run. + +#### Step 7 - Create the pipeline.yaml + +Create a file named `pipeline.yaml` in the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, and place the following contents into it. +Be sure to replace `S3_BUCKET_NAME`, `S3_BUCKET_REGION`, `ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: + +```yaml +pipeline: + workers: 2 + delay: 0 + buffer: + bounded_blocking: + batch_size: 125000 + buffer_size: 1000000 + source: + mongodb: + host: "localhost" + port: 27017 + acknowledgments: true + s3_bucket: "{{S3_BUCKET_NAME}}" + s3_region: "{{S3_BUCKET_REGION}}" + s3_prefix: "mongodb-opensearch" + insecure: "true" + ssl_insecure_disable_verification: "true" + authentication: + username: "dbuser" + password: "admin1234" + collections: + - collection: "demo.demo_collection" + export: true + stream: true + aws: + sts_role_arn: "{{ROLE_ARN_FROM_STEP_5}}" + sink: + - opensearch: + hosts: [ "http://localhost:9200" ] + index: "mongodb-index" + document_id: "${getMetadata(\"primary_key\")}" + action: "${getMetadata(\"opensearch_action\")}" + document_version: "${getMetadata(\"document_version\")}" + document_version_type: "external" + # Default username + username: "admin" + # Change to your OpenSearch password if needed. For running OpenSearch with Docker Compose, this is set by the environment variable OPENSEARCH_INITIAL_ADMIN_PASSWORD + password: "OpenSearchMongoDB1#" + flush_timeout: -1 +``` + +#### Step 8 - Run the pipeline + +Now that you have the AWS credentials, and MongoDB and OpenSearch are both running locally, you can now start the pipeline. + +Move to the directory containing the Data Prepper binaries + +``` +cd data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64 +``` + +Start Data Prepper + +``` +bin/data-prepper +``` + +#### Step 9 - Observe the documents in OpenSearch + +It may take a minute for the export to go through. Once you see a log like `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition` from Data Prepper, +visit OpenSearch Dashboards in your browser at `http://localhost:5601`. + + +Go to `Dev Tools` and run `GET mongodb-index/_search`. You should see the documents you added to MongoDB in step 2. + +#### Step 10 - Insert some sample documents to MongoDB + +``` +db.demo_collection.insertOne({"key-four": "value-four"}) +db.demo_collection.insertOne({"key-five": "value-five"}) +db.demo_collection.insertOne({"key-six": "value-six"}) +``` + +These documents will now be pulled by Data Prepper's MongoDB source from the MongoDB streams. + +#### Step 11 - Observe the documents in OpenSearch + +Once you see another Data Prepper log like `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, +go back to OpenSearch Dashboards Dev Tools and run another query on the index with `GET mongodb-index/_search`. + +#### Step 12 - Cleanup resources + +You are all done! Remember to delete the DynamoDB Source Coordination Store and the S3 bucket, and to stop Data Prepper, MongoDB, and OpenSearch. + +### Summary + +After reading this and following along, you hopefully will have a better understanding of the internals of Data Prepper, and how to create a new plugin +that can scale via Source Coordination. If you have any ideas or proposals on new database plugins for Data Prepper, +questions about creating a new plugin for a database, or even just general questions about Data Prepper, +please don't hesitate to [Create a new discussion](https://github.com/opensearch-project/data-prepper/discussions), +and the community and Data Prepper maintainers would be happy to help! + + + From da9b752b8e38c450eefb52a77649045ffd8eb5ca Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 1 Nov 2024 12:33:41 -0400 Subject: [PATCH 02/66] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- ...base-integration-with-data-prepper.markdown | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 66c8382348..8a51b7efcd 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -1,6 +1,6 @@ --- layout: post -title: "How to Create a New Database Integration with Data Prepper" +title: "Step-by-step: Creating a new database integration using Data Prepper" authors: - tylgry - dinujoh @@ -8,20 +8,18 @@ date: 2022-09-21 10:00:00 -0500 categories: - technical-post twittercard: - description: " Data Prepper contains support for migrating databases like MongoDB and DynamoDB to OpenSearch. This infrastructure is extendable to other databases, and involves creating a new Data Prepper source plugin." -redirect_from: "/blog/technical-post/2024/09/How to Create a New Database Integration with Data Prepper/" + description: "Data Prepper offers a flexible framework for database migration, supporting sources like MongoDB and DynamoDB. You can extend this capability to new databases by implementing a Data Prepper source plugin." --- -Data Prepper is an open source data collector for trace and log data that can filter, -enrich, and aggregate data for downstream analysis and visualization in OpenSearch. +Data Prepper, an open source data collector, enables you to collect, filter, enrich, and aggregate trace and log data. With Data Prepper, you can prepare your data for downstream analysis and visualization in OpenSearch. -Data Prepper pipelines are made up of a source, an optional set of processors, and one or more sinks. See [ Data Prepper -Key Concepts and Fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals) -for more details. The following sections discuss the process required to create a new database source integration in Data Prepper. +Data Prepper pipelines consist of three main components: a source, an optional set of processors, and one or more sinks. For more information, see [ Data Prepper key concepts and fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals). The following sections outline the steps necessary for implementing a new database source integration within Data Prepper. -### Push-based vs. Pull-based sources +### Understanding push-based and pull-based sources -Data Prepper source plugins can be divided into two different categories: push-based and pull-based. Pull-based sources, +Data Prepper source plugins fall into two categories: push-based and pull-based. + +_Pull-based sources_ such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute workload across Data Prepper containers. like HTTP and OpenTelemetry, are easy to scale between Data Prepper containers. Kubernetes, Nginx, or Docker Swarm will support load balancing between Data Prepper containers for push-based sources. From e62c8532b024e0382b30657efa19fd0ae9535cb3 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 1 Nov 2024 12:46:24 -0400 Subject: [PATCH 03/66] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- ...ase-integration-with-data-prepper.markdown | 38 ++++++++----------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 8a51b7efcd..71a572e7cb 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -20,33 +20,28 @@ Data Prepper pipelines consist of three main components: a source, an optional s Data Prepper source plugins fall into two categories: push-based and pull-based. _Pull-based sources_ such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute workload across Data Prepper containers. -like HTTP and OpenTelemetry, are easy to scale between Data Prepper containers. Kubernetes, Nginx, or Docker Swarm will support -load balancing between Data Prepper containers for push-based sources. -Pull-based sources, on the other hand, need an alternative mechanism to scale and distribute work between a fleet of Data Prepper containers. For Data Prepper, -this alternative mechanism is [Source Coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/). +Unlike push-based sources, pull-based sources in Data Prepper use [Source coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/) to achieve scalability and work distribution across multiple containers. Source coordination uses an external store functioning as a lease table, similar to the approach used by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html). -Source coordination uses an external store that acts as a lease table, similar to the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html), +### Defining work partitions for source coordination -### Choosing a partition of work -Source coordination enables Data Prepper to divide "partitions" of work between Data Prepper containers. +Data Prepper uses source coordination to distribute work partitions" across Data Prepper containers. -For example, for Data Prepper's S3 source, a partition of work is a single S3 object. For OpenSearch as a source, -a partition of work is an OpenSearch index. For DynamoDB as a source, there are two types of partitions: an S3 data file for the one-time export, -and a shard when reading from DynamoDB streams. +For new Data Prepper sources using source coordination, identifying and delineating work partitions is a fundamental first step. -When creating a new Data Prepper source that requires source coordination, one of the first questions to answer is "What are my partitions of work?" +Data Prepper defines work partitions differently for various sources. In the S3 source, each S3 object represents a partition. For OpenSearch, an index serves as a partition. DynamoDB sources have dual partition types: S3 data files for exports and shards for stream processing. -### Creating a Data Prepper plugin with Source Coordination Enabled -To create a plugin that uses Source Coordination, the minimum required amount of code is to create two classes: one for the plugin itself, and one for the configuration of the plugin. -The configuration will contain the necessary parameters users must provide to run your plugin. For example, it may contain the database name or endpoint, as well as information on authorization and performance tuning. -Anything that a user would need to provide to run your plugin should be defined in this configuration. +### Creating a source coordination-enabled Data Prepper plugin -To get started, see this [sample source example](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java). -This example is for a hypothetical database, where the only configurations required are `database_name`, `username`, and `password` ([Configuration Class](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSourceConfig.java)). Note in the example code that the plugin name and configuration class is defined in the `@DataPrepperPlugin` annotation on the class. -When running Data Prepper, the pipeline.yaml for this source would be as follows: +A source coordination plugin consists of to two classes: the main plugin class and a configuration class. The configuration class specifies all required users inputs, from the data endpoints to authorization details and performance tuning parameters. All user-required inputs for plugin operation should be specified within this configuration class. + +For a practical starting point, refer to the [sample source code](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java) in the Data Prepper repository. + +This example demonstrates a basic configuration for a [hypothetical database source](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSourceConfig.java), requiring only `database_name`, `username`, and `password`. The plugin name and configuration class are defined in the `@DataPrepperPlugin` annotation. + +The `pipeline.yaml` for running this source in Data Prepper would be structured as follows: ```yaml version: 2 @@ -60,12 +55,11 @@ sample-source-pipeline: - stdout: ``` -### Using the Source Coordination APIs +### Using the source coordination APIs -The [Source Coordination Interface](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/coordinator/enhanced/EnhancedSourceCoordinator.java) defines -the methods available for interacting with the [Source Coordination Store](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/SourceCoordinationStore.java). +The [source coordination interface](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/coordinator/enhanced/EnhancedSourceCoordinator.java) defines the methods available for interacting with the [source coordination store](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/SourceCoordinationStore.java). -These methods enable plugin creators to handle CRUD operations on their partitions, as well as acquiring the next partition to be worked on with the `acquireAvailablePartition(String partitionType)` method. A common pattern when using Source Coordination +These methods provide for managing partition CRUD operations and getting the next available partition using `acquireAvailablePartition(String partitionType)`. A common source coordination pattern assigns a "leader" Data Prepper container for partition discovery and creation. This is done by initializing a "leader partition" at startup and using `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` to assign partition management responsibilities. is to assign a "leader" Data Prepper container that is responsible for partition discovery and creation of partitions. This is achieved by creating one partition on startup of Data Prepper that is the "Leader Partition", and utilizing the `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` method to determine if the job of partition discovery and creation of other partitions is assigned to a given Data Prepper container. From 0f368a9cc3b2e64955e275389385f25b41c4f7f5 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 1 Nov 2024 12:48:24 -0400 Subject: [PATCH 04/66] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- ...ase-integration-with-data-prepper.markdown | 74 ++++++++----------- 1 file changed, 30 insertions(+), 44 deletions(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 71a572e7cb..aad704ffc7 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -60,18 +60,15 @@ sample-source-pipeline: The [source coordination interface](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/coordinator/enhanced/EnhancedSourceCoordinator.java) defines the methods available for interacting with the [source coordination store](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/SourceCoordinationStore.java). These methods provide for managing partition CRUD operations and getting the next available partition using `acquireAvailablePartition(String partitionType)`. A common source coordination pattern assigns a "leader" Data Prepper container for partition discovery and creation. This is done by initializing a "leader partition" at startup and using `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` to assign partition management responsibilities. -is to assign a "leader" Data Prepper container that is responsible for partition discovery and creation of partitions. This is achieved by creating one partition on startup of Data Prepper that is the "Leader Partition", and utilizing the `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` -method to determine if the job of partition discovery and creation of other partitions is assigned to a given Data Prepper container. -The following code snippet demonstrates a simple flow while using source coordination. This example references a hypothetical database, where the "partition" is a single database file. -The full code for this example can be found [here](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40). +The following code snippet shows a basic source coordination workflow, using a hypothetical database where each partition represent an individual database file. See the [full code](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40) on GitHub. -The overview of this code is as follows for the different numbered sections +The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: -1. A leader partition was created on startup of Data Prepper ([Code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). There is only one leader partition, and whichever Data Prepper node acquires it with the method call to `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` will now be assigned the responsibility of discovering new database files. -2. If the leader partition is owned by this Data Prepper node, it will query the hypothetical database and create new partitions in the Source Coordination Store. When these partitions are created in the store, all Data Prepper nodes running this source will be able to acquire the database file partitions. -3. Acquire a database file partition to process. If no database files need to be processed at this time, an empty Optional will be returned -4. Process the database file by writing the records from that file into the Data Prepper buffer as individual Events. After the database file has all records written to the buffer, mark the database file partition as COMPLETED in the source coordination store. This will make it so the database file is not processed again. +1. Upon Data Prepper's startup, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. +2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. +3. A database file partition is acquired for processing. In cases where no partitions need processing, an empty `Optional` is returned. +4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring it is not not processed again. ```java public void run() { @@ -116,25 +113,22 @@ public void run() { } ``` -### Running and Testing Data Prepper with Source Coordination +### Running and testing Data Prepper using source coordination -To create a new plugin, it is helpful to first experience setting up and running Data Prepper locally. The following steps will outline how to get Data Prepper up and running with a pipeline -that is streaming documents from MongoDB, and writing them to OpenSearch using source coordination. The example will only use a single Data Prepper application instance, -but using source coordination allows for scaling if multiple instances of Data Prepper were run with the same pipeline configuration (i.e. pointing to the same MongoDB database), -and utilizing the same source coordination store as defined in the `data-prepper-config.yaml`. +Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you in configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. -#### Step 1 - Set up Data Prepper locally for development +#### Step 1 - Set up Data Prepper for local development -The [Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) demonstrates all of the ways to run Data Prepper. -When developing a new source plugin, one must clone the Data Prepper repository and build from source with the following commands +The [Data Prepper developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview for running Data Prepper in various environments. +Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: -Clone the Data Prepper repository +- Clone the Data Prepper repository ``` git clone https://github.com/opensearch-project/data-prepper.git ``` -Build data prepper from source +- Build Data Prepper from source ``` ./gradlew assemble @@ -142,29 +136,28 @@ Build data prepper from source #### Step 2 - Set up MongoDB locally -Follow the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/) to install and run MongoDB. Before you run MongoDB, -you will need to enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following [Converting your Standalone Self-Manged MongoDB to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). +First, install and configure MongoDB using the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/). Before running MongoDB, enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following the instructions in [Convert a Standalone Self-Managed mongod to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). -Run `mongosh` to enter the shell. Once in the shell, create a new user and password. This username and password will be specified later in the Data Prepper `pipeline.yaml`. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information on creating users with MongoDB. +Next, launch the MongoDB shell by running `mongosh`, and then create a new user and password within the shell using the following syntax. The username and password are required later in the Data Prepper `pipeline.yaml`. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information about MongoDB user creation. ``` use admin db.createUser({"user": "dbuser","pwd": "admin1234","roles": []}); ``` -Create a new database named `demo`: +Then, create a new database named `demo`: ``` use demo ``` -Now create a new MongoDB collection in the `demo` database named `demo_collection` with +Next, create a new MongoDB collection named `demo_collection` in your `demo` database with this syntax: ``` db.createCollection("demo_collection") ``` -Insert some records into the collection. These will be processed during the export phase of the MongoDB pipeline. +Finally, add sample records to the collection using the following syntax. These records are processed during the MongoDB pipeline's export phase: ``` db.demo_collection.insertOne({"key-one": "value-one"}) @@ -174,16 +167,14 @@ db.demo_collection.insertOne({"key-three": "value-three"}) #### Step 3 - Set up OpenSearch locally -Follow the [Installation Quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/) guide to run OpenSearch locally. +To run OpenSearch locally, follow the steps in the [Installation quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/). -#### Step 4 - Create an AWS S3 bucket -Follow the steps here to [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). If you already have a bucket, this step can be skipped. Since only one Data Prepper node can read from MongoDB streams -at a time, this S3 bucket will be used by the pipeline to provide a way to parallelize the processing and writing of data to OpenSearch between multiple Data Prepper containers when running in a multi-node environment. +#### Step 4 - Create an Amazon S3 bucket +Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 buckets enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, as only one node can read from MongoDB streams at a time. -#### Step 5 - Get credentials for accessing AWS DynamoDB and AWS S3 +#### Step 5 - Get AWS credentials for DynamoDB and S3 access -To create and interact with the DynamoDB Source Coordination Store that will be created on startup of Data Prepper, and to access the S3 bucket that was created in step 4, create a role with the following AWS policy permissions. -Be sure to replace the `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID`. +Set up an AWS role with the following policy permissions to enable Data Prepper to interact with the DynamoDB source coordination store and the S3 bucket from step 4. Make sure to replace `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID` with your unique values. ```json { @@ -239,31 +230,27 @@ Be sure to replace the `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID`. } ``` -Now run +Run the following command, then enter the `Access Key Id` and `Secret Access Key` associated with credentials that correspond to the previously defined role: ``` aws configure ``` -and insert the `Access Key Id` and `Secret Access Key` for credentials that allow the previously created role to be assumed. -Set the following environment variables: +Then, set the following environment variables: ``` export AWS_REGION="{{REGION}}" export SOURCE_COORDINATION_PIPELINE_IDENTIFIER="test-mongodb" ``` -The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` should be the same as the `partition_prefix` in the `data-prepper-config.yaml` that -will be created in the next step. +The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` must correspond to the `partition_prefix` that you will define in the `data-prepper-config.yaml` in step 6. #### Step 6 - Create the data-prepper-config.yaml -The `data-prepper-config.yaml` is used to configure the source coordination store for Data Prepper. At this time, -only DynamoDB is supported as a source coordination store. +Configure the source coordination store for Data Prepper using the `data-prepper-config.yaml` file. Currently, this store exclusively supports DynamoDB. -Create a file named `data-prepper-config.yaml` in the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/config/` directory -and place the following contents into it. Be sure to replace the `REGION` with the desired region the DynamoDB table will be created in, as well as the `ROLE_ARN_FROM_STEP_5`: +In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/config/` directory, create a file named `data-prepper-config.yaml`. Insert the following content, replacing `REGION` with your desired DynamoDB table region and `ROLE_ARN_FROM_STEP_5` with the appropriate role ARN: ```yaml ssl: false @@ -277,10 +264,9 @@ source_coordination: skip_table_creation: false ``` -Note how the `skip_table_creation` parameter is set to false. This will make it so Data Prepper will attempt to create the table on startup if it does not exist. -After the first time Data Prepper is run, this flag can be set to true to speed up the startup of Data Prepper. +The `skip_table_creation` parameter is set to `false`, instructing Data Prepper create the table on startup if it is missing. For subsequent runs, you can set this flag to `true` to accelerate startup speed. -Also note the `partition_prefix`. This prefix makes it easy to do a soft reset of the pipeline in the coordination store. If you are testing a new source plugin, simply bumping the prefix between each +The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures Data Prepper ignores DynamoDB items from the previous test runs. run (i.e. `test-mongodb-1`, `test-mongodb-2`) will make it so Data Prepper ignores DynamoDB items from the previous test run. #### Step 7 - Create the pipeline.yaml From acc61a09d165a40c8ce3887c6f765d1ce73c6532 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 1 Nov 2024 12:49:27 -0400 Subject: [PATCH 05/66] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- ...ase-integration-with-data-prepper.markdown | 41 ++++++++----------- 1 file changed, 17 insertions(+), 24 deletions(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index aad704ffc7..04c0b6fe4e 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -267,12 +267,10 @@ source_coordination: The `skip_table_creation` parameter is set to `false`, instructing Data Prepper create the table on startup if it is missing. For subsequent runs, you can set this flag to `true` to accelerate startup speed. The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures Data Prepper ignores DynamoDB items from the previous test runs. -run (i.e. `test-mongodb-1`, `test-mongodb-2`) will make it so Data Prepper ignores DynamoDB items from the previous test run. -#### Step 7 - Create the pipeline.yaml +#### Step 7 - Create the `pipeline.yaml` file -Create a file named `pipeline.yaml` in the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, and place the following contents into it. -Be sure to replace `S3_BUCKET_NAME`, `S3_BUCKET_REGION`, `ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: +In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file with the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: ```yaml pipeline: @@ -318,54 +316,49 @@ pipeline: #### Step 8 - Run the pipeline -Now that you have the AWS credentials, and MongoDB and OpenSearch are both running locally, you can now start the pipeline. +With AWS credentials configured and both MongoDB and OpenSearch running on your local machine, you can launch the pipeline. -Move to the directory containing the Data Prepper binaries +First, navigate to the directory containing the Data Prepper binaries: ``` cd data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64 ``` -Start Data Prepper +Then, start Data Prepper using the following command: ``` bin/data-prepper ``` -#### Step 9 - Observe the documents in OpenSearch +#### Step 9 - Review the documents in OpenSearch -It may take a minute for the export to go through. Once you see a log like `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition` from Data Prepper, -visit OpenSearch Dashboards in your browser at `http://localhost:5601`. +Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. +Go to the **Dev Tools** application and enter `GET mongodb-index/_search` into the console editor to retrieve the MongoDB documents you created in step 2. -Go to `Dev Tools` and run `GET mongodb-index/_search`. You should see the documents you added to MongoDB in step 2. - -#### Step 10 - Insert some sample documents to MongoDB - +#### Step 10 - Add sample documents to MongoDB +Add sample documents to MongoDB using the following command: + ``` db.demo_collection.insertOne({"key-four": "value-four"}) db.demo_collection.insertOne({"key-five": "value-five"}) db.demo_collection.insertOne({"key-six": "value-six"}) ``` -These documents will now be pulled by Data Prepper's MongoDB source from the MongoDB streams. +The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. -#### Step 11 - Observe the documents in OpenSearch +#### Step 11 - Review the documents in OpenSearch -Once you see another Data Prepper log like `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, -go back to OpenSearch Dashboards Dev Tools and run another query on the index with `GET mongodb-index/_search`. +As soon as Data Prepper generates another log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, return to **Dev Tools** and run another search on the index using `GET mongodb-index/_search`. -#### Step 12 - Cleanup resources +#### Step 12 - Clean up resources -You are all done! Remember to delete the DynamoDB Source Coordination Store and the S3 bucket, and to stop Data Prepper, MongoDB, and OpenSearch. +As you complete this process, make sure perform the following cleanup tasks: delete the DynamoDB source coordination store and S3 bucket, and stop the Data Prepper, MongoDB, and OpenSearch instances. ### Summary -After reading this and following along, you hopefully will have a better understanding of the internals of Data Prepper, and how to create a new plugin +We hope this walkthrough deepens. your knowledge of the Data Prepper architecture and process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions). The Data Prepper community and maintenance team are committed to supporting your efforts. that can scale via Source Coordination. If you have any ideas or proposals on new database plugins for Data Prepper, -questions about creating a new plugin for a database, or even just general questions about Data Prepper, -please don't hesitate to [Create a new discussion](https://github.com/opensearch-project/data-prepper/discussions), -and the community and Data Prepper maintainers would be happy to help! From 4eb28759c3a6bab861246f817699b4211ede96db Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:50:17 -0600 Subject: [PATCH 06/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 04c0b6fe4e..aacd4f96e7 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -11,7 +11,7 @@ twittercard: description: "Data Prepper offers a flexible framework for database migration, supporting sources like MongoDB and DynamoDB. You can extend this capability to new databases by implementing a Data Prepper source plugin." --- -Data Prepper, an open source data collector, enables you to collect, filter, enrich, and aggregate trace and log data. With Data Prepper, you can prepare your data for downstream analysis and visualization in OpenSearch. +Data Prepper, an open-source data collector, enables you to collect, filter, enrich, and aggregate trace and log data. With Data Prepper, you can prepare your data for downstream analysis and visualization in OpenSearch. Data Prepper pipelines consist of three main components: a source, an optional set of processors, and one or more sinks. For more information, see [ Data Prepper key concepts and fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals). The following sections outline the steps necessary for implementing a new database source integration within Data Prepper. From 3cb65f1ce4b68d0fd286b78af17358d2671a8140 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:50:37 -0600 Subject: [PATCH 07/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index aacd4f96e7..de91751645 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -21,7 +21,7 @@ Data Prepper source plugins fall into two categories: push-based and pull-based. _Pull-based sources_ such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute workload across Data Prepper containers. -Unlike push-based sources, pull-based sources in Data Prepper use [Source coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/) to achieve scalability and work distribution across multiple containers. Source coordination uses an external store functioning as a lease table, similar to the approach used by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html). +Unlike push-based sources, pull-based sources use [source coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/) to achieve scalability and work distribution across multiple containers. Source coordination uses an external store functioning as a lease table, similar to the approach used by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html). ### Defining work partitions for source coordination From efdefd2e5b0af5547984c0ee619fbf7af002c4d2 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:50:50 -0600 Subject: [PATCH 08/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index de91751645..abc9023031 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -13,7 +13,7 @@ twittercard: Data Prepper, an open-source data collector, enables you to collect, filter, enrich, and aggregate trace and log data. With Data Prepper, you can prepare your data for downstream analysis and visualization in OpenSearch. -Data Prepper pipelines consist of three main components: a source, an optional set of processors, and one or more sinks. For more information, see [ Data Prepper key concepts and fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals). The following sections outline the steps necessary for implementing a new database source integration within Data Prepper. +Data Prepper pipelines consist of three main components: a source, an optional set of processors, and one or more sinks. For more information, see [Data Prepper key concepts and fundamentals](https://opensearch.org/docs/latest/data-prepper/#key-concepts-and-fundamentals). The following sections outline the steps necessary for implementing a new database source integration within Data Prepper. ### Understanding push-based and pull-based sources From 918a59cfca11ca6838e57f7c3c90f0629ed9b238 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:51:25 -0600 Subject: [PATCH 09/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...reating-a-new-database-integration-with-data-prepper.markdown | 1 - 1 file changed, 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index abc9023031..101fb7b780 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -358,7 +358,6 @@ As you complete this process, make sure perform the following cleanup tasks: del ### Summary We hope this walkthrough deepens. your knowledge of the Data Prepper architecture and process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions). The Data Prepper community and maintenance team are committed to supporting your efforts. -that can scale via Source Coordination. If you have any ideas or proposals on new database plugins for Data Prepper, From c9321c47b74414f9ff87145785256c9776488bed Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:52:33 -0600 Subject: [PATCH 10/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 101fb7b780..e35b9f1ac0 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -357,7 +357,7 @@ As you complete this process, make sure perform the following cleanup tasks: del ### Summary -We hope this walkthrough deepens. your knowledge of the Data Prepper architecture and process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions). The Data Prepper community and maintenance team are committed to supporting your efforts. +We hope this guide deepens your knowledge of the Data Prepper architecture and the process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions). The Data Prepper community and maintenance team are committed to supporting your efforts. From 76716f5e2e854068a20c464fc2d2a10edd0d9d16 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:52:49 -0600 Subject: [PATCH 11/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index e35b9f1ac0..a00092172b 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -19,7 +19,7 @@ Data Prepper pipelines consist of three main components: a source, an optional s Data Prepper source plugins fall into two categories: push-based and pull-based. -_Pull-based sources_ such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute workload across Data Prepper containers. +_Pull-based sources_, such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely on load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute a workload across Data Prepper containers. Unlike push-based sources, pull-based sources use [source coordination](https://opensearch.org/docs/latest/data-prepper/managing-data-prepper/source-coordination/) to achieve scalability and work distribution across multiple containers. Source coordination uses an external store functioning as a lease table, similar to the approach used by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html). From 25ef1ea1c931f2b0494497a4a7ea322660b91228 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:53:05 -0600 Subject: [PATCH 12/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index a00092172b..2cc665291e 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -17,7 +17,7 @@ Data Prepper pipelines consist of three main components: a source, an optional s ### Understanding push-based and pull-based sources -Data Prepper source plugins fall into two categories: push-based and pull-based. +Data Prepper source plugins fall into two categories: push based and pull based. _Pull-based sources_, such as HTTP and OpenTelemetry (OTel), scale easily across Data Prepper containers. _Push-based sources_ rely on load balancing solutions, such as Kubernetes, NGINX, or Docker Swarm, to distribute a workload across Data Prepper containers. From 23239b5295f10bd5eb40253196f41c4c6b4bb5f4 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:53:17 -0600 Subject: [PATCH 13/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 2cc665291e..68b5f09077 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -351,7 +351,7 @@ The MongoDB source in Data Prepper will now extract these documents from the Mo As soon as Data Prepper generates another log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, return to **Dev Tools** and run another search on the index using `GET mongodb-index/_search`. -#### Step 12 - Clean up resources +#### Step 12: Clean up resources As you complete this process, make sure perform the following cleanup tasks: delete the DynamoDB source coordination store and S3 bucket, and stop the Data Prepper, MongoDB, and OpenSearch instances. From 24d7a11f232479c151bb66b14e6e775c2d4fe0f9 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:54:49 -0600 Subject: [PATCH 14/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 68b5f09077..d25ded2f6a 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -353,7 +353,7 @@ As soon as Data Prepper generates another log, for example, `org.opensearch.data #### Step 12: Clean up resources -As you complete this process, make sure perform the following cleanup tasks: delete the DynamoDB source coordination store and S3 bucket, and stop the Data Prepper, MongoDB, and OpenSearch instances. +As you complete this process, make sure that you delete the DynamoDB source coordination store and S3 bucket as well as stop the Data Prepper, MongoDB, and OpenSearch instances. ### Summary From 1e21412bddfc0855c5cb3f576be464e52d5d2a96 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 09:58:01 -0600 Subject: [PATCH 15/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index d25ded2f6a..2e2614ba6a 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -26,7 +26,7 @@ Unlike push-based sources, pull-based sources use [source coordination](https:// ### Defining work partitions for source coordination -Data Prepper uses source coordination to distribute work partitions" across Data Prepper containers. +Data Prepper uses source coordination to distribute work partitions across Data Prepper containers. For new Data Prepper sources using source coordination, identifying and delineating work partitions is a fundamental first step. From 9a6b67b50bc80d6af40d14b66cf7d2fc4936a659 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:00:11 -0600 Subject: [PATCH 16/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 2e2614ba6a..906c15a081 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -30,7 +30,7 @@ Data Prepper uses source coordination to distribute work partitions across Data For new Data Prepper sources using source coordination, identifying and delineating work partitions is a fundamental first step. -Data Prepper defines work partitions differently for various sources. In the S3 source, each S3 object represents a partition. For OpenSearch, an index serves as a partition. DynamoDB sources have dual partition types: S3 data files for exports and shards for stream processing. +Data Prepper defines work partitions differently for various sources. In the `s3` source, each Amazon Simple Storage Service (Amazon S3) object represents a partition. In OpenSearch, an index serves as a partition. Amazon DynamoDB sources have dual partition types: S3 data files for exports and shards for stream processing. ### Creating a source coordination-enabled Data Prepper plugin From f81c5eacaedfdc71ecadefc91679b778fe124103 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:00:25 -0600 Subject: [PATCH 17/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 906c15a081..9917c491ff 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -33,7 +33,7 @@ For new Data Prepper sources using source coordination, identifying and delineat Data Prepper defines work partitions differently for various sources. In the `s3` source, each Amazon Simple Storage Service (Amazon S3) object represents a partition. In OpenSearch, an index serves as a partition. Amazon DynamoDB sources have dual partition types: S3 data files for exports and shards for stream processing. -### Creating a source coordination-enabled Data Prepper plugin +### Creating a source-coordination-enabled Data Prepper plugin A source coordination plugin consists of to two classes: the main plugin class and a configuration class. The configuration class specifies all required users inputs, from the data endpoints to authorization details and performance tuning parameters. All user-required inputs for plugin operation should be specified within this configuration class. From 7d6e998ac77610789fa521a8566531efbbff9a51 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:11:59 -0600 Subject: [PATCH 18/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 9917c491ff..5b201b629d 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -35,7 +35,7 @@ Data Prepper defines work partitions differently for various sources. In the `s3 ### Creating a source-coordination-enabled Data Prepper plugin -A source coordination plugin consists of to two classes: the main plugin class and a configuration class. The configuration class specifies all required users inputs, from the data endpoints to authorization details and performance tuning parameters. All user-required inputs for plugin operation should be specified within this configuration class. +A source coordination plugin consists of to two classes: the main plugin class and a configuration class. The configuration class specifies all required user inputs, including data endpoints, authorization details, and performance tuning parameters. All user-required inputs for plugin operations should be specified within this configuration class. For a practical starting point, refer to the [sample source code](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java) in the Data Prepper repository. From 2a202d68f2342ef8464b525688f44cc41e2b0937 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:12:20 -0600 Subject: [PATCH 19/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 5b201b629d..0b43354071 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -41,7 +41,7 @@ For a practical starting point, refer to the [sample source code](https://github This example demonstrates a basic configuration for a [hypothetical database source](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSourceConfig.java), requiring only `database_name`, `username`, and `password`. The plugin name and configuration class are defined in the `@DataPrepperPlugin` annotation. -The `pipeline.yaml` for running this source in Data Prepper would be structured as follows: +The `pipeline.yaml` file for running this source in Data Prepper would be structured as follows: ```yaml version: 2 From 345d0804f07bcc5a6d4b90a65424053ff69f409c Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:13:04 -0600 Subject: [PATCH 20/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 0b43354071..5949de2bb8 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -59,7 +59,7 @@ sample-source-pipeline: The [source coordination interface](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/coordinator/enhanced/EnhancedSourceCoordinator.java) defines the methods available for interacting with the [source coordination store](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/source/SourceCoordinationStore.java). -These methods provide for managing partition CRUD operations and getting the next available partition using `acquireAvailablePartition(String partitionType)`. A common source coordination pattern assigns a "leader" Data Prepper container for partition discovery and creation. This is done by initializing a "leader partition" at startup and using `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` to assign partition management responsibilities. +These methods support managing partition CRUD operations and getting the next available partition using `acquireAvailablePartition(String partitionType)`. A common source coordination pattern assigns a "leader" Data Prepper container for partition discovery and creation. This is done by initializing a "leader partition" at startup and using `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` to assign partition management responsibilities. The following code snippet shows a basic source coordination workflow, using a hypothetical database where each partition represent an individual database file. See the [full code](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40) on GitHub. From f7d7111cebdcff8a7a2ebceedb3eb07a7a0627f8 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:13:19 -0600 Subject: [PATCH 21/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 5949de2bb8..a8d3fa54fc 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -61,7 +61,7 @@ The [source coordination interface](https://github.com/opensearch-project/data-p These methods support managing partition CRUD operations and getting the next available partition using `acquireAvailablePartition(String partitionType)`. A common source coordination pattern assigns a "leader" Data Prepper container for partition discovery and creation. This is done by initializing a "leader partition" at startup and using `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)` to assign partition management responsibilities. -The following code snippet shows a basic source coordination workflow, using a hypothetical database where each partition represent an individual database file. See the [full code](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40) on GitHub. +The following code snippet shows a basic source coordination workflow, using a hypothetical database in which each partition represents an individual database file. See the [full code](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40) on GitHub. The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: From 7c69248cd2f1de02d2964264e54315008de34370 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:15:09 -0600 Subject: [PATCH 22/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index a8d3fa54fc..02801e7566 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -65,7 +65,7 @@ The following code snippet shows a basic source coordination workflow, using a h The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: -1. Upon Data Prepper's startup, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. +1. Upon starting Data Prepper, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. 2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. 3. A database file partition is acquired for processing. In cases where no partitions need processing, an empty `Optional` is returned. 4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring it is not not processed again. From 7b69ff534526d50958c91abc8056b5981ee057d5 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:15:30 -0600 Subject: [PATCH 23/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 02801e7566..3f7c18637c 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -68,7 +68,7 @@ The key components and workflow in the code for implementing source coordination 1. Upon starting Data Prepper, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. 2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. 3. A database file partition is acquired for processing. In cases where no partitions need processing, an empty `Optional` is returned. -4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring it is not not processed again. +4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring that it is not processed again. ```java public void run() { From d5b7c28398168ab5d5d428dcc5e95702a9735ff5 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:16:30 -0600 Subject: [PATCH 24/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 3f7c18637c..eeb260c049 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -115,7 +115,7 @@ public void run() { ### Running and testing Data Prepper using source coordination -Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you in configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. +Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you through configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. #### Step 1 - Set up Data Prepper for local development From 120c629d6b55ccd34d1d22448f30296d0f4a88e4 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:16:47 -0600 Subject: [PATCH 25/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index eeb260c049..0afde026df 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -117,7 +117,7 @@ public void run() { Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you through configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. -#### Step 1 - Set up Data Prepper for local development +#### Step 1: Set up Data Prepper for local development The [Data Prepper developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview for running Data Prepper in various environments. Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: From f387f59904137194a89a115491fc499de68c9990 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:17:06 -0600 Subject: [PATCH 26/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 0afde026df..d1d710749d 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -119,7 +119,7 @@ Before creating a new plugin, you must set up and run Data Prepper locally. The #### Step 1: Set up Data Prepper for local development -The [Data Prepper developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview for running Data Prepper in various environments. +The [OpenSearch Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview of running Data Prepper in various environments. Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: From 48fc976c19a644adabc5e4a024f6ee863b11a983 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:17:31 -0600 Subject: [PATCH 27/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index d1d710749d..75d79a8193 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -120,7 +120,7 @@ Before creating a new plugin, you must set up and run Data Prepper locally. The #### Step 1: Set up Data Prepper for local development The [OpenSearch Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview of running Data Prepper in various environments. -Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: +Creating a new source plugin requires cloning the Data Prepper repository and building it from the source using the following commands: - Clone the Data Prepper repository From 708dda471d7ce87b8fb6b3a39ff39a13210dff46 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:18:05 -0600 Subject: [PATCH 28/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 75d79a8193..cbbc9092a1 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -123,7 +123,7 @@ The [OpenSearch Data Prepper Developer Guide](https://github.com/opensearch-proj Creating a new source plugin requires cloning the Data Prepper repository and building it from the source using the following commands: -- Clone the Data Prepper repository +- Clone the Data Prepper repository: ``` git clone https://github.com/opensearch-project/data-prepper.git ``` From 3d6f56364366fa2b00e394d816d1769a606f2a0a Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:19:19 -0600 Subject: [PATCH 29/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index cbbc9092a1..4969aef419 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -134,7 +134,7 @@ git clone https://github.com/opensearch-project/data-prepper.git ./gradlew assemble ``` -#### Step 2 - Set up MongoDB locally +#### Step 2: Set up MongoDB locally First, install and configure MongoDB using the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/). Before running MongoDB, enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following the instructions in [Convert a Standalone Self-Managed mongod to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). From 74457cfc7acc9f45d4394a66c31a90fe4cbd39c2 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:19:37 -0600 Subject: [PATCH 30/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 4969aef419..10dcbde1db 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -138,7 +138,7 @@ git clone https://github.com/opensearch-project/data-prepper.git First, install and configure MongoDB using the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/). Before running MongoDB, enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following the instructions in [Convert a Standalone Self-Managed mongod to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). -Next, launch the MongoDB shell by running `mongosh`, and then create a new user and password within the shell using the following syntax. The username and password are required later in the Data Prepper `pipeline.yaml`. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information about MongoDB user creation. +Next, launch the MongoDB shell by running `mongosh`, and then create a new user and password within the shell using the following syntax. The username and password are required later in the Data Prepper `pipeline.yaml` file. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information about MongoDB user creation. ``` use admin From 349a8c2c3c192d857c401b0815f42786bfcf6f97 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:22:13 -0600 Subject: [PATCH 31/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 10dcbde1db..de04d5e22d 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -165,7 +165,7 @@ db.demo_collection.insertOne({"key-two": "value-two"}) db.demo_collection.insertOne({"key-three": "value-three"}) ``` -#### Step 3 - Set up OpenSearch locally +#### Step 3: Set up OpenSearch locally To run OpenSearch locally, follow the steps in the [Installation quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/). From 0eae00d8bb2c8270e0edc04c8259458574da3264 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:22:40 -0600 Subject: [PATCH 32/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index de04d5e22d..617a8fd030 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -169,7 +169,7 @@ db.demo_collection.insertOne({"key-three": "value-three"}) To run OpenSearch locally, follow the steps in the [Installation quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/). -#### Step 4 - Create an Amazon S3 bucket +#### Step 4: Create an Amazon S3 bucket Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 buckets enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, as only one node can read from MongoDB streams at a time. #### Step 5 - Get AWS credentials for DynamoDB and S3 access From 81fe3e3c633f58f85705d3d38c2099b3684d60b2 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:23:14 -0600 Subject: [PATCH 33/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 617a8fd030..c4e853d794 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -170,7 +170,7 @@ db.demo_collection.insertOne({"key-three": "value-three"}) To run OpenSearch locally, follow the steps in the [Installation quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/). #### Step 4: Create an Amazon S3 bucket -Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 buckets enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, as only one node can read from MongoDB streams at a time. +Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 bucket enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, given that only one node can read from MongoDB streams at a time. #### Step 5 - Get AWS credentials for DynamoDB and S3 access From ccdf47fa46e3301c83dadfe4aabc5022987435bf Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:23:53 -0600 Subject: [PATCH 34/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index c4e853d794..5730850d07 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -172,7 +172,7 @@ To run OpenSearch locally, follow the steps in the [Installation quickstart](htt #### Step 4: Create an Amazon S3 bucket Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 bucket enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, given that only one node can read from MongoDB streams at a time. -#### Step 5 - Get AWS credentials for DynamoDB and S3 access +#### Step 5: Get AWS credentials for DynamoDB and S3 access Set up an AWS role with the following policy permissions to enable Data Prepper to interact with the DynamoDB source coordination store and the S3 bucket from step 4. Make sure to replace `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID` with your unique values. From 93ce7d55b705045e6b5fa3fa7f949d53ca6df7f4 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:24:24 -0600 Subject: [PATCH 35/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 5730850d07..8074e1790b 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -174,7 +174,7 @@ Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS #### Step 5: Get AWS credentials for DynamoDB and S3 access -Set up an AWS role with the following policy permissions to enable Data Prepper to interact with the DynamoDB source coordination store and the S3 bucket from step 4. Make sure to replace `MONGODB_BUCKET`, `REGION` and `AWS_ACCOUNT_ID` with your unique values. +Set up an AWS role with the following policy permissions to enable Data Prepper to interact with the DynamoDB source coordination store and the S3 bucket from step 4. Make sure to replace `MONGODB_BUCKET`, `REGION`, and `AWS_ACCOUNT_ID` with your unique values. ```json { From 30fa3b346730b6e50c652eb7cf85ce30e22d1ed8 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:25:02 -0600 Subject: [PATCH 36/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 8074e1790b..e5f9e36c3e 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -230,7 +230,7 @@ Set up an AWS role with the following policy permissions to enable Data Prepper } ``` -Run the following command, then enter the `Access Key Id` and `Secret Access Key` associated with credentials that correspond to the previously defined role: +Run the following command, and then enter the `Access Key Id` and `Secret Access Key` associated with the credentials that correspond to the previously defined role: ``` aws configure From 38d4d8b79f35a6ed6bf51fbc5df84c5a1ed0f7cf Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:25:29 -0600 Subject: [PATCH 37/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index e5f9e36c3e..350b32b1dd 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -244,7 +244,7 @@ export AWS_REGION="{{REGION}}" export SOURCE_COORDINATION_PIPELINE_IDENTIFIER="test-mongodb" ``` -The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` must correspond to the `partition_prefix` that you will define in the `data-prepper-config.yaml` in step 6. +The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` must correspond to the `partition_prefix` that you will define in the `data-prepper-config.yaml` file in step 6. #### Step 6 - Create the data-prepper-config.yaml From 33d183b1f9c102f6a5d2af189302328f712b5df1 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:27:04 -0600 Subject: [PATCH 38/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 350b32b1dd..7c8845fa6d 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -264,7 +264,7 @@ source_coordination: skip_table_creation: false ``` -The `skip_table_creation` parameter is set to `false`, instructing Data Prepper create the table on startup if it is missing. For subsequent runs, you can set this flag to `true` to accelerate startup speed. +The `skip_table_creation` parameter is set to `false`, instructing Data Prepper to create the table on startup if it is missing. For subsequent runs, you can set this flag to `true` to accelerate startup speed. The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures Data Prepper ignores DynamoDB items from the previous test runs. From b435caf4bd1090f29888fdfb32289e6b014411f2 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:27:30 -0600 Subject: [PATCH 39/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 7c8845fa6d..070f6ded9d 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -246,7 +246,7 @@ export SOURCE_COORDINATION_PIPELINE_IDENTIFIER="test-mongodb" The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` must correspond to the `partition_prefix` that you will define in the `data-prepper-config.yaml` file in step 6. -#### Step 6 - Create the data-prepper-config.yaml +#### Step 6: Create the data-prepper-config.yaml file Configure the source coordination store for Data Prepper using the `data-prepper-config.yaml` file. Currently, this store exclusively supports DynamoDB. From 6ce60671c1d5aa7b81881113652f497bdc5af1af Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:27:59 -0600 Subject: [PATCH 40/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 070f6ded9d..a53ad03aca 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -266,7 +266,7 @@ source_coordination: The `skip_table_creation` parameter is set to `false`, instructing Data Prepper to create the table on startup if it is missing. For subsequent runs, you can set this flag to `true` to accelerate startup speed. -The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures Data Prepper ignores DynamoDB items from the previous test runs. +The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures that Data Prepper ignores DynamoDB items from the previous test runs. #### Step 7 - Create the `pipeline.yaml` file From 6ffe942114822353e90bda4056289fcc7a7774aa Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:28:47 -0600 Subject: [PATCH 41/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index a53ad03aca..71f973e2d4 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -268,7 +268,7 @@ The `skip_table_creation` parameter is set to `false`, instructing Data Prepper The `partition_prefix` enables soft resets of the pipeline in the source coordination store. When testing a new source plugin, incrementing this prefix (for example, `test-mongodb-1`, `test-mongodb-2`) ensures that Data Prepper ignores DynamoDB items from the previous test runs. -#### Step 7 - Create the `pipeline.yaml` file +#### Step 7: Create the `pipeline.yaml` file In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file with the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: From 99175d0dece0de9045516530f675d49b19e2ee41 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:30:10 -0600 Subject: [PATCH 42/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 71f973e2d4..4a92e9b8f3 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -270,7 +270,7 @@ The `partition_prefix` enables soft resets of the pipeline in the source coordin #### Step 7: Create the `pipeline.yaml` file -In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file with the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: +In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file with the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, `ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: ```yaml pipeline: From c0653f191f13b3a8cf03ad3c982c317f6da7c1e2 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:30:46 -0600 Subject: [PATCH 43/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 4a92e9b8f3..64a9702e4a 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -314,7 +314,7 @@ pipeline: flush_timeout: -1 ``` -#### Step 8 - Run the pipeline +#### Step 8: Run the pipeline With AWS credentials configured and both MongoDB and OpenSearch running on your local machine, you can launch the pipeline. From bf7deea5a6bbff5d4bd45415818a12b06e7945eb Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:31:18 -0600 Subject: [PATCH 44/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 64a9702e4a..5890b6ef2f 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -330,7 +330,7 @@ Then, start Data Prepper using the following command: bin/data-prepper ``` -#### Step 9 - Review the documents in OpenSearch +#### Step 9: Review the documents in OpenSearch Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. From abfed1e7497ae309a3aecfd76437f76c06cd90c4 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:35:03 -0600 Subject: [PATCH 45/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 5890b6ef2f..d998e34fcd 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -336,7 +336,7 @@ Wait for the export to complete, which may take a minute. Once Data Prepper disp Go to the **Dev Tools** application and enter `GET mongodb-index/_search` into the console editor to retrieve the MongoDB documents you created in step 2. -#### Step 10 - Add sample documents to MongoDB +#### Step 10: Add sample documents to MongoDB Add sample documents to MongoDB using the following command: ``` From 19eae96f8086305588390cd1e7281dfafa802c0f Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:35:35 -0600 Subject: [PATCH 46/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index d998e34fcd..fe6f914897 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -332,7 +332,7 @@ bin/data-prepper #### Step 9: Review the documents in OpenSearch -Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. +Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. Go to the **Dev Tools** application and enter `GET mongodb-index/_search` into the console editor to retrieve the MongoDB documents you created in step 2. From 0f84c07e43714ccebdd5cb7908a80fb1296d6531 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:36:13 -0600 Subject: [PATCH 47/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index fe6f914897..8b87ef309b 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -347,7 +347,7 @@ db.demo_collection.insertOne({"key-six": "value-six"}) The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. -#### Step 11 - Review the documents in OpenSearch +#### Step 11: Review the documents in OpenSearch As soon as Data Prepper generates another log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, return to **Dev Tools** and run another search on the index using `GET mongodb-index/_search`. From d3982038d10709bdda42c591491b3d869804c920 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 10:36:50 -0600 Subject: [PATCH 48/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 8b87ef309b..570df82090 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -334,7 +334,7 @@ bin/data-prepper Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. -Go to the **Dev Tools** application and enter `GET mongodb-index/_search` into the console editor to retrieve the MongoDB documents you created in step 2. +Go to the **Dev Tools** application and enter `GET mongodb-index/_search` in the console editor to retrieve the MongoDB documents you created in step 2. #### Step 10: Add sample documents to MongoDB Add sample documents to MongoDB using the following command: From 6de1340d3faa88029760c56fc8ccd43adf51635c Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Mon, 4 Nov 2024 15:34:45 -0600 Subject: [PATCH 49/66] Move code snippet to be above the overwiew of the code snippet Signed-off-by: Taylor Gray --- ...database-integration-with-data-prepper.markdown | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 570df82090..878d6896e6 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -63,13 +63,6 @@ These methods support managing partition CRUD operations and getting the next av The following code snippet shows a basic source coordination workflow, using a hypothetical database in which each partition represents an individual database file. See the [full code](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/DatabaseWorker.java#L40) on GitHub. -The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: - -1. Upon starting Data Prepper, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. -2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. -3. A database file partition is acquired for processing. In cases where no partitions need processing, an empty `Optional` is returned. -4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring that it is not processed again. - ```java public void run() { @@ -113,6 +106,13 @@ public void run() { } ``` +The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: + +1. Upon starting Data Prepper, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. +2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. +3. A database file partition is acquired for processing. In cases where no partitions need processing, an empty `Optional` is returned. +4. The database file undergoes processing, with its records written into the Data Prepper buffer as individual `Events`. Once all records have been written to the buffer, the source coordination store marks the database file partition as `COMPLETED`, ensuring that it is not processed again. + ### Running and testing Data Prepper using source coordination Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you through configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. From a968bc62645930cc1a90b61652253ec842a230fc Mon Sep 17 00:00:00 2001 From: Nathan Bower Date: Mon, 4 Nov 2024 16:58:27 -0500 Subject: [PATCH 50/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Signed-off-by: Nathan Bower --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 878d6896e6..0a5012f4ae 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -345,7 +345,7 @@ db.demo_collection.insertOne({"key-five": "value-five"}) db.demo_collection.insertOne({"key-six": "value-six"}) ``` -The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. +The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. #### Step 11: Review the documents in OpenSearch From df1858f78c0970819ff1432eefe47dad4b215673 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:07:06 -0600 Subject: [PATCH 51/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 0a5012f4ae..56545bbe14 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -357,7 +357,7 @@ As you complete this process, make sure that you delete the DynamoDB source coor ### Summary -We hope this guide deepens your knowledge of the Data Prepper architecture and the process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions). The Data Prepper community and maintenance team are committed to supporting your efforts. +We hope this guide deepens your knowledge of the Data Prepper architecture and the process of creating scalable plugins with source coordination. For any suggestions regarding new database plugins, assistance with plugin creation, or general Data Prepper questions, [create a new discussion](https://github.com/opensearch-project/data-prepper/discussions) in the Data Prepper repository. The Data Prepper community and maintenance team are committed to supporting your efforts. From 4fc360de8ffd0962e15f59ddc6da6983a24aaf5d Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:07:23 -0600 Subject: [PATCH 52/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 56545bbe14..456210ddba 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -347,7 +347,7 @@ db.demo_collection.insertOne({"key-six": "value-six"}) The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. -#### Step 11: Review the documents in OpenSearch +#### Step 11: Observe the stream documents in OpenSearch As soon as Data Prepper generates another log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, return to **Dev Tools** and run another search on the index using `GET mongodb-index/_search`. From e029d6aaedd0b170fa0ec0540e740d02a2c9f0bc Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:07:35 -0600 Subject: [PATCH 53/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 456210ddba..97a6164683 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -115,7 +115,7 @@ The key components and workflow in the code for implementing source coordination ### Running and testing Data Prepper using source coordination -Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you through configuring Data Prepper for streaming documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. +Before creating a new plugin, you must set up and run Data Prepper locally. The following steps guide you through configuring Data Prepper to stream documents from MongoDB to OpenSearch using source coordination. While this example uses a single Data Prepper instance, the source coordination allows for scalability when running multiple instances with identical pipeline configurations and shared source coordination store settings defined in `data-prepper-config.yaml`. #### Step 1: Set up Data Prepper for local development From f3ed634b724fcb68976148c59fcab991d118433d Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:07:45 -0600 Subject: [PATCH 54/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 97a6164683..f5cf0594e9 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -8,7 +8,7 @@ date: 2022-09-21 10:00:00 -0500 categories: - technical-post twittercard: - description: "Data Prepper offers a flexible framework for database migration, supporting sources like MongoDB and DynamoDB. You can extend this capability to new databases by implementing a Data Prepper source plugin." + description: "Data Prepper offers a flexible framework for database migration, supporting sources like MongoDB and Amazon DynamoDB. You can extend this capability to new databases by implementing a Data Prepper source plugin." --- Data Prepper, an open-source data collector, enables you to collect, filter, enrich, and aggregate trace and log data. With Data Prepper, you can prepare your data for downstream analysis and visualization in OpenSearch. From ee68a61b023535f9e56503cf68ee33ee15444d9b Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:08:00 -0600 Subject: [PATCH 55/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index f5cf0594e9..79e1935c87 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -106,7 +106,7 @@ public void run() { } ``` -The key components and workflow in the code for implementing source coordination in Data Prepper are as follows: +The key components and code workflow for implementing source coordination in Data Prepper are as follows: 1. Upon starting Data Prepper, a leader partition is established. See the [code reference](https://github.com/graytaylor0/data-prepper/blob/6e38dead8e9beca089381519654f329b82524b9d/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java#L41)). The single leader partition is assigned to the Data Prepper node that successfully calls `acquireAvailablePartition(LeaderPartition.PARTITION_TYPE)`, assigning it the task of identifying new database files. 2. When a Data Prepper node owns the leader partition, it queries the hypothetical database and creates new partitions in the source coordination store, enabling all nodes running this source to access these database file partitions. From 3498d1bd1016c2d65b70c3168f35b4d9e41cd736 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:08:25 -0600 Subject: [PATCH 56/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 79e1935c87..b9cb95ec8e 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -120,7 +120,7 @@ Before creating a new plugin, you must set up and run Data Prepper locally. The #### Step 1: Set up Data Prepper for local development The [OpenSearch Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview of running Data Prepper in various environments. -Creating a new source plugin requires cloning the Data Prepper repository and building it from the source using the following commands: +Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: - Clone the Data Prepper repository: From eab1f31d4054b4811de867232501314a4ecf492a Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:08:33 -0600 Subject: [PATCH 57/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index b9cb95ec8e..e26965b8df 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -330,7 +330,7 @@ Then, start Data Prepper using the following command: bin/data-prepper ``` -#### Step 9: Review the documents in OpenSearch +#### Step 9: Observe the export documents in OpenSearch Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. From 7f2ff6434a9c25eb6482002e4cc5243af5918513 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:08:46 -0600 Subject: [PATCH 58/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index e26965b8df..bd6fe24b39 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -35,7 +35,7 @@ Data Prepper defines work partitions differently for various sources. In the `s3 ### Creating a source-coordination-enabled Data Prepper plugin -A source coordination plugin consists of to two classes: the main plugin class and a configuration class. The configuration class specifies all required user inputs, including data endpoints, authorization details, and performance tuning parameters. All user-required inputs for plugin operations should be specified within this configuration class. +A source coordination plugin consists of two classes: the main plugin class and a configuration class. The configuration class specifies all required user inputs, including data endpoints, authorization details, and performance tuning parameters. All user-required inputs for plugin operations should be specified within this configuration class. For a practical starting point, refer to the [sample source code](https://github.com/graytaylor0/data-prepper/blob/SourceCoordinationSampleSource/data-prepper-plugins/sample-source-coordination-source/src/main/java/SampleSource.java) in the Data Prepper repository. From ceed3954e99442bde2d882e18ff46dff561031c6 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:09:52 -0600 Subject: [PATCH 59/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index bd6fe24b39..26d49da50b 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -128,7 +128,7 @@ Creating a new source plugin requires cloning the Data Prepper repository and bu git clone https://github.com/opensearch-project/data-prepper.git ``` -- Build Data Prepper from source +- Build Data Prepper from source: ``` ./gradlew assemble From 4f1d91d6f17eb70ba72fdb2da13ccdd4815ea8cf Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:10:20 -0600 Subject: [PATCH 60/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 26d49da50b..499700ef9f 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -138,7 +138,7 @@ git clone https://github.com/opensearch-project/data-prepper.git First, install and configure MongoDB using the [MongoDB installation guide](https://www.mongodb.com/docs/manual/installation/). Before running MongoDB, enable [MongoDB change streams](https://www.mongodb.com/docs/manual/changeStreams/) by following the instructions in [Convert a Standalone Self-Managed mongod to a Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/). -Next, launch the MongoDB shell by running `mongosh`, and then create a new user and password within the shell using the following syntax. The username and password are required later in the Data Prepper `pipeline.yaml` file. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information about MongoDB user creation. +Next, launch the MongoDB shell by running `mongosh`, and then create a new user and password within the shell using the following syntax. The username and password will later be required by the Data Prepper `pipeline.yaml` file. See [Create User Documentation](https://www.mongodb.com/docs/manual/reference/method/db.createUser/) for more information about MongoDB user creation. ``` use admin From 34a09d568eb314f7c518d55c45ac256096277e74 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:47:24 -0600 Subject: [PATCH 61/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 499700ef9f..9d0be034e5 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -250,7 +250,7 @@ The `SOURCE_COORDINATION_PIPELINE_IDENTIFIER` must correspond to the `partition_ Configure the source coordination store for Data Prepper using the `data-prepper-config.yaml` file. Currently, this store exclusively supports DynamoDB. -In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/config/` directory, create a file named `data-prepper-config.yaml`. Insert the following content, replacing `REGION` with your desired DynamoDB table region and `ROLE_ARN_FROM_STEP_5` with the appropriate role ARN: +In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/config/` directory, create a file named `data-prepper-config.yaml`. Insert the following content, replacing `REGION` with your desired DynamoDB table region and `ROLE_ARN_FROM_STEP_5` with the appropriate role Amazon Resource Name (ARN): ```yaml ssl: false From e73b08e1d9605bfb4ab8e464b061b4ee31121987 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:47:37 -0600 Subject: [PATCH 62/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 9d0be034e5..040ac031a9 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -270,7 +270,7 @@ The `partition_prefix` enables soft resets of the pipeline in the source coordin #### Step 7: Create the `pipeline.yaml` file -In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file with the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, `ROLE_ARN_FROM_STEP_5`, and your OpenSearch password: +In the `data-prepper/release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64/pipelines/` directory, create a `pipeline.yaml` file containing the following content. Make sure to update `S3_BUCKET_NAME`, `S3_BUCKET_REGION, `ROLE_ARN_FROM_STEP_5`, and your OpenSearch password. ```yaml pipeline: From e861af1bfdc04ec5430dab3ec714d94029230aa0 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 10:47:48 -0600 Subject: [PATCH 63/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Co-authored-by: Nathan Bower Signed-off-by: Taylor Gray --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 040ac031a9..670c9d3f41 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -332,7 +332,7 @@ bin/data-prepper #### Step 9: Observe the export documents in OpenSearch -Wait for the export to complete, which may take a minute. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. +Wait for the export to complete, which may take a minute or so. Once Data Prepper displays a log, for example, `org.opensearch.dataprepper.plugins.source.s3.ScanObjectWorker - Received all acknowledgments for folder partition`, open `http://localhost:5601` to access OpenSearch Dashboards. Go to the **Dev Tools** application and enter `GET mongodb-index/_search` in the console editor to retrieve the MongoDB documents you created in step 2. From 267ff08eae6c087f921484c8c57ad37dfb16fa6d Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 11:18:53 -0600 Subject: [PATCH 64/66] Address PR comments Signed-off-by: Taylor Gray --- ...ating-a-new-database-integration-with-data-prepper.markdown | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 670c9d3f41..a405153e45 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -120,6 +120,7 @@ Before creating a new plugin, you must set up and run Data Prepper locally. The #### Step 1: Set up Data Prepper for local development The [OpenSearch Data Prepper Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) provides a complete overview of running Data Prepper in various environments. + Creating a new source plugin requires cloning the Data Prepper repository and building it from source using the following commands: @@ -345,7 +346,7 @@ db.demo_collection.insertOne({"key-five": "value-five"}) db.demo_collection.insertOne({"key-six": "value-six"}) ``` -The MongoDB source in Data Prepper will now extract these documents from the MongoDB streams. +The MongoDB source in Data Prepper will now extract these documents from the MongoDB change streams. #### Step 11: Observe the stream documents in OpenSearch From 49350104ee24c4cbfa677ded486a161e5579e0b4 Mon Sep 17 00:00:00 2001 From: Nathan Bower Date: Tue, 5 Nov 2024 12:28:47 -0500 Subject: [PATCH 65/66] Update _posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown Signed-off-by: Nathan Bower --- ...eating-a-new-database-integration-with-data-prepper.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index a405153e45..05cd4496e4 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -171,7 +171,7 @@ db.demo_collection.insertOne({"key-three": "value-three"}) To run OpenSearch locally, follow the steps in the [Installation quickstart](https://opensearch.org/docs/latest/getting-started/quickstart/). #### Step 4: Create an Amazon S3 bucket -Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 bucket enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, given that only one node can read from MongoDB streams at a time. +Follow the steps in [Create a new S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can skip this step if you have an existing bucket. This S3 bucket enables parallel processing and writing to OpenSearch across multiple Data Prepper containers in a multi-node setup, given that only one node can read from MongoDB change streams at a time. #### Step 5: Get AWS credentials for DynamoDB and S3 access From f1b91344826ba3a567a19a474a3309a27a4b5919 Mon Sep 17 00:00:00 2001 From: Taylor Gray Date: Tue, 5 Nov 2024 16:11:36 -0600 Subject: [PATCH 66/66] Update the date, meta_keywords, and meta_description Signed-off-by: Taylor Gray --- ...ting-a-new-database-integration-with-data-prepper.markdown | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown index 05cd4496e4..4f96b35dfa 100644 --- a/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown +++ b/_posts/2024-09-24-creating-a-new-database-integration-with-data-prepper.markdown @@ -4,9 +4,11 @@ title: "Step-by-step: Creating a new database integration using Data Prepper" authors: - tylgry - dinujoh -date: 2022-09-21 10:00:00 -0500 +date: 2022-11-05 10:00:00 -0500 categories: - technical-post +meta_keywords: Data Prepper, database integration, OpenSearch data collector, data pipeline, mongodb +meta_description: Use this step-by-step guide to set up and test a Data Prepper pipeline. Create a new database integration that streams data from MongoDB to OpenSearch using source coordination. twittercard: description: "Data Prepper offers a flexible framework for database migration, supporting sources like MongoDB and Amazon DynamoDB. You can extend this capability to new databases by implementing a Data Prepper source plugin." ---