Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact Publishing, Default Property Value Change, Unit Tests & Documentation Updates #64

Merged
merged 26 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7dde93f
test artifact publishing
Michael7371 Jun 10, 2024
35b7f18
fix artifact endpoint
Michael7371 Jun 10, 2024
5460c51
update trigger to only run on release
Michael7371 Jun 10, 2024
3462850
testing -DremoveSnapshot task for artifact publishing github action
Michael7371 Jun 12, 2024
ebcc32d
updating artifact-publish github action to be triggered on release cr…
Michael7371 Jun 12, 2024
465545c
testing snapshot publishing
Michael7371 Jun 12, 2024
18ab77c
Removing GitHub token secret reference in workflow
Michael7371 Jun 12, 2024
50c31e3
Removing dev trigger as tags are immutable in GitHub's Maven Registry.
Michael7371 Jun 12, 2024
cfcef3c
updating workflow to only trigger on release tags
Michael7371 Jun 13, 2024
f0da6da
Changed default value for enable.auto.commit property to 'true'
dmccoystephenson Jun 19, 2024
da54ae1
Add unit tests
mwodahl Jul 2, 2024
d2bf634
comment update
mwodahl Jul 3, 2024
edc418e
Merge pull request #25 from CDOT-CV/config/default-auto-commit-to-true
payneBrandon Jul 3, 2024
089318f
Merge pull request #23 from CDOT-CV/cicd-artifact-publishing
payneBrandon Jul 3, 2024
2e06875
Merge branch 'dev' into Feature/Add-Unit-Tests
mwodahl Jul 9, 2024
13e8d74
Update mockito verifications to work with Java Test Runner
mwodahl Jul 16, 2024
1c24623
Add argLine, jacoco.version to pom.xml
mwodahl Jul 16, 2024
44979a5
revert java.version to maven.compiler.source and maven.compiler.target
mwodahl Jul 16, 2024
05cc92d
Merge pull request #26 from CDOT-CV/Feature/Add-Unit-Tests
drewjj Jul 16, 2024
df4901a
Revised README
dmccoystephenson Jul 30, 2024
7c9d4a5
Merge pull request #27 from CDOT-CV/docs/review-docs-for-accuracy
payneBrandon Aug 14, 2024
afbd97a
Changed version to 1.6.0-SNAPSHOT
dmccoystephenson Sep 6, 2024
3353541
Added release notes for version 1.6.0
dmccoystephenson Sep 6, 2024
737e2fa
Merge pull request #28 from CDOT-CV/q3-release/version-change-and-rel…
drewjj Sep 6, 2024
7abc88f
Removed unnecessary declaration from `getEnvironmentVariable()` metho…
dmccoystephenson Sep 19, 2024
ccc8ed4
Merge pull request #29 from CDOT-CV/pr/address-usdot-comments-9-19-2024
dmccoystephenson Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/artifact-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Publish Java Package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the approx. artifact size and where we are using this ? As our ODE GitHub org is on free plan we have artifact storage limit of 500mb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project's JAR file is about 14 KB in size. It is not currently being used, but it is for potential future use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment from jpo-security-svcs seems like for Maven packages and other packages like npm, docker, etc storage limit will be not applied for public repo's. so I think we should be good to use artifacts. However for any private repo's this limits will apply. We have a backlog item for supporting GitHub artifacts setup in future release. Thanks!


on:
push:
tags:
- 'jpo-s3-deposit-*'

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up JDK 21
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'adopt'

- name: Remove snapshot from version
run: mvn versions:set -DremoveSnapshot

- name: Build with Maven
run: mvn -B package --file pom.xml

- name: Publish to GitHub Packages
run: mvn --batch-mode -Dgithub_organization=${{ github.repository_owner }} deploy
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
15 changes: 15 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"java.test.config": {
"env": {
"TEST_VARIABLE": "testValue",
"TEST_VARIABLE_EMPTY": "",
"AWS_ACCESS_KEY_ID": "testAccessKey",
"AWS_SECRET_ACCESS_KEY": "testSecretKey",
"AWS_EXPIRATION": "2020-01-01 00:00:00",
"AWS_SESSION_TOKEN": "testSessionToken",
"API_ENDPOINT": "testApiEndpoint",
"CONFLUENT_KEY": "testConfluentKey",
"CONFLUENT_SECRET": "testConfluentSecret",
}
},
}
151 changes: 52 additions & 99 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,29 @@
# Message Deposit Service

# jpo-s3-deposit (Message Deposit Service)
This project is intended to serve as a consumer application to subscribe to a Kafka topic of streaming JSON, package the results as a JSON file, and deposit the resulting file into a predetermined Firehose/Kinesis, S3 Bucket, or Google Cloud Storage Bucket (GCS). This runs alongside the ODE and when deployed using Docker Compose, runs in a Docker container.

## Quick Run
The use of AWS credentials is being read from the machine's environmental variables. You may also set them in your bash profile. Note that when using Docker Compose from the main `jpo-ode` repository, these variables are set in the `.env` present in that repo.
## Table of Contents
- [Release Notes](#release-notes)
- [Usage](#usage)
- [Installation](#installation)
- [Configuration](#configuration)
- [Debugging](#debugging)
- [Testing](#testing)

## Release Notes
The current version and release history of the Jpo-s3-deposit: [Jpo-s3-deposit Release Notes](<docs/Release_notes.md>)

## Usage
### Run with Docker
1. Create a copy of `sample.env` and rename it to `.env`.
2. Update the variable `DOCKER_HOST_IP` to the local IP address of the system running docker and set an admin user password with the `MONGO_DB_PASS` variable.
1. If connecting to a separately deployed mongo cluster make sure to specify the `MONGO_IP` and `MONGO_PORT`.
3. Navigate back to the root directory and run the following command: `docker compose -f docker-compose-mongo.yml up -d`
4. Using either a local kafka install or [kcat](https://github.com/edenhill/kcat) to produce a sample message to one of the sink topics. Optionally, you can separately run the [ODE](https://github.com/usdot-jpo-ode/jpo-ode) and process messages directly from it's output.
5. Using [MongoDB Compass](https://www.mongodb.com/products/compass) or another DB visualizer connect to the MongoDB database using this connection string: `mongodb://[admin_user]:[admin_password]@localhost:27017/`
6. Now we are done! If everything is working properly you should see an ODE database with a collection for each kafka sink topic that contains messages.

### Run manually
The use of AWS credentials is being read from the machine's environment variables. You may also set them in your bash profile. Note that when using Docker Compose from the main `jpo-ode` repository, these variables are set in the `.env` present in that repo.

If depositing to GCS, credentials are read from a JSON service account key file. A sample service account file can be found at ./resources/google/sample_gcp_service_account.json.
Please note that if depositing to GCS the service account will need the storage.buckets.get and storage.objects.create permissions.
Expand All @@ -26,119 +46,53 @@ mvn clean compile assembly:single install

To run the jar, be sure to include the topic at the end and group id at the end. If this is not a distributed system, the group can be any string.

```
java -jar target/jpo-aws-depositor-0.0.1-SNAPSHOT-jar-with-dependencies.jar

usage: Consumer Example
-s,--bootstrap-server <arg> Endpoint ('ip:port')
-d,--destination <arg> Destination (Optional, defaults to Kinesis/Firehose, put "s3" to override)
-g,--group <arg> Consumer Group
-k,--key_name <arg> Key Name
-b,--bucket-name <arg> Bucket Name
-t,--topic <arg> Topic Name
-type,--type <arg> string|byte message type
-i, --k-aws-key <arg> AWS key name (Optional, defaults to AccessKeyId)
-a, --k-aws-secret-key <arg> AWS secret access key name (Optional, defaults to SecretAccessKey)
-n, --k-aws-session-token <arg> AWS session token name (Optional, defaults to SessionToken)
-e, --k-aws-expiration <arg> AWS expiration name (Optional, defaults Expiration)
-u, --token-endpoint <arg> API token endpoint
-h, --header-accept <arg> Header Accept (Optional, defaults to application/json)
-x, --header-x-api-key <arg> Header X API key
```
Example Usage As Of: 3/2/18

```
java -jar target/jpo-aws-depositor-0.0.1-SNAPSHOT-jar-with-dependencies.jar --bootstrap-server 192.168.1.1:9092 -g group1 -t topic.OdeTimJson -b test-bucket-name -k "bsm/ingest/bsm-"
```

It should return the following confirmation

```
DEBUG - Bucket name: test-usdot-its-cvpilot-wydot-bsm
DEBUG - Key name: bsm/ingest/wydot-bsm-
DEBUG - Kafka topic: topic.OdeBsmJson
DEBUG - Type: string
DEBUG - Destination: null

Subscribed to topic OdeTimJson
```
Triggering an upload into the ODE, the output should be seen decoded into JSON in the console.

![CLI-output](images/cli-output.png)

## Additional Resources

With the Kafka installed locally on a machine, here are a few additional commands that may be helpful while debugging Kafka topics.

[Kafka Install Instructions](https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_ngx_4l4_4r)
## Installation
### Run Script
The run.sh script can be utilized to run the project. This script will export the necessary environment variables, compile the project, and run the jar file.

The IP used is the location of the Kafka endpoints.

#### Create, alter, list, and describe topics.
It should be noted that this script must be run from the project root folder, or it will not work.

```
kafka-topics --zookeeper 192.168.1.151:2181 --list
sink1
t1
t2
```
### Launch Configurations
A launch.json file with some launch configurations have been included to allow developers to debug the project in VSCode.

#### Read data from a Kafka topic and write it to standard output.
The values between braces < > are stand-in and need to be replaced by the developer.

```
kafka-console-consumer --zookeeper 192.168.1.151:2181 --topic topic.J2735Bsm
```
To run the project through the launch configuration and start debugging, the developer can navigate to the Run panel (View->Run or Ctrl+Shift+D), select the configuration at the top, and click the green arrow or press F5 to begin.

#### Push data from standard output and write it into a Kafka topic.
### Docker Compose Files
The docker-compose.yml file can be used to spin up the depositor as a container, along with instances of kafka and zookeeper.

```
kafka-console-producer --broker-list 192.168.1.151:9092 --topic topic.J2735Bsm
```
The docker-compose-confluent-cloud.yml file can be used to spin up the depositor as a container by itself. This depends on an instance of kafka hosted by Confluent Cloud.

# Confluent Cloud Integration
## Configuration
### Confluent Cloud Integration
Rather than using a local kafka instance, this project can utilize an instance of kafka hosted by Confluent Cloud via SASL.

## Environment variables
### Purpose & Usage
#### Environment variables
##### Purpose & Usage
- The DOCKER_HOST_IP environment variable is used to communicate with the bootstrap server that the instance of Kafka is running on.
- The KAFKA_TYPE environment variable specifies what type of kafka connection will be attempted and is used to check if Confluent should be utilized.
- The CONFLUENT_KEY and CONFLUENT_SECRET environment variables are used to authenticate with the bootstrap server.

### Values
##### Values
- DOCKER_HOST_IP must be set to the bootstrap server address (excluding the port)
- KAFKA_TYPE must be set to "CONFLUENT"
- CONFLUENT_KEY must be set to the API key being utilized for CC
- CONFLUENT_SECRET must be set to the API secret being utilized for CC

## CC Docker Compose File
#### CC Docker Compose File
There is a provided docker-compose file (docker-compose-confluent-cloud.yml) that passes the above environment variables into the container that gets created. Further, this file doesn't spin up a local kafka instance since it is not required.
## Release Notes
The current version and release history of the Jpo-s3-deposit: [Jpo-s3-deposit Release Notes](<docs/Release_notes.md>)

## Note
##### Note
This has only been tested with Confluent Cloud but technically all SASL authenticated Kafka brokers can be reached using this method.

# Run Script
The run.sh script can be utilized to run the PPM manually.

It should be noted that this script must be run from the project root folder, or it will not work.

# Docker Compose Files
The docker-compose.yml file can be used to spin up the PPM as a container, along with instances of kafka and zookeeper.

The docker-compose-confluent-cloud.yml file can be used to spin up the PPM as a container by itself. This depends on an instance of kafka hosted by Confluent Cloud.

# Launch Configurations
A launch.json file with some launch configurations have been included to allow developers to debug the project in VSCode.

The values between braces < > are stand-in and need to be replaced by the developer.

To run the project through the launch configuration and start debugging, the developer can navigate to the Run panel (View->Run or Ctrl+Shift+D), select the configuration at the top, and click the green arrow or press F5 to begin.

# MongoDB Deposit Service
### MongoDB Deposit Service
The mongo-connector service connects to specified Kafka topics (as defined in the mongo-connector/connect_start.sh script) and deposits these messages to separate collections in the MongoDB Database. The codebase that provides this functionality comes from Confluent using their community licensed [cp-kafka-connect image](https://hub.docker.com/r/confluentinc/cp-kafka-connect). Documentation for this image can be found [here](https://docs.confluent.io/platform/current/connect/index.html#what-is-kafka-connect).

## Configuration
Provided in the mongo-connector directory is a sample configuration shell script ([connect_start.sh](./mongo-connector/connect_start.sh)) that can be used to create kafka connectors to MongoDB. The connectors in kafka connect are defined in the format that follows:
``` shell
declare -A config_name=([name]="topic_name" [collection]="mongo_collection_name"
Expand All @@ -152,18 +106,17 @@ createSink config_name
```
This needs to be put after the createSink function definition.

## Quick Run
1. Create a copy of `sample.env` and rename it to `.env`.
2. Update the variable `DOCKER_HOST_IP` to the local IP address of the system running docker and set an admin user password with the `MONGO_DB_PASS` variable.
1. If connecting to a separately deployed mongo cluster make sure to specify the `MONGO_IP` and `MONGO_PORT`.
3. Navigate back to the root directory and run the following command: `docker compose -f docker-compose-mongo.yml up -d`
4. Using either a local kafka install or [kcat](https://github.com/edenhill/kcat) to produce a sample message to one of the sink topics. Optionally, you can separately run the [ODE](https://github.com/usdot-jpo-ode/jpo-ode) and process messages directly from it's output.
5. Using [MongoDB Compass](https://www.mongodb.com/products/compass) or another DB visualizer connect to the MongoDB database using this connection string: `mongodb://[admin_user]:[admin_password]@localhost:27017/`
6. Now we are done! If everything is working properly you should see an ODE database with a collection for each kafka sink topic that contains messages.

## Debugging
If the Kafka connect image crashes with the following error:
``` bash
bash: /scripts/connect_wait.sh: /bin/bash^M: bad interpreter: No such file or directory
```
Please verify that the line endings in the ([connect_start.sh](./mongo-connector/connect_start.sh)) and ([connect_wait.sh](./mongo-connector/connect_wait.sh)) are set to LF instead of CRLF.
Please verify that the line endings in the ([connect_start.sh](./mongo-connector/connect_start.sh)) and ([connect_wait.sh](./mongo-connector/connect_wait.sh)) are set to LF instead of CRLF.

## Testing
### Unit Tests
To run the unit tests, reopen the project in the provided dev container and run the following command:
``` bash
mvn test
```
This will run the unit tests and provide a report of the results.
12 changes: 12 additions & 0 deletions docs/Release_notes.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
Jpo-s3-deposit Release Notes
----------------------------

Version 1.6.0, released September 2024
----------------------------------------
### **Summary**
The changes for the jpo-s3-deposit 1.6.0 release include a GitHub action to publish a java artifact to the GitHub repository whenever a release is created, a change to the default value for the enable.auto.commit property to 'true', unit tests, and revised documentation for accuracy & clarity.

Enhancements in this release:
- CDOT PR 23: Added GitHub action to publish a java artifact to the GitHub repository whenever a release is created
- CDOT PR 25: Changed default value for enable.auto.commit property to 'true'
- CDOT PR 26: Added unit tests
- CDOT PR 27: Revised documentation for accuracy & clarity


Version 1.5.0, released June 2024
----------------------------------------
### **Summary**
Expand Down
68 changes: 66 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,47 @@

<groupId>usdot.jpo.ode</groupId>
<artifactId>jpo-aws-depositor</artifactId>
<version>1.5.0-SNAPSHOT</version>
<version>1.6.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JPO AWS Depositor</name>
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<jmockit.version>1.49</jmockit.version>
<!-- Allow override of github organization when publishing artifacts to github -->
<github_organization>usdot-jpo-ode</github_organization>
<argLine>
-javaagent:${user.home}/.m2/repository/org/jmockit/jmockit/${jmockit.version}/jmockit-${jmockit.version}.jar
</argLine>
<jacoco.version>0.8.11</jacoco.version>
</properties>
<dependencies>
<dependency>
<groupId>org.jmockit</groupId>
<artifactId>jmockit</artifactId>
<version>${jmockit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.9.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<version>4.13.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.3.3</version>
<scope>test</scope>
</dependency>


<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
Expand Down Expand Up @@ -99,6 +126,43 @@
</descriptorRefs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<argLine>-javaagent:${user.home}/.m2/repository/org/jmockit/jmockit/${jmockit.version}/jmockit-${jmockit.version}.jar -Xshare:off</argLine>
<systemPropertyVariables>
<loader.path>${loader.path}</loader.path>
<buildDirectory>${project.build.directory}</buildDirectory>
</systemPropertyVariables>
<environmentVariables>
<TEST_VARIABLE>testValue</TEST_VARIABLE>
<TEST_VARIABLE_EMPTY></TEST_VARIABLE_EMPTY>
<AWS_ACCESS_KEY_ID>testAccessKey</AWS_ACCESS_KEY_ID>
<AWS_SECRET_ACCESS_KEY>testSecretKey</AWS_SECRET_ACCESS_KEY>
<AWS_SESSION_TOKEN>testSessionToken</AWS_SESSION_TOKEN>
<AWS_EXPIRATION>2020-01-01 00:00:00</AWS_EXPIRATION>
<API_ENDPOINT>testApiEndpoint</API_ENDPOINT>
<CONFLUENT_KEY>testConfluentKey</CONFLUENT_KEY>
<CONFLUENT_SECRET>testConfluentSecret</CONFLUENT_SECRET>
</environmentVariables>
</configuration>
<dependencies>
<dependency>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
<distributionManagement>
<repository>
<id>github</id>
<name>GitHub Packages</name>
<url>https://maven.pkg.github.com/${github_organization}/jpo-s3-deposit</url>
</repository>
</distributionManagement>
</project>
2 changes: 1 addition & 1 deletion sample.env
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ HEADER_X_API_KEY=
KAFKA_TYPE=
CONFLUENT_KEY=
CONFLUENT_SECRET=
# Defaults to false
# Defaults to true
KAFKA_ENABLE_AUTO_COMMIT=
# Defaults to 1000
KAFKA_AUTO_COMMIT_INTERVAL_MS=
Expand Down
Loading