Welcome to the Delta Producer/Consumer Tutorial! In this comprehensive guide, we will walk you through the setup and configuration of the powerful Delta Producer and Consumer stacks. These essential components facilitate efficient data synchronization, allowing smooth data consumption between applications.
Here's a brief outline of the tutorial:
Step 0: Setup of the producer and consumer stack Step 0 introduces two application stacks, one of which will be used on the producer side and the other will be used on the consumer side.
Step 1: Setting up the Delta Producer Stack Step 1 introduces the Delta Producer stack, guiding you through the setup and configuration of its various components. We will cover essential aspects like mu-authorization layer, delta-notifier, job-controller, and background jobs.
Step 2: Activating the Delta Producer Stack After configuring the Delta Producer stack, step 2 focuses on its activation, ensuring that all components are ready to generate and publish delta messages.
Step 3: Configuring the Delta Consumer Stack In step 3, we introduce the Delta Consumer stack, explaining its purpose and guiding you through its configuration. We will look at the mu-authorization layer and delta-consumer service.
Step 4: Linking the Consumer and Producer Stacks Step 4 demonstrates how to link the Delta Consumer with the Delta Producer stack. We will cover the necessary configurations to enable data consumption and synchronization between the two.
Step 5: Implementing Producer-Side Authentication In step 5, we focus on enhancing security by implementing producer-side authentication. You will learn how to generate and add keys to ensure secure data publication.
Step 6: Showcasing File Sharing Abilities Finally, in step 6, we showcase the file sharing abilities of the Delta Producer stack, allowing efficient sharing and utilization of data files.
As part of this tutorial, we use a books frontend, the code for which can be found at https://github.com/Jan-PieterBaert/delta-tutorial-books-frontend.
By the end of this tutorial, you will have a comprehensive understanding of the Delta Producer and Consumer stacks, empowering you to build dynamic and synchronized applications that efficiently handle data updates. So, let's dive in and embark on the journey of mastering delta synchronization!
The producer and consumer stacks are simple book application to showcase how the sync flow works between the producer and consumer stack.
To begin, we'll pull the Producer stack, which includes the application for editing the list of books. Follow these steps:
- Open your terminal or command prompt.
- Navigate to the desired directory where you want to store the Producer stack.
- Execute the following command to pull the Producer stack:
git clone https://github.com/Jan-PieterBaert/delta-tutorial-producer
- Wait for the pull process to complete. You now have the latest version of the application used to manage the list of books.
Next, let's pull the Consumer stack, which comprises the application for viewing the books. Follow these steps:
- Keep the terminal or command prompt open.
- Navigate to the directory where you'd like to store the Consumer stack.
- Use the following command to pull the Consumer stack:
git clone https://github.com/Jan-PieterBaert/delta-tutorial-consumer
- Allow the process to finish. You now have the most up-to-date version of the application used to view the list of books.
In this step, we will walk you through the process of adding various containers to set up the publication flow.
Each container is added in the docker-compose.yml
file and plays a crucial role in managing different aspects of the publication process. Let's get started!
The mu-authorization layer ensures secure access control to the database. Follow these steps to add it:
- Add the
mu-authorization
container with appropriate configurations to control access to the database.
The resource and cache containers optimize the handling of job/task objects. Follow these steps to add it:
- Configure and add the
resource
container to efficiently manage job/task objects. - Add the
cache
container to store frequently accessed job/task data for faster retrieval.
The delta-notifier informs other services about changes in the database. Follow these steps to add it:
- Add the
delta-notifier
container with the necessary configuration to enable it to send notifications when changes occur.
The file service container is responsible for managing files. Follow these steps to add it:
- Add the
file service
container with appropriate configurations to handle file-related operations.
The job-controller and scheduled-job-controller containers handle the scheduling of tasks and jobs. Follow these steps to add them:
- Configure and add the
job-controller
container to schedule tasks/jobs based on events or user actions. - Add the
scheduled-job-controller
container to manage recurring jobs or tasks on a predetermined schedule.
The delta-producer-dump-file-publisher container publishes dump-files to the triplestore. Follow these steps to add it:
- Add the
delta-producer-dump-file-publisher
container with the necessary configurations to publish the dump-files to the triplestore.
The delta-producer-publication-graph-maintainer ensures the publication graph remains consistent with the database content. Follow these steps to add it:
- Configure and add the
delta-producer-publication-graph-maintainer
container, which optionally publishes differences as delta messages based on the specified configuration.
The delta-producer-background-jobs-initiator creates background jobs, including the initial sync, periodic healing, and dumping jobs. Follow these steps to add it:
- Add the
delta-producer-background-jobs-initiator
container, setting up the necessary configurations for creating and managing background jobs.
In this step, we will walk you through the essential configuration steps for various publisher components. Each component plays a crucial role in managing the publication flow effectively. Let's get started!
mu-authorization requires two crucial configurations:
config.ex
: Define permissions for editing specific kinds of triples based on user roles or groups.delta.ex
: Configure delta notifications to ensure mu-authorization provides the delta-notifier with necessary delta messages. With these configurations in place, the jobs and tasks can be created and the delta-notifier will work as usual.
The resource configuration files, located at config/resources/
, define the data model for tasks and jobs.
This allows the job services to create and consume the jobs.
In this step, we will focus on configuring the delta-notifier, a critical component that includes various services expecting delta messages. There are a few important aspects to consider during its configuration:
- The delta-producer-publication-graph-maintainer service is responsible for receiving delta messages from specific graphs. To set up this configuration, follow these steps:
- Specify the necessary graphs from which the maintainer should receive delta messages.
- Ensure you have one configuration entry for each graph that needs to be monitored.
- Pay close attention when modifying the graph filters or including multiple graphs.
- Removing the graph filter or including an excessive number of graphs may overwhelm the
delta-producer-publication-graph-maintainer
service.
- Removing the graph filter or including an excessive number of graphs may overwhelm the
- Specify the necessary graphs from which the maintainer should receive delta messages.
Keep in mind that other entries may need to be added based on your application's specific requirements. Customization allows you to tailor the configuration to suit your project's functionality effectively.
In the job-controller configuration, specify tasks for a job in a specific order. Note that job URIs may change, so customize them accordingly and keep them synchronized across all configurations for consistency. Proper configuration ensures effective task sequencing and maintains a coherent publication flow.
Configure the delta-producer-dump-file-publisher
with the following details:
fileBaseName
: Specify the location to find the dumps for the jobs.- Keep the job URIs in sync across all configurations to ensure consistency.
Configure the delta-producer-publication-graph-maintainer
with the following details:
- Make the the job URIs are synced with other config files to maintain consistency.
- Verify that the graphs for errors, files and jobs match other configured services.
- Note that delta files are generated at the
relativeFilePath
whenserveDeltaFiles
is enabled. - When the healing process takes too much time because the dateset is too large, it is advised to use the
useFileDiff
option for memory-efficient healing on big datasets. - Enable
logOutgoingDelta
for debugging created deltas. - Set up the export config to synchronize patterns to the publication graph, specified in the file specified by
exportConfigPath
.- For each type, define a list of graphs and properties that should be synchronized.
- Ensure the graph is the same as the graph your application creates triples in.
Configure the delta-producer-background-jobs-initiator
with the following details:
- Ensure
jobsGraph
is the same across all configurations. - Make the the job URIs are synced with other config files to maintain consistency.
- Enable
startInitialSync
if initial sync has never been executed. This ensures it gets executed, as healing jobs can only start after the initial sync has run. - Use cronjob patterns to determine when new jobs should be created for dump and healing tasks.
- For more information about cronjobs, refer to https://crontab.guru/ (note: the cronjob pattern here includes a
second
option before all others).
- For more information about cronjobs, refer to https://crontab.guru/ (note: the cronjob pattern here includes a
With these configurations, you can effectively initiate background jobs and control their scheduling for smooth operation of your publication flow.
To ensure proper synchronization and consistent data management, you need to check that specific job URIs and graphs are in sync across the relevant configuration files. Here's a breakdown of the elements you need to verify:
dcatDataSetSubject
- In
config/delta-producer/dump-file-publisher/config.json
, verify this URI for each key.
- In
deltaDumpFileCreation
- In
config/delta-producer/dump-file-publisher/config.json
, ensure it is used as the key for the dump configuration. - In
config/delta-producer/background-job-initiator/config.json
, confirm it is used asdumpFileCreationJobOperation
. - In
config/job-controller/config.json
, validate it as a key for its task configuration.
- In
initialPublicationGraphSyncing
- In
config/delta-producer/dump-file-publisher/config.json
, ensure it is used as the key for the dump configuration. - In
config/delta-producer/publication-graph-maintainer/config.json
, verify it is used asinitialPublicationGraphSyncJobOperation
. - In
config/job-controller/config.json
, validate it as a key for its task configuration.
- In
healingJobOperation
- In
config/job-controller/config.json
, confirm it is used as a key for its task configuration. - In
config/delta-producer/publication-graph-maintainer/config.json
, verify it ashealingJobOperation
. - In
config/delta-producer/background-job-initiator/config.json
, ensure it is used ashealingJobOperation
.
- In
jobsGraph
- In
config/delta-producer/background-job-initiator/config.json
, verify its presence. - In
config/delta-producer/publication-graph-maintainer/config.json
, ensure it is defined correctly. - In
config/authorization/config.ex
, check the part for jobs to find the%GraphSpec
where it should be the graph. - In
config/delta/rules.js
, confirm it is used as a graph filter for messages sent to thedelta-producer-publication-graph-maintainer
service.
- In
publicationGraph
- In
config/delta-producer/publication-graph-maintainer/config.json
, verify its definition. - In
config/delta-producer/dump-file-publisher/config.json
, confirm its use astargetGraph
.
- In
filesGraph
- In the file service section of
docker-compose.yml
, ensure its presence. - In
config/delta-producer/publication-graph-maintainer/config.json
, verify its usage.
- In the file service section of
By thoroughly verifying these job URIs and graphs across different configuration files, you can maintain synchronization and facilitate a seamless publication flow.
After configuring the delta producer and related components, it's essential to verify the output:
- Ensure the initial dump exists at the configured location (example application:
data/files/delta-producer-dumps/initial-dump-books/
). - Make a change in the application and wait for the healing process to execute.
- Check that the delta exists at the configured location (example application:
data/files/deltas/books/$(date +%Y-%m-%d)/
).
By following these steps, you can ensure the delta producer is functioning correctly, and deltas are generated and stored as expected.
The mu-authorization layer ensures secure access control to the database. Follow these steps to add it:
- Add the
mu-authorization
container with appropriate configurations to control access to the database.
The delta-consumer
service is responsible for consuming both the initial dump and delta messages from the delta-producer
.
By incorporating this service, your application can efficiently process and utilize the data updates generated by the producer side.
For this part we refer back to step 2.1, since the mu-authorization configuration is similar.
When configuring the consumer, pay attention to the following variables:
DCR_SYNC_BASE_URL
: Set tohttp://machine-ip:8888
(avoid usinglocalhost
as it resolves to the container itself).DCR_SYNC_FILES_PATH
: Define the path of the sync files, configured in the dispatcher of the producer.DCR_SYNC_DATASET_SUBJECT
: Specify thedcatDataSetSubject
job URI configured on the producer side.DCR_DISABLE_DELTA_INGEST
andDCR_DISABLE_INITIAL_SYNC
: In thedocker-compose.yml
file, set these variables totrue
and use an overriding docker-compose configuration file to switch them tofalse
when needed.INGEST_GRAPH
: The default ishttp://mu.semte.ch/graphs/public
, but some backend services might require different graphs. Adjust accordingly based on your backend requirements.
Note In order to start syncing, the consumer needs to do an initial sync and delta ingest. You'll have to run the override once where
DCR_DISABLE_DELTA_INGEST
andDCR_DISABLE_INITIAL_SYNC
are set to false. After this you can run the normal docker compose with the values set to true again.
By correctly configuring these variables, you can ensure smooth operation and data processing within the consumer component of your application.
After setting up the consumer and ingesting the initial dump and delta files, it's crucial to verify that the data is correctly synchronized between the producer and consumer. You can do this in two ways:
- Using the Books Application: Edit the books list on the producer side at
http://localhost:8888/books-frontend/book
, and wait until the sync, including healing that creates deltas and ingestion on the consumer side, is completed. Then, check that the list is the same on the consumer side athttp://localhost:8877/books-frontend/book
. - Comparing SPARQL Query Outputs: Compare the SPARQL query outputs on
localhost:8879/sparql
(consumer) andlocalhost:8889/sparql
(producer). Ensure that the data retrieved from both endpoints is consistent and reflects the changes made on the producer side.
By performing these checks, you can ensure that the data synchronization is working correctly, and the producer and consumer are in harmony.
In this final step, we will add authentication to the producer side. Follow these steps to secure the producer:
- Generate a Key: Create a unique key, which can be any string, to use for authentication.
- Add the Key to
publication-graph-maintainer
Configuration: In theconfig/delta-producer/publication-graph-maintainer/config.json
file, include the generated key for the correct service. - Update Dispatcher Configuration: On the producer side, update the dispatcher configuration to include a login route for the correct service. This ensures secure authentication.
- Add the Key to the Consumer Configuration: On the consumer side, add the generated key in the
docker-compose.yml
file for the correct consumer service. - Include the Login URL for the Consumer Service: Also on the consumer side, add the full login URL in the
docker-compose.yml
file for the correct consumer service. This connects the consumer with the authentication process.
By implementing these authentication measures on the producer side, you enhance the security and integrity of your data publication and consumption processes.
In this final step we show where the delta files can be found.
Since the delta-producer-publication-graph-maintainer
service makes these delta files and they can be accessed at the configured endpoint (and forwarded by the dispatcher)