Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Dataplane that supports Non-Finite Provider Push Transfers #1150

Open
16 of 18 tasks
gerbigf opened this issue Jan 21, 2025 · 8 comments
Open
16 of 18 tasks

Comments

@gerbigf
Copy link

gerbigf commented Jan 21, 2025

Overview

Implements #938

Explain the topic in 2 sentences

A Dataspace data consumer should be able to subscribe to a data offering that provides a "stream" of non-finite data transfers while the agreed contract terms are valid. Messages should be exchanged via HTTP (not kafka)

What's the benefit?

We reduce the complexity of setting up use cases that rely on frequent, provider triggered, transfers of data.

Furthermore, we correct the current anti-pattern that is used to PUSH data via HTTP to a partner: To receive data from a Company B, a Company A has to set up an asset that points to a REST API that shall receive the data. Company A has to act as the "data provider" in the EDC sense because he sets up the contract including that asset. Company B acts as the "data consumer" in the EDC sense because he needs to query the catalogue of A, negotiate a contract and then can use the EDR to HTTP POST or PUT information against A's (the data provider in EDC speak) data plane.

What are the Risks/Dependencies ?

Note: Its potentially breaking change and should allow use cases to be redesigned in a different, more efficient way.
We're working on a solution where both transfer types can exist in parallel.

Detailed explanation

Current implementation

Considering the given scenario 1.

Company A generates GBs of data on daily basis. Company B wants to consume that data as soon as it is generated.

Currently, Company B has to create a data offering that allows Company A to notify B when a new data was generated and where it can be found. This requires a protocol to be agreed between both companies, so the details of said notification and following steps are understood.

Once company B receives the notification, it triggers a search, negotiation and transfer of that particular data.

This is designed as such because the EDC closes the forward channel on any provider push transfer scenario, as soon as the transfer is either successful / failed.

Scenario 2.
See the anti-pattern in "what's the benefit".

Proposed improvements

This shall introduce a concept for a dataplane that is capable of keeping the forward channel open, and of handling and recovering from data transfer errors, while the agreed contract terms are valid.

Feature Team

Contributor

Committer

User Stories

  • Issue 1, linked to specific repository
  • Issue 2, linked to another specific repository

Acceptance Criteria

  • A Data Consumer can subscribe to a data offer
  • A Data Consumer can receive data updates from his subscription, when the data provider adds new data to the source

Test Cases

Test Case 1

A Data Consumer can subscribe to a data offer

Steps

Given a consumer and a provider exist.
Given a provider has a dataset which we can add new data to.
Given the provider has an offer for this dataset

  1. The consumer finds, negotiates access to the providers dataset.
  2. Using the obtained agreement, the consumer initiates a provider push data transfer for the respective dataset.

Expected Result

The data transfer stays active as long as the terms defined in the agreed contract are valid.

Test Case 2

Steps

Given a consumer is subscribed to a data offer from a provider.

A data provider add new data to its data source.

Expected Result

  1. The consumer eventually receives the data in its data destination

Architectural Relevance

The following items are ensured (answer: yes) after this issue is implemented.

In the context of the standards 126 and 127, typically only one is applicable, depending on the specific use case. Please cross out one of the two standards that does not apply.

Justification: (Fill this out, if at least one of the checkboxes above cannot be ticked. Contact the Architecture Management Committee to get an approval for the justification)

Additional information

  • I am aware that my request may not be developed if no developer can be found for it. I'll try to contribute a developer (bring your own developer)
@gerbigf
Copy link
Author

gerbigf commented Jan 21, 2025

@stephanbcbauer can you add the R25.06 and R25.09 label to that? We were missing the implementation of #938

@rafaelmag110
Copy link

We can probably have this feature as a non-breaking change, where both finite and non-finite transfers are possible. Finite transfers would be the default.

@gerbigf
Copy link
Author

gerbigf commented Jan 21, 2025

Thanks @rafaelmag110 - added that to the description.

@stephanbcbauer
Copy link
Member

@stephanbcbauer can you add the R25.06 and R25.09 label to that? We were missing the implementation of #938

@gerbigf Not sure if we mention the same label :) I added Prep-R25.06, this means this features needs to be presented during Planning Days

@gerbigf
Copy link
Author

gerbigf commented Jan 21, 2025

That's perfect. I want to have visibility of this feature because it's important ;)

@stephanbcbauer
Copy link
Member

@gerbigf Could we also define a roadmap item for this? What is the “customer need”? Which “pain is released” with this? If there is nothing ... also fine for me :)

@gerbigf
Copy link
Author

gerbigf commented Jan 21, 2025

Uhm, sure.

In simple terms: "We are trying to remove the current misuse of the EDC that we call "Consumer Push" where a Data Provider actually only provides an Asset to be able to receive data via HTTP POST". This implementation will make things right again.

A data provider offers the actual data asset and the consumer can specify an API endpoint where he wants that data to be delivered.

Does that make sense? It's specific EDC terminology but it will have a huge impact on compliance in the data space.

@lgblaumeiser
Copy link
Contributor

Hi @gerbigf @rafaelmag110 I was just thinking, for http we only have a PULL transfer, not a PUSH transfer, because there are only very limited use cases for a one time push contract, right. The only thing that came to my mind is kind of an order for data that is then pushed as soon as it is available. But if we provide a infinite PUSH, it would be a straightforward thing to do a http PUSH that automatically is "infinite", as it allows to open up transfer processes as long as the contract has not expired, right.

Thinking this further, it could be the dataspace vehicle to a general publish subscribe as you could also provide a http PUSH setup which requires a backend to push to with a Kafka or corresponding backend operated on consumer side, so a plain publish subscribe setup would be really redundant, right? What do you think @giterrific ?

Do you consider that in your approach? Or is it basically based on a bucket PUSH transfer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Inbox
Development

No branches or pull requests

4 participants