Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter end device event types that are forwarded to applications #4834

Closed
htdvisser opened this issue Nov 8, 2021 · 14 comments · Fixed by #5094
Closed

Filter end device event types that are forwarded to applications #4834

htdvisser opened this issue Nov 8, 2021 · 14 comments · Fixed by #5094
Assignees
Labels
c/console This is related to the Console in progress We're working on it ui/web This is related to a web interface
Milestone

Comments

@htdvisser
Copy link
Contributor

Summary

We need to filter which end device events are forwarded to the application event stream.

Why do we need this?

Because currently event streams can't keep up with large applications.

What is already there? What do you see now?

All end device events are forwarded to application event streams. This means that for applications with thousands of devices the event stream won't be able to keep up, and drop events. This makes the event stream pretty useless.

What is missing? What do you want to see?

We can think about a (possibly configurable) set of end device events that get forwarded to the application. My proposal would be to only forward application-layer events (starting with as.).

Environment

All deployments, but specifically The Things Stack Cloud.

How do you propose to implement this?

Define a list of end device event names that get forwarded to the application.

Can you do this yourself and submit a Pull Request?

Sure, but this first needs some discussion.

@htdvisser htdvisser added the needs/discussion We need to discuss this label Nov 8, 2021
@github-actions github-actions bot added the needs/triage We still need to triage this label Nov 8, 2021
@NicolasMrad NicolasMrad removed the needs/triage We still need to triage this label Nov 9, 2021
@kschiffer
Copy link
Contributor

Related to #2887

@htdvisser htdvisser added this to the v3.17.0 milestone Nov 10, 2021
@htdvisser
Copy link
Contributor Author

htdvisser commented Nov 10, 2021

The most chatty end device events are:

  • ns.up.data.receive (which is published for each duplicate)
  • ns.up.data.drop
  • ns.up.data.process
  • ns.up.data.forward
  • as.up.data.receive
  • as.up.data.forward
  • ns.down.data.schedule.attempt
  • ns.down.data.schedule.success
  • as.packages.storage.up.store
  • ns.mac.link_adr.request
  • ns.mac.dev_status.request
  • ns.mac.new_channel.request
  • ns.mac.link_adr.answer.accept
  • ns.mac.rx_timing_setup.request

Most of these are in my opinion not relevant to the application, and therefore don't need to be forwarded to application event streams.

@kschiffer
Copy link
Contributor

I'm happy with any irrelevant event type that we can omit by default. 

Also compare with what events we show by default in the Console:

  'as.*.drop',
  'as.down.data.forward',
  'as.up.location.forward',
  'as.up.data.forward',
  'as.up.service.forward',
  'gs.down.send',
  'gs.gateway.connect',
  'gs.gateway.disconnect',
  'gs.status.receive',
  'gs.up.receive',
  'js.join.accept',
  'js.join.reject',
  'ns.mac.*.answer.reject',

@adriansmares
Copy link
Contributor

I'd like to see how things perform after #4831 . If this is still an issue, we can indeed start slashing a bit the NS events.

@htdvisser
Copy link
Contributor Author

This is only partly related to #4831. While #4831 improves the performance on the event subscriber side, this issue is trying to improve the performance on the event publisher side, and of the pub/sub system (usually Redis).

@adriansmares
Copy link
Contributor

adriansmares commented Nov 10, 2021

My only concern is that this behavior is inconsistent - It's slightly weird that normal I would go to the application Live Data and see some events only, given that before I saw everything. But I fully agree that an incomplete and inconsistent stream won't help much (maybe only for programmatic consumers), and that this becomes worse with scale.

This means that for applications with thousands of devices the event stream won't be able to keep up, and drop events.

I can't fully find where this occurs in the publishing code - could you please give some pointers here ?

@htdvisser
Copy link
Contributor Author

htdvisser commented Nov 10, 2021

We currently don't actually drop them, which means that the event system will probably just grind to a halt, which would impact everyone instead of just the concerning application.

if evtEntityType == "end device" && subEntityType == "application" &&
unique.ID(evt.Context(), evtIDs.GetDeviceIds().ApplicationIdentifiers) == unique.ID(s.ctx, subIDs) {
return true
}

if appID := id.GetApplicationIds(); appID != nil {
pattern := ps.eventChannel(ctx, name, (&ttnpb.EndDeviceIdentifiers{
ApplicationIdentifiers: *appID,
DeviceId: "*",
}).GetEntityIdentifiers())
patterns = append(patterns, pattern)
}

if devID := id.GetDeviceIds(); devID != nil {
eventStream := ps.eventStream(evt.Context(), devID.ApplicationIdentifiers.GetEntityIdentifiers())
tx.XAdd(ps.ctx, &redis.XAddArgs{
Stream: eventStream,
MaxLenApprox: int64(ps.entityHistoryCount),
Values: streamValues,
})
tx.Expire(ps.ctx, eventStream, ps.entityHistoryTTL)
}

@KrishnaIyer
Copy link
Member

I also think only the following are relevant to application users, i.e, something that says "success" and something that helps debug "failure". Most of the time, I don't even look at the other events since they don't convey any information to me.

  'as.*.drop',
  'as.*.forward',

@htdvisser
Copy link
Contributor Author

Okay so what about this:

EndDevice CRUD:

end_device.*
ns.end_device.*
as.end_device.*
js.end_device.*

AS Forwarded Uplinks/Downlinks:

as.*.forward

JS Joins:

js.join.accept

Errors and warnings:

*.fail
*.warning
*.drop
*.reject

NS MAC layer:

ns.class.switch.*

@htdvisser htdvisser removed the needs/discussion We need to discuss this label Jan 5, 2022
@htdvisser htdvisser modified the milestones: v3.17.0, v3.17.1 Jan 5, 2022
@htdvisser
Copy link
Contributor Author

I'm going to implement this for the next milestone.

@kschiffer
Copy link
Contributor

kschiffer commented Jan 11, 2022

Looking into this now and remembered that the Console relies on certain event types to synchronize "hot data" such as

  • as.up.*.forward for Last activity (Currently not planned to be filtered out)
  • ns.up.data.process for uplink/downlink count

Filtering those out would result in the Console not being able to keep these data points updated. So we either need to find a replacement event or keep these events unfiltered. We already ran into this problem when moving to backend-enabled filters (#4831) in the backend and had to adjust the filtering accordingly. This caused a problem where we now have to show some events in the Console simply because we rely on it internally and we removed hiding of events on the frontend.

Also, keep in mind that changing what data is being displayed can cause confusion/frustration among users if not communicated properly. So we should make sure that users will understand why the event output changed, via the changelog and other communication channels.

@htdvisser htdvisser added the in progress We're working on it label Jan 12, 2022
@htdvisser
Copy link
Contributor Author

  • The as.up.*.forward events will still be included in the stream of application events.
  • Do you really need ns.up.data.process in the stream of application events? It's still included in the stream of device events.
  • Obviously there will be a CHANGELOG item for this.

@kschiffer
Copy link
Contributor

  • Do you really need ns.up.data.process in the stream of application events? It's still included in the stream of device events.

Yes, we could use the end device event stream to get this info, but it's currently still sourced from the app stream.

@htdvisser htdvisser assigned kschiffer and unassigned htdvisser Jan 20, 2022
@htdvisser htdvisser added c/console This is related to the Console in progress We're working on it ui/web This is related to a web interface and removed in progress We're working on it labels Jan 20, 2022
@htdvisser
Copy link
Contributor Author

Re-assigning since my part is done. What's still TODO is to update the Console accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/console This is related to the Console in progress We're working on it ui/web This is related to a web interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants