-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend usage metrics (offline entities) for 2024.2 #1198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far, just leaving some initial thoughts!
// submission create loggedAt timestamp (when sub was created) to when | ||
// the event was processed, which will be after the entity version was | ||
// created. | ||
const measureEntityProcessingTime = () => ({ one }) => one(sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this exclude offline updates that are force-processed from the backlog? I think those would skew the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is another place where getodk/central#720 would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion below, I think this isn't an issue after all. Even if an offline update ends up in the backlog, that's different from how quickly the submission.create
event is processed.
// submission create loggedAt timestamp (when sub was created) to when | ||
// the event was processed, which will be after the entity version was | ||
// created. | ||
const measureEntityProcessingTime = () => ({ one }) => one(sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reason, it'd be great if we could exclude submission.create
events that were processed when the approvalRequired
flag was toggled, as we expect a delay in that case. I'm not really sure how to identify that though. 🤔 If it helped, I think it'd be OK to exclude entity creations (entity defs whose version number is 1), since those are the only entity defs that the approvalRequired
flag will affect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is kinda indirect in a way that I think is convenient for the metric we are getting at.
It's looking at the timestamps on the submission.create
audit event which only gets updated by the jobs worker, the one that runs the first time the event comes in. The other things that process submission events to make entities (like the batch processing when toggling the approvalRequired flag and batch-processing the backlog) use the event to look up information, but they don't update anything about the underlying event.
So I think it's safe to to look at this time difference like this. It's not every submission create event, it's only the ones that ultimately turn into entity versions. It doesn't capture how long it takes overall for a submission to become an entity (potentially factoring in these other sources of delay), but it does capture how long a submission would be sitting around waiting for any initial entity processing (even if the result of that was it getting skipped to be reexamined later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what you were expecting was a more explicit (entity def creation date - source submission def creation date). I could add a query for that, too, that could include these other ways in which entity processing can get delayed. Then we could see if there's a noticeable difference... which would mean these delay-related mechanism are activated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking at the timestamps on the
submission.create
audit event which only gets updated by the jobs worker, the one that runs the first time the event comes in.
Ah that's interesting, I hadn't put that together. This sounds like a good way to approach the metric then. 👍 I don't feel like we need anything more explicit: it makes sense to me to start with this.
I think metric 5 is also one that would benefit from getodk/central#720. |
WHERE action = 'submission.create' | ||
`); | ||
|
||
const measureElapsedEntityTime = () => ({ one }) => one(sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this query for fun but decided not to include it in the actual report. Partly because it feels slower than some of the other queries and the overall set of queries is already sooo slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and I think as we discussed above, we would expect these metrics to be larger than those from measureEntityProcessingTime()
, as these metrics will include the 5-day entity backlog wait and also the time between submission creation and submission approval (if submission approval is required).
|
||
// Create form, set dataset people to approvalRequired = true | ||
await createTestForm(service, container, testData.forms.simpleEntity, 1); | ||
await approvalRequired(service, 1, 'people'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I just removed this function in #1196, as it wasn't used anymore and the linter was complaining. I think merging now would break the master
branch, but you could merge master
into this PR, then add approvalRequired()
back. Or if it's not used elsewhere, you could just send the request to toggle the flag without using approvalRequired()
. Sorry about that!
ebeb0ba
to
16e33b5
Compare
Closes getodk/central#653
Example of new metrics:
This covers these metrics:
submission.reprocess events
have there been?It doesn't include
TODO:
tackle this one: (planning to leave til after Log when a create/update entity submission is force applied central#720)5. How long does it take for an entire branch to be processed? Specifically: what is the max amount of time between when the first submission for a branch is received and when the last submission is processed?
What has been done to verify that this works as intended?
Why is this the best possible solution? Were any other approaches considered?
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.
Before submitting this PR, please make sure you have:
make test
and confirmed all checks still pass OR confirm CircleCI build passes