Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log links to alarm chat messages #2487

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

johnduffell
Copy link
Member

@johnduffell johnduffell commented Oct 9, 2024

edit: see discussion on chat thread for improvements: https://chat.google.com/room/AAAAA9BipFs/a9ENymXhPsw/a9ENymXhPsw?cls=10

This PR adds log links to alarm chat messages.
It assumes that logs are /aws/lambda/app-stage by default, but it can be overridden if that is not the case.

Possible future improvement would be to link to other things like step function execution logs.

tested in CODE
image
(link as text: https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdiscount-api-PROD/log-events$3Fstart$3D1728499576235$26filterPattern$3D$26end$3D1728499876235 )
image

@@ -17,7 +18,8 @@ const sharedMobilePurchasesApps = [
'mobile-purchases-google-update-subscriptions',
];

const teamToAppMappings: Record<Team, string[]> = {
type AppInfo = string | { app: string; logGroups: string[] };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now an app can have either just a name, or it can have a name and a list of associated log groups.
In future, rather than hard coding, this could maybe be generated by a call to AWS to find all log groups associated with lambdas/ec2 instances.


mappings[app] = teams;
}
export class AlarmMappings {
Copy link
Member Author

@johnduffell johnduffell Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this into a class so I could use simplified mappings in the tests more easily should handle captured CloudWatch alarm message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Typescript you can use both OOP or functional style but at the Guardian we usually use a functional style with typescript.

So in this scenario I am not sure about the benefits or changing this part of the code to use a class since the PR does not change the behaviour of this part of the code (how mappings work).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yes thanks for that, I get super confused between all the different ways to write a function/method. The idea behind making it parameterised rather than a singleton was so I could write a test that used mocked data rather than having to use the real data (which might get edited).
In scala I usually use classes because then you get better help from the compiler and IDE, but I will see how to boil it down to a function.

return mappings;
};

private buildAppToLogGroupOverrides = (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if an app is in multiple teams with different log groups, it will just pick one arbitrarily.
I could have got around it by having separate app->group mapping but I didn't think that would be as maintainable. It does mean the type system doesn't help us out 100% though.

): Record<string, Team[]> => {
const entries: Array<[Team, AppInfo[]]> = Object.entries(
theMappings,
) as Array<[Team, AppInfo[]]>; // `as` - hmm?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason calling Object.entries on a Record<Team, AppInfo[]> gives Array<[string, AppInfo[]]> so I had to give the type system the info back again. Not sure if this is the safest way of doing it though?

app: 'workers',
logGroups: [
'/aws/lambda/CreatePaymentMethod',
'/aws/lambda/CreateZuoraSubscription', //etc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future it might be useful to have a link to the state machine executions logs also, rather than just to the lambdas.

Comment on lines 144 to 149
const mappings: Record<string, Team[]> = Object.fromEntries(
Object.entries(groups).map(([app, info]) => [
app,
info.map(({ team }) => team),
]),
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempted to make an object mapValues function - if there isn't already such a thing?

Suggested change
const mappings: Record<string, Team[]> = Object.fromEntries(
Object.entries(groups).map(([app, info]) => [
app,
info.map(({ team }) => team),
]),
);
const mappings: Record<string, Team[]> =
mapValues(
groups,
(info) => info.map(({ team }) => team),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above - I am not sure why this needs changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was because I changed it from a simple team->app mapping to a team->(app|{app, loglink[]}) mapping so I could extract the log links.
When I wrote the log link extractor I couldn't work out the equivalent to groupMap in typescript so I ended up with a mess! I think the groupMap implementation is there in a later commit. It made sense to update the existing mappings extractor to be consistent but now that it's not needed I will have a look to see if it's still adding value.

@AndreaDiotallevi
Copy link
Contributor

AndreaDiotallevi commented Oct 10, 2024

Interesting addition!

I have a comment regarding where to get the log group from.

I appreciate this is a prototype, but I am not sure placing the log group info in the alarm mapping (making it more complex) is the right place.

The alarm mapping is used for ownership and it should stay simple I believe.

If we talk about "change". If for instance that app log group changes, what is the reason of the change? It is not ownership, it is likely a change in application code. So it should be changed in the producer application code. Probably it should be stored in an alarm tag (via IAC) (e.g. logGroupName) (one to one mapping alarm <-> log group name) and retrieved at run time by the alarm handler.

@johnduffell
Copy link
Member Author

Thanks Andrea - after a bit of chat the solution is going to be to look for a LogGroupNames tag on the alarm and build the links from there. Then it's completely opt-in and also keeps the info about the alarm with the alarm, rather than making this lambda know more than it should.

const entries = tags.flatMap((tag: Tag) =>
tag.Key && tag.Value ? [[tag.Key, tag.Value]] : [],
);
return Object.fromEntries(entries) as Tags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

for (const record of event.Records) {
console.log(record);
Copy link
Contributor

@AndreaDiotallevi AndreaDiotallevi Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of changing the way we handle CloudWatch alarms v SNS publish messages?

What I expected was just to just update the handleCloudWatchAlarmMessage function, by using the new tag and modifying the text message.

If this is refactoring it should probably done in a separate PR? In any case, in this case the SNS publish case is only the 1% of the cases and it's very different from the CloudWatch case (input, text message, how you get the App info), so I feel it is more clearer to separate them as early as possible up front.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review, so far I had a first pass at making the change we agreed (to use tags rather than hard coded), but I should have first made the PR into draft so it's clear it's back to in progress. I will go through it again myself and review it before it's ready.

Regarding your question, this change was to make it more testable as I could just check the return value rather than having to mock http calls, however I will have to confirm whether that test is still needed once I've made the updated.

| undefined,
StateChangeTime: Date,
) {
const assumedTimeForCompositeAlarms = 300;
Copy link
Contributor

@AndreaDiotallevi AndreaDiotallevi Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - Interesting to see how this will help us practically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be a help - just compostite alarms do not have a specific period reported from SNS. So it would just choose a default period. It might be possible to fetch the alarm details from AWS CLI and work something out, if it appears to cause confusion.

@AndreaDiotallevi
Copy link
Contributor

AndreaDiotallevi commented Oct 11, 2024

Thanks John for taking in the discussion and experimenting with the new tag.

I have just left a few more comments on coding paradigms between OOP and functional with Typescript and whether the refactoring should be done in a separate PR or even if we needed it at all.

See them as suggestions.

@johnduffell johnduffell marked this pull request as draft October 15, 2024 10:05
@johnduffell
Copy link
Member Author

Thanks John for taking in the discussion and experimenting with the new tag.

I have just left a few more comments on coding paradigms between OOP and functional with Typescript and whether the refactoring should be done in a separate PR or even if we needed it at all.

See them as suggestions.

thanks for the comments, despite that it's still WIP I think this is helpful when I get it ready for another review. I think there's probably some scala thinking coming through, for better or worse, but I will look through when I get more time and let you know when it's ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants