feat (spike): Package opentelemetry forwarding to xray #115

cfraz89 · 2022-12-14T23:58:00Z

Packages AWS opentelemetry distro for lambda into the functions, enabling capturing of opentelemetry instrumentation, and currently forwarding data to x-ray. An experiment on the impact on cold start times. Currently clocking in at 1.5s.

Have attempted to use own build of collector, enabling region-agnostic access, and enabling us to trim unnecessary modules from it, However so far haven't been able to get my build to forward to x-ray.

Next step will add some light instrumentation around activities and events to see how it turns out.

sam-goodwin · 2022-12-15T19:54:56Z

packages/@eventual/aws-runtime/src/cloudwatch-span-exporter.ts

+  export(
+    spans: ReadableSpan[],
+    resultCallback: (result: ExportResult) => void
+  ): void {
+    console.log("Exporting!!!", spans);
+    const sym = Symbol();
+    const promise = client
+      .send(
+        new PutLogEventsCommand({
+          logGroupName: this.logGroupName,
+          logStreamName: this.logStreamName,
+          logEvents: spans.map((s) => ({
+            message: this.serializeSpan(s),
+            timestamp: new Date().getTime(),
+          })),
+        })
+      )
+      .then(() => {


Does this run in a side car or within the lambda execution?

Within the lambda, but asynchronously.

I don't think we should do this. It couples the critical path to telemetry. Should we instead run it as a side car ?

We could, it'd add some complexity and a bit of overhead though. What's the concern with it being on the critical path? Could just wrap the whole thing in a try/catch if failure is a worry

The request won't complete until the (slow) put logs request succeeds.

Agree on not polluting logs.

Yep, but wont the lambda execution be incomplete until the side car has completed sending to cloudwatch, and terminated, anyway?

Or are their lifetimes independent?

Ok, having issues with sequencing the logs to cloudwatch anyway, with this model. Did some reading up, I see lambda extensions get independent lifetimes from the function, ie the function can return back to caller, while the extension gets to clean up. So yeah I'll try making a cut down build of the collector that just exports to cloudwatch.

There's https://github.com/open-telemetry/opentelemetry-collector/tree/main/cmd/builder for making custom builds of the collector

thantos

Have any sample of what value this change brings?

thantos · 2022-12-23T15:27:08Z

packages/@eventual/aws-cdk/src/service.ts

+      this.workflows.orchestrator,
+      "orchestrator"
+    );
+    this.telemetry.attachToFunction(this.scheduler.forwarder, "forwarder");


Not sure the scheduler forwarded needs this? What is it logging? If it does need it, then the scheuler.handler also needs it.

Sure can take it off.

thantos · 2022-12-23T15:31:47Z

packages/@eventual/aws-cdk/src/telemetry.ts

+    const logStream = new LogStream(this, `LogStream${componentName}`, {
+      logGroup: this.logGroup,
+      logStreamName: componentName,
+    });


Is this what we want? A log steam per function for all time? Or is this just an experiment?

There is a limit to the number of writes to a log stream.

5 requests per second per log stream. Additional requests are throttled. This quota can't be changed.

The orchestrator, for all workflow executions, would be limited to 5TPS.

Hrm ok didnt realise there was a throttle. I originally had it creating a new log stream every execution, but reliased without static streams it would be difficult to attach events listeners to the streams, to forward logs to the real collector. With static streams we can just set that up in cdk.

An option I can think of instead, actually, is just skip the logging to cloudwatch part. Instead the extension just sends the data to the otel collector running in a different lambda, over http

thantos · 2022-12-23T15:36:00Z

packages/@eventual/aws-runtime/src/clients/workflow-client.ts

+    const tracer = trace.getTracer(executionId, "0.0.0");
+    await tracer.startActiveSpan(
+      "startWorkflow",
+      {
+        attributes: { workflowName, input },
+        kind: SpanKind.PRODUCER,


Does it make sense to trace in the client or should we trace in the orchestrator (aka: those who call the client). Not all of the callers of the client will have tracing on.

Yeah orchestrator probably makes more sense.

thantos · 2022-12-23T15:37:41Z

packages/@eventual/aws-runtime/src/telemetry.ts

+  provider.addSpanProcessor(
+    new BatchSpanProcessor(new OTLPTraceExporter({ hostname: "127.0.0.1" }))
+  );
+  provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));


thantos · 2022-12-23T15:39:45Z

packages/@eventual/aws-runtime/src/handlers/orchestrator.ts


 /**
 * Creates an entrypoint function for orchestrating a workflow
 * from within an AWS Lambda Function attached to a SQS FIFO queue.
 */
+const traceProvider = registerTelemetryApi();


these functions compile to MJS, would it help to make the AWS clients and telemetry start in parallel (top level await)

I initially tried that somewhere else, but the functions also compile to cjs, and breaks in that form. Its our lowest common denominator. That being said, registerTelemetryApi isnt an async function, I dont think top level await's going to help us here.

thantos · 2022-12-23T15:41:02Z

packages/@eventual/core/src/runtime/handlers/activity-worker.ts

+      await tracer.startActiveSpan(
+        "createActivityWorker",
+        { attributes: { command: request.command.name } },


Why have a span for createActivityWorker, but not createOrchestrator? Does this one time operation need a span?

I've only created a couple of spans for testing purposes. I'll leave it up to you guys to create more once everything's in place.

thantos · 2022-12-23T15:42:35Z

packages/@eventual/core/src/runtime/handlers/orchestrator.ts

+        const orchestrateSpan = tracer.startSpan("orchestrate");
+        const ret = await orchestrateExecution(workflow, executionId, records);
+        orchestrateSpan.end();
+        return ret;


Would the scope function be better here?

It would be, for some reason that one isn't working right now though. Probably missing something in the setup of the sdk.

cfraz89 · 2022-12-24T14:12:48Z

Have any sample of what value this change brings?

Have any sample of what value this change brings?
Before sample, the obvious answer is threefold:

it enables us to push traces/metrics/logs to the entire otel ecosystem, which is adopted by many popular services like datadog and premetheus.
the extension can be used for any language we build support for, not just typescript.
provides an agnostic api for handling traces/metrics/logs across the codebase, including the shared parts. Should be useful when deploying to cloudflare etc.

Aside from that, I'm suspecting that the traces being sent to eg. grafana could do a good job at visualising the workflow runs, and with a bunch of customisation ability, better than our own visualiser will be able to provide for some time.

As for a sample, I'll put one together once everything works.

netlify · 2023-01-01T12:48:25Z

❌ Deploy Preview for preeminent-heliotrope-380b2a failed.

Name	Link
🔨 Latest commit	`9b62c5b`
🔍 Latest deploy log	https://app.netlify.com/sites/preeminent-heliotrope-380b2a/deploys/63b18311fcde380008d27c25

Chris Fraser added 3 commits December 15, 2022 10:56

Package opentelemetry forwarding to xray

b1d0252

Add some tracing

5bc9fc3

Work on cloudwatch exporter

20b87b5

sam-goodwin reviewed Dec 15, 2022

View reviewed changes

Chris Fraser added 2 commits December 23, 2022 01:04

Successfully send traces to cloudwatch

2678da5

create a log stream per task type

b4c6b40

thantos suggested changes Dec 23, 2022

View reviewed changes

Chris Fraser added 5 commits December 26, 2022 18:01

Remove extension

2656a91

send otlp to secondary lambda over http without extension

ee6a471

Configure otlp lambda proxy

455484e

shuffle files around

004f7b6

Wire up metrics

9e9c9d2

Chris Fraser added 2 commits January 1, 2023 23:56

Add emf exporter

9b62c5b

Successfully export metrics

e8706d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat (spike): Package opentelemetry forwarding to xray #115

feat (spike): Package opentelemetry forwarding to xray #115

cfraz89 commented Dec 14, 2022 •

edited

Loading

sam-goodwin Dec 15, 2022

cfraz89 Dec 15, 2022

sam-goodwin Dec 15, 2022

cfraz89 Dec 15, 2022

sam-goodwin Dec 15, 2022

sam-goodwin Dec 16, 2022

cfraz89 Dec 16, 2022 •

edited

Loading

cfraz89 Dec 16, 2022

cfraz89 Dec 16, 2022 •

edited

Loading

cfraz89 Dec 16, 2022

thantos left a comment

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

cfraz89 Dec 24, 2022 •

edited

Loading

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

thantos Dec 23, 2022

cfraz89 Dec 24, 2022 •

edited

Loading

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

thantos Dec 23, 2022

cfraz89 Dec 24, 2022

cfraz89 commented Dec 24, 2022

netlify bot commented Jan 1, 2023 •

edited

Loading

feat (spike): Package opentelemetry forwarding to xray #115

Are you sure you want to change the base?

feat (spike): Package opentelemetry forwarding to xray #115

Conversation

cfraz89 commented Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfraz89 Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfraz89 Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thantos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfraz89 Dec 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfraz89 Dec 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfraz89 commented Dec 24, 2022

netlify bot commented Jan 1, 2023 • edited Loading

❌ Deploy Preview for preeminent-heliotrope-380b2a failed.

cfraz89 commented Dec 14, 2022 •

edited

Loading

cfraz89 Dec 16, 2022 •

edited

Loading

cfraz89 Dec 16, 2022 •

edited

Loading

cfraz89 Dec 24, 2022 •

edited

Loading

cfraz89 Dec 24, 2022 •

edited

Loading

netlify bot commented Jan 1, 2023 •

edited

Loading