Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent for Apache Spark on EMR #805

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dupetr
Copy link

@dupetr dupetr commented May 15, 2023

Why this was created

We needed a way to collect JMX metrics from Apache Spark jobs running on AWS EMR.

The way you tell Spark to expose these metrics is

spark submit ...
--conf "spark.executor.extraJavaOptions=-javaagent:./jmx_prometheus_javaagent-0.17.2.jar=<1 specific port>:<config file>"

This works fine as long as you have 1 JVM with 1 agent exposing the metrics as it will open the webserver on a specific port.
When you have 2 executors on 1 machine, one will not start (you cannot start 2 webservers on 1 port) and the job will crash.

The enhancement

  1. You can define a range of ports, in our case we use 39100-39115. Some of the ports are wasted as a headroom. Nevertheless, it needs a continuous block/range of ports.
  2. It will not fail, when it cannot start a webserver. It will try to pick next port in the range. It would fail if your port range have 5 ports, but you would want to start 10 exporters on that machine.

The exporters will compete for resource (port) and they act independently. For this there are backoff times and retries.

On driver/executor stdout it looks like this

Backing off at server start for 1218ms
Looking up free port. Checking: 39100, remaining ports in range: 16
Port 39100 is used. Trying next one.
Looking up free port. Checking: 39101, remaining ports in range: 15
Found free port 39101
Trying to start JMX agent on 39101
Started JMX agent on 39101. (retries left: 16)

fixes #627

@dhoard
Copy link
Collaborator

dhoard commented Jun 24, 2023

I feel this is the wrong approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple JVMs per machine with mutliple Ports
3 participants