Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterId and name are missing #219

Open
akbarali789 opened this issue Jul 18, 2023 · 3 comments
Open

ClusterId and name are missing #219

akbarali789 opened this issue Jul 18, 2023 · 3 comments

Comments

@akbarali789
Copy link

Hi Team,

I took code solution from spark-monitor branch l4jv2 to support our custom logging after our ADB version upgrade to 12.2 LTS.

Issue: Except metrics starts with "app" in SparkMetric_CL table, I could not find cluster id and name details.

image

Issue: Logs are not loading into "SparkListenerEvent_CL" table with latest solution.

Please help me resolve these issues as our report is now not working. Thanks in advance.

@infosuresh2k
Copy link

spark-monitor branch l4jv2 to support custom logging after DBR version upgrade to 12.2 is missing lot of columns(cluster id, name, application id etc.,) and mappings are wrong as well. I am not sure why the below new columns are introduced in the new version which supposed to be mapped to the existing columns.

10.4 vs 11.0 and above mappings list.
Level vs log_level_s, thread_name_s vs process_thread_name_s, logger_name_s vs log_logger_s, applicationName_s vs sparkAppName_s, nodeType_s vs sparkNode_s

@vandanakravi-tfm
Copy link

vandanakravi-tfm commented Aug 23, 2023

@infosuresh2k @akbarali789 Im also facing the same issue with SparkLoggingEvent_CL. Have you got the solution for this or any workarounds.

@gustavomcarmo
Copy link

I've read somewhere that we can define the columns/values sent to the Log Analytics workspace by changing the content of sparkLayout.json. The columns can be based on Spark Properties listed in the Environment tab of the Spark UI, provided by the Databricks cluster UI, as shown below:

image

I've tried then to merge as much as possible the columns already used by the implementation covering Spark versions older than 3.3.x, ending up this:

cat << 'EOF' > "$STAGE_DIR/sparkLayout.json"
{
  "@timestamp": {
    "$resolver": "timestamp",
    "pattern": {
      "format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
      "timeZone": "UTC"
    }
  },
  "level": {
    "$resolver": "level",
    "field": "name"
  },
  "message": {
    "$resolver": "message",
    "stringified": true
  },
  "thread.name": {
    "$resolver": "thread",
    "field": "name"
  },
  "logger.name": {
    "$resolver": "logger",
    "field": "name"
  },
  "labels": {
    "$resolver": "mdc",
    "flatten": true,
    "stringified": true
  },
  "tags": {
    "$resolver": "ndc"
  },
  "error.type": {
    "$resolver": "exception",
    "field": "className"
  },
  "error.message": {
    "$resolver": "exception",
    "field": "message"
  },
  "error.stack_trace": {
    "$resolver": "exception",
    "field": "stackTrace",
    "stackTrace": {
      "stringified": true
    }
  },
  "applicationId": "${spark:spark.app.id:-}",
  "applicationName": "${spark:spark.app.name:-}",
  "nodeType": "${spark:nodeType}",
  "clusterId": "${spark:spark.databricks.clusterUsageTags.clusterId:-}",
  "clusterName": "${spark:spark.databricks.clusterUsageTags.clusterName:-}"
}
EOF

The columns clusterId and clusterName are there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants