Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageStatus is not serializable #532

Open
thejohncostanzo opened this issue Oct 19, 2023 · 1 comment
Open

StorageStatus is not serializable #532

thejohncostanzo opened this issue Oct 19, 2023 · 1 comment

Comments

@thejohncostanzo
Copy link

In truth this is more an issue with Spark itself which is recorded here: https://issues.apache.org/jira/browse/SPARK-43108

When using the spark snowflake connector (2.11 and above I believe) you will always end up seeing the following (harmless) stack trace on the driver:

Caused by: java.io.NotSerializableException: org.apache.spark.storage.StorageStatus
Serialization stack:
    - object not serializable (class: org.apache.spark.storage.StorageStatus, value: org.apache.spark.storage.StorageStatus@715b4e82)
    - element of array (index: 0)
    - array (class [Lorg.apache.spark.storage.StorageStatus;, size 2)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:286)
    at org.apache.spark.rpc.netty.RemoteNettyRpcCallContext.send(NettyRpcCallContext.scala:64)
    at org.apache.spark.rpc.netty.NettyRpcCallContext.reply(NettyRpcCallContext.scala:32)
    at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:156)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
    at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
    at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)

and a corresponding warning in the executor that triggered it (note there is no additional detail after this line):

WARN SnowflakeTelemetry$: Fail to get cluster statistic. reason: Exception thrown in awaitResult: 

This is caused by this commit: 2a3f090 which attempts to gather some telemetry about the running spark session. Specifically the offending line is SparkEnv.get.blockManager.master.getStorageStatus.length to attempt to gather the number of nodes in the cluster. Based on how StorageStatus is not serializable, I can't believe that this call has ever succeeded (nor will it ever) until Spark actually fixes the issue upstream. I'm wondering if it makes sense to remove this line (or somehow get the data in a different way) to clean up our logs and avoid having "ignorable" errors.

@hanamurayuki
Copy link

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants