DSE Spark: Setting SPARK_LOCAL_IP may fix something annoying by breaking something necessary
There is a bug in the Spark project SPARK–12963 I hit on a couple of trouble tickets where the SPARK_LOCAL_IP set in the spark-env.sh resulted in messages in the executor log when using –deploy-mode cluster
Exception in thread “main” java.net.BindException: Failed to bind to: /10.1.1.7:0 <http://10.1.1.7:0>: Service ‘Driver’ failed after 16 retries!
The workaround for this is to comment out any setting of SPARK_LOCAL_IP, but on clouds using public IPs when you try and use dse spark-shell or try and use client mode to submit your jobs you’ll get nice errors like this:
Exception in thread “main” java.net.BindException: Failed to bind to: /104.99.99.99:0 <http://104.99.99.99:0>: Service ‘sparkDriver’ failed after 16 retries!
So the workaround to all this is pretty simple:
- Never ever set SPARK_LOCAL_IP until there is a fix for SPARK–12963
- For all other commands pass the appropriate IP address, this is more effort but it just works (Note I believe this applies to OSS spark as well just take the dse off):
- dse spark-submit –deploy-mode client –conf spark.driver.host <routeable ip>
- dse spark –master spark://<master ip you want>:7077
- dse spark-sql –master spark://<master ip you want>:7077