Friday, May 12, 2017

NameError: name 'spark' is not defined

With "pyspark" script, you have a "spark" object as follows:

SparkSession available as 'spark'.
>>> spark
<pyspark.sql.session.SparkSession object at 0x10d6dd898>
>>>

But with "python" or Jupyter Notebook (IPython Notebook), there's no "spark" object as follows:

>>> spark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'spark' is not defined
>>>

Add the following to your ".bash_profile":

export PYTHONSTARTUP="${SPARK_HOME}/python/pyspark/shell.py"

It's working but I'm not sure this is a right approach.

No comments:

Post a Comment