hortonworks hadoop conf path:
/etc/hadoop/conf/
hortonworks 没有$HADOOP_HOME这个环境变量, 但有$SPARK_HOME(/usr/hdp/current/spark2-client),
/usr/hdp/current/spark2-client/conf 为其配置目录
Spark logs:
The directory where they are located can be found by looking at your YARN configs (yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix)
如果使用yarn作为spark的master, 可以使用
usage: yarn logs -applicationId <application ID> [OPTIONS]
来看日志
如果觉得 spark-submit 以后显示的INFO日志太多想消除, 可以去$SPARK_HOME的conf目录下把 log4j.properties修改一下, 然后在spark-submit命令中增加一段
--driver-java-options "-Dlog4j.configuration=file:/path/to/log4j.properties" \
来消除屏幕上不要的日志级别比如INFO, 其实这里就是限制了spark driver的执行日志
怎么限制executor的日志呢(yarn logs的显示日志), 也可以在spark-submit里增加参数
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
最终示例(执行命令目录下已经有log4j.properties文件的话, 就不必指定全路径了):
spark-submit --master yarn --deploy-mode client \
--driver-java-options "-Dlog4j.configuration=log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
somescript.py