spark3.0-python教程 目录 作者:马育民 • 2022-05-01 10:17 • 阅读:10122 # 基础 1. [spark3.0教程:介绍](https://www.malaoshi.top/show_1IXnL4WH3bt.html "spark3.0教程:介绍") ### 安装python 1. [spark3.0-python教程:安装python3.8](https://www.malaoshi.top/show_1IX3ENLxOVSW.html "spark3.0-python教程:安装python3.8") 2. [python设置pip使用国内安装源](https://www.malaoshi.top/show_1IXfXmuORyN.html "python设置pip使用国内安装源") 3. [centos配置Python pip国内镜像源](https://www.malaoshi.top/show_1IX3Ea4OxRXY.html "centos配置Python pip国内镜像源") ### 部署spark 1. [spark3.0教程:部署模式](https://www.malaoshi.top/show_1IX2N2OfGEhI.html "spark3.0教程:部署模式") 2. [spark3.0教程:下载地址](https://www.malaoshi.top/show_1IX2N2NG9svB.html "spark3.0教程:下载地址") 3. [spark3.0-python教程:local本地模式(单机)安装](https://www.malaoshi.top/show_1IX3EUPfK3iR.html "spark3.0-python教程:local本地模式(单机)安装") 4. [spark3.0-python教程:yarn模式安装(spark-3.0.0-bin-without-hadoop.tgz)](https://www.malaoshi.top/show_1IX3EZaQiGSn.html "spark3.0-python教程:yarn模式安装(spark-3.0.0-bin-without-hadoop.tgz)") 5. [spark3.0-python教程:yarn模式安装(spark-3.0.0-bin-hadoop3.2.tgz)](https://www.malaoshi.top/show_1IX3FUyvxD9G.html "spark3.0-python教程:yarn模式安装(spark-3.0.0-bin-hadoop3.2.tgz)") 5. [spark3.0-python教程:pyspark命令](https://www.malaoshi.top/show_1IX3EWnrOnr0.html "spark3.0-python教程:pyspark命令") 6. [spark3.0-python教程:spark-submit介绍和参数](https://www.malaoshi.top/show_1IX3HBDuKP44.html "spark3.0-python教程:spark-submit介绍和参数") # PySpark类库 ### 准备开发环境 1. [spark3.0-python教程:PySpark库介绍和安装](https://www.malaoshi.top/show_1IX3EaEeqcXt.html "spark3.0-python教程:PySpark库介绍和安装") 2. [spark3.0-python教程:准备开发环境-vscode安装插件](https://www.malaoshi.top/show_1IX3Eaz4jjPX.html "spark3.0-python教程:准备开发环境-vscode安装插件") 3. [spark3.0-python教程:准备开发环境-安装findspark库](https://www.malaoshi.top/show_1IX3EiKK7u68.html "spark3.0-python教程:准备开发环境-安装findspark库") 4. [spark3.0-python教程:SparkContext的作用和创建](https://www.malaoshi.top/show_1IX3Er1IsqHh.html "spark3.0-python教程:SparkContext的作用和创建") 5. [spark3.0-python教程:SparkContext对象读取HDFS文件、本地文件](https://www.malaoshi.top/show_1IX3EtlTNWtv.html "spark3.0-python教程:SparkContext对象读取HDFS文件、本地文件") 6. [spark3.0-python教程:第一个程序-读取HDFS文件,统计词频](https://www.malaoshi.top/show_1IX3ErDqq3kH.html "spark3.0-python教程:第一个程序-读取HDFS文件,统计词频") # SparkSQL 1. [spark3.0教程:SparkSQL介绍](https://www.malaoshi.top/show_1IX3EvrpzOjg.html "spark3.0教程:SparkSQL介绍") 2. [从Shark到SparkSQL,从Hive on Spark到Spark on Hive](https://www.malaoshi.top/show_1IX3F8Jxi6w9.html "从Shark到SparkSQL,从Hive on Spark到Spark on Hive") 3. [spark3.0-python教程:SparkSession作用和创建](https://www.malaoshi.top/show_1IX3FDPaVvTP.html "spark3.0-python教程:SparkSession作用和创建") 4. [spark3.0教程-SparkSQL:DataFrame介绍、与 RDD 的区别](https://www.malaoshi.top/show_1IX3FH9vZvkJ.html "spark3.0教程:DataFrame介绍、与 RDD 的区别") - [spark3.0-python教程-SparkSQL:DataFrame案例](https://www.malaoshi.top/show_1IX3FHlOgCca.html "spark3.0-python教程-SparkSQL:DataFrame案例") 5. [spark3.0-python教程-SparkSQL:SQL方式第一个例子](https://www.malaoshi.top/show_1IX3FIFEcfVy.html "spark3.0-python教程-SparkSQL:SQL方式第一个例子") 6. [spark3.0-python教程-SparkSQL:DSL方式](https://www.malaoshi.top/show_1IX3FULL5XBn.html "spark3.0-python教程-SparkSQL:DSL方式") 7. [spark3.0教程-SparkSQL:数据类型](https://www.malaoshi.top/show_1IX3MFcE7USA.html "spark3.0教程-SparkSQL:数据类型") 8. [spark3.0-python教程-SparkSQL:日期转换-to_date() 方式](https://www.malaoshi.top/show_1IX3MGFiNd8g.html "spark3.0-python教程-SparkSQL:日期转换-to_date() 方式") 9. [spark3.0-python教程-SparkSQL:日期转换-DataFrame.cast() 方式](https://www.malaoshi.top/show_1IX3MGMi1Pc3.html "spark3.0-python教程-SparkSQL:日期转换-DataFrame.cast() 方式") ### 查询 Hive 数据(Spark on Hive) 1. [spark3.0教程:为什么Spark读取Hive数据、读取Hive的方式](https://www.malaoshi.top/show_1IX3DQddYLMK.html "spark3.0教程:为什么Spark读取Hive数据、读取Hive的方式") 2. [spark3.0教程-SparkSQL:Spark on Hive配置](https://www.malaoshi.top/show_1IX3FKi8qTfh.html "spark3.0教程-SparkSQL:Spark on Hive配置") 3. [spark3.0教程-SparkSQL:spark-sql 命令,对 Hive进行操作](https://www.malaoshi.top/show_1IX3FSHQdeUk.html "spark3.0教程-SparkSQL:spark-sql 命令,对 Hive进行操作") 4. [spark3.0-python教程-SparkSQL:pyspark 命令,对 Hive进行操作](https://www.malaoshi.top/show_1IX3FVAn3iNy.html "spark3.0教程-SparkSQL:pyspark 命令,对 Hive进行操作") 5. [spark3.0-python教程-SparkSQL:代码连接 Hive ,使用配置文件的 metastore 服务](https://www.malaoshi.top/show_1IX3FY6Ws0Mg.html "spark3.0-python教程-SparkSQL:代码连接 Hive ,使用配置文件的 metastore 服务") 6. [spark3.0-python教程-SparkSQL:代码连接 Hive ,指定 metastore 服务](https://www.malaoshi.top/show_1IX3FXqcvITE.html "spark3.0-python教程-SparkSQL:代码连接 Hive ,指定 metastore 服务") 7. [spark3.0-python教程-SparkSQL:通过代码,对 Hive 中的表进行操作](https://www.malaoshi.top/show_1IX3FUoNCqJp.html "spark3.0教程-SparkSQL:通过代码,对 Hive 中的表进行操作") 8. [spark3.0-python教程-SparkSQL:默认shuffle 有200个分区,设置分区数(spark.sql.shuffle.partitions)](https://www.malaoshi.top/show_1IX3FeMc8ozc.html "spark3.0-python教程-SparkSQL:默认shuffle 有200个分区,设置分区数(spark.sql.shuffle.partitions)") ### MySQL 1.[ spark3.0-python教程-SparkSQL:查询Hive,将结果保存到MySQL](https://www.malaoshi.top/show_1IX3H4cMRafL.html " spark3.0-python教程-SparkSQL:查询Hive,将结果保存到MySQL") ### 常见错误 1. [pyspark.sql.utils.AnalysisException: Column not found in schema Some(StructType(StructField](https://www.malaoshi.top/show_1IX3H3xnv0b7.html "pyspark.sql.utils.AnalysisException: Column not found in schema Some(StructType(StructField") # RDD算子 1. [spark3.0教程:spark会导致 shuffle 的算子](https://www.malaoshi.top/show_1IX3Ff6TNhOz.html "spark3.0教程:spark会导致 shuffle 的算子") # 其他 1. [spark3.0-python教程:引入第三方库](https://www.malaoshi.top/show_1IX3HAQMcI4P.html "spark3.0-python教程:引入第三方库") # 常见问题 1. [spark WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform](https://www.malaoshi.top/show_1IX3EStt6ZrZ.html "spark WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform") 2. [pyspark报错:py4j.protocol.Py4JError(findspark)](https://www.malaoshi.top/show_1IX3EatKMmD9.html "pyspark报错:py4j.protocol.Py4JError(findspark)") 原文出处:http://malaoshi.top/show_1IX3EZnQEPa3.html