spark3.0教程目录 作者:马育民 • 2021-03-20 11:26 • 阅读:16678 # 基础 1. [spark3.0教程:介绍](https://www.malaoshi.top/show_1IXnL4WH3bt.html "spark3.0教程:介绍") ### 部署 1. [spark3.0教程:部署模式](https://www.malaoshi.top/show_1IX2N2OfGEhI.html "spark3.0教程:部署模式") 2. [spark3.0教程:下载地址](https://www.malaoshi.top/show_1IX2N2NG9svB.html "spark3.0教程:下载地址") 3. [spark3.0教程:local本地模式(单机)安装](https://www.malaoshi.top/show_1IXnO1VeRl7.html "spark3.0教程:local本地模式(单机)安装") 4. [spark3.0教程:yarn模式部署(spark-3.0.0-bin-without-hadoop)](https://www.malaoshi.top/show_1IXnbu9MnD6.html "spark3.0教程:yarn模式部署(spark-3.0.0-bin-without-hadoop)") - [spark3.0教程:yarn模式部署后-测试提交应用(计算PI)](https://www.malaoshi.top/show_1IX4QLWDDM8m.html "spark3.0教程:yarn模式部署后-测试提交应用(计算PI)") 5. [spark3.0教程:yarn模式部署(spark-3.0.0-bin-hadoop)](https://www.malaoshi.top/show_1IX4QF6kckQ0.html "spark3.0教程:yarn模式部署(spark-3.0.0-bin-hadoop)") - [spark3.0教程:yarn模式部署后-测试提交应用(计算PI)](https://www.malaoshi.top/show_1IX4QLWDDM8m.html "spark3.0教程:yarn模式部署后-测试提交应用(计算PI)") 6. [spark3.0教程:yarn模式两种提交任务方式:yarn client和yarn cluster](https://www.malaoshi.top/show_1IXncHuCyOm.html "spark3.0教程:yarn模式两种提交任务方式:yarn client和yarn cluster") 7. [spark3.0教程:spark-submit介绍和参数](https://www.malaoshi.top/show_1IXnhwPEDg0.html "spark3.0教程:spark-submit介绍和参数") 8. client模式提交任务,访问 web ui(4040端口)常见错误 - [spark3.0 web ui(4040端口)要启动成功后才能访问,否则访问报错](https://www.malaoshi.top/show_1IX30lrv8mSn.html "spark3.0 web ui(4040端口)要启动成功后才能访问,否则报错") - [spark报错:Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.](https://www.malaoshi.top/show_1IX30lvIrcoC.html "spark报错:Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.") 9. [spark3.0教程:配置历史服务](https://www.malaoshi.top/show_1IX2JIsrhe5k.html "spark3.0教程:配置历史服务") 10. [spark3.0教程:spark-shell yarn client提交任务](https://www.malaoshi.top/show_1IX2N3Ye8s1K.html "spark3.0教程:spark-shell yarn client提交任务") 11. [spark3.0教程:SparkContext](https://www.malaoshi.top/show_1IX1IoD545lJ.html "spark3.0教程:SparkContext") - [spark3.0教程:SparkContext对象读取HDFS文件、本地文件](https://www.malaoshi.top/show_1IX3EtomRY83.html "spark3.0教程:SparkContext对象读取HDFS文件、本地文件") 12. [spark3.0教程:架构(Driver、Cluster Manager、Worker、Executor、Task、SparkContext)](https://www.malaoshi.top/show_1IX2N4JrcMrl.html "spark3.0教程:架构(Driver、Cluster Manager、Worker、Executor、Task、SparkContext)") 13. [spark3.0教程:三大数据结构(RDD算子、累加器、广播变量)](https://www.malaoshi.top/show_1IX1HUFYtkz2.html "spark3.0教程:三大数据结构(RDD算子、累加器、广播变量)") ### 常见问题 1. [spark WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform](https://www.malaoshi.top/show_1IX3EStt6ZrZ.html "spark WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform") # RDD算子 1. [spark3.0教程:RDD算子介绍和五大特性](https://www.malaoshi.top/show_1IX1HcVRLUIg.html "spark3.0教程:RDD算子介绍和五大特性") 2. [spark3.0教程:RDD算子分类:转换算子和行动算子](https://www.malaoshi.top/show_1IX1JWYZB4RI.html "spark3.0教程:RDD算子分类:转换算子和行动算子") 3. [spark3.0教程:第一个程序-WordCount实现](https://www.malaoshi.top/show_1IX1I8nV4jl2.html "spark3.0教程:第一个程序-WordCount实现") 4. [spark3.0教程:sc.textFile源码分析、默认并行度(本地开发环境)](https://www.malaoshi.top/show_1IX4QQYpwoYT.html "spark3.0教程:sc.textFile源码分析、默认并行度(本地开发环境)") 5. [spark3.0教程:RDD并行度和分区方式(通过textFile方式生成的rdd)](https://www.malaoshi.top/show_1IX1J0J944oV.html "spark3.0教程:RDD并行度和分区方式(通过textFile方式生成的rdd)") 6. [spark3.0教程:sc.makeRDD()并行度和分区方式](https://www.malaoshi.top/show_1IX1Inv4d9vX.html "spark3.0教程:sc.makeRDD()并行度和分区方式") ### 行动算子(常用) 1. [spark3.0教程:RDD行动算子collect()](https://www.malaoshi.top/show_1IX1Nwp41MSJ.html "spark3.0教程:RDD行动算子collect()") 11. [spark3.0教程:RDD行动算子foreach()](https://www.malaoshi.top/show_1IX1O4JB4Jjq.html "spark3.0教程:RDD行动算子foreach()") ### 转换算子 单 value 类型,需要1个RDD 2. [spark3.0教程:RDD转换算子map()(读取文件)](https://www.malaoshi.top/show_1IX1Jb4TZGjc.html "spark3.0教程:RDD转换算子map()(读取文件)") 3. [spark3.0教程:多个RDD多个分区执行顺序](https://www.malaoshi.top/show_1IX1Jb2aRmZa.html "spark3.0教程:多个RDD多个分区执行顺序") 4. [spark3.0教程:RDD大部分转换算子分区不变](https://www.malaoshi.top/show_1IX1KLdJt2JS.html "spark3.0教程:RDD大部分转换算子分区不变") 4. [spark3.0教程:RDD转换算子mapPartitions()](https://www.malaoshi.top/show_1IX1Jbo8yod6.html "spark3.0教程:RDD转换算子mapPartitions()") 5. [spark3.0教程:RDD转换算子mapPartitionsWithIndex()](https://www.malaoshi.top/show_1IX1KHGlTemS.html "spark3.0教程:RDD转换算子mapPartitionsWithIndex()") 6. [spark3.0教程:RDD转换算子flatMap()](https://www.malaoshi.top/show_1IX1KIHQguw8.html "spark3.0教程:RDD转换算子flatMap()") 7. [spark3.0教程:RDD转换算子glom()](https://www.malaoshi.top/show_1IX1KKTpVM9r.html "spark3.0教程:RDD转换算子glom()") 8. [spark3.0教程:RDD转换算子groupBy()](https://www.malaoshi.top/show_1IX1KMZjwqnz.html "spark3.0教程:RDD转换算子groupBy()") 9. [spark3.0教程:RDD转换算子filter()](https://www.malaoshi.top/show_1IX1KNPpKcLt.html "spark3.0教程:RDD转换算子filter()") 10. [spark3.0教程:RDD转换算子sample()](https://www.malaoshi.top/show_1IX1KSGY9Vgh.html "spark3.0教程:RDD转换算子sample()") 11. [spark3.0教程:RDD转换算子distinct()](https://www.malaoshi.top/show_1IX1KSRgXGX2.html "spark3.0教程:RDD转换算子distinct()") 12. [spark3.0教程:RDD转换算子coalesce()合并分区](https://www.malaoshi.top/show_1IX1KSrBN9X3.html "spark3.0教程:RDD转换算子coalesce()合并分区") 13. [spark3.0教程:RDD转换算子repartition()重新分区](https://www.malaoshi.top/show_1IX1KczFGS5G.html "spark3.0教程:RDD转换算子repartition()重新分区") 14. [spark3.0教程:RDD转换算子sortBy()排序](https://www.malaoshi.top/show_1IX1KeJFCvZ4.html "spark3.0教程:RDD转换算子sortBy()排序") 双 value 类型,需要2个RDD 15. [spark3.0教程:RDD转换算子intersection交集、union并集、subtract差集](https://www.malaoshi.top/show_1IX1Kf63OTIt.html "spark3.0教程:RDD转换算子intersection交集、union并集、subtract差集") 16. [spark3.0教程:RDD转换算子zip拉链](https://www.malaoshi.top/show_1IX1KfSDHKvu.html "spark3.0教程:RDD转换算子zip拉链") Key-Value 类型 17. [spark3.0教程:RDD转换算子partitionBy重新分区](https://www.malaoshi.top/show_1IX1KiqsqDMN.html "spark3.0教程:RDD转换算子partitionBy重新分区") - [spark partitionBy、coalesce、repartition算子区别](https://www.malaoshi.top/show_1IX318ZhHHPC.html "spark partitionBy、coalesce、repartition算子区别") 18. [spark3.0教程:RDD转换算子reduceByKey](https://www.malaoshi.top/show_1IX1LuTZM6Qg.html "spark3.0教程:RDD转换算子reduceByKey") 19. [spark3.0教程:RDD转换算子groupByKey()](https://www.malaoshi.top/show_1IX1MHHQaYET.html "spark3.0教程:RDD转换算子groupByKey()") 20. [spark3.0教程:RDD转换算子join()](https://www.malaoshi.top/show_1IX1NmrShE1D.html "spark3.0教程:RDD转换算子join()") 21. [spark3.0教程:RDD转换算子join与zip的区别](https://www.malaoshi.top/show_1IX1NntmgEeK.html "spark3.0教程:RDD转换算子join与zip的区别") 22. [spark3.0教程:RDD转换算子leftOuterJoin()](https://www.malaoshi.top/show_1IX1No1kbphZ.html "spark3.0教程:RDD转换算子leftOuterJoin()") 21. [spark3.0教程:RDD转换算子rightOuterJoin()](https://www.malaoshi.top/show_1IX1No5lTtEg.html "spark3.0教程:RDD转换算子rightOuterJoin()") 23. [spark3.0教程:RDD转换算子cogroup()](https://www.malaoshi.top/show_1IX1NoYkgsrj.html "spark3.0教程:RDD转换算子cogroup()") 24. [spark3.0教程:join()和cogroup()区别](https://www.malaoshi.top/show_1IX1NoeeF2z3.html "spark3.0教程:join()和cogroup()区别") 25. [spark3.0教程:RDD转换算子sortByKey()排序](https://www.malaoshi.top/show_1IX1NwK0602J.html "spark3.0教程:RDD转换算子sortByKey()排序") ### 行动算子 1. [spark3.0教程:RDD行动算子collect()](https://www.malaoshi.top/show_1IX1Nwp41MSJ.html "spark3.0教程:RDD行动算子collect()") 11. [spark3.0教程:RDD行动算子foreach()](https://www.malaoshi.top/show_1IX1O4JB4Jjq.html "spark3.0教程:RDD行动算子foreach()") 2. [spark3.0教程:RDD行动算子reduce()](https://www.malaoshi.top/show_1IX1Nz2Jznqz.html "spark3.0教程:RDD行动算子reduce()") 3. [spark3.0教程:RDD行动算子take()](https://www.malaoshi.top/show_1IX1NzEvoomH.html "spark3.0教程:RDD行动算子take()") 4. [spark3.0教程:RDD行动算子takeOrdered()](https://www.malaoshi.top/show_1IX1O0GCy1eb.html "spark3.0教程:RDD行动算子takeOrdered()") 5. [spark3.0教程:RDD行动算子aggregate()](https://www.malaoshi.top/show_1IX1O4vo6Rdy.html "spark3.0教程:RDD行动算子aggregate()") 6. [spark3.0教程:RDD行动算子fold()](https://www.malaoshi.top/show_1IX1O53v8sqG.html "spark3.0教程:RDD行动算子fold()") 6. [spark3.0教程:RDD行动算子countByKey()](https://www.malaoshi.top/show_1IX1O0aRjTGA.html "spark3.0教程:RDD行动算子countByKey()") 5. [spark3.0教程:RDD行动算子countByValue()](https://www.malaoshi.top/show_1IX1O0TKr1G0.html "spark3.0教程:RDD行动算子countByValue()") 9. [spark3.0教程:RDD行动算子count()](https://www.malaoshi.top/show_1IX1Nz5rvPq3.html "spark3.0教程:RDD行动算子count()") 10. [spark3.0教程:RDD行动算子first()](https://www.malaoshi.top/show_1IX1Nz9ZpSz7.html "spark3.0教程:RDD行动算子first()") 保存相关 1. [spark3.0教程:RDD行动算子saveAsTextFile](https://www.malaoshi.top/show_1IX1O3DumCBR.html "spark3.0教程:RDD行动算子saveAsTextFile") 2. [spark3.0教程:RDD行动算子saveAsObjectFile](https://www.malaoshi.top/show_1IX1O3ECVafS.html "spark3.0教程:RDD行动算子saveAsObjectFile") 3. [spark3.0教程:RDD行动算子saveAsSequenceFile](https://www.malaoshi.top/show_1IX1O3Eg7xuT.html "spark3.0教程:RDD行动算子saveAsSequenceFile") ### 总结 1. [spark3.0教程:spark会导致 shuffle 的算子](https://www.malaoshi.top/show_1IX3Ff6TNhOz.html "spark3.0教程:spark会导致 shuffle 的算子") 2. [spark3.0可以改变分区数量的算子](https://www.malaoshi.top/show_1IX4Qi9rd5y4.html "spark3.0可以改变分区数量的算子") 3. [spark3.0教程:分区方式-HashPartitioner、RangePartitioner、自定义分区](https://www.malaoshi.top/show_1IX3GB3C8ffR.html "spark3.0教程:分区方式-HashPartitioner、RangePartitioner、自定义分区") 4. [spark 案例](https://www.malaoshi.top/show_1IX4Ql4U8n9q.html "spark 案例") # 累加器 1. [spark3.0教程:累加器](https://www.malaoshi.top/show_1IX2LyGDbh0K.html "spark3.0教程:累加器") 2. [spark3.0教程:转换算子不会执行累加器,行动算子会执行累加器](https://www.malaoshi.top/show_1IX2LyU5VX3s.html "spark3.0教程:转换算子不会执行累加器,行动算子会执行累加器") # 广播变量 1. [spark3.0教程:广播变量](https://www.malaoshi.top/show_1IX2LzETGvpi.html "spark3.0教程:广播变量") # SparkSQL ### 读取文件中的数据 1. [spark3.0教程:SparkSQL介绍](https://www.malaoshi.top/show_1IX3EvrpzOjg.html "spark3.0教程:SparkSQL介绍") - [从Shark到SparkSQL,从Hive on Spark到Spark on Hive](https://www.malaoshi.top/show_1IX3F8Jxi6w9.html "从Shark到SparkSQL,从Hive on Spark到Spark on Hive") 2. [spark3.0教程:SparkSession作用和创建](https://www.malaoshi.top/show_1IX3FDWSNamc.html "spark3.0教程:SparkSession作用和创建") 3. [spark3.0教程-SparkSQL:DataFrame介绍、与 RDD 的区别](https://www.malaoshi.top/show_1IX3FH9vZvkJ.html "spark3.0教程:DataFrame介绍、与 RDD 的区别") 4. [spark3.0-SparkSQL 第一个idea工程](https://www.malaoshi.top/show_1IX2JNhLzqFZ.html "spark3.0-SparkSQL 第一个idea工程") 5. [spark3.0教程-SparkSQL:默认shuffle 有200个分区,设置分区数(spark.sql.shuffle.partitions)](https://www.malaoshi.top/show_1IX3FehffWrH.html "spark3.0教程-SparkSQL:默认shuffle 有200个分区,设置分区数(spark.sql.shuffle.partitions)") 5. [spark3.0-SparkSQL 全局临时视图和局部临时视图](https://www.malaoshi.top/show_1IX2JOJYpTNj.html "spark3.0-SparkSQL 全局临时视图和局部临时视图") 6. [spark3.0教程-SparkSQL:数据类型](https://www.malaoshi.top/show_1IX3MFcE7USA.html "spark3.0教程-SparkSQL:数据类型") 7. [spark3.0教程-SparkSQL:读取csv文件并指定类型](https://www.malaoshi.top/show_1IX4K7YzJJOQ.html "spark3.0教程-SparkSQL:读取csv文件并指定类型") ### RDD、DataFrame、DataSet 1. [spark3.0教程-SparkSQL:RDD转DataFrame](https://www.malaoshi.top/show_1IX4K2m4KQv8.html "spark3.0教程-SparkSQL:RDD转DataFrame") 2. [spark3.0教程-SparkSQL:DataFrame转RDD](https://www.malaoshi.top/show_1IX4K3ZtyVI3.html "spark3.0教程-SparkSQL:DataFrame转RDD") 3. [spark3.0教程-SparkSQL:Dataset介绍](https://www.malaoshi.top/show_1IX4K3qFGfZ7.html "spark3.0教程-SparkSQL:Dataset介绍") 4. [spark3.0教程-SparkSQL:RDD转Dataset](https://www.malaoshi.top/show_1IX4K4ahJgeS.html "spark3.0教程-SparkSQL:RDD转Dataset") 5. [spark3.0教程-SparkSQL:Dataset转RDD](https://www.malaoshi.top/show_1IX4K88IAX89.html "spark3.0教程-SparkSQL:Dataset转RDD") 6. [spark3.0教程-SparkSQL:DataFrame转Dataset、Dataset转DataFrame](https://www.malaoshi.top/show_1IX4K7xbNaLU.html "spark3.0教程-SparkSQL:DataFrame转Dataset、Dataset转DataFrame") ### 读文件 1. [spark3.0教程-SparkSQL:load() 通用读取文件和parquet 格式文件](https://www.malaoshi.top/show_1IX4KJjnfmYJ.html "spark3.0教程-SparkSQL:load() 通用读取文件和parquet 格式文件") 2. [spark3.0教程-SparkSQL:read.json() 读取json文件](https://www.malaoshi.top/show_1IX4KKKMDqRX.html "spark3.0教程-SparkSQL:read.json() 读取json文件") 3. [spark3.0教程-SparkSQL:不创建表,直接查询(适合json)](https://www.malaoshi.top/show_1IX4KKegSzeQ.html "spark3.0教程-SparkSQL:不创建表,直接查询(适合json)") ### 保存文件 1. [spark3.0教程-SparkSQL:save()通用保存文件和parquet 格式文件](https://www.malaoshi.top/show_1IX4KJwpIBRr.html "spark3.0教程-SparkSQL:save()通用保存文件和parquet 格式文件") 2. [spark3.0教程-SparkSQL:write.csv()保存csv文件](https://www.malaoshi.top/show_1IX4KK4C0d56.html "spark3.0教程-SparkSQL:write.csv()保存csv文件") 3. [spark3.0教程-SparkSQL:write.json()保存json文件](https://www.malaoshi.top/show_1IX4KKUuqyxx.html "spark3.0教程-SparkSQL:write.json()保存json文件") 4. [spark3.0教程-SparkSQL:mode()、SaveMode 保存模式](https://www.malaoshi.top/show_1IX4KKvtHCSw.html "spark3.0教程-SparkSQL:mode()、SaveMode 保存模式") ### SQL 1. [spark3.0教程-SparkSQL:分页](https://www.malaoshi.top/show_1IX4RMIHbMsx.html "spark3.0教程-SparkSQL:分页") ### 用户自定义函数 1. [spark3.0教程-SparkSQL:UDF自定义函数](https://www.malaoshi.top/show_1IX4K8Vpxnd0.html "spark3.0教程-SparkSQL:UDF自定义函数") 2. [spark3.0教程-SparkSQL:UDAF自定义聚合函数](https://www.malaoshi.top/show_1IX4MdtdAhhx.html "spark3.0教程-SparkSQL:UDAF自定义聚合函数") ### 读写MySQL 1. [spark3.0教程-SparkSQL:连接MySQL,读取数据](https://www.malaoshi.top/show_1IX4KLRThFTX.html "spark3.0教程-SparkSQL:连接MySQL,读取数据") 2. [spark3.0教程-SparkSQL:连接MySQL,写入数据](https://www.malaoshi.top/show_1IX4KLWwYS4o.html "spark3.0教程-SparkSQL:连接MySQL,写入数据") ### 读取 Hive 数据 1. [spark3.0教程:为什么Spark读取Hive数据、读取Hive的方式](https://www.malaoshi.top/show_1IX3DQddYLMK.html "spark3.0教程:为什么Spark读取Hive数据、读取Hive的方式") 2. [spark3.0教程-SparkSQL:Spark on Hive配置](https://www.malaoshi.top/show_1IX3FKi8qTfh.html "spark3.0教程-SparkSQL:Spark on Hive配置") 3. [spark3.0教程-SparkSQL:Spark on Hive 启动](https://www.malaoshi.top/show_1IX4KOT84H9U.html "spark3.0教程-SparkSQL:Spark on Hive 启动") 4. [spark3.0教程-SparkSQL:spark-sql 命令,对 Hive进行操作](https://www.malaoshi.top/show_1IX3FSHQdeUk.html "spark3.0教程-SparkSQL:spark-sql 命令,对 Hive进行操作") 5. [spark3.0教程-Spark on Hive:scala代码通过 Spark 查询 Hive中的表](https://www.malaoshi.top/show_1IX4KPSAbmdM.html "spark3.0教程-Spark on Hive:scala代码通过 Spark 查询 Hive中的表") - [spark 读取 hive 报错:tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-](https://www.malaoshi.top/show_1IX4RSSPSpOs.html "spark 读取 hive 报错:tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-") ### DSL 1. [spark3.0-SparkSQL DSL语法](https://www.malaoshi.top/show_1IX4JkomRh21.html "spark3.0-SparkSQL DSL语法") 2. [spark3.0-SparkSQL DSL-第一个idea工程](https://www.malaoshi.top/show_1IX4Jks1Ih1E.html "spark3.0-SparkSQL DSL-第一个idea工程") 3. [spark3.0-SparkSQL DSL-分组查询](https://www.malaoshi.top/show_1IX4RXGHpebt.html "spark3.0-SparkSQL DSL-分组查询") ### Thrift JDBC Server 1. [spark3.0教程:配置、启动Thrift JDBC Server](https://www.malaoshi.top/show_1IX4LcrvfTiW.html "spark3.0教程:配置、启动Thrift JDBC Server") 2. [spark3.0教程:创建启动脚本,启动metastore(元数据)服务,启动thriftserver服务](https://www.malaoshi.top/show_1IX4RqzhqTPn.html "spark3.0教程:创建启动脚本,启动metastore(元数据)服务,启动thriftserver服务") # 其他 1. [spark3.0教程:打包、提交服务器、运行](https://www.malaoshi.top/show_1IX2O4uz1VyD.html "spark3.0教程:打包、提交服务器、运行") 原文出处:http://malaoshi.top/show_1IXnoqdXeRM.html