hive3.1.x 窗口函数:row_number()、rank()、dense_rank()(查询统计 TopN)(student表案例) 作者:马育民 • 2021-04-23 23:56 • 阅读:10192 # 介绍 下面3个函数 搭配 `over (PARTITION by ... order by ...)` 使用 - `row_number()`:为每行增加一个序号,同一分区中,序号从1开始,下一行增加1 - `rank()`:与 `row_number()` 相同,但是,当 `order by `值相同时,**序号相同**,但会占用下一行的序号 - `dense_rank()`:与 `row_number()` 相同,但是,当 `order by `值相同时,**序号相同**,**不会** 占用下一行的序号 ### 应用场景 查询统计 TopN,如:微博热搜、排行榜等 # 例子 ### 建表 ``` create external table student( class string, name string, score int ) row format delimited fields terminated by ',' ``` ### 数据 ``` 李雷,90,java 韩梅梅,85,java lucy,92,android lili,92,android 马云,59,java 马化腾,82,android 扎克伯格,59,java 马斯克,75,android ``` ### 上传 略 ### SQL 根据 `class` 分区,每个区内根据 `score` 进行排序 ``` select class,name,score, row_number() over (PARTITION by class order by score desc) as row_num, rank() over (PARTITION by class order by score desc) as rnk, dense_rank() over (PARTITION by class order by score desc) as dns_rnk from student ``` 执行结果如下: [![](https://www.malaoshi.top/upload/pic/hive/Snipaste_2021-11-24_06-37-35.png)](https://www.malaoshi.top/upload/pic/hive/Snipaste_2021-11-24_06-37-35.png) **解释:** 根据 `class` 分区,每个区内根据 `score` 进行排序 - `row_number()`:不论 `score` 是多少,都逐行递增序号 - `rank()`:当 `score` 相同时, 序号相同,**但会占用下一行序号** - `dense_rank()`:当 `score` 相同时, 序号相同,**不会** 占用下一行序号 ### 要搭配 order by 使用 如果不加 `order by ...`,如下: ``` select class,name,score, row_number() over (PARTITION by class ) as row_num111, rank() over (PARTITION by class ) as rnk, dense_rank() over (PARTITION by class ) as dns_rnk from student ``` 执行如下图: [![](https://www.malaoshi.top/upload/pic/hive/Snipaste_2021-11-24_06-50-39.png)](https://www.malaoshi.top/upload/pic/hive/Snipaste_2021-11-24_06-50-39.png) - row_num:分区内序号没变,与上面相同,正确 - rnk、dns_rnk:**序号都是1,没有意义** # 案例 查询统计每个班级前2名的学生(每个班级只要2名) ### 分析 根据 `class` 分区,根据 `score` 倒序排序,由于 `每个班级只要2名`,所以取 `row_number()` ### SQL ``` with tbl as ( select class,name,score, row_number() over (PARTITION by class order by score desc) as row_num from student ) select class,name,score from tbl where row_num<=2 ``` 执行结果: |class |name|score| |-------|----|-----| |android|lili| 92| |android|lucy| 92| |java |李雷 | 90| |java |韩梅梅 | 85| # 案例2 查询统计每个班级前2名的学生(成绩相同的,不多占用名额) ### 分析 根据 `class` 分区,根据 `score` 倒序排序,由于 `成绩相同的,不多占用名额`,所以取 `row_number()` ### SQL ``` with tbl as ( select class,name,score, dense_rank() over (PARTITION by class order by score desc) as dns_rnk from student ) select * from tbl where dns_rnk<=2 ``` 执行结果: |class |name|score|dns_rnk| |-------|----|-----|-------| |android|lucy| 92| 1| |android|lili| 92| 1| |android|马化腾 | 82| 2| |java |李雷 | 90| 1| |java |韩梅梅 | 85| 2| 原文出处:http://malaoshi.top/show_1IX2Heaf4EtH.html