PostgreSQL窗口函数分析-低调大师

PostgreSQL窗口函数分析

2019-09-05 889

今天看了一下PostgreSQL row_number的实现过程。之前一直好奇窗口函数是什么，原理是什么，今天稍稍解惑。下面就以row_number为例进行介绍：窗口函数：

窗口函数在一组表行中执行计算，这些表行以某种方式与当前行相关。这与使用聚合函数可以完成的计算类型相当。但是，窗口函数不会导致行被分组到单个输出行，就像非窗口聚合调用一样。相反，行保留其独立的身份。在幕后，窗口功能不仅可以访问查询结果的当前行。

row_number使用示例：

[postgres@shawnpc bin]$ ./psql 
psql (13devel)
Type "help" for help.

postgres=# select row_number() over() as rownum, id from aa;
 rownum | id 
--------+----
      1 |  1
      2 |  2
      3 |  3
      4 |  4
      5 |  5
      6 |  6
      7 |  7
      8 |  8
      9 |  9
     10 | 10
(10 rows)

postgres=#

row_number代码：

/*
 * row_number
 * just increment up from 1 until current partition finishes.
 */
Datum
window_row_number(PG_FUNCTION_ARGS)
{
        WindowObject winobj = PG_WINDOW_OBJECT(); //获取窗口函数内存上下文
        int64           curpos = WinGetCurrentPosition(winobj); //初始化位置

        WinSetMarkPosition(winobj, curpos); //将行号和位置绑定
        PG_RETURN_INT64(curpos + 1); //返回行号
}

看起来似乎非常简单，但是经过调试发现这里和执行计划耦合度很高：设置函数断点：

Breakpoint 1, window_row_number (fcinfo=0x7ffc158cce90) at windowfuncs.c:83
83	{
(gdb) bt
#0  window_row_number (fcinfo=0x7ffc158cce90) at windowfuncs.c:83
#1  0x0000000000632956 in eval_windowfunction (perfuncstate=0x1ca3768, result=0x1ca3738, isnull=0x1ca3750, winstate=0x1ca23e8, 
    winstate=0x1ca23e8) at nodeWindowAgg.c:1056
#2  0x0000000000635174 in ExecWindowAgg (pstate=0x1ca23e8) at nodeWindowAgg.c:2198
#3  0x0000000000605b82 in ExecProcNode (node=0x1ca23e8) at ../../../src/include/executor/executor.h:240
#4  ExecutePlan (execute_once=<optimized out>, dest=0x1c125e8, direction=<optimized out>, numberTuples=0, sendTuples=true, 
    operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1ca23e8, estate=0x1ca21c0) at execMain.c:1648
#5  standard_ExecutorRun (queryDesc=0x1c0eb70, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365
#6  0x000000000074c81b in PortalRunSelect (portal=portal@entry=0x1c52e90, forward=forward@entry=true, count=0, count@entry=9223372036854775807, 
    dest=dest@entry=0x1c125e8) at pquery.c:929
#7  0x000000000074db60 in PortalRun (portal=portal@entry=0x1c52e90, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
    run_once=run_once@entry=true, dest=dest@entry=0x1c125e8, altdest=altdest@entry=0x1c125e8, 
    completionTag=completionTag@entry=0x7ffc158cd7e0 "") at pquery.c:770
#8  0x0000000000749bc6 in exec_simple_query (query_string=0x1becfa0 "select row_number() over() as rownum, id from aa;") at postgres.c:1231
#9  0x000000000074aea2 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1c16f70, dbname=0x1c16e98 "postgres", username=<optimized out>)
    at postgres.c:4256
#10 0x000000000047e579 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4446
#11 BackendStartup (port=0x1c0ee70) at postmaster.c:4137
#12 ServerLoop () at postmaster.c:1704
#13 0x00000000006ddb9d in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1be7bb0) at postmaster.c:1377
#14 0x000000000047f243 in main (argc=3, argv=0x1be7bb0) at main.c:210

从上可知，首先row_number函数执行是在执行计划执行之后进行调用的。首先进入的是ExecutePlan：

static void
ExecutePlan(EState *estate,
                        PlanState *planstate,
                        bool use_parallel_mode,
                        CmdType operation,
                        bool sendTuples,
                        uint64 numberTuples,
                        ScanDirection direction,
                        DestReceiver *dest,
                        bool execute_once)
{
        TupleTableSlot *slot;
        uint64          current_tuple_count;
略

        for (;;)
        {
                /* Reset the per-output-tuple exprcontext */
                ResetPerTupleExprContext(estate);

                /*
                 * Execute the plan and obtain a tuple
                 */
                slot = ExecProcNode(planstate);

略
}

这里调用了ExecProcNode（宏定义，调用了ExecWindowAgg），ExecWindowAgg调用了eval_windowfunction，而正是eval_windowfunction完成了row_number的调用，并且构建了相关数据。通过调试可以发现，多少行数据就会调用多少次row_number。

eval_windowfunction：

/*
 * eval_windowfunction
 *
 * Arguments of window functions are not evaluated here, because a window
 * function can need random access to arbitrary rows in the partition.
 * The window function uses the special WinGetFuncArgInPartition and
 * WinGetFuncArgInFrame functions to evaluate the arguments for the rows
 * it wants.
 */
static void
eval_windowfunction(WindowAggState *winstate, WindowStatePerFunc perfuncstate,
                                        Datum *result, bool *isnull)
{
        LOCAL_FCINFO(fcinfo, FUNC_MAX_ARGS);
        MemoryContext oldContext;

        oldContext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory); //切换至tuple的内存上下文

        /*
         * We don't pass any normal arguments to a window function, but we do pass
         * it the number of arguments, in order to permit window function
         * implementations to support varying numbers of arguments.  The real info
         * goes through the WindowObject, which is passed via fcinfo->context.
         */
        InitFunctionCallInfoData(*fcinfo, &(perfuncstate->flinfo),
                                                         perfuncstate->numArguments,
                                                         perfuncstate->winCollation,
                                                         (void *) perfuncstate->winobj, NULL);//初始化fcinfo，为下面调用函数使用
        /* Just in case, make all the regular argument slots be null */
        for (int argno = 0; argno < perfuncstate->numArguments; argno++)
                fcinfo->args[argno].isnull = true;//见注释
        /* Window functions don't have a current aggregate context, either */
        winstate->curaggcontext = NULL;//见注释

        *result = FunctionCallInvoke(fcinfo);//调用函数
        *isnull = fcinfo->isnull;

        /*
         * Make sure pass-by-ref data is allocated in the appropriate context. (We
         * need this in case the function returns a pointer into some short-lived
         * tuple, as is entirely possible.)
         */
        if (!perfuncstate->resulttypeByVal && !fcinfo->isnull &&
                !MemoryContextContains(CurrentMemoryContext,
                                                           DatumGetPointer(*result)))
                *result = datumCopy(*result,
                                                        perfuncstate->resulttypeByVal,
                                                        perfuncstate->resulttypeLen);
     //见注释

        MemoryContextSwitchTo(oldContext); //切换回原上下文
}

至此分析结束。

微信关注我们

原文链接：https://my.oschina.net/Suregogo/blog/3102449

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

分布式事务的四种解决方案

简述分布式事务指事务的操作位于不同的节点上，需要保证事务的 AICD 特性。例如在下单场景下，库存和订单如果不在同一个节点上，就涉及分布式事务。解决方案在分布式系统中，要实现分布式事务，无外乎那几种解决方案。一、两阶段提交（2PC）两阶段提交（Two-phase Commit，2PC），通过引入协调者（Coordinator）来协调参与者的行为，并最终决定这些参与者是否要真正执行事务。 1. 运行过程 1.1 准备阶段协调者询问参与者事务是否执行成功，参与者发回事务执行结果。 1.2 提交阶段如果事务在每个参与者上都执行成功，事务协调者发送通知让参与者提交事务；否则，协调者发送通知让参与者回滚事务。需要注意的是，在准备阶段，参与者执行了事务，但是还未提交。只有在提交阶段接收到协调者发来的通知后，才进行提交或者回滚。 2. 存在的问题 2.1 同步阻塞所有事务参与者在等待其它参与者响应的时候都处于同步阻塞状态，无法进行其它操作。 2.2 单点问题协调者在 2PC 中起到非常大的作用，发生故障将会造成很大影响。特别是在阶段二发生故障，所有参与者会一直等待状态，无法完成其它...

2019-09-05

699

本篇文章将开启对分布式协调服务zk的学习，目前规划是从理论基础开始逐步到源码解析，深入学习这个在分布式系统中起着至关作用的组件。对于 zk 理论的学习，最重要也是最难的知识点就是 Paxos 算法。所以我们首先学习 Paxos 算法。算法简介 Paxos 算法是莱斯利·兰伯特(Leslie Lamport)1990 年提出的一种基于消息传递的、具有高容错性的一致性算法。Google Chubby 的作者 Mike Burrows 说过，世上只有一种一致性算法，那就是 Paxos，所有其他一致性算法都是 Paxos 算法的不完整版。Paxos 算法是一种公认的晦涩难懂的算法，并且工程实现上也具有很大难度。较有名的 Paxos 工程实现有 Google Chubby、 ZAB、微信的 PhxPaxos 等 Paxos 算法是用于解决什么问题的呢? Paxos 算法要解决的问题是，在分布式系统中如何就某个决议达成一致。 Paxos与拜占庭将军问题拜占庭将军问题是由 Paxos 算法作者莱斯利·兰伯特提出的点对点通信中的基本问题。该问题要说明的含义是，<font colo...

2019-09-05

918

资源下载

更多资源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称，一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集，帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

WebStorm

WebStorm 是jetbrains公司旗下一款JavaScript 开发工具。目前已经被广大中国JS开发者誉为“Web前端开发神器”、“最强大的HTML5编辑器”、“最智能的JavaScript IDE”等。与IntelliJ IDEA同源，继承了IntelliJ IDEA强大的JS部分的功能。