网络知识 娱乐 GreenPlum聚合结构解析

GreenPlum聚合结构解析

聚合的执行需要状态描述信息,由AggState结构体管理。该结构体如下:

typedef struct AggStaten{ntScanStatetss;tttt/* its first field is NodeTag */ntListt *aggs;ttt/* all Aggref nodes in targetlist & quals */ntinttttnumaggs;tt/* length of list (could be zero!) */ntinttttnumtrans;tt/* number of pertrans items */ntAggStrategy aggstrategy;t/* strategy mode */ntAggSplittaggsplit;tt/* agg-splitting mode, see nodes.h */ntAggStatePerPhase phase;tt/* pointer to current phase data */ntinttttnumphases;tt/* number of phases (including phase 0) */ntinttttcurrent_phase;t/* current phase number */ntAggStatePerAgg peragg;tt/* per-Aggref information */ntAggStatePerTrans pertrans;t/* per-Trans state information */ntExprContext *hashcontext;t/* econtexts for long-lived data (hashtable) */ntExprContext **aggcontexts;t/* econtexts for long-lived data (per GS) */ntExprContext *tmpcontext;t/* econtext for input expressions */n#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14ntExprContext *curaggcontext; /* currently active aggcontext */ntAggStatePerAgg curperagg;t/* currently active aggregate, if any */n#define FIELDNO_AGGSTATE_CURPERTRANS 16ntAggStatePerTrans curpertrans;t/* currently active trans state, if any */ntboolttinput_done;tt/* indicates end of input */ntboolttagg_done;tt/* indicates completion of Agg scan */ntinttttprojected_set;t/* The last projected grouping set */n#define FIELDNO_AGGSTATE_CURRENT_SET 20ntinttttcurrent_set;t/* The current grouping set being evaluated */ntBitmapset *grouped_cols;t/* grouped cols in current projection */ntListt *all_grouped_cols;t/* list of all grouped cols in DESC order */nt/* These fields are for grouping set phase data */ntinttttmaxsets;tt/* The max number of sets in any phase */ntAggStatePerPhase phases;t/* array of all phases */ntTuplesortstate *sort_in;t/* sorted input to phases > 1 */ntTuplesortstate *sort_out;t/* input is copied here for next phase */ntTupleTableSlot *sort_slot;t/* slot for sort results */nt/* these fields are used in AGG_PLAIN and AGG_SORTED modes: */ntAggStatePerGroup *pergroups;t/* grouping set indexed array of per-groupnttttttttt * pointers */ntHeapTupletgrp_firstTuple; /* copy of first tuple of current group */nt/* these fields are used in AGG_HASHED and AGG_MIXED modes: */ntbooltttable_filled;t/* hash table filled yet? */ntinttttnum_hashes;ntMemoryContextthash_metacxt;t/* memory for hash table itself */ntstruct HashTapeInfo *hash_tapeinfo; /* metadata for spill tapes */ntstruct HashAggSpill *hash_spills; /* HashAggSpill for each grouping set,ntttttttttt exists only during first pass */ntTupleTableSlot *hash_spill_slot; /* slot for reading from spill files */ntListt *hash_batches;t/* hash batches remaining to be processed */ntbooltthash_ever_spilled;t/* ever spilled during this execution? */ntbooltthash_spill_mode;t/* we hit a limit during the current batchnttttttttt and we must not create new groups */ntSizetthash_mem_limit;t/* limit before spilling hash table */ntuint64tthash_ngroups_limit;t/* limit before spilling hash table */ntintttthash_planned_partitions; /* number of partitions plannedntttttttttttfor first pass */ntdoubletthashentrysize;t/* estimate revised during execution */ntSizetthash_mem_peak;t/* peak hash table memory usage */ntuint64tthash_ngroups_current;t/* number of groups currently inntttttttttt memory in all hash tables */ntuint64tthash_disk_used; /* kB of disk space used */ntintttthash_batches_used;t/* batches used during entire execution */nntAggStatePerHash perhash;t/* array of per-hashtable data */ntAggStatePerGroup *hash_pergroup;t/* grouping set indexed array ofntttttttttt * per-group pointers */nnt/* support for evaluation of agg input expressions: */n#define FIELDNO_AGGSTATE_ALL_PERGROUPS 49ntAggStatePerGroup *all_pergroups;t/* array of first ->pergroups, thanntttttttttt * ->hash_pergroup */ntProjectionInfo *combinedproj;t/* projection machinery */nntinttttgroup_id;tt/* GROUP_ID in current projection. This is passedntttttttt * to GroupingSetId expressions, similar to thentttttttt * 'grouped_cols' value. */ntinttttgset_id;nnt/* if input tuple has an AggExprId, save the Attribute Number */ntIndex AggExprId_AttrNum;n} AggState;

他们之间的关系如下图所示:以投影中有聚合操作为例

GreenPlum聚合结构解析

下面分布对AggState中成员进行介绍。

ScanState中存储有聚合算子操作的计划节点描述信息PlanState。PlanState中有投影信息和执行计划树节点。计划节点Plan里的targetlist链表为聚合操作的一些相关信息。比如Aggref,aggref.args链表有针对哪一列进行聚合操作的信息。

AggState中的aggs链表存储有所有聚合操作函数的描述信息,最终aggref指向Plan的targetlist中。

aggstrategy指定聚合模式:有3中:

typedef enum AggStrategyn{n AGG_PLAIN, /* simple agg across all input rows */n AGG_SORTED, /* grouped agg, input must be sorted */n AGG_HASHED, /* grouped agg, use internal hashtable */n AGG_MIXED /* grouped agg, hash and sort both used */n} AggStrategy;

phase:聚合操作中间函数,比如avg的求和函数,的计算步骤。针对最终函数,并未为其进行表达式生成计算步骤,而是在finalize_aggregate函数中直接调用其函数进行计算。

peragg:聚合操作最终计算函数的元数据信息。这是一个数组,描述所有聚合操作的最终计算函数

pertrans:聚合操作中间函数的元数据信息。这也是一个数组。

pergroups:每个中间操作函数的返回值