#21 Journey of a Query inside Postgresql
Good day and welcome to yet another blog on query execution and the various stages involved in processing a query inside postgresql. I will try and cover the basics in this article and get to some of the internals involved, if the blog extends a bit then we shall continue it in another post.
So what is the idea behind running a query in postgresql? how does a query travel inside the PG engine? How does postgresql breaks down a complex query or even the simpler ones and gives you the results in a jiffy? Here's how ....
Let's take a look at a generic server\client architecture first.
So an application will have a UI and backend and then it will be asking (authenticating) to postgresql via usual protocols and then we have a server process spawned that will execute the queries for us.
Now moving inside Postgresql, there is a lot happening inside of the engine which processes the query and give the results back to the user, i will not be covering the engine in this blog as that is for a separate blog post(keep an eye on the upcoming blogs). So let's see what our database server communicates from inside:
So the server gets the requests and then processes it with various compute allocations that take place in storage and memory area. Now the intent of this blog to go through the lifecycle of a query when and once the requests comes at the server, now that we have established a basic idea on how we are going to interact with the application and within the database server lets go ahead and take a look at the stages, that query goes through inside PG to process a client request.
The query once sent from the client to db server ends up with postmaster spawning a backend and completing the query in the below mentioned stages:
- Parser
generation of a parse tree from a SQL statement in plain text.
2. Query Analyzer
The analyzer does an analysis of a parse tree and generates a query tree, it does this by checking the semantics of the tree generated by the parser.
3. Rewriter
The rewrite system takes the query tree created by the parser & Analyzer stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies.
4. Planner
The planner/optimizer takes the (rewritten) query tree and creates a query plan that will be the input to the executor. It does so by first creating all possible paths leading to the same result. For example if there is an index on a relation to be scanned, there are two paths for the scan. One possibility is a simple sequential scan and the other possibility is to use the index. Next the cost for the execution of each path is estimated and the cheapest path is chosen. The cheapest path is expanded into a complete plan that the executor can use.
5. Executor
The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan. The executor makes use of the storage system while scanning relations, performs sorts and joins, evaluates qualifications and finally hands back the rows derived.
- How does the Parser work then ?
Well lets take an example of the below mentioned query:
SELECT id, name FROM tbl1 WHERE id >2 ORDER BY name;
Before jumping into the query itself you might want to take a look at the parse tree which has a root node called "SelectStmt" , the definition can be found here: parsenodes.h.
The struct itself looks like the one mentioned below:
typedef struct SelectStmt
{
NodeTag type;
/*
* These fields are used only in "leaf" SelectStmts.
*/
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
/*
* In a "leaf" node representing a VALUES list, the above fields are all
* null, and instead this field is set. Note that the elements of the
* sublists are just expressions, without ResTarget decoration. Also note
* that a list element can be DEFAULT (represented as a SetToDefault
* node), regardless of the context of the VALUES list. It's up to parse
* analysis to reject that where not valid.
*/
List *valuesLists; /* untransformed list of expression lists */
/*
* These fields are used in both "leaf" SelectStmts and upper-level
* SelectStmts.
*/
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
LimitOption limitOption; /* limit type */
List *lockingClause; /* FOR UPDATE (list of LockingClause's) */
WithClause *withClause; /* WITH clause */
/*
* These fields are used only in upper-level SelectStmts.
*/
SetOperation op; /* type of set op */
bool all; /* ALL specified? */
struct SelectStmt *larg; /* left child */
struct SelectStmt *rarg; /* right child */
/* Eventually add fields for CORRESPONDING spec here */
}
SelectStmt;
So what does this mean for a query coming in? Let's looks at the query form parser's perspective, it will see the query in the form of lists and nodes,
select id, name (list1) >>>> these being the target(targetlist) values to be obtained from the query,
from tabl1 (list2) >>>>> (see the source code above("from clause"))
"where caluse" >>>>>> will come under a node, it will also check for expression, ">"
and finally the sort operation will comes under a list as well.
So at the end of parsing the parser will have 3 list and one node element in the query.
Now, the most important point to note is that parser "never" checks for semantics of a query, meaning if a table doesn't exists it will not worry about it but it will surely check the syntax is right or not. Analyzer\Analyser is our guy who checks for the semantics of a query.
2. Analyzer
As mentioned previosuly analyzer will go through the parse tree and check for the semantics basically an analysis of objects existing or not, refer to the source code here:
/*
* Query -
* Parse analysis turns all statements into a Query tree
* for further processing by the rewriter and planner.
*
* Utility statements (i.e. non-optimizable statements) have the
* utilityStmt field set, and the Query itself is mostly dummy.
* DECLARE CURSOR is a special case: it is represented like a SELECT,
* but the original DeclareCursorStmt is stored in utilityStmt.
*
* Planning converts a Query tree into a Plan tree headed by a PlannedStmt
* node --- the Query structure is not used by the executor.
*/
typedef struct Query
{
NodeTag type;
CmdType commandType; /* select|insert|update|delete|utility */
QuerySource querySource; /* where did I come from? */
uint32 queryId; /* query identifier (can be set by plugins) */
bool canSetTag; /* do I set the command result tag? */
Node *utilityStmt; /* non-null if this is DECLARE CURSOR or a non-optimizable statement */
int resultRelation; /* rtable index of target relation for INSERT/UPDATE/DELETE; 0 for SELECT */
bool hasAggs; /* has aggregates in tlist or havingQual */
bool hasWindowFuncs; /* has window functions in tlist */
bool hasSubLinks; /* has subquery SubLink */
bool hasDistinctOn; /* distinctClause is from DISTINCT ON */
bool hasRecursive; /* WITH RECURSIVE was specified */
bool hasModifyingCTE; /* has INSERT/UPDATE/DELETE in WITH */
bool hasForUpdate; /* FOR [KEY] UPDATE/SHARE was specified */
bool hasRowSecurity; /* row security applied? */
List *cteList; /* WITH list (of CommonTableExpr's) */
List *rtable; /* list of range table entries */
FromExpr *jointree; /* table join tree (FROM and WHERE clauses) */
List *targetList; /* target list (of TargetEntry) */
List *withCheckOptions; /* a list of WithCheckOption's */
OnConflictExpr *onConflict; /* ON CONFLICT DO [NOTHING | UPDATE] */
List *returningList; /* return-values list (of TargetEntry) */
List *groupClause; /* a list of SortGroupClause's */
List *groupingSets; /* a list of GroupingSet's if present */
Node *havingQual; /* qualifications applied to groups */
List *windowClause; /* a list of WindowClause's */
List *distinctClause; /* a list of SortGroupClause's */
List *sortClause; /* a list of SortGroupClause's */
Node *limitOffset; /* # of result tuples to skip (int8 expr) */
Node *limitCount; /* # of result tuples to return (int8 expr) */
List *rowMarks; /* a list of RowMarkClause's */
Node *setOperations; /* set-operation tree if this is top level of a UNION/INTERSECT/EXCEPT query */
List *constraintDeps; /* a list of pg_constraint OIDs that the query
depends on to be semantically valid */
}
Query;
I would strongly recommend going through the code above to have a better understanding so you can relate to things going on.
So now a parse tree has come to query analyzer and it will create a query tree. We saw our parse tree had 3 lists and one node. Now the query tree will user the same to check so it will look something like this:
Query:
SELECT id, name FROM tbl1 WHERE id >2 ORDER BY name;
Parse tree:
id, name (List, TargetList)
tbl1(List, "from clause")
where id>2(Node, "whereclause")
order by data(List, sortClause)
Query tree will be similar except for the fact now its checking of the data supplied exists or not and it can throw an error if it doesn't and instead of a NODE element there will be a "FromEXpr" with three list elements(targetList, rtable, sortclause and the jointree)
Query tree:
id, name (List, TargetList , this will have resno, resname, resorigtbl)
tbl1(List, rtable, checks for the relid of the table here(rtekind, relid, subquery(NULL) )
id>2 (FromExpr(see source code), ">", variable check "id", CONST: 2)
order by data(List, sortClause)
If the above doesn't make any sense, you should go through the code again.
3. Rewriter
So the "Rule system" is what transforms the query according to the rules stored in "pg_rules" system catalog(if required). I will not be covering the rule system, if interested please go to the link here .
4, 5. Planner & Executor
So now when our query has travelled all the way to the Planner and executor without any syntax errors, semantic checks we plan it and then execute it.
The query plan when an explain is fired against it, looks like this:
QUERY PLAN --------------------------------------------------------------------- Sort (cost=44.33..45.38 rows=423 width=36) Output: id, name Sort Key: tbl1.name DESC -> Seq Scan on public.tbl1 (cost=0.00..25.88 rows=423 width=36) Output: id, name Filter: (tbl1.id > 2) (6 rows)
The EXPLAIN plan and the plan tree have a direct correlation(duh!). so this plan has a sort node and a seq scan node, which are nothing but the plan nodes on a plan tree(NOT TO BE CONFUSE WITH PARSE TREE). Also, i'm not covering plan tree in this blog, to not a make a movie script out of it.
Therefore the executor scans the table using a seq scan and then sorts to give the results(see the query plan above).
NOTE: When accessing rows\tuples from a table there is the well known mechanism which is called MVCC, to ensure consistency and isolation of transactions.
Executor, uses the buffer manager when it processes the query to read and write to tables & indexes. It also touches temp_buffers, work_mem(this is pre-allocated) and creates temp files if there is a need for it.
So that is a little bit about the life of a query inside postgresql, I have tried not to over extend the blog with details as everything cannot be covered in a single blog for something that has been developed and written over a period of decades. I shall cover cost estimations in my upcoming blogs.
As I always believe learning is a continuous process & there is sheer excitement in anything new that pops up everyday. Feel free to leave comments, feedback that I can learn from and help you guys with what I know.
"To getting to know the unknown is knowledge", Happy Holidays! & keep learning.
Your writing skills are improved, 10 fold! 👏