Lateral Derived Optimization
Documents Lateral Derived Optimization, also referred to as Split Grouping Optimization or Split Materialized Optimization.
MariaDB supports the Lateral Derived optimization, also referred to as Split Grouping Optimization or Split Materialized Optimization.
Description
The optimization use cases are:
The query uses a derived table (or a
VIEW, or a non-recursive CTE).The derived table, view, or CTE has a
GROUP BYoperation as its top-level operation.The query only needs data from a few
GROUP BYgroups.
Example: A VIEW that computes totals for each customer in October:
CREATE VIEW OCT_TOTALS AS
SELECT
customer_id,
SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY
customer_id;Example: A query that does a join with the customer table to get October totals for Customer#1 and Customer#2:
SELECT *
FROM
customer, OCT_TOTALS
WHERE
customer.customer_id=OCT_TOTALS.customer_id AND
customer.customer_name IN ('Customer#1', 'Customer#2')Before Lateral Derived optimization, MariaDB executed the query as follows:
Materialize the view
OCT_TOTALS. This essentially computesOCT_TOTALSfor all customers.Join it with table customer.
The EXPLAIN looked like this:
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| 1 | PRIMARY | customer | range | PRIMARY,name | name | 103 | NULL | 2 | Using where; Using index |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.customer.customer_id | 36 | |
| 2 | DERIVED | orders | index | NULL | o_cust_id | 4 | NULL | 36738 | Using where |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+It is obvious that step #1 is inefficient: We compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here.)
Lateral Derived Optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables. This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.
The query plan looks as follows:
Scan table
customerand findcustomer_idforCustomer#1andCustomer#2.For each
customer_id, compute the October totals, for this specific customer.
The EXPLAIN output looks like this:
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| 1 | PRIMARY | customer | range | PRIMARY,name | name | 103 | NULL | 2 | Using where; Using index |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.customer.customer_id | 2 | |
| 2 | LATERAL DERIVED | orders | ref | o_cust_id | o_cust_id | 4 | test.customer.customer_id | 1 | Using where |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.
In EXPLAIN FORMAT=JSON output, the optimization is shown like this:
...
"table": {
"table_name": "<derived2>",
"access_type": "ref",
...
"materialized": {
"lateral": 1,Note the "lateral": 1 member.
Controlling the Optimization
Lateral Derived is enabled by default. The optimizer makes a cost-based decision whether the optimization should be used.
If you need to disable the optimization, it has an optimizer_switch flag. It can be disabled like this:
SET optimizer_switch='split_materialized=off'It is possible to enable or disable the optimization with an optimizer hint, SPLIT_MATERLIZED or NO_SPLIT_MATERIALIZED.
For example, by default, this table and query makes use of the optimization:
CREATE TABLE t1 ( n1 INT(10) NOT NULL, n2 INT(10) NOT NULL, c1 CHAR(1) NOT NULL, KEY c1 (c1), KEY n1_c1_n2 (n1,c1,n2) ) ENGINE=innodb CHARSET=latin1;
INSERT INTO t1 VALUES (0, 2, 'a'), (1, 3, 'a');
INSERT INTO t1 SELECT seq+1,seq+2,'c' FROM seq_1_to_1000;
ANALYZE TABLE t1;
EXPLAIN SELECT t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1\G*************************** 1\. row ***************************
id: 1
select_type: PRIMARY
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2\. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ref
possible_keys: key0
key: key0
key_len: 8
ref: test.t1.n1,test.t1.n2
rows: 1
Extra:
*************************** 3\. row ***************************
id: 2
select_type: LATERAL DERIVED
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: n1_c1_n2
key_len: 4
ref: test.t1.n1
rows: 1
Extra: Using where; Using indexThe optimization can be disabled like this:
EXPLAIN SELECT /*+ NO_SPLIT_MATERIALIZED(t) */ t1.n1 FROM t1, (SELECT n1, n2 FROM t1 WHERE c1 = 'a' GROUP BY n1) AS t WHERE t.n1 = t1.n1 AND t.n2 = t1.n2 AND c1 = 'a' GROUP BY n1 \G
**************_*_************ 1\. row ************_*_************** id: 1
select_type: PRIMARY
table: t1
type: ref
possible_keys: c1,n1_c1_n2
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesort
************_*_************ 2\. row ************_*_************
id: 1
select_type: PRIMARY
table: <derived2>
type: ref
possible_keys: key0
key: key0
key_len: 8
ref: test.t1.n1,test.t1.n2
rows: 1
Extra:
************_*_************ 3\. row ************_*_************
id: 2
select_type: DERIVED
table: t1
type: ref
possible_keys: c1
key: c1
key_len: 1
ref: const
rows: 2
Extra: Using index condition; Using where; Using temporary; Using filesortReferences
Jira task: MDEV-13369
Commit: b14e2b044b
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?

