Monday, July 21, 2014

DB2 Performance Improvement tip using COALESCE function

One of the developers at our shop fed his query to an optimizer tool and the optimized query looked kind of weird.

This is the original query:
DELETE FROM TABLE_A
 WHERE COL1 IN (SELECT COL1
                          FROM TABLE_A R
                               INNER JOIN TABLE_B O
                                  ON O.COL2 = R.COL1
                         WHERE O.COL3 != R.COL4)


Query suggested by the optimizer:
DELETE
  FROM TABLE_A
 WHERE EXISTS (SELECT 'X'
                 FROM TABLE_A R
                      INNER JOIN TABLE_B O
                         ON R.COL4 <> COALESCE (O.COL3, O.COL3)
                            AND R.COL1 = COALESCE (O.COL2, O.COL2)
                WHERE COL1 = TABLE_A.COL1
                ORDER BY R.COL4)

The puzzling part here is in the revised query: COALESCE (O.COL3, O.COL3). Why would anyone feed the same column twice to the COALESCE function? It's going to give the same result.

Here is the explanation by IBM though: http://www-01.ibm.com/support/docview.wss?uid=swg21259831

In their own words:

Question
Why might adding a special form of the COALESCE predicate produce cheaper data access plans and result in better performance.
Cause
A "no-op" coalesce() predicate of the form "COALESCE(X, X) = X" introduces an estimation error into the planning of any query using it. Currently the DB2 query compiler doesn't have the capability of dissecting that predicate and determining that all rows actually satisfy it. As a result, the predicate artificially reduces the estimated number of rows coming from some part of a query plan. This smaller row estimate usually reduces the row and cost estimates for the rest of query planning, and sometimes results in a different plan being chosen because relative estimates between different candidate plans have changed.
Why can this do-nothing predicate sometimes improve query performance? The addition of the "no-op" coalesce() predicate introduces an error that masks something else that is preventing optimal performance.

What some performance enhancement tools do is a brute-force test: the tool repeatedly introduces the predicate into different places in a query, operating on different columns, to try to find a case where, by introducing an error, it stumbles onto a better-performing plan. This is also true of a query developer hand-coding the "no-op" predicate into a query. Typically, the developer will have some insight on the data to guide the placement of the predicate.

Using this method to improve query performance is a short-term solution which does not address root cause. It hides potential areas for performance improvements and could have the following implications:
This workaround does not guarantee permanent performance improvements. The DB2 query compiler might eventually handle the predicate better, or it might be affected by other random factors.
Other queries might be affected by the same root cause. The performance of your system in general might be suffering as a result.
Answer
Try to identify and address root cause by determining why the original plan chosen by the query compiler did not perform optimally.

Although the actual estimates in a better-performing plan using "COALESCE(X, X) = X" cannot be trusted, such a plan can still be useful because its "shape" (for example, its join order and access methods) can be used as clues to determine where a problem might lie with the original plan.

Here are some examples of questions that can help to better identify the source of performance issues:
Are the statistics out-of-date?
Should more statistics be collected?
Are there statistical correlations that have not been detected?
Are there other characteristics of the data that are not yet captured or modeled?
Are there hardware problems? (e.g. a disk is misbehaving)
Are there configuration problems?
Are any queries stuck behind a lock?