Monday, September 15, 2014

Set up mainframe SLIP trap

Often times, when you are diagnosing a problem on mainframe, if you contact the tech support of the product, you will be asked to set up a SLIP trap.

When you set up SLIP trap, you can indicate what kinds of events you want trapped and the system conditions for the trap, then specify what action the system is to take when the event occurs during the specified conditions. More info on SLIP trap

How do you set up SLIP trap?

Your SLIP trap may look something as simple as this:

SLIP SET,A=SVCD,COMP=0C4,ID=KC01,JOBLIST=(jobname),END

or something as complicated as this:

SLIP SET,IF,ID=slipid,A=SYNCSVCD,ML=1,ASID=(xxxxx),
ASIDLST=(xxxxx,yyyy,zzzz),PVTMOD=(module,00B0),
DATA=((12R?+60E,EQ,54475249,OR,12R?+60E,EQ,E3C7D9C9),
AND,12R?+756,EQ,E4D5D2D5,
AND,12R?+4A8,EQ,C4C9E2E3,AND,12R?+4AC,EQ,E2C5D9E5),END


If the command spreads for 2 lines are less, you can enter it in SDSF - System Command Extension panel:


However, if it spans for more than 2 lines, you cannot enter it in this panel. You need to create a member in SYS1.PARMLIB with the naming convention IEASLPii and save the command in this member. After saving this member, issue this command on SDSF

/T SLIP=ii

This will enable the SLIP trap.

After you set up the trap, run this MVS system command to display it's status:
/RO lpar,D SLIP=slipid

HTH.

Wednesday, August 13, 2014

DFSMS Message ARC1237I - No Space for Migration Copy

This morning I was trying to SMP/E RECEIVE a bunch of PTFs as part of a DB2 tool maintenance. After receiving 20 PTFs, the job went down with a storage space abend IEC032I E37-04 on the SMPPTS dataset.

I noticed that the SMPPTS dataset already had 16 extents. I tried to add some cylinders, but I couldn't. So I tried to change the SMS volume by HMIGRATEing and HRECALLing the dataset. But the HMIG command failed with this message:

ARC1001I hlq.SMPPTS MIGRATE FAILED, RC=0037, REAS=0000
ARC1237I NO SPACE FOR MIGRATION COPY    

I tried migrating the dataset directly to Level 2 by issuing this command:

HMIG / ML2

When I issued this command, it failed with the message:

ARC1001I hlq.SMPPTS MIGRATE FAILED, RC=0080, REAS=0000
ARC1280I MIGRATION FAILED - DATA SET IS IN NEED OF BACKUP

I issued this command to take a DFSMS backup:

HBACKDS

After the backup was complete, I tried to migrated to L2, it worked !!

After the migrate was complete, I recalled the dataset. It went to a different volume with more space and also had only 1 extent filled.

Monday, July 21, 2014

DB2 Performance Improvement tip using COALESCE function

One of the developers at our shop fed his query to an optimizer tool and the optimized query looked kind of weird.

This is the original query:
DELETE FROM TABLE_A
 WHERE COL1 IN (SELECT COL1
                          FROM TABLE_A R
                               INNER JOIN TABLE_B O
                                  ON O.COL2 = R.COL1
                         WHERE O.COL3 != R.COL4)


Query suggested by the optimizer:
DELETE
  FROM TABLE_A
 WHERE EXISTS (SELECT 'X'
                 FROM TABLE_A R
                      INNER JOIN TABLE_B O
                         ON R.COL4 <> COALESCE (O.COL3, O.COL3)
                            AND R.COL1 = COALESCE (O.COL2, O.COL2)
                WHERE COL1 = TABLE_A.COL1
                ORDER BY R.COL4)

The puzzling part here is in the revised query: COALESCE (O.COL3, O.COL3). Why would anyone feed the same column twice to the COALESCE function? It's going to give the same result.

Here is the explanation by IBM though: http://www-01.ibm.com/support/docview.wss?uid=swg21259831

In their own words:

Question
Why might adding a special form of the COALESCE predicate produce cheaper data access plans and result in better performance.
Cause
A "no-op" coalesce() predicate of the form "COALESCE(X, X) = X" introduces an estimation error into the planning of any query using it. Currently the DB2 query compiler doesn't have the capability of dissecting that predicate and determining that all rows actually satisfy it. As a result, the predicate artificially reduces the estimated number of rows coming from some part of a query plan. This smaller row estimate usually reduces the row and cost estimates for the rest of query planning, and sometimes results in a different plan being chosen because relative estimates between different candidate plans have changed.
Why can this do-nothing predicate sometimes improve query performance? The addition of the "no-op" coalesce() predicate introduces an error that masks something else that is preventing optimal performance.

What some performance enhancement tools do is a brute-force test: the tool repeatedly introduces the predicate into different places in a query, operating on different columns, to try to find a case where, by introducing an error, it stumbles onto a better-performing plan. This is also true of a query developer hand-coding the "no-op" predicate into a query. Typically, the developer will have some insight on the data to guide the placement of the predicate.

Using this method to improve query performance is a short-term solution which does not address root cause. It hides potential areas for performance improvements and could have the following implications:
This workaround does not guarantee permanent performance improvements. The DB2 query compiler might eventually handle the predicate better, or it might be affected by other random factors.
Other queries might be affected by the same root cause. The performance of your system in general might be suffering as a result.
Answer
Try to identify and address root cause by determining why the original plan chosen by the query compiler did not perform optimally.

Although the actual estimates in a better-performing plan using "COALESCE(X, X) = X" cannot be trusted, such a plan can still be useful because its "shape" (for example, its join order and access methods) can be used as clues to determine where a problem might lie with the original plan.

Here are some examples of questions that can help to better identify the source of performance issues:
Are the statistics out-of-date?
Should more statistics be collected?
Are there statistical correlations that have not been detected?
Are there other characteristics of the data that are not yet captured or modeled?
Are there hardware problems? (e.g. a disk is misbehaving)
Are there configuration problems?
Are any queries stuck behind a lock?