I'm working on upgrading DB2 for zOS from V8 to V9. I completed upgrading it to V9 CM (Compatibility/Conversion Mode) in one of our PROD subsystems about 2 weeks ago and the next business day was horrible. The performance was bad for the entire subsystem. The LPAR CPU peaked to 98% of the total MIPS capacity.
The activity rate against the SPT01 tablespace jumped several times and the CICS regions started sucking 20% more CPU. We noticed a few bad queries through Omegamon and we fixed them by either tweaking the stats or putting some new indexes. But that didn't bring down the CPU consumption much. The interesting part was, our zIIP engine started to peak at 100%, which usually stays at 40%. We found one query that had done a bunch of updates and got stuck - it was using a lot of CPU but was not doing any productive work. We cancelled this thread, but it wouldn't go away and the zIIP engine was still at 100%. Many more transactions/threads started to back log because of lack of CPU availability. The cancelled thread didn't die and still was burning CPU. We ended up cycling DB2 with a MODE(FORCE) command. Still, when DB2 came back, it started recovery of the cancelled thread, so we had to force kill it.
When that thread was at last gone, we looked at the CPU rate, and it was right back at the 98% with the only exception that zIIP processor was at 75% level.
We opened a Sev 1 PMR with IBM. As usual, they asked for dumps, SMF, RMF records etc. [Rant On] I would have thought IBM provided a solution to try for a problem with such a mature product. But instead they were waiting on us to provide docs. [Rant Off]
Anyway, we did some more research and found that, in DB2 V9, copies of packages and plans have been moved from below the 2GB bar to above the bar. This is done through a new zparm called EDM_SKELETON_POOL. The default value for this zparm is too low. So we increased it by 20 times. The system is in a much better shape now.
This is from Omegamon for DB2:
Package Table (PT) Reqs 1005483K
PT Loads 80073250
% of PT Loads from DASD 7.96%
The "% of PT Loads from DASD" should be less than 10% for a healthy system. It was about 55% in our system before increasing the value of EDM_SKELETON_POOL. With any other virtual memory, make sure that there is enough physical memory to support this increase.
After this fix, the CPU looks much better but we are still seeing some runaway transactions or changed accesspath (which are not good in many cases). We're fixing them by rewriting the queries in most cases.
No comments:
Post a Comment