Usually, if you are working on a mainframe issue (DB2 or CICS or MQ or any mainframe stuff), and if you are dealing with a 3rd party vendor, they would ask the maintenance level or patch level or PUT level or RSU level of your system or product. How do you find the RSU/PUT level?
If you install your products through SMP/E, this is how you find this information.
Go to SMP/E panel, and enter SMCSI dataset (global zone data set) and select option 3 and hit enter.
Select option 3 from this screen and hit enter.
Just hit enter again (no input for ZONE NAME).
Now select the region you are interested in. GLOBAL zone will show all the RSU that has been RECEIVED (it may or may not be APPLYed already). TARGET zone will show all the RSU that has been APPLYed.
e.g., this screen shows the last maintenance applied (RSU1103), along with the previous maintenance:
Technical Tips including DB2 z/OS, DB2 LUW, COBOL, JCL, VSAM, CICS, 3rd party vendor tools (BMC, CA, IBM, Macro4, etc.)
Wednesday, October 19, 2011
Saturday, August 27, 2011
DB2 V9 for z/OS - New ZPARM EDM_SKELETON_POOL
I'm working on upgrading DB2 for zOS from V8 to V9. I completed upgrading it to V9 CM (Compatibility/Conversion Mode) in one of our PROD subsystems about 2 weeks ago and the next business day was horrible. The performance was bad for the entire subsystem. The LPAR CPU peaked to 98% of the total MIPS capacity.
The activity rate against the SPT01 tablespace jumped several times and the CICS regions started sucking 20% more CPU. We noticed a few bad queries through Omegamon and we fixed them by either tweaking the stats or putting some new indexes. But that didn't bring down the CPU consumption much. The interesting part was, our zIIP engine started to peak at 100%, which usually stays at 40%. We found one query that had done a bunch of updates and got stuck - it was using a lot of CPU but was not doing any productive work. We cancelled this thread, but it wouldn't go away and the zIIP engine was still at 100%. Many more transactions/threads started to back log because of lack of CPU availability. The cancelled thread didn't die and still was burning CPU. We ended up cycling DB2 with a MODE(FORCE) command. Still, when DB2 came back, it started recovery of the cancelled thread, so we had to force kill it.
When that thread was at last gone, we looked at the CPU rate, and it was right back at the 98% with the only exception that zIIP processor was at 75% level.
We opened a Sev 1 PMR with IBM. As usual, they asked for dumps, SMF, RMF records etc. [Rant On] I would have thought IBM provided a solution to try for a problem with such a mature product. But instead they were waiting on us to provide docs. [Rant Off]
Anyway, we did some more research and found that, in DB2 V9, copies of packages and plans have been moved from below the 2GB bar to above the bar. This is done through a new zparm called EDM_SKELETON_POOL. The default value for this zparm is too low. So we increased it by 20 times. The system is in a much better shape now.
This is from Omegamon for DB2:
Package Table (PT) Reqs 1005483K
PT Loads 80073250
% of PT Loads from DASD 7.96%
The "% of PT Loads from DASD" should be less than 10% for a healthy system. It was about 55% in our system before increasing the value of EDM_SKELETON_POOL. With any other virtual memory, make sure that there is enough physical memory to support this increase.
After this fix, the CPU looks much better but we are still seeing some runaway transactions or changed accesspath (which are not good in many cases). We're fixing them by rewriting the queries in most cases.
The activity rate against the SPT01 tablespace jumped several times and the CICS regions started sucking 20% more CPU. We noticed a few bad queries through Omegamon and we fixed them by either tweaking the stats or putting some new indexes. But that didn't bring down the CPU consumption much. The interesting part was, our zIIP engine started to peak at 100%, which usually stays at 40%. We found one query that had done a bunch of updates and got stuck - it was using a lot of CPU but was not doing any productive work. We cancelled this thread, but it wouldn't go away and the zIIP engine was still at 100%. Many more transactions/threads started to back log because of lack of CPU availability. The cancelled thread didn't die and still was burning CPU. We ended up cycling DB2 with a MODE(FORCE) command. Still, when DB2 came back, it started recovery of the cancelled thread, so we had to force kill it.
When that thread was at last gone, we looked at the CPU rate, and it was right back at the 98% with the only exception that zIIP processor was at 75% level.
We opened a Sev 1 PMR with IBM. As usual, they asked for dumps, SMF, RMF records etc. [Rant On] I would have thought IBM provided a solution to try for a problem with such a mature product. But instead they were waiting on us to provide docs. [Rant Off]
Anyway, we did some more research and found that, in DB2 V9, copies of packages and plans have been moved from below the 2GB bar to above the bar. This is done through a new zparm called EDM_SKELETON_POOL. The default value for this zparm is too low. So we increased it by 20 times. The system is in a much better shape now.
This is from Omegamon for DB2:
Package Table (PT) Reqs 1005483K
PT Loads 80073250
% of PT Loads from DASD 7.96%
The "% of PT Loads from DASD" should be less than 10% for a healthy system. It was about 55% in our system before increasing the value of EDM_SKELETON_POOL. With any other virtual memory, make sure that there is enough physical memory to support this increase.
After this fix, the CPU looks much better but we are still seeing some runaway transactions or changed accesspath (which are not good in many cases). We're fixing them by rewriting the queries in most cases.
Labels:
CPU,
DB2,
DB2 V9,
DB2 z/OS,
EDM_SKELETON_POOL,
LPAR,
Omegamon,
Package Table,
upgrade,
zparm
Saturday, April 16, 2011
Accessing DB2 LUW table through a COBOL program on mainframe
If you are planning to write a COBOL program to access data from a table on DB2 LUW, you need to do the right setup first. I've described them here as much as I remember. If you get into any trouble, let me know, we can try to resolve it together.
Things to do on Mainframe DB2
Insert row in SYSIBM.LOCATIONS with these values
Then the DDF address space must be stopped and restarted to pick up the changes made to the SYSIBM.IPNAMES, SYSIBM.LUNAMES.
Things to do on DB2 LUW
You need to update the DataBaseManager configuration which is at the DB2 Instance level. You can display the current settings with this command from a command line on NT or AIX
DB2 GET DBM CFG | more
Part of the output will look like this:
Database manager authentication (AUTHENTICATION) = SERVER
Cataloging allowed without authority (CATALOG_NOAUTH) = NO
Trust all clients (TRUST_ALLCLNTS) = YES
Trusted client authentication (TRUST_CLNTAUTH) = CLIENT
There are 2 parms that must be changed from the default to allow connecting from the host--AUTHENTICATION and TRUST_ALLCLNTS. Use the following commands to change these. The changes don't take effect until the db2 instance is stopped and started, so you may need to coordinate cycling the instance with applications and the server owner(s).
UPDATE DBM CFG USING AUTHENTICATION CLIENT
UPDATE DBM CFG USING TRUST_ALLCLNTS DRDAONLY
To cycle the DB2 instance use DB2STOP After it stops do a DB2START. Its a good idea to do a DB2 LIST APPLICATIONS beforehand to see if there are any threads. You can use a FORCE command if you need to to kill any threads.
On top of the above things, if you are writing a COBOL program on host and trying to connect to DB2 LUW, these steps must also be followed.
1. BIND PACKAGE (luwdbname.collectionid) OWNER(CHGTMANP) ISOLATION(CS) QUALIFIER(ALIUPDP) MEMBER(programname) DYNAMICRULES(RUN) VALIDATE(BIND) SQLERROR(CONTINUE) ACTION(REPLACE)
2. Make sure the above BIND statement doesn't fail on anything other than -204 (especially look for -551s, because this could lead to -525 at run time)
3. Grant DBADM authority to the userid that is specified in the OWNER parameter of your BIND card
4. Add the new "dabase.collection.*" to PKLIST on the plan that you will use to run your cobol program
e.g.,
BIND PLAN(plan name) OWNER(owner) QUALIFIER(qualifier)+
NODEFER(PREPARE) VALIDATE(BIND) ISOLATION(CS)+
CACHESIZE(1024) CURRENTDATA(NO) DEGREE(1)+
SQLRULES(DB2) ACQUIRE(USE) RELEASE(COMMIT)+
EXPLAIN(NO) REOPT(NONE) KEEPDYNAMIC(NO)+
IMMEDWRITE(NO) DBPROTOCOL(DRDA) ENCODING(37)+
DISCONNECT(EXPLICIT)+
PKLIST(collction1.*,+
collection2.*,+
database.collection3.*)+
ENABLE(*)+
ACTION(REPLACE) RETAIN
5. In the COBOL program, make sure the programmers are coding these in the right order:
SET CURRENT PACKAGESET = 'collectionname'
CONNECT TO :database-name
6. Give EXECUTE access to PUBLIC on the package on LUW for the first time it is bound. This should be done for each new package you create
Things to do on Mainframe DB2
Insert row in SYSIBM.LOCATIONS with these values
- LOCATION - dbname <-- This is the DB2 LUW database name
- LINKNAME - linkname <-- This is parent of a row in SYSIBM.IPNAMES or SYSIBM.LUNAMES (For this to work it needs to be the name of the DB on the UDB box, because this is how it determines which DB to connect to if that DB2 instance has more than one DB under it.)
- IBMREQD - N
- PORT - portnum <-- This is the connection port on the UDB box. To find this on AIX do: "cat /etc/services". The end of the file will have the connection port for the DB2 instance. On NT it will be 50000.
- TPN - Leave this empty
- Insert row in SYSIBM.IPNAMES
- LINKNAME - linkname This ties to a row in SYSIBM.LOCATIONS with column LINKNAME set to same
- SECURITY_OUT - A A means Already verified
- USERNAMES -
- IBMREQD - N
- IPADDR - nn.nn.nn.nn This is the IP address. You can find this by doing a ping command from a TSO command line or from a NT command prompt. (e.g. PING EXPLORER)
Then the DDF address space must be stopped and restarted to pick up the changes made to the SYSIBM.IPNAMES, SYSIBM.LUNAMES.
Things to do on DB2 LUW
You need to update the DataBaseManager configuration which is at the DB2 Instance level. You can display the current settings with this command from a command line on NT or AIX
DB2 GET DBM CFG | more
Part of the output will look like this:
Database manager authentication (AUTHENTICATION) = SERVER
Cataloging allowed without authority (CATALOG_NOAUTH) = NO
Trust all clients (TRUST_ALLCLNTS) = YES
Trusted client authentication (TRUST_CLNTAUTH) = CLIENT
There are 2 parms that must be changed from the default to allow connecting from the host--AUTHENTICATION and TRUST_ALLCLNTS. Use the following commands to change these. The changes don't take effect until the db2 instance is stopped and started, so you may need to coordinate cycling the instance with applications and the server owner(s).
UPDATE DBM CFG USING AUTHENTICATION CLIENT
UPDATE DBM CFG USING TRUST_ALLCLNTS DRDAONLY
To cycle the DB2 instance use DB2STOP After it stops do a DB2START. Its a good idea to do a DB2 LIST APPLICATIONS beforehand to see if there are any threads. You can use a FORCE command if you need to to kill any threads.
On top of the above things, if you are writing a COBOL program on host and trying to connect to DB2 LUW, these steps must also be followed.
1. BIND PACKAGE (luwdbname.collectionid) OWNER(CHGTMANP) ISOLATION(CS) QUALIFIER(ALIUPDP) MEMBER(programname) DYNAMICRULES(RUN) VALIDATE(BIND) SQLERROR(CONTINUE) ACTION(REPLACE)
2. Make sure the above BIND statement doesn't fail on anything other than -204 (especially look for -551s, because this could lead to -525 at run time)
3. Grant DBADM authority to the userid that is specified in the OWNER parameter of your BIND card
4. Add the new "dabase.collection.*" to PKLIST on the plan that you will use to run your cobol program
e.g.,
BIND PLAN(plan name) OWNER(owner) QUALIFIER(qualifier)+
NODEFER(PREPARE) VALIDATE(BIND) ISOLATION(CS)+
CACHESIZE(1024) CURRENTDATA(NO) DEGREE(1)+
SQLRULES(DB2) ACQUIRE(USE) RELEASE(COMMIT)+
EXPLAIN(NO) REOPT(NONE) KEEPDYNAMIC(NO)+
IMMEDWRITE(NO) DBPROTOCOL(DRDA) ENCODING(37)+
DISCONNECT(EXPLICIT)+
PKLIST(collction1.*,+
collection2.*,+
database.collection3.*)+
ENABLE(*)+
ACTION(REPLACE) RETAIN
5. In the COBOL program, make sure the programmers are coding these in the right order:
SET CURRENT PACKAGESET = 'collectionname'
CONNECT TO :database-name
6. Give EXECUTE access to PUBLIC on the package on LUW for the first time it is bound. This should be done for each new package you create
Saturday, February 26, 2011
DB2 UDB Partitioned Database - Unique Index
In the absence of the DBA that usually works on DB2 UDB, I tried to create an unique index on a table that is defined on a partitioned database. I realized that it's not very straight forward to do so.
From DB2 manual: "There are performance advantages to creating a table across several database partitions in a partitioned database environment. Creating a table that will be a part of several database partitions is specified when you are creating the table. There is an additional option when creating a table in a partitioned database environment: the distribution key. A distribution key is a key that is part of the definition of a table. It determines the database partition on which each row of data is stored.
You must be careful to select an appropriate distribution key because it cannot be changed later. Furthermore, any unique indexes (and therefore unique or primary keys) must be defined as a superset of the distribution key. That is, if a distribution key is defined, unique keys and primary keys must include all of the same columns as the distribution key (they might have more columns)."
This is where I got totally lost "any unique indexes (and therefore unique or primary keys) must be defined as a superset of the distribution key. That is, if a distribution key is defined, unique keys and primary keys must include all of the same columns as the distribution key (they might have more columns)". I was trying create the unique index on a column which was not the hash key and I was getting an SQL0270N Reason Code 1. After including the hash key along with the other key, it worked.
From DB2 manual: "There are performance advantages to creating a table across several database partitions in a partitioned database environment. Creating a table that will be a part of several database partitions is specified when you are creating the table. There is an additional option when creating a table in a partitioned database environment: the distribution key. A distribution key is a key that is part of the definition of a table. It determines the database partition on which each row of data is stored.
You must be careful to select an appropriate distribution key because it cannot be changed later. Furthermore, any unique indexes (and therefore unique or primary keys) must be defined as a superset of the distribution key. That is, if a distribution key is defined, unique keys and primary keys must include all of the same columns as the distribution key (they might have more columns)."
This is where I got totally lost "any unique indexes (and therefore unique or primary keys) must be defined as a superset of the distribution key. That is, if a distribution key is defined, unique keys and primary keys must include all of the same columns as the distribution key (they might have more columns)". I was trying create the unique index on a column which was not the hash key and I was getting an SQL0270N Reason Code 1. After including the hash key along with the other key, it worked.
Labels:
DB2,
DB2 LUW,
DB2 UDB,
Hash Key,
Partitioned Database,
SQL0270N,
Unique Index
Wednesday, January 12, 2011
DB2 4K space filled up
It was a Friday night and I got a call around 2 AM from Operations people saying, a lot of applications are spewing some error message in DB2 Master address space. I saw these messages in DB**MSTR, which started about 30 minutes ago and still continuing.
I looked up both the reason codes 00C900A5 and 00C90084. The DB2 messages and codes guide said "Explanation: The temporary file with the page size shown in NAME is not available. An error was detected either during the opening of the page set or during the formatting of the header and the space map pages of the pageset.
If the reason code appears in a DSNT500I message issued during the -START DATABASE command, the request fails. In either case, the temporary file function does not allocate any temporary file on the page set."
I checked the 4K datasets DB2CAT.DSNDBC.DSNDB07.DSN4Kxx.I0001.A001. We've 6 of them. I checked the extents on these, and they had spanned for only one extent. So the problem must be something else. I decided to stop and start the 4K tablespaces and issued a STOP against these 6 tablespaces. However, they were in STOPP (Stop Pending) status for a long time. That's when I looked in Omegamon for DB2 and found that there were 4 threads that we running for a long time and had consumed millions of getpages, they were sorting large amounts of data, which means they were using up most of the 4K space. I killed these 4 threads and voila, all the 6 tablespaces stopped. I then issued the START command and started them. They came up without any hiccup and all the error messages stopped.
I looked up both the reason codes 00C900A5 and 00C90084. The DB2 messages and codes guide said "Explanation: The temporary file with the page size shown in NAME is not available. An error was detected either during the opening of the page set or during the formatting of the header and the space map pages of the pageset.
If the reason code appears in a DSNT500I message issued during the -START DATABASE command, the request fails. In either case, the temporary file function does not allocate any temporary file on the page set."
I checked the 4K datasets DB2CAT.DSNDBC.DSNDB07.DSN4Kxx.I0001.A001. We've 6 of them. I checked the extents on these, and they had spanned for only one extent. So the problem must be something else. I decided to stop and start the 4K tablespaces and issued a STOP against these 6 tablespaces. However, they were in STOPP (Stop Pending) status for a long time. That's when I looked in Omegamon for DB2 and found that there were 4 threads that we running for a long time and had consumed millions of getpages, they were sorting large amounts of data, which means they were using up most of the 4K space. I killed these 4 threads and voila, all the 6 tablespaces stopped. I then issued the START command and started them. They came up without any hiccup and all the error messages stopped.
Subscribe to:
Posts (Atom)