Tuesday, March 17, 2009

DB2 crash with S04F (DSNJ113E)

This happened a couple of days ago. One of our CICS transactions inserted more than 18 million rows (probably loop) in a DB2 table before someone noticed and cancelled it. DB2 started rolling back the data but was slow, so the DBAs tried to issue the CANCEL THREAD command with NOBACKOUT option (this was in a TEST region) but that didn't work. So they just let it roll back. When there was less than 2 million rows to be rolled back DB2 crashed with the following message (S04F):

DSNJ113E +DB2T DSNJR003 RBA 'C35EF8BC2000' NOT IN
ANY ACTIVE OR ARCHIVE LOG DATA SET. CONNECTION-ID=DB2T,
CORRELATION-ID=003.RCRSC 02, MEMBER-ID=0

The operators restarted DB2, but it went down again after some time. DB2 manual suggested that the log record might be missing from the active/archive log. DBAs ran DSNJU004 utility and found that this RBA was available in a archive log data set which was not in the BSDS. DB2 manual suggested to add the dataset to the BSDS using DSNJU003 utility. But before doing that, the DBAs stopped DB2 and ran DSNJU003 using the parameters

CRESTART CREATE,FORWARD=YES,BACKOUT=NO

to reset checkpoint. Then they stopped DB2 with MODE(FORCE) since it didn't stop after issuing a regular stop DB2 command. But that failed. So they issued the following command to stop IRLM:

F DB2TIRLM,ABEND,NODUMP

After stopping DB2, they ran the DSNJU003 utility again and then started DB2.

Most of these are possible because it was a TEST system. Not sure what would we have done if this had happened in a PROD system.

No comments: