Publication Date: 16 February 1996
System Version: GCCS 2.1/Update 4
Web Page Created: 26 March 1996
Setup. None.
That is your agenda for the next 20 minutes or so, so you will begin with the purpose and techniques for saving data.
OBJECTIVE. Without references, explain the JOPES database save/archive process.
Data Save/Archive Techniques. Utilities exist within our applications and operating systems that provide for data saves/archives. The user can select which storage media to use: tapes, floppy disks, and/or hard disks. The selection is usually based on what is available and the purpose of the action. For our purposes, we will use "save" to store data within the system and "archive" when we want to separate the data from the system. TS3 mainframe data protection can be accomplished file-by-file or on the entire database at once. It can use any or all of the available media, such as tapes (reels or cartridge), hard disks, and, in some cases, floppies.
JOPS F60F Module - Load or Save a TPFDD/SRF (Forms Mode Screen). This module provides a forms mode capability for saving TPFDD/SRF files to tape or PERMFILE (hard disk).
JDS Capabilities.
JDS H3 S/F - Offload/Reload OPLAN. This function provides a method for archiving OPLAN data in transaction format. You performed these functions earlier in this class.
JDS H4 S/F - Save and Recover local OPLANs. This function provides a method to save/archive a local OPLAN(s) to file or tape (TD 18-14-1, VOL 4, pg 5-42). This function is restricted to the TDBMs. They must be authorized to access any limited access or close hold OPLANs. It only works for local plans that are not in delete status.
OBJECTIVE. Without references, explain why it is essential to accomplish periodic site database backups.
In another example (when there was more than just two sites involved), the network functional manager was requested to delete a network OPLAN when the plan OPR really wanted it deleted at all sites except theirs. Because they had properly backed up their plan locally, all that had to be done was to wait for the delete to complete and reload it as a local plan. However, the OPLAN got hung in delete status and the transaction could not be completed. The OPR had a suspense and could not wait for the technical personnel to find the hang-up and push the delete job to completion. The FM used the local archive tape copy and loaded the plan locally with a new PID. This got the OPR back into operation quickly while others fixed the network problems. Once the delete transaction completed, the old plan ID was re-initialized to maintain name recognition. The local plan was then merged into the old plan ID.
Potential Site Failures. When a site's IDS database exceeds a fill level of 65 percent in any subfile, the potential for site failure increases. Transaction processing begins to slow until eventually the site refuses to process any other transactions because there is no place to put them. Failures also occur if the site is invaded by some foreign entity, like water in the computer room from cooling pipes, broken air conditioners that build up excessive heat, dust that may cause a disk pack failure, or lightning/static electricity that might cause disk damage. Any of these events could cause a site to fail. Because of this, functional managers should ensure that users do periodic saves, especially of local OPLANs. Remember, both sites will have copies of the networked plans, but only one site concerns itself with the preservation of local plans.
Now that you know what software can be used to create backup copies of the database and why it should be done, you will take a look at the procedures to accomplish a database recovery.
OBJECTIVE. Without references, explain the process used to recover the JOPES database at a failed site.
Rebuilding a Site's Database.
Obtain Backup Database. The next step is to coordinate with the site TDBM to obtain a replacement database. The JNOCC TDBM will need to take the remaining site to a "0" transaction state, so be prepared to lose access to that site as well. The backup site will dump its entire database to tape and prepare it to be handcarried (courier) to the failed site. Electronic transmission of the database is generally not used due to the possible loss of data during the transmission process. During this process, the site functional managers need to coordinate the blocking/removal of local site unique data, like close hold plans, to ensure that only essential required data is made available at the recovering site.
Network OPLAN Upload. Upon receipt of the tapes, the site TDBM will coordinate with the FM to bring the site to zero transactions so the JDSIP can be turned off, if it is not already off due to the failure. They then begin the upload process. This is a TDBM responsibility with the FM assisting as necessary. The FM will keep the users informed as to the progress, and work with them on checking the accuracy of the upload and synchronization status. The TDBM will verify that all backed up transactions held for your site have completed before granting the users access to JOPES.
Local OPLAN Upload. At the same time, the TDBM and FM will plan the local OPLAN database recovery. Hopefully, you will work with the plans that you were astute enough to archive after consultation with the users. Also, the local database from the backup site may come through with data that you do not want. The TDBM will run S/F H6 to delete the sending site's local OPLANs.
Update Complete. The TDBM will monitor the JDSUP/JDSIP to ensure all the backlog transactions that built up during the outage process are completed. The functional managers will stay informed so they can work out synchronization and answer users' questions, e.g., when will the system be operational again?
System On-line. The FM should immediately notify all users that the site is operational again and the database is in full synchronization. The FM would then release an FM teleconference message informing the community at large that the site has rejoined the network.
Note: Do not confuse the JDSIP/JDSUP updates with the batch/TSS jobs that are waiting in the sysout queue. They are both files with jobs to be completed, but the JDSIP file is not arbitrarily emptied/overwritten without a very important reason. Normally, a SYSOUT queue gets emptied by a certain time each day (site unique as to the details) to keep it from getting overloaded. The JDSIP almost never gets its entire file written over, but in some cases, individual actions are traced down and deleted.
Note: The keystrokes required to recover a site are the responsibility of the TDBM. The FM is in the loop to represent operational requirements. The specific steps/commands the TDBM takes are found in the TDBM handbook. For the failed site, generic recovery steps are included in Table 12-1.
RECEIVING SITE DATABASE RECOVERY STEPS | |
---|---|
Step | Action |
1 | Offload local OPLANs - H4 save local OPLANs. |
2 | Terminate JDSUP at the receiving site, if not already down - "NET INFORM JDSUP QUIT." |
3 | Disable JOPES - rename JOPES so no one activates it. |
4 | Process backlogged transactions - ensure all transactions generated at the receiving site have been received at the providing site. |
5 | FTS Network Status Files to receiving site - compare NSF counts for this step to those provided by the providing site. |
6 | Transport database to receiving site. |
7 | Initialize file areas for the new database - clear out the old invalid transactions before installing the new database. |
8 | Unlock database subfiles and initialize the new database - sets up the real world and exercise sections (can be done concurrently). |
9 | Check page ranges - compare data field for range 1 thru 8 page ranges. |
10 | Tape restore of the IDS database - enter new tape numbers of the providing site's tapes . |
11 | Check plan list - use "list" command to check for possible aborts. |
12 | Run analyzer - only if providing site did not run. |
13 | Fix broken chains - only if you ran analyzer. |
14 | Quick copy - copies to backup database. |
15 | Reactivate JOPES - change name back and reformat JDSUP to BCD format. |
16 | Restart JOPES - restart the JDSUP and the JDSIP format. |
17 | Cleanup the database recovery - TDBM or FM - delete the local OPLANs on the receiving sites database that were not initialized at the recovered site - generate the PIN records for the local OPLANs that were originated at this site. |
18 | Reload local OPLANs - H4. |
19 | Screen copy receiving site's NSF - obtain record counts for comparison to number of records provided. |
20 | Verify NSF record counts - should be equal to or greater than. |
21 | Derail (DRL) queue processing - required if site uses DRL queue processing, this ensures JDSUP is out of the system so upload can take place. |
22 | Start JOPES at the receiving site - after database cleanup and local OPLAN reload are complete, the receiving site is ready to process JOPES network transactions. |
23 | Notify ALCON via TLCF(s), as appropriate, that the site has been restored. |
Note: After a site has been restored and the updates are completed, the record counts should match at all sites. If not, there may be dropped data that needs to be reentered. Before going to the trouble of trying to find the missing record(s), S/F HW, (set record counts) can start a job to recheck and update the record count. This is much easier and can save considerable time versus doing a manual record count. Figure 12-1 shows the Set Record Counts Screen.
|
Note: Sometime after the TDBMs complete this process, the FM may want to check the site's synchronization by using S/F HV. If the record counts are off, S/F HW can be run to recheck what you have. If discrepancies occur, further research and repair may be required. HW only requires the entry of the OPLAN you are concerned with or All (left justified) to set all OPLAN record counts. Figure 12-2 shows the Selection Site Data Recovery screen.
|
'ALL' Chain Recovery, Option B. This option allows you to recover an entire set of requirements (ULNs, CINs, or PINs) at one time.
Carrier/Manifest Recovery, Option C. This option allows you to recover up to 12 missing carriers at a time. The plan must be in available status, the 200, 020, and 601 (SM-OPLAN-RECORD) records must exist, and sites must be on current plan distribution. This option no longer has any validity as there are no carriers and manifests in the database.
OPLAN INIT Recovery, Option D. This option allows you to restart the OPLAN init TSM at a site(s) where it did not complete. If the 200, 020, or 055 (SYNC-SITE-REC) records are not at the recovering site, this function will not work. The assumption is made that the site was originally on distribution.
OPLAN Delete Recovery, Option E. This is the same as init recovery, except you restart the delete TISM.
Earlier in this lesson, you reviewed the options available to save and/or archive OPLANs on the database. It seems reasonable that you might use the same or similar options to restore the saved and/or archived data. The commands will be the same with the twist of putting the data back, instead of taking it out.
JOPS F60F Module - Load or Save a TPFDD/SRF (Forms Mode Screen). This module provides a forms mode capability for loading or attaching TPFDD/SRF files to a working file.
JDS Capabilities.
JDS H3 S/F - Offload/Reload OPLAN. This function provides a method to reload an OPLAN from tape (JDS/transaction format) to the JDS database. The plan must have been offloaded by H3 to be reloaded by H3. Local functional managers can reload only local plans, while Network functional managers must reload network plans. To reload, the PID cannot currently exist anywhere in the database. The H3 upload can also be loaded as another PID number.
JDS HK S/F - Load OPLAN. HK builds a local or networked OPLAN from a TPFDD tape or PERMFILE. You must first initialize the plan by S/F H1 before accomplishing S/F HK.
Note: B8 can be used to offload a TPFDD (in JOPS format) for many purposes including saving and archiving. TPFDDs that were downloaded this way must be reloaded by HK, B3, or F60(F). An HK reload requires only a PID and tape number or cat/file string. This load process automatically produces the TPFDD audit report. There is an option to produce the report without uploading the data as well. Simply enter an "x" in the Enter 'X' to Produce Audit Report Only brackets. The audit report processes routine system edit checks, such as, the plan series matching ULN first characters, I's or O's in ULN, etc. For HK to execute, the OPLAN must exist on the system, be in load status, and have no requirement detail records.
JDS HU S/F - Selective Site Data Recovery. This function provides the Network FM with the ability to recover lost data (partial or total) by reloading from an alternate database site. It can be used if the OPLAN recovery process was not perfect. The process steps are:
The Network FM enters the "plan id" or "all" for recovery.
Select the recovery option, ULN/CIN/PIN recovery, all chain recovery, carrier/manifest recovery, OPLAN init recovery, OPLAN delete recovery, or OPLAN network status recovery.
Enter the site identifier for recovery.
If the ULN/CIN/PIN or Carrier/Manifest option is selected, enter the appropriate data.
Now that you are familiar with database backup and recovery procedures, it is time to do a quick review.
Summary. During the past half hour or so, you have become more familiar with the reasons for doing database saves and archives as well as some of the software available to assist in the process. You were also exposed to the procedures used to accomplish a database recovery if a catastrophic failure were to occur.