Principles of Programming: Reliability

Dorothy Young

Published Jan 24, 2019

One area of proper programming practice which I see being neglected is the matter of reliability.

For decades, COBOL programmers (and Assembler programmers alike) have known that one ought to check for success or failure after opening a file. (And there is a feature in some levels of COBOL which, depending upon installation options, will allocate, write to, and then delete a file pertaining to a missing DD statement. Obviously, failure to check for the success or failure can produce incorrect output or failure to observe errors or issues in the code.) Or, one may simply get a S0C4 ABEND as punishment for not checking for OPEN success.

But beyond that, for example, one ought not to use IKJEFT01 for invoking REXXes or CLISTs in production batch, as that entry point of that program does not NECESSARILY return indication of sub-ABEND error. So your REXX or CLIST might be missing or fail and, depending upon the nature of the failure, your job fail to notify the operating system (and the scheduler) of the failure. In a former job, when this was presented to management (with the recommendation to use IKJEFT1A or IKJEFT1B, both of which do return indication of sub-ABEND error) my boss's approach to fixing this was to state that people would be assigned to read the job messages of every job using IKJEFT01. But that is fundamentally against principles of reliability. If detection of errors must depend upon human beings checking for error messages, one can well expect downstream jobs or processes to be triggered inappropriately, and the farther one gets into a problem situation before the problem is discovered, the worse recovery will be, generally speaking.

The modern concept of a "feedback loop" is key here. Wherever feasible, one ought to KNOW that one has had success so far before proceeding with processing.

Another example of improper architecture is that in a former job, they used a data transfer program product which has the ability to respond indicating whether the transmission was or was not successful. But they didn't make use of that. I was told that the only way anyone knew in that shop whether the transmission worked or not was to go and manually check the statistics and messages generated by the product.

This kind of (pardon me) irresponsibility lends itself to elongating problem situations and impacting availability.

I understand that in Java programming, one must take extra steps as well to ensure that issues are handled, such as "try/catch" constructs. Maybe people don't do that as often as they should, and maybe that is why my cell phone misbehaves so very often.... What good is a "smart phone" if the programmers responsible for it do not do their "due diligence"?

Likewise is the failure to use the tools available in a given batch scheduler to NOT trigger follow on jobs if serious errors are encountered. Pretty much every mainframe z/os scheduler has these tools, and use of them can save the programmer sleepless nights.

Early on in my career, I was taught "Defensive Programming," namely, to anticipate potential errors and prepare the code to handle them. THAT, it seems to me, is the right approach.

To view or add a comment, sign in

Principles of Programming: Reliability

Dorothy Young

More articles by Dorothy Young

Explore content categories

More articles by Dorothy Young

Disusting

COBOL

Remember Y2K?

Forward to the Past

Danger Areas for Continuous Delivery

Wrongful Termination and its Aftermath 15 Years Later

Blockchain: a Few Stupid Questions

System Independent Debugging

Assembler, Archaic? COBOL, Obsolete?

The Turn of a Phrase

Explore content categories