Over the last few years there has been a lot of work in the computer industry on making software more reliable. One of the main areas of focus has been improving how programs handle software or hardware errors, and particularly automatic recovery if a problem occurs.
When I got my first programming job after leaving university (more years ago than I like to admit), it was all too easy to build a program which crashed.
One line of code which tried to access an invalid memory address was enough to cause the program to terminate. As a programmer, there was very little you could do to prevent that apart from checking and rechecking every line of your program.
As a user, you were used to programs crashing, and you made sure that you saved files frequently to protect against losing your work. You got used to seeing error boxes from Windows if a program failed, often with opaque error messages like General Protection Fault, and restarting the program or your computer to recover.
Over the intervening years, user expectations have (rightly) risen, and the technology and techniques used to ensure programs keep running have steadily got better. One of the biggest changes has been to improve how programs recover if a problem occurs, so that programs restart automatically on failure without need for user intervention. Software developers still work hard to avoid issues occurring. However, if something does go wrong then automatic recovery options help to minimize the impact, and will often restore behavior without the user being inconvenienced or even noticing.
As an unattended communications service, it is vital that the Zetafax Server keeps running reliably. When we first built Zetafax, we included several safeguards to help ensure that happened, and Zetafax has an enviable reputation for reliability as a result.
To help with reliability and scalability, the Zetafax Server comprises multiple independent Windows applications, including Queue Manager and Device Controllers. These are started and controlled by the System Manager program. While the Zetafax Server is running, the System Manager monitors the state of each of these applications.
Previously if one of the server applications failed, the System Manager would stop the Zetafax Server. This ensures that data is preserved, and prevents the risk of faxes being received with cannot be delivered correctly to the user.
In our latest release of Zetafax, we have extended this behavior, adding an option to restart the Zetafax Server if one of the server applications fails. This is designed for use on very high availability systems, or on systems which are experiencing intermittent critical issues causing program failure.
The option is enabled on the General Configuration Options page in the Zetafax Configuration Program. You can adjust further configuration options to tune behavior if needed, for example to adjust the maximum number of time it will restart in a given period. These can be configured in the SETUP.INI server settings file, and are detailed in file SETUP.RTF, but the default settings should be correct for most sites.
We originally added this feature for a UK Platinum Support customer. Their Zetafax Server was stopping every 2 to 3 weeks. As faxes were critical for their business, they needed to find a way to minimize the impact of that while we worked with them to identify the cause.
Eventually we found the cause was a memory problem on the server computer, and moving the Zetafax Server to a new server computer corrected the issue. In the mean-time, the auto-restart helped to avoid this affecting their business.
Like most insurance policies, I hope you won’t need it, but I think you’ll be grateful for it if you do!
The auto-restart feature is included in version 17.1 of Zetafax, which is available without charge for customers with Software Assurance. For more information about the update, see our What’s New technote, or contact Equisys or your local Zetafax distributor.
The original article can be found here.
To learn more contact us email@example.com