skip to main content
article
Free Access

Queueing analysis of fault-tolerant computer systems (extended abstract)

Authors Info & Claims
Published:01 May 1986Publication History
Skip Abstract Section

Abstract

Queueing models provide a useful tool for predicting the performance of many service systems including computer systems, telecommunication systems, computer/communication networks and flexible manufacturing systems. Traditional queueing models predict system performance under the assumption that all service facilities provide failure-free service. It must, however, be acknowledged that service facilities do experience failures and that they get repaired. In recent years, it has been increasingly recognized that this separation of performance and reliability/availability models is no longer adequate.

An exact steady-state queueing analysis of such systems is considered by several authors and is carried out by means of generating functions, supplementary variables, imbedded Markov process and renewal theory, or probabilistic techniques [1,2,7,8]. Another approach is approximate, in which it is assumed that the time to reach the steady-state is much smaller than the times to failures/repairs. Therefore, it is reasonable to associate a performance measure (reward) with each state of the underlying Markov (or semi-Markov) model describing the failure/repair behavior of the system. Each of these performance measures is obtained from the steady-state queueing analysis of the system in the corresponding state [3,5].

Earlier we have developed models to derive the distribution of job completion time in a failure-prone environment [3,4]. In these models, we need to consider a possible loss of work due to the occurrence of a failure, i.e., the interrupted job may be resumed or restarted upon service resumption. Note that the job completion time analysis includes the delays due to failures and repairs. The purpose of this paper [9] is to extend our earlier analysis so as to account for the queueing delays. In effect, we consider an exact queueing analysis of fault-tolerant systems in order to obtain the steady-state distribution and the mean of the number of jobs in the system. In particular, we study a system in which jobs arrive in a Poisson fashion and are serviced according to FCFS discipline. The service requirements of the incoming jobs form a sequence of independent and identically distributed random variables. The failure/repair behaviour of the system is modelled by an irreducible continuous-time Markov chain, which is independent of the number of jobs in the system. Let the state-space be {1,2, …,n}. When the computer system is in state i it delivers service at rate ri ≥ 0. Furthermore, depending on the type of the state, the work done on the job is preserved or lost upon entering that state. The actual time required to complete a job depends in a complex way upon the service requirement of the job and the evolution of the state of the system. Note that even though the service requirements of jobs are independent and identically distributed, the actual times required to complete these jobs are neither independent nor identically distributed, and hence the model cannot be reduced to a standard M/G/1 queue [8]. As loss of work due to failures and interruptions is quite a common phenomenon in fault-tolerant computer systems, the model proposed here is of obvious interest.

Using our earlier results on the distribution of job completion time we set up a queueing model and show that it has the block M/G/1 structure. Queueing models with such a structure have been studied by Neuts, Lucantoni and others [6]. We demonstrate the usefulness of our approach by performing the numerical analysis for a system with two processors subject to failures and repairs.

References

  1. 1 Baceelli, F. and Trivedi, K. S., "Analysis of M/G/2-Standby Redundant System," Performance '88, A. K. Agrawala and S. k. Tripathi (eds.), North Holland, 1983, pp. 457-476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Gaver D. P., "A Waiting Line with Interrupted Service, including Priorities," .7. Royal Statistical Society, Series B-24, 1982, pp. 73-90.Google ScholarGoogle Scholar
  3. 3 Kulkarni, V. G., Nicola, V. F., Trivedi, K. S. and Smith, R. M., "A Unified Model for the Analysis of Job Completion Time and Performability Measures in Fault-Tolerant Systems," Tech. Report CS-1985-13, Dept. of Computer Science, Duke University, N. C., 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Kulkarni, V. G., Nicola, V. F. and Trivedi, K. S., "The Completion Time of a Job on Multi-Mode Systems," in review, jr. of Applied Probability.Google ScholarGoogle Scholar
  5. 5 Meyer, J., "Closed-Form Solution of Performability," {FEE Trans. on Computer#, 3/ol. C-31, No. 7, 1982, pp. 648-657.Google ScholarGoogle Scholar
  6. 6 Neuts, M. F., 'CMatrix Analytic Methods in Queueing Theory," Eeropean Y. of Operationa Re, earth, Vol. 15, 1984, pp. 2-12.Google ScholarGoogle Scholar
  7. 7 Neut,, M.F. and Lucantoni, D.M., "A Markovian Queue with N Servers Subject to Breakdowns and Repairs," Manafement Science, Vol. 25, No. 9, 1979, pp. 849-861.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Nicola, V. F., "A Sing}e Server Queue with Mixed Types of Interruptions," to appear in Aeta Informatiea, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Nico}a, V. F., Kulkarni, V. G. and Trivedi, K. S., "Queueing Analysis of Fault-Tolerant Systems," accepted in Performance '86, to appear in IEEE Trans. on Software Engineering, 1986.Google ScholarGoogle Scholar

Index Terms

  1. Queueing analysis of fault-tolerant computer systems (extended abstract)

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMETRICS Performance Evaluation Review
          ACM SIGMETRICS Performance Evaluation Review  Volume 14, Issue 1
          May 1986
          277 pages
          ISSN:0163-5999
          DOI:10.1145/317531
          Issue’s Table of Contents
          • cover image ACM Conferences
            SIGMETRICS '86/PERFORMANCE '86: Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
            May 1986
            262 pages
            ISBN:0897911849
            DOI:10.1145/317499

          Copyright © 1986 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 1986

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!