article

Autopilot: automatic data center management

Online:01 April 2007Publication History

Abstract

Microsoft is rapidly increasing the number of large-scale web services that it operates. Services such as Windows Live Search and Windows Live Mail operate from data centers that contain tens or hundreds of thousands of computers, and it is essential that these data centers function reliably with minimal human intervention. This paper describes the first version of Autopilot, the automatic data center management infrastructure developed within Microsoft over the last few years. Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware. A key assumption underlying Autopilot is that the services built on it must be designed to be manageable. We also therefore outline the best practices adopted by applications that run on Autopilot.

References

  1. Ajmani, S., Liskov, B. and Shrira, L. Modular Software Upgrades for Distributed Systems. 20th European Conference on Object-Oriented Programming, July 2006, 452--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barroso, L. A., Dean, J. and Holzle, U. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brown, A. and D. A. Patterson. Embracing Failure: A Case for Recovery-Oriented Computing (ROC). High Performance Transaction Processing Symposium, October 2001.Google ScholarGoogle Scholar
  4. Candea, G. and Fox, A. Crash-Only Software. 9th Workshop on Hot Topics in Operating Systems, May 2003, 67--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gentzsch, W., Iwano, K., Johnston-Watt, D. Minhas, M. A. and Yousif, M. Self-adaptable autonomic computing systems: an industry view. 16th International Workshop on Database and Expert Systems Applications, August 2005, 201--205.Google ScholarGoogle ScholarCross RefCross Ref
  6. Lamport, L. The Part-Time Parliament. ACM Transactions on Computer Systems 16, 2 (May 1998), 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Microsoft Active Directory for Windows Server 2003. http://www.microsoft.com/windowsserver2003/technologies/directory/activedirectory/default.mspx Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Autopilot: automatic data center management

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        ACM SIGOPS Operating Systems Review cover image
        ACM SIGOPS Operating Systems Review  Volume 41, Issue 2
        Systems work at Microsoft Research
        April 2007
        93 pages
        ISSN:0163-5980
        DOI:10.1145/1243418
        Issue’s Table of Contents

        Copyright © 2007 Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Online: 1 April 2007

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!