Abstract
Native Command Queueing (NCQ) is an optimization technology to maximize throughput by reordering requests inside a disk drive. It has been so successful that NCQ has become the standard in SATA 2 protocol specification, and the great majority of disk vendors have adopted it for their recent disks. However, there is a possibility that the technology may lead to an information gap between the OS and a disk drive. A NCQ-enabled disk tries to optimize throughput without realizing the intention of an OS, whereas the OS does its best under the assumption that the disk will do as it is told without specific knowledge regarding the details of the disk mechanism. Let us call this expectation discord, which may cause serious problems such as request starvations or performance anomaly. In this article, we (1) confirm that expectation discord actually occurs in real systems; (2) propose software-level approaches to solve them; and (3) evaluate our mechanism. Experimental results show that our solution is simple, cheap (no special hardware required), portable, and effective.
- Abbott, R. K. and Garcia-Monlina, H. 1990. Scheduling I/O requests with deadlines: A performance evaluation. In Proceedings of the 11th Real-Time Systems Symposium (RTSS). 113--124.Google Scholar
- Bruno, J., Brustoloni, J., Gabber, E., Ozden, B., and Silberschatz, A. 1999. Disk scheduling with quality of service guarantees. In Proceedings of the IEEE International Conference on Microelectronics and Computer Science (ICMCS). IEEE, Los Alamitos, CA.Google Scholar
- Carey, M. J., Jauhari, R., and Livny, M. 1989. Priority in dbms resource scheduling. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB). Google Scholar
Digital Library
- Chen, S., Stankovic, J. A., Kurose, J. F., and Towsley, D. 1991. Performance evaluation of two new disk scheduling algorithms for real-time systems. J. Real-Time Syst. 3, 307--336Google Scholar
Cross Ref
- de Jonge, W., Kaashoek, M. F., and Hsieh, W. C. 1993. The logical disk: A new approach to improving file systems. In Proceedings of the 13th Symposium on Operating Systems Principles. Google Scholar
Digital Library
- Dees, B. 2005. Native command queuing—Advanced performance in desktop storage. IEEE Potentials 24, 4, 4--7.Google Scholar
Cross Ref
- Denehy, T. E., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2002. Bridging the information gap in storage protocol stacks. In Proceedings of the USENIX Annual Technical Conference (USENIX'02). 177--190. Google Scholar
Digital Library
- Ganger, G. R. 2001. Blurring the line between (OSes) and storage devices. Tech. rep. CMU-CS-01-166, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
- Gill, B. S. and Bathen, L. A. D. 2007. Amp: Adaptive multi-stream prefetching in a shared cache. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). 185--198. Google Scholar
Digital Library
- Grimsrud, K. 2007. Sata-io: Features moves sata into smaller form factors. Intel Developer Forum (IDF), Intel Corporation.Google Scholar
- Gulati, A., Merchant, A., Uysal, M., and Varman, P. J. 2007. Efficient and adaptive proportional share I/O scheduling. Tech. rep. HPL-2007-186, HP Laboratories, Palo Alto, CA.Google Scholar
- Gurun, S. and Krintz, C. 2005. Autodvs: An automatic, general-purpose, dynamic clock scheduling system for hand-held devices. In Proceedings of the ACM International Conference on Embedded Software (EMSOFT). ACM, New York. Google Scholar
Digital Library
- Hall, C. and Bonnet, P. 2005. Getting priorities straight: Improving Linux support for database I/O. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB '05). Google Scholar
Digital Library
- Huang, L. and Chiueh, T. 2000. Implementation of a rotation latency sensitive disk scheduler. Tech. rep. ECSL-TR81, SUNY, Stony Brook.Google Scholar
- Huffman, A. 2003. Comparing serial ATA native command queuing (NCQ) and ATA tagged command queuing (TCQ). White paper, Intel Corporation.Google Scholar
- Huffman, A. 2007. Serial ATA advanced host controller interface (AHCI). Specification 1.2, Intel Corporation.Google Scholar
- Intel and Seagate. 2003. Serial ATA native command queuing: An exciting new performance feature for serial ATA. Joint White paper, Intel Corporation and Seagate Technology.Google Scholar
- Iyer, S. and Druschel, P. 2001. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In Proceedings of the Symposium on Operating Systems Principles. 117--130. Google Scholar
Digital Library
- Jacobson, D. M. and Wilkes, J. 1991. Disk scheduling algorithms based on rotational position. Tech. rep. HPL-CSP-91-7rev1, HP Laboratories.Google Scholar
- Jones, S. T., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2006. Antfarm: Tracking processes in a virtual machine environment. In Proceedings of the USENIX Annual Technical Conference (USENIX '06). Google Scholar
Digital Library
- Kaldewey, T., Wong, T. M., Golding, R., Povzner, A., Brandt, S., and Maltzahn, C. 2008. Virtualizing disk performance. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 319--330. Google Scholar
Digital Library
- Katcher, J. 1997. Postmark: A new file system benchmark. Tech. rep. TR3022, Network Appliance, Inc.Google Scholar
- Keeton, K., Patterson, D. A., and Hellerstein, J. M. 1998. A case for intelligent disks (idisks). SIGMOD Record 27, 3. Google Scholar
Digital Library
- Li, C., Shen, K., and Papathanasiou, A. E. 2004. Competitive prefetching for concurrent sequential I/O. In Proceedings of the 1st Workshop on Operating Systems and Architectural Support for the on Demand IT Infrastructure (OASIS '04).Google Scholar
- Li, M., Varki, E., Bhatia, S., and Merchant, A. 2008. Tap: Table-based prefetching for storage caches. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST). 81--96. Google Scholar
Digital Library
- Lumb, C., Schindler, J., Ganger, G. R., Riedel, E., and Nagle, D. F. 2000. Towards higher disk head utilization: Extracting “free” bandwidth from busy disk drives. In Proceedings of the 4th Symposium on Operating System Design and Implementation. Google Scholar
Digital Library
- McWherter, D. T., Schroeder, B., Ailamaki, A., and Harchol-Balter, M. 2004. Priority mechanisms for OLTP and transactional Web applications. In Proceedings of the 20th International Conference on Data Engineering (ICDE '04). Google Scholar
Digital Library
- Mesnier, M., Ganger, G. R., and Riedel, E. 2003. Object-based storage. IEEE Comm. Mag. 41, 8, 84--90. Google Scholar
Digital Library
- Microsoft. 2006. I/O prioritization in Windows Vista. White paper. http://www.microsoft.com/whdc/driver/priorityio.mspx.Google Scholar
- Ongaro, D., Cox, A. L., and Rixner, S. 2008. Scheduling I/O in virtual machine monitors. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '08). Google Scholar
Digital Library
- Panasas. Object storage architecture. White paper. http://www.panasas.com/library.html, Panasas.Google Scholar
- Reuther, L. and Pohlack, M. 2003. Rotational-position-aware real-time disk scheduling using a dynamic active subset (DAS). In Proceedings of the 24th IEEE International Real-Time Systems Symposium. Google Scholar
Digital Library
- Riedel, E., Faloutsos, C., Ganger, G. R., and Nagle, D. F. 2000. Data mining on an OLTP system (nearly) for free. In Proceedings of the ACM SIGMOD International Conference on Measurement of Data. ACM, New York. Google Scholar
Digital Library
- Riedel, E., Gibson, G. A., and Faloutsos, C. 1998. Active storage for large-scale data mining and multimedia. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB '98). Google Scholar
Digital Library
- Ruemmler, C. and Wilkes, J. 1994. An introduction to disk drive modeling. IEEE Computer 27, 17--28. Google Scholar
Digital Library
- SATAIO. 2005. Serial ATA international organization: Serial ATA rev. 2.5 specification. www.sata-io.org.Google Scholar
- SATAIO. 2007. Serial ATA international organization: Serial ATA rev. 2.6 specification. www.sata-io.org.Google Scholar
- Seelam, S., Romero, R., Teller, P., and Buros, W. 2005. Enhancements to Linux I/O scheduling. In Proceedings of the Linux Symposium.Google Scholar
- Seltzer, M., Chen, P., and Ousterhout, J. 1990. Disk scheduling revisited. In Proceedings of the USENIX Winter Technical Conference.Google Scholar
- Shenoy, P. J. and Vin, H. M. 1998. Cello: A disk scheduling framework for next generation operating systems. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Shin, D. I., Yu, Y. J., and Yeom, H. Y. 2007. Shedding light in the black box: Structural modeling of modern disk drives. In Proceedings of the 15th Annual Meeting of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Google Scholar
Digital Library
- Sivathanu, G., Sundararaman, S., and Zadok, E. 2006. Type-safe disks. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). Google Scholar
Digital Library
- Sivathanu, M., Prabhakaran, V., Popovici, F. I., Denehy, T. E., Arpaci-Dussseau, A. C., and Arpaci-Dusseau, R. H. 2003. Semantically-smart disk systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST). 73--88. Google Scholar
Digital Library
- T10:SAM4. 2007. SCSI architecture model - 4 (SAM-4). Specification rev.13b. www.t10.org.Google Scholar
- T10:SBC3. 2007. SCSI block commands - 3 (SBC-3). Specification rev.12. www.t10.org.Google Scholar
- T10:SPC4. 2007. SCSI primary commands - 4 (SPC-4). Specification rev. 11. www.t10.org.Google Scholar
- Talagala, N., Arpaci-Dusseau, R. H., and Patterson, D. 1999. Micro-benchmark based extraction of local and global disk characteristics. Tech. rep. CSD-99-1063, University of California, Berkeley. Google Scholar
Digital Library
- Traeger, A., and Zadok, E. 2008. A nine year study of file system and storage benchmarking. ACM Trans. Storage 4, 2. Google Scholar
Digital Library
- Waldspurger, C. and Weihl, W. 1995. Stride scheduling: Deterministic proportional resource management. Tech. rep. MIT/LCS/TM-528, MIT. Google Scholar
Digital Library
- Wang, R. Y., Anderson, T. E., and Patterson, D. A. 1999. Virtual log based file systems for a programmable disk. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation. Google Scholar
Digital Library
- Wang, Y. 2006. NCQ for power efficiency. White paper, ULINK Technology.Google Scholar
- Won, Y., Chang, H., and Ryu, J. 2006. Intelligent storage: Cross-layer optimization for soft real-time workload. ACM Trans. Storage 2, 3, 255--282. Google Scholar
Digital Library
- Worthington, B. L., Ganger, G. R., and Patt, Y. N. 1994. Scheduling algorithms for modern disk drives. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 241--251. Google Scholar
Digital Library
- Worthington, B. L., Ganger, G. R., Patt, Y. N., and Wilkes, J. 1995. On-line extraction of SCSI disk drive parameters. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Wright, C. P., Joukov, N., Kulkarni, D., Miretskiy, Y., and Zadok, E. 2005. Auto-pilot: A platform for system software benchmarking. In Proceedings of the Annual USENIX Technical Conference, FREENIX Track. Google Scholar
Digital Library
Index Terms
NCQ vs. I/O scheduler: Preventing unexpected misbehaviors
Recommendations
Regional Scheduler: A Region-based High Efficient Solid State Drive Scheduler
CSE '12: Proceedings of the 2012 IEEE 15th International Conference on Computational Science and EngineeringCurrently, the available I/O schedulers in Linux kernel have long been designed and optimized under the default assumption of underlying traditional rotating HDDs. Unsurprisingly, their performance would be sub optimal when working with emerging Solid ...
pCFS vs. PVFS: comparing a highly-available symmetrical parallel cluster file system with an asymmetrical parallel file system
EuroPar'10: Proceedings of the 16th international Euro-Par conference on Parallel processing: Part IpCFS is a highly available parallel, symmetrical (where nodes perform both compute and I/O work) cluster file system that we have designed to run in medium-sized clusters. In this paper, using exactly the same hardware and Linux version across all nodes ...
Performance analysis of an integrated scheduling scheme in the presence of bursty MMPP traffic
Contemporary communication networks are expected to support multimedia applications which require diversified Quality-of-Services (QoS). An integrated scheduling discipline of Priority Queueing (PQ) and Generalized Processor Sharing (GPS), referred to ...






Comments