Abstract
The SSD has been playing a significantly important role in caching systems due to its high performance-to-cost ratio. Since the cache space is typically much smaller than that of the backend storage by one order of magnitude or even more, write density (defined as writes per unit time and space) of the SSD cache is therefore much more intensive than that of HDD storage, which brings about tremendous challenges to the SSD’s lifetime. Meanwhile, under social network workloads, quite a lot writes to the SSD cache are unnecessary. For example, our study on Tencent’s photo caching shows that about 61% of total photos are accessed only once, whereas they are still swapped in and out of the cache. Therefore, if we can predict these kinds of photos proactively and prevent them from entering the cache, we can eliminate unnecessary SSD cache writes and improve cache space utilization.
To cope with the challenge, we put forward a “one-time-access criteria” that is applied to the cache space and further propose a “one-time-access-exclusion” policy. Based on these two techniques, we design a prediction-based classifier to facilitate the policy. Unlike the state-of-the-art history-based predictions, our prediction is non-history oriented, which is challenging to achieve good prediction accuracy. To address this issue, we integrate a decision tree into the classifier, extract social-related information as classifying features, and apply cost-sensitive learning to improve classification precision. Due to these techniques, we attain a prediction accuracy greater than 80%. Experimental results show that the one-time-access-exclusion approach results in outstanding cache performance in most aspects. Take LRU, for instance: applying our approach improves the hit rate by 4.4%, decreases the cache writes by 56.8%, and cuts the average access latency by 5.5%.
- Ethem Alpaydin. 2014. Introduction to Machine Learning. MIT Press, Cambridge, MA.Google Scholar
- Xiao Bai, B. Barla Cambazoglu, and Archie Russell. 2016. Improved caching techniques for large-scale image hosting services. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 639--648.Google Scholar
Digital Library
- Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.Google Scholar
- Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. 1999. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’99), Vol. 1. IEEE, Los Alamitos, CA, 126--134.Google Scholar
Cross Ref
- Li-Pin Chang, Yu-Syun Liu, and Wen-Huei Lin. 2016. Stable greedy: Adaptive garbage collection for durable page-mapping multichannel SSDs. ACM Transactions on Embedded Computing Systems 15, 1 (Jan. 2016), Article 13, 25 pages.Google Scholar
Digital Library
- Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST’11), Vol. 11. 77--90.Google Scholar
Digital Library
- F. H. Chen, M. C. Yang, Y. H. Chang, and T. W. Kuo. 2015. PWL: A progressive wear leveling to minimize data migration overheads for NAND flash devices. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 1209--1212.Google Scholar
- Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. 2015. Dynacache: Dynamic cloud caching. In Proceedings of the 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’15).Google Scholar
- Riley Crane and Didier Sornette. 2008. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences 105, 41 (2008), 15649--15653.Google Scholar
Cross Ref
- Zhaoxia Deng, Lunkai Zhang, Nikita Mishra, Henry Hoffmann, and Frederic T. Chong. 2017. Memory cocktail therapy: A general learning-based framework to optimize dynamic tradeoffs in NVMs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 232--244.Google Scholar
- Assaf Eisenman, Asaf Cidon, Evgenya Pergament, Or Haimovich, Ryan Stutsman, Mohammad Alizadeh, and Sachin Katti. 2019. Flashield: A hybrid key-value cache that controls flash write amplification. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation (NSDI’19). 65--78.Google Scholar
- Charles Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 17. 973--978.Google Scholar
- Eran Gal and Sivan Toledo. 2005. Algorithms and data structures for flash memories. ACM Computing Surveys 37, 2 (2005), 138--163.Google Scholar
Digital Library
- Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih-Fu Chang. 2015. Image popularity prediction in social media using sentiment and context features. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, New York, NY, 907--910.Google Scholar
Digital Library
- Ping Huang, Wenjie Liu, Kun Tang, Xubin He, and Ke Zhou. 2016. ROP: Alleviating refresh overheads via reviving the memory system in frozen cycles. In Proceedings of the 2016 45th International Conference on Parallel Processing (ICPP’16). IEEE, Los Alamitos, CA, 169--178.Google Scholar
Cross Ref
- Ping Huang, Pradeep Subedi, Xubin He, Shuang He, and Ke Zhou. 2014. FlexECC: Partially relaxing ECC of MLC SSD for better cache performance. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 489--500.Google Scholar
Digital Library
- Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An analysis of Facebook photo caching. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 167--181.Google Scholar
- Sai Huang, Qingsong Wei, Dan Feng, Jianxi Chen, and Cheng Chen. 2016. Improving flash-based disk cache with lazy adaptive replacement. ACM Transactions on Storage 12, 2 (2016), 8.Google Scholar
Digital Library
- Song Jiang and Xiaodong Zhang. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review 30, 1 (2002), 31--42.Google Scholar
Digital Library
- Daniel A. Jiménez and Elvira Teran. 2017. Multiperspective reuse prediction. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 436--448.Google Scholar
- Xavier Jimenez, David Novo, and Paolo Ienne. 2014. Wear unleveling: Improving NAND flash lifetime by balancing page endurance. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14), Vol. 14. 47--59.Google Scholar
- Ramakrishna Karedla, J. Spencer Love, and Bradley G. Wherry. 1994. Caching strategies to improve disk system performance. Computer 27, 3 (1994), 38--46.Google Scholar
Digital Library
- Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Proceedings of the 2007 25th International Conference on Computer Design. IEEE, Los Alamitos, CA, 245--250.Google Scholar
Cross Ref
- Mazen Kharbutli and Yan Solihin. 2008. Counter-based cache replacement and bypassing algorithms. IEEE Transactions on Computers 57, 4 (2008), 433--447.Google Scholar
Digital Library
- Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular? In Proceedings of the 23rd International Conference on World Wide Web. ACM, New York, NY, 867--876.Google Scholar
Digital Library
- Ren-Shuo Liu, Chia-Lin Yang, and Wei Wu. 2012. Optimizing NAND flash-based SSDs via retention relaxation. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 11.Google Scholar
Digital Library
- Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST’03). 115--130.Google Scholar
Digital Library
- Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. 2015. Semantic locality and context-based prefetching using reinforcement learning. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 285--297.Google Scholar
Digital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 24--33.Google Scholar
- Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber. 2010. Extending SSD lifetimes with disk-based write caches. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 101--114.Google Scholar
Digital Library
- Gabor Szabo and Bernardo A. Huberman. 2010. Predicting the popularity of online content. Communications of the ACM 53, 8 (2010), 80--88.Google Scholar
Digital Library
- Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. 2015. RIPQ: Advanced photo caching on flash for Facebook. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 373--386.Google Scholar
Digital Library
- Linpeng Tang, Qi Huang, Amit Puntambekar, Ymir Vigfusson, Wyatt Lloyd, and Kai Li. 2017. Popularity prediction of Facebook videos for higher quality streaming. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 111--123.Google Scholar
- Elvira Teran, Zhe Wang, and Daniel A. Jiménez. 2016. Perceptron learning for reuse prediction. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). IEEE, Los Alamitos, CA, 1--12.Google Scholar
- Laszlo A. Belady.1966. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal 5, 2 (1966), 78--101.Google Scholar
Digital Library
- Xiaopeng Fan, Jiannong Cao, Haixia Mao, Weigang Wu, Yubin Zhao, and Chengzhong Xu. 2016. Web access patterns enhancing data access performance of cooperative caching in IMANETs. In Proceedings of the 2016 17th IEEE International Conference on Mobile Data Management (MDM’16), Vol. 1. IEEE, Los Alamitos, CA, 50--59.Google Scholar
Cross Ref
- Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The stretched exponential distribution of Internet media access patterns. In Proceedings of the 27th ACM Symposium on Principles of Distributed Computing. ACM, Los Alamitos, CA, 283--294.Google Scholar
Digital Library
- Rohan Samarasinghe, Yoshihiro Yasutake, and Takaichi Yoshida. 2005. Optimizing the access performance and data freshness of distributed cache objects considering user access pattern. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05), Vol. 2. IEEE, Los Alamitos, CA, 325--328.Google Scholar
Digital Library
- M. Zubair Shafiq, Amir R. Khakpour, and Alex X. Liu. 2016. Characterizing caching workload of a large commercial content delivery network. In Proceedings of the 35th Annual IEEE International Conference on Computer Communications (INFOCOM’16). IEEE, Los Alamitos, CA, 1--9.Google Scholar
- Aditya Sundarrajan, Mingdong Feng, Mangesh Kasbekar, and Ramesh K. Sitaraman. 2017. Footprint descriptors: Theory and practice of cache provisioning in a global CDN. In Proceedings of the 13th International Conference on Emerging Networking Experiments and Technologies. ACM, New York, NY, 55--67.Google Scholar
- Yue Yang and Jianwen Zhu. 2016. Write skew and Zipf distribution: Evidence and implications. ACM Transactions on Storage 12, 4 (2016), 1--19.Google Scholar
Digital Library
- Guanying Wu and Xubin He. 2012. Delta-FTL: Improving SSD lifetime via exploiting content locality. In Proceedings of the 7th ACM European Conference on Computer Systems. ACM, New York, NY, 253--266.Google Scholar
Digital Library
- Qiang Yang, Haining Henry Zhang, and Tianyi Li. 2001. Mining web logs for prediction models in WWW caching and prefetching. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 473--478.Google Scholar
Digital Library
- Rui Ye, Wentao Meng, and Shenggang Wan. 2017. Extending lifetime of SSD in Raid5 systems through a reliable hierarchical cache. In Proceedings of the 2017 International Conference on Networking, Architecture, and Storage (NAS’17). IEEE, Los Alamitos, CA, 1--8.Google Scholar
Cross Ref
- Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. 2015. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1513--1522.Google Scholar
Digital Library
- Ke Zhou, Shaofu Hu, Ping Huang, and Yuhong Zhao. 2017. LX-SSD: Enhancing the lifespan of NAND flash-based memory via recycling invalid pages. In Proceedings of the 2017 IEEE 33rd Symposium on Massive Storage Systems and Technology.Google Scholar
- Ke Zhou, Si Sun, Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, and Tianming Yang. 2019. Improving cache performance for large-scale photo stores via heuristic prefetching scheme. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 2033--2045.Google Scholar
Digital Library
- Ke Zhou, Yu Zhang, Ping Huang, Hua Wang, Yongguang Ji, Bin Cheng, and Ying Liu. 2018. LEA: A lazy eviction algorithm for SSD cache in cloud block storage. In Proceedings of the 2018 IEEE 36th International Conference on Computer Design (ICCD’18). IEEE, Los Alamitos, CA, 569--572.Google Scholar
Cross Ref
Index Terms
Cache What You Need to Cache: Reducing Write Traffic in Cloud Cache via “One-Time-Access-Exclusion” Policy
Recommendations
Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingSSD has been playing a significantly important role in caching systems due to its high performance-to-cost ratio. Since cache space is much smaller than that of the backend storage by one order of magnitude or even more, write density (writes per unit ...
ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersRecently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and designWhile set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...






Comments