skip to main content
research-article
Public Access

A Multicore Path to Connectomics-on-Demand

Authors Info & Claims
Published:26 January 2017Publication History
Skip Abstract Section

Abstract

The current design trend in large scale machine learning is to use distributed clusters of CPUs and GPUs with MapReduce-style programming. Some have been led to believe that this type of horizontal scaling can reduce or even eliminate the need for traditional algorithm development, careful parallelization, and performance engineering. This paper is a case study showing the contrary: that the benefits of algorithms, parallelization, and performance engineering, can sometimes be so vast that it is possible to solve "cluster-scale" problems on a single commodity multicore machine.

Connectomics is an emerging area of neurobiology that uses cutting edge machine learning and image processing to extract brain connectivity graphs from electron microscopy images. It has long been assumed that the processing of connectomics data will require mass storage, farms of CPU/GPUs, and will take months (if not years) of processing time. We present a high-throughput connectomics-on-demand system that runs on a multicore machine with less than 100 cores and extracts connectomes at the terabyte per hour pace of modern electron microscopes.

References

  1. 1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(5): 898--916, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Bertrand and Z. Aktouf. Three-dimensional thinning algorithm using subfields. volume 2356, pages 113--124, 1995. doi: 10.1117/12.198601. Google ScholarGoogle ScholarCross RefCross Ref
  3. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM. ISBN 0--89791--700--6. doi: 10.1145/209936.209958. URL http://doi.acm.org/10.1145/209936.209958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Budden, A. Matveev, S. Santurkar, S. R. Chaudhuri, and N. Shavit. Deep tensor convolution on multicores. CoRR, abs/1611.06565, 2016. URL http://arxiv.org/abs/1611.06565.Google ScholarGoogle Scholar
  5. I. Calciu, D. Dice, Y. Lev, V. Luchangco, V. J. Marathe, and N. Shavit. Numa-aware reader-writer locks. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 157--166, New York, NY, USA, 2013. ACM. ISBN 978--1--4503- 1922--5. doi: 10.1145/2442516.2442532. URL http://doi.acm.org/10.1145/2442516.2442532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 571--582, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978--1--931971--16--4. URL http://dl.acm.org/citation.cfm?id=2685048.2685094.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chintala. Convnet benchmarks. https://github.com/soumith/convnet-benchmarks.Google ScholarGoogle Scholar
  8. D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in neural information processing systems, pages 2843--2851, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Corporation. Intel math kernel library, 2016. URL https://en.wikipedia.org/wiki/Math Kernel Library.Google ScholarGoogle Scholar
  11. H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. Geeps: Scalable deep learning on distributed gpus with a gpuspecialized parameter server. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, pages 4:1--4:16, New York, NY, USA, 2016. ACM. ISBN 978--1--4503--4240--7. doi: 10.1145/2901318.2901323. URL http://doi.acm.org/10.1145/2901318.2901323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1): 107--113, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Dice, V. J. Marathe, and N. Shavit. Flat-combining numa locks. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 65--74, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0743--7. doi: 10.1145/1989493.1989502. URL http://doi.acm.org/10.1145/1989493.1989502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Dice, V. J. Marathe, and N. Shavit. Lock cohorting: A general technique for designing numa locks. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 247--256, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1160--1. doi: 10.1145/2145816.2145848. URL http://doi.acm.org/10.1145/2145816.2145848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Eberle, S. Mikula, R. Schalek, J. Lichtman, M. K. TATE, and D. Zeidler. High-resolution, high-throughput imaging with a multibeam scanning electron microscope. Journal of microscopy, 259(2):114--120, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  16. F. Ellen, Y. Lev, V. Luchangco, and M. Moir. Snzi: Scalable nonzero indicators. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 13--22, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--616--5. doi: 10.1145/1281100.1281106. URL http://doi.acm.org/10.1145/1281100.1281106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Feng, T. Zhao, and J. Kim. neuTube 1.0: a New Design for Efficient Neuron Reconstruction Software Based on the SWC Format. eneuro, Jan. 2015. ISSN 2373--2822.Google ScholarGoogle Scholar
  18. M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other cilk++ hyperobjects. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 79--90. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Giusti, D. C. Ciresan, J. Masci, L. M. Gambardella, and J. Schmidhuber. Fast image scanning with deep max-pooling convolutional neural networks. In ICIP, page in press, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  20. Google. Google cloud platform blog: Google supercharges machine learning tasks with tpu custom chip, 2016. URL https://cloudplatform.googleblog.com/2016/05/Googlesupercharges-machine-learning-tasks-with-customchip.html.Google ScholarGoogle Scholar
  21. J. Hauswald, Y. Kang, M. A. Laurenzano, Q. Chen, C. Li, T. Mudge, R. G. Dreslinski, J. Mars, and L. Tang. Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA '15, pages 27--40, New York, NY, USA, 2015. ACM. ISBN 978--1--4503--3402-0. doi:10.1145/2749469.2749472. URL http://doi.acm.org/10.1145/2749469.2749472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. IBM. Introducing a brain-inspired computer, 2016. URL http://www.research.ibm.com/articles/brain-chip.shtml.Google ScholarGoogle Scholar
  23. itseez. Open source computer vision library, 2016. URL http://opencv.org/.Google ScholarGoogle Scholar
  24. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675--678. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Kaler, W. Hasenplaugh, T. B. Schardl, and C. E. Leiserson. Executing dynamic data-graph computations deterministically using chromatic scheduling. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 154--165, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2821-0. doi: 10.1145/2612669.2612673. URL http://doi.acm.org/10.1145/2612669.2612673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Kaler, W. Hasenplaugh, T. B. Schardl, and C. E. Leiserson. Executing dynamic data-graph computations deterministically using chromatic scheduling. ACM Trans. Parallel Comput., 3(1):2:1--2:31, July 2016. ISSN 2329--4949. doi:10.1145/2896850. URL http://doi.acm.org/10.1145/2896850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Kasthuri, K. Hayworth, J. C. Tapia, R. Schalek, S. Nundy, and J. W. Lichtman. The brain on tape: Imaging an ultra-thin section library (utsl). In Soc. Neurosci. Abstr, 2009.Google ScholarGoogle Scholar
  28. N. Kasthuri, K. J. Hayworth, D. R. Berger, R. L. Schalek, J. A. Conchello, S. Knowles-Barley, D. Lee, A. Vazquez-Reina, V. Kaynig, T. R. Jones, et al. Saturated reconstruction of a volume of neocortex. Cell, 162(3):648--661, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  29. V. Kaynig, A. Vazquez-Reina, S. Knowles-Barley, M. Roberts, T. R. Jones, N. Kasthuri, E. Miller, J. Lichtman, and H. Pfister. Large-scale automatic reconstruction of neuronal processes from electron microscopy images. Medical image analysis, 22(1):77--88, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Knowles-Barley. Rhoana git. https://github.com/Rhoana/membrane cnn/tree/master/maxout.Google ScholarGoogle Scholar
  31. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Lee, A. Zlateski, V. Ashwin, and H. S. Seung. Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Prediction. In Advances in Neural Information Processing Systems, pages 3559--3567, 2015.Google ScholarGoogle Scholar
  33. V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 451--460, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0053--7. doi: 10.1145/1815961.1816021. URL http://doi.acm.org/10.1145/1815961.1816021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W.-C. A. Lee, V. Bonin, M. Reed, B. J. Graham, G. Hood, K. Glattfelder, and R. C. Reid. Anatomy and function of an excitatory network in the visual cortex. Nature, 532(7599): 370--374, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  35. C. E. Leiserson. The cilk++ concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Li, A. Kadav, E. Kruus, and C. Ungureanu. Malt: Distributed data-parallelism for existing ML applications. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 3:1--3:16, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3238-5. doi:10.1145/2741948.2741965. URL http://doi.acm.org/10.1145/2741948.2741965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Li and Z. Lan. Exploit failure prediction for adaptive fault-tolerance in cluster computing. In Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, volume 1, pages 8--pp. IEEE, 2006.Google ScholarGoogle Scholar
  38. J. W. Lichtman and W. Denk. The big and the small: challenges of imaging the brains circuits. Science, 334(6056):618--623, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  39. J. W. Lichtman and J. R. Sanes. Ome sweet ome: what can the genome tell us about the connectome? Current Opinion in Neurobiology, 18(3):346--353, June 2008. ISSN 0959--4388. doi: 10.1016/j.conb.2008.08.010. URL http://www.sciencedirect.com/science/article/pii/S0959438808000834. Google ScholarGoogle ScholarCross RefCross Ref
  40. J. W. Lichtman, H. Pfister, and N. Shavit. The big data challenges of connectomics. Nature neuroscience, 17(11): 1448--1454, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  41. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. Google ScholarGoogle ScholarCross RefCross Ref
  42. J. Maitin-Shepard, V. Jain, M. Januszewski, P. Li, J. Kornfeld, J. Buhmann, and P. Abbeel. Combinatorial energy learning for image segmentation. arXiv preprint arXiv:1506.04304, 2015.Google ScholarGoogle Scholar
  43. J. Masci, A. Giusti, D. Ciresan, G. Fricout, and J. Schmidhuber. A fast learning algorithm for image segmentation with max-pooling convolutional networks. In Image Processing (ICIP), 2013 20th IEEE International Conference on, pages 2713--2717. IEEE, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  44. M. Meila. Comparing clusteringsan information based distance. Journal of multivariate analysis, 98(5):873--895, 2007.Google ScholarGoogle Scholar
  45. M. Meila. Comparing clusteringsan information based distance. Journal of multivariate analysis, 98(5):873--895, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Meirovitch, A. Matveev, H. Saribekyan, D. Budden, D. Rolnick, G. Odor, S. K.-B. T. R. Jones, H. Pfister, J. W. Lichtman, and N. Shavit. A Multi-Pass Approach to LargeScale Connectomics. ArXiv e-prints, Dec. 2016.Google ScholarGoogle Scholar
  47. J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(1):21--65, Feb. 1991. ISSN 0734-2071. doi: 10.1145/103727.103729. URL http://doi. acm.org/10.1145/103727.103729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. N. Moga, B. Cramariuc, and M. Gabbouj. Parallel watershed transformation algorithms for image segmentation. Parallel Comput., 24(14):1981--2001, Dec. 1998. ISSN 0167- 8191. doi: 10.1016/S0167--8191(98)00085--4. URL http://dx.doi.org/10.1016/S0167--8191(98)00085--4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. L. Morgan, D. R. Berger, A. W. Wetzel, and J. W. Lichtman. The fuzzy logic of network connectivity in mouse visual thalamus. Cell, 165(1):192--206, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  50. Nervana. Neon. https://github.com/NervanaSystems/neon.Google ScholarGoogle Scholar
  51. J. Nunez-Iglesias, R. Kennedy, T. Parag, J. Shi, and D. B. Chklovskii. Machine learning of hierarchical clustering to segment 2D and 3D images. PloS one, 8(8):e71715, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  52. J. Nunez-Iglesias, R. Kennedy, S. M. Plaza, A. Chakraborty, and W. T. Katz. Graph-based active learning of agglomeration (gala): a python library to segment 2d and 3d neuroimages. Frontiers in neuroinformatics, 8, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  53. NVIDIA. Nvidia cudnn - gpu accelerated deep learning, 2016. URL https://developer.nvidia.com/cudnn.Google ScholarGoogle Scholar
  54. T. Parag, A. Chakrobarty, and S. Plaza. A context-aware delayed agglomeration framework for em segmentation. CoRR, 2014.Google ScholarGoogle Scholar
  55. T. Parag, A. Chakraborty, S. Plaza, and L. Scheffer. A context-aware delayed agglomeration framework for electron microscopy segmentation. PloS one, 10(5):e0125825, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  56. S. M. Plaza and S. E. Berg. Large-scale electron microscopy image segmentation in spark. arXiv preprint arXiv:1604.00385, 2016.Google ScholarGoogle Scholar
  57. S. Ramon and S. Cajal. Textura del Sistema Nervioso del Hombre y de los Vertebrados, volume 2. Madrid Nicolas Moya, 1904.Google ScholarGoogle Scholar
  58. W. R. G. Roncal, D. M. Kleissas, J. T. Vogelstein, P. Manavalan, K. Lillaney, M. Pekala, R. Burns, R. J. Vogelstein, C. E. Priebe, M. A. Chevillet, et al. An automated images-tographs framework for high resolution connectomics. Frontiers in neuroinformatics, 9, 2015.Google ScholarGoogle Scholar
  59. O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, pages 234--241. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  60. P. K. Saha, G. Borgefors, and G. Sanniti di Baja. A survey on skeletonization algorithms and their applications. Pattern Recognition Letters, 76:3--12, June 2016. ISSN 0167--8655. doi: 10.1016/j.patrec.2015.04.006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. S. Seung. Connectome: How the brain's wiring makes us who we are. Houghton Mifflin Harcourt, 2012.Google ScholarGoogle Scholar
  62. F. She, R. Chen, W. Gao, P. Hodgson, L. Kong, and H. Hong. Improved 3d Thinning Algorithms for Skeleton Extraction. In Digital Image Computing: Techniques and Applications, 2009. DICTA '09., pages 14--18, Dec. 2009. doi: 10.1109/ DICTA.2009.13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. F. Tschopp. Efficient convolutional neural networks for pixelwise classification on heterogeneous hardware systems. arXiv preprint arXiv:1509.03371, 2015.Google ScholarGoogle Scholar
  65. B. Vision and L. Center. Caffe deep learning framework. http://caffe.berkeleyvision.org/.Google ScholarGoogle Scholar
  66. J. Vogelstein. Machine intelligence from cortical networks (microns), 2016. URL https://www.iarpa.gov/index.php/ research-programs/microns.Google ScholarGoogle Scholar
  67. J. G. White, E. Southgate, J. N. Thomson, and S. Brenner. The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314(1165):1--340, 1986. ISSN 0080--4622. doi: 10.1098/rstb.1986.0056. URL http://rstb.royalsocietypublishing.org/content/314/1165/1. Google ScholarGoogle ScholarCross RefCross Ref
  68. Wiki. Advanced vector extensions, 2016. URL https://en. wikipedia.org/wiki/Advanced Vector Extensions.Google ScholarGoogle Scholar
  69. F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.Google ScholarGoogle Scholar
  70. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. HotCloud, 10:10--10, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 2--2. USENIX Association, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. A. Zlateski, K. Lee, and H. S. Seung. ZNN-A fast and scalable algorithm for training 3D convolutional networks on multicore and many-core shared memory machines. arXiv preprint arXiv:1510.06706, 2015.Google ScholarGoogle Scholar

Index Terms

  1. A Multicore Path to Connectomics-on-Demand

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 52, Issue 8
            PPoPP '17
            August 2017
            442 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/3155284
            Issue’s Table of Contents
            • cover image ACM Conferences
              PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
              January 2017
              476 pages
              ISBN:9781450344937
              DOI:10.1145/3018743

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 January 2017

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!