Abstract
High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.
- [1] . 2010. DataStager: Scalable data staging services for petascale applications. Clust. Comput. 13, 3 (2010), 277–290.Google Scholar
Digital Library
- [2] . 2008. The ATLAS metadata interface. In Journal of Physics: Conference Series, Vol. 119. IOP Publishing, Bristol, United Kingdom.Google Scholar
Cross Ref
- [3] . 2018. Radio Galaxy Zoo: Machine learning for radio source host galaxy cross-identification. Month. Not. Roy. Astron. Societ. 478, 4 (2018), 5547–5563.Google Scholar
Cross Ref
- [4] . 2002. High performance computing of fluid-structure interactions in hydrodynamics applications using unstructured meshes with more than one billion elements. In International Conference on High-performance Computing. Springer, New York, NY, 519–533.Google Scholar
Cross Ref
- [5] . 2018. Benchmarking a hemodynamics application on Intel based HPC systems. Parallel Comput. Everyw. 32 (2018), 57.Google Scholar
- [6] . 2011. MPI on millions of cores. Parallel Process. Lett. 21, 1 (2011), 45–60.Google Scholar
Cross Ref
- [7] . 2018. Change point detection for ocean eddy analysis. In Workshop on Visualisation in Environmental Sciences (EnvirVis). The Eurographics Association, 27–33.
DOI: Google ScholarCross Ref
- [8] . 2021. Feature selection, clustering, and prototype placement for turbulence data sets. In AIAA SciTech 2021 Forum. American Institute of Aeronautics and Astronautic.
DOI: Google ScholarCross Ref
- [9] . 2012. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, New York, NY, 1–9.Google Scholar
Digital Library
- [10] . 2016. Exploring partial replication to improve lightweight silent data corruption detection for HPC applications. In European Conference on Parallel Processing. Springer, New York, NY, 419–430.Google Scholar
Digital Library
- [11] . 2013. Improved photometric calibration of the SNLS and the SDSS supernova surveys. Astron. Astrophys. 552 (2013), A124.Google Scholar
Cross Ref
- [12] . 2013. ConnectomeExplorer: Query-guided visual analysis of large volumetric neuroscience data. IEEE Trans. Visualiz. Comput. Graph. 19, 12 (2013), 2868–2877.Google Scholar
Digital Library
- [13] . 2018. In situ data-driven adaptive sampling for large-scale simulation data summarization. In Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization. ACM, New York, NY, 13–18.Google Scholar
- [14] . 2018. Classification of galaxy morphologies using artificial neural network. In 4th International Conference for Convergence in Technology (I2CT). IEEE, New York, NY, 1–4.Google Scholar
Cross Ref
- [15] . 2018. GUFI Overview.
Technical Report . Los Alamos National Lab (LANL), Los Alamos, NM.Google ScholarCross Ref
- [16] . 2010. Interactive exploration and analysis of large-scale simulations using topology-based data segmentation. IEEE Trans. Visualiz. Comput. Graph. 17, 9 (2010), 1307–1324.Google Scholar
Digital Library
- [17] . 2017. Tuning HDF5 subfiling performance on parallel file systems. In Cray User Group Conference (CUG’17).Google Scholar
- [18] . 2003. Path seeds and flexible isosurfaces using topology for exploratory visualization. In Symposium on Data Visualisation. The Eurographics Association, 49–58.Google Scholar
- [19] . 2017. An early functional and performance experiment of the MarFS Hybrid Storage EcoSystem. In IEEE International Conference on Cloud Engineering (IC2E). 59–66.
DOI: Google ScholarCross Ref
- [20] . 2015. In-memory query system for scientific datasets. In IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). IEEE, New York, NY, 362–371.Google Scholar
Digital Library
- [21] . 2018. A numerical study of scalable cardiac electro-mechanical solvers on HPC architectures. Front. Physiol. 9 (2018), 268.Google Scholar
Cross Ref
- [22] . 2015. Identifying structural flow defects in disordered solids using machine-learning methods. Phys. Rev. Lett. 114, 10 (2015), 108001.Google Scholar
Cross Ref
- [23] . 2016. Doubly Distributed Transactions. Retrieved from https://github.com/gflofst/d2t.Google Scholar
- [24] . 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 3 (2005), 219–237.Google Scholar
Digital Library
- [25] . 1996. ZLIB Compressed Data Format Specification Version 3.3.
Technical Report . RFC 1950, May.Google ScholarDigital Library
- [26] . 2016. Detection of dispersed radio pulses: A machine learning approach to candidate identification and classification. Month. Not. Roy. Astron. Societ. 459, 2 (2016), 1519–1532.Google Scholar
Cross Ref
- [27] . 2009. The Square Kilometre Array. Proc. Instit. Electric. Electron. Eng. 97, 8 (2009), 1482–1496.Google Scholar
Cross Ref
- [28] . 2017. Optimization of error-bounded lossy compression for hard-to-compress HPC data. IEEE Trans. Parallel Distrib. Syst. 29, 1 (2017), 129–143.Google Scholar
Digital Library
- [29] . 2012. DataSpaces: An interaction and coordination framework for coupled simulation workflows. Clust. Comput. 15, 2 (2012), 163–181.Google Scholar
Digital Library
- [30] . 2016. Adaptive performance-constrained in situ visualization of atmospheric simulations. In IEEE International Conference on Cluster Computing (CLUSTER). IEEE, New York, NY, 269–278.Google Scholar
Cross Ref
- [31] . 2008. Incorporating historic knowledge into a communication library for self-optimizing high performance computing applications. In 2nd IEEE International Conference on Self-adaptive and Self-organizing Systems. IEEE, New York, NY, 265–274.Google Scholar
Digital Library
- [32] . 2011. An overview of the HDF5 technology suite and its applications. In EDBT/ICDT Workshop on Array Databases. ACM, New York, NY, 36–47.Google Scholar
- [33] . 2012. MPI: A Message-Passing Interface Standard Version 3.0. Retrieved November 20, 2022 from https://www.mpi-forum.org/mpi-30/.Google Scholar
- [34] . 2005. T-Map: A topological approach to visual exploration of time-varying volume data. In High-Performance Computing. Springer, New York, NY, 176–190.Google Scholar
- [35] . 2010. Towards performance portability through runtime adaptation for high-performance computing applications. Concurr. Computat.: Pract. Exper. 22, 16 (2010), 2230–2246.Google Scholar
Digital Library
- [36] . 2011. Modeling the performance of an algebraic multigrid cycle on HPC platforms. In International Conference on Supercomputing. ACM, New York, NY, 172–181.Google Scholar
Digital Library
- [37] . 2020. Runtime mechanisms to survive new HPC architectures: A use case in human respiratory simulations. Int. J. High Perform. Comput. Applic. 34, 1 (2020), 42–56.Google Scholar
Digital Library
- [38] . 2010. Machine learning for galaxy morphology classification. arXiv preprint arXiv:1005.0390 (2010).Google Scholar
- [39] . 2013. PARLO: PArallel Run-time Layout Optimization for scientific data explorations with heterogeneous access patterns. In 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, New York, NY, 343–351.Google Scholar
- [40] . 2012. Multi-level layout optimization for efficient spatio-temporal queries on ISABELA-compressed data. In IEEE 26th International Symposium on Parallel & Distributed Processing Symposium (IPDPS). IEEE, New York, NY, 873–884.Google Scholar
- [41] . 2012. MLOC: Multi-level layout optimization framework for compressed scientific data exploration with heterogeneous access patterns. In 41st International Conference on Parallel Processing. IEEE, New York, NY, 239–248.Google Scholar
Digital Library
- [42] . 2007. Variable interactions in query-driven visualization. IEEE Trans. Visualiz. Comput. Graph. 13, 6 (2007), 1400–1407.Google Scholar
Digital Library
- [43] . 2006. HDF5-FastQuery: Accelerating complex queries on HDF datasets using fast bitmap indices. In 18th International Conference on Scientific and Statistical Database Management. IEEE, New York, NY, 149–158.Google Scholar
Digital Library
- [44] . 2010. An application of multivariate statistical analysis for query-driven visualization. IEEE Trans. Visualiz. Comput. Graph. 17, 3 (2010), 264–275.Google Scholar
Digital Library
- [45] . 2014. SKYNET: An efficient and robust neural network training tool for machine learning in astronomy. Month. Not. Roy. Astron. Societ. 441, 2 (2014), 1741–1759.Google Scholar
Cross Ref
- [46] . 2015. Analysis of the ECMWF storage landscape. In 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, Berkeley, CA, 15–27.Google Scholar
Digital Library
- [47] . 1999. Using MPI: Portable Parallel Programming with the Message-passing Interface. Vol. 1. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- [48] . 2014. Using Advanced MPI: Modern Features of the Message-passing Interface. MIT Press.Google Scholar
- [49] . 1999. Using MPI-2: Advanced Features of the Message-passing Interface. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- [50] . 2018. Querying large scientific data sets with adaptable IO system ADIOS. In Asian Conference on Supercomputing Frontiers. Springer, New York, NY, 51–69.Google Scholar
Cross Ref
- [51] . 2005. Topology-based simplification for feature extraction from 3D scalar fields. In IEEE Visualization Conference. IEEE, New York, NY.Google Scholar
- [52] . 2016. ASCR/HEP exascale requirements review report. arXiv preprint arXiv:1603.09303 (2016).Google Scholar
- [53] . 2017. GlobeNet: Convolutional neural networks for typhoon eye tracking from remote sensing imagery. arXiv preprint arXiv:1708.03417 (2017).Google Scholar
- [54] . 2011. High-performance computing of wind turbine aerodynamics using isogeometric analysis. Comput. Fluids 49, 1 (2011), 93–100.Google Scholar
Cross Ref
- [55] . 2008. LSST: From science drivers to reference design and anticipated data products. arXiv preprint arXiv:0805.2366 (2008).Google Scholar
- [56] JAMO 2018. JAMO—JGI Archive and Metadata Organizer. Retrieved from https://storageconference.us/2018/Presentations/Beecroft.pdf.Google Scholar
- [57] . 2012. Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In International Conference on Database and Expert Systems Applications. Springer, New York, NY, 16–30.Google Scholar
Cross Ref
- [58] . 2012. Byte-precision level of detail processing for variable precision analytics. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1–11.
DOI: Google ScholarDigital Library
- [59] . 2014. From research to practice: Experiences engineering a production metadata database for a scale out file system. In USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 191–198.Google Scholar
- [60] . 2019. SciSpace: A scientific collaboration workspace for geo-distributed HPC data centers. Fut. Gen. Comput. Syst. 101 (2019), 398–409.Google Scholar
Digital Library
- [61] . 2011. Managing biomedical image metadata for search and retrieval of similar images. J. Digit. Imag. 24, 4 (2011), 739–748.Google Scholar
Cross Ref
- [62] . 2009. Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic Tokamak geometry. Nucl. Fus. 49, 11 (2009), 115021.Google Scholar
Cross Ref
- [63] . 2020. Coupling storage systems and self-describing data formats for global metadata management. In International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, New York, NY, 1224–1230.Google Scholar
Cross Ref
- [64] . 2011. Compressing the incompressible with ISABELA: In-situ reduction of spatio-temporal data. In European Conference on Parallel Processing. Springer, New York, NY, 366–379.Google Scholar
Digital Library
- [65] . 2014. DIRAQ: Scalable in situ data-and resource-aware indexing for optimized query performance. Clust. Comput. 17, 4 (2014), 1101–1119.Google Scholar
Digital Library
- [66] . 2014. In-situ feature extraction of large scale combustion simulations using segmented merge trees. In International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New York, NY, 1020–1031.Google Scholar
Digital Library
- [67] . 2018. The Next Generation of EMPRESS: A Metadata Management System for Accelerated Scientific Discovery at Exascale. Bachelor’s Thesis. Dartmouth College.Google Scholar
- [68] . 2018. Using a robust metadata management system to accelerate scientific discovery at extreme scales. In 3rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS’18). IEEE, New York, NY, 13–23.
DOI: Google ScholarCross Ref
- [69] . 2009. Spyglass: Fast, scalable metadata search for large-scale storage systems. In USENIX Conference on File and Storage Technologies. 153–166.Google Scholar
- [70] . 2003. Parallel netCDF: A high-performance scientific I/O interface. In ACM/IEEE Conference on Supercomputing. IEEE, New York, NY, 39–39.
DOI: Google ScholarCross Ref
- [71] . 2016. Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:1605.01156 (2016).Google Scholar
- [72] . 1996. A near optimal isosurface extraction algorithm using the span space. IEEE Trans. Visualiz. Comput. Graph. 2, 1 (1996), 73–84.Google Scholar
Digital Library
- [73] . 2012. Transactional parallel metadata services for integrated application workflows. In High Performance Computing Meets Databases at Supercomputing (HPCDB’12).Google Scholar
- [74] . 2012. D2T: Doubly distributed transactions for high performance and distributed computing. In IEEE International Conference on Cluster Computing (CLUSTER). IEEE, New York, NY, 90–98.Google Scholar
Digital Library
- [75] . 2016. DAOS and friends: A proposal for an exascale storage system. In International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New York, NY, 585–596.Google Scholar
Cross Ref
- [76] . 2011. Six degrees of scientific data: Reading patterns for extreme scale science IO. In 20th International Symposium on High-performance Distributed Computing (HPDC’11). ACM, New York, NY, 49–60.
DOI: http://doi.acm.org/10.1145/1996130.1996139Google Scholar - [77] . 2009. Adaptable, metadata rich IO methods for portable high performance IO. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, New York, NY, 1–10.Google Scholar
- [78] . 2010. Managing variability in the IO performance of petascale storage systems. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, New York, NY, 1–12.Google Scholar
Digital Library
- [79] . 2019. nsCouette—A high-performance code for direct numerical simulations of turbulent Taylor-Couette flow. arXiv preprint arXiv:1908.00587 (2019).Google Scholar
- [80] . 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH 21, 4 (1987), 163–169.Google Scholar
Digital Library
- [81] . 2006. Scientific workflow management and the Kepler system. Concurr. Computat.: Pract. Exper. 18, 10 (2006), 1039–1065.Google Scholar
Cross Ref
- [82] . 2018. Survey of storage systems for high-performance computing. Supercomput. Front. Innov. 5, 1 (2018), 31–58.Google Scholar
Digital Library
- [83] . 2009. In situ visualization at extreme scale: Challenges and opportunities. IEEE Comput. Graph. Applic. 29, 6 (2009), 14–19.Google Scholar
Digital Library
- [84] . 2016. Exploring machine learning methods for the star/galaxy separation problem. In International Joint Conference on Neural Networks (IJCNN). IEEE, New York, NY, 123–130.Google Scholar
Cross Ref
- [85] . 2007. A tool for prioritizing DAGMan jobs and its evaluation. J. Grid Comput. 5, 2 (2007), 197–212.Google Scholar
Digital Library
- [86] . 2012. A study on data deduplication in HPC storage systems. In International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, New York, NY, 1–11.Google Scholar
Digital Library
- [87] . 2015. Applying Graph Partitioning Methods in Measurement-based Dynamic Load Balancing.
Technical Report , UIUC. https://www.ideals.illinois.edu/items/77157.Google Scholar - [88] . 2016. Partitioning a large simulation as it runs. Technometrics 58, 3 (2016), 329–340.Google Scholar
Cross Ref
- [89] . 2003. On marching cubes. IEEE Trans. Visualiz. Comput. Graph. 9, 3 (2003), 283–297.Google Scholar
Digital Library
- [90] . 2006. Efficient data-movement for lightweight I/O. In International Workshop on High Performance I/O Techniques and Deployment of Very Large Scale I/O Systems.Google Scholar
- [91] . 2016. Scientific workflows at DataWarp-speed: Accelerated data-intensive science using NERSC’s Burst Buffer. In 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS). IEEE, New York, NY, 1–6.Google Scholar
- [92] . 2012. Towards efficient data search and subsetting of large-scale atmospheric datasets. Fut. Gen. Comput. Syst. 28, 1 (2012), 112–118.Google Scholar
Digital Library
- [93] Panasas 2022. Retrieved November 20, 2022 from https://www.panasas.com/.Google Scholar
- [94] . 2006. Efficient query processing on unstructured tetrahedral meshes. In ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 551–562.Google Scholar
Digital Library
- [95] . 2003. The state of the art in flow visualisation: Feature extraction and tracking. In Computer Graphics Forum, Vol. 22. Wiley Online Library, New York, NY, 775–792.Google Scholar
- [96] . 2017. PKDGRAV3: Beyond trillion particle cosmological simulations for the next era of galaxy surveys. Computat. Astrophys. Cosmol. 4, 1 (2017), 1–13.Google Scholar
- [97] . 2017. Moya–A JIT compiler for HPC. In Programming and Performance Visualization Tools. Springer, New York, NY, 56–73.Google Scholar
- [98] . 2018. Deep learning identifies High-z galaxies in a central blue nugget phase in a characteristic mass range. Astrophys. J. 858, 2 (2018), 114.Google Scholar
Cross Ref
- [99] . 2018. A 3D transversally isotropic constitutive model for advanced composites implemented in a high performance computing code. Eur. J. Mechan.-A/Solids 71 (2018), 278–291.Google Scholar
Cross Ref
- [100] . 2017. ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 3405–3416. Google Scholar
- [101] . 2006. netCDF-4: Software implementing an enhanced data model for the geosciences. In 22nd International Conference on Interactive Information Processing Systems for Meteorology, Oceanograph, and Hydrology.Google Scholar
- [102] . 2018. Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery (Report for the DOE ASCR Workshop on Storage Systems and I/O).
Technical Report . USDOE Office of Science (SC).Google Scholar - [103] . 2007. Characterizing the I/O behavior of scientific applications on the Cray XT. In 2nd International Workshop on Petascale Data Storage at Supercomputing’07. 50–55.Google Scholar
- [104] . 2012. TECA: A parallel toolkit for extreme climate analysis. Procedia Comput. Sci. 9 (2012), 866–876.Google Scholar
Cross Ref
- [105] . 2019. Containers in HPC: A scalability and portability study in production biological simulations. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, New York, NY, 567–577.Google Scholar
- [106] . 2018. Tintenfisch: File system namespace schemas and generators. In 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’18). USENIX Association, Berkeley, CA.Google Scholar
- [107] . 1993. Quantifying visualizations for reduced modeling in nonlinear science: Extracting structures from data sets. J. Vis. Commun. Image Represent. 4, 1 (1993), 46–61.Google Scholar
Cross Ref
- [108] . 2017. TagIt: An integrated indexing and search service for file systems. In International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, New York, NY.Google Scholar
Digital Library
- [109] . 2014. Dark Sky Simulations: Early Data Release. (2014).
arxiv:astro-ph.CO/1407.2600 .Google Scholar - [110] SQLite 2022. SQLite. Retrieved from http://www.sqlite.org/.Google Scholar
- [111] Starfish 2017. Starfish. Retrieved from https://storageconference.us/2017/Presentations/Farmer.pdf.Google Scholar
- [112] . 2005. Query-driven visualization of large data sets. In IEEE Visualization Conference. IEEE, New York, NY.Google Scholar
- [113] . 2012. Indexing and parallel query processing support for visualizing climate datasets. In 41st International Conference on Parallel Processing. IEEE, New York, NY, 249–258.Google Scholar
Digital Library
- [114] . 2017. SoMeta: Scalable object-centric metadata management for high performance computing. In IEEE International Conference on Cluster Computing (CLUSTER). IEEE, New York, NY, 359–369.Google Scholar
Cross Ref
- [115] . 2004. The Panasas ActiveScale storage cluster—delivering scalable high bandwidth storage. In ACM/IEEE Conference on Supercomputing. IEEE, New York, NY, 53–53.Google Scholar
Digital Library
- [116] . 2012. Accelerating range queries for brain simulations. In IEEE 28th International Conference on Data Engineering. IEEE, New York, NY, 941–952.Google Scholar
Digital Library
- [117] . 2008. The Catalog archive server database management system. Comput. Sci. Eng. 10, 1 (2008), 30–37.
DOI: Google ScholarDigital Library
- [118] . 1999. Data sieving and collective I/O in ROMIO. In 7th Symposium on the Frontiers of Massively Parallel Computation (Frontiers’99). IEEE, New York, NY, 182–189.Google Scholar
- [119] . 2011. Analysis of large-scale scalar data using hixels. In IEEE Symposium on Large Data Analysis and Visualization. IEEE, New York, NY, 23–30.Google Scholar
- [120] Top500 2022. TOP500 Lists. Retrieved from https://www.top500.org/lists/top500/.Google Scholar
- [121] . 2013. The SPOT Suite project. Retrieved from http://spot.nersc.gov/.Google Scholar
- [122] . 2005. Intelligent feature extraction and tracking for visualizing large-scale 4D flow simulations. In ACM/IEEE Conference on Supercomputing. IEEE, New York, NY.Google Scholar
- [123] . 2017. TempestExtremes: A framework for scale-insensitive pointwise feature tracking on unstructured grids. Geosci. Model Devel. 10, 3 (2017), 1069–1090.Google Scholar
Cross Ref
- [124] . 2021. TempestExtremes v2.1: A community framework for feature detection, tracking and analysis in large datasets. Geosci. Model Devel. Discuss. 14 (2021), 5023–5048. https://gmd.copernicus.org/articles/14/5023/2021/.Google Scholar
Cross Ref
- [125] . 2018. Faodel: Data management for next-generation application workflows. In Workshop on Infrastructure for Workflows and Application Composition (IWAC).Google Scholar
- [126] . 2019. U.S. Department of Energy and Intel to Build First Exascale Supercomputer. Retrieved from https://www.energy.gov/articles/us-department-energy-and-intel-build-first-exascale-supercomputer.Google Scholar
- [127] VIC 2022. VIC. Retrieved from https://zenodo.org/badge/DOI/10.5281/zenodo.5781377.svg.Google Scholar
- [128] . 2006. Interactive level-of-detail selection using image-based quality metric for large volume visualization. IEEE Trans. Visualiz. Comput. Graph. 13, 1 (2006), 122–134.Google Scholar
Digital Library
- [129] . 2006. LOD map—a visual interface for navigating multiresolution volume visualization. IEEE Trans. Visualiz. Comput. Graph. 12, 5 (2006), 1029–1036.Google Scholar
Digital Library
- [130] . 2008. Importance-driven time-varying data visualization. IEEE Trans. Visualiz. Comput. Graph. 14, 6 (2008), 1547–1554.Google Scholar
Digital Library
- [131] . 2006. Ceph: A scalable, high-performance distributed file system. In 7th Symposium on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, 307–320.Google Scholar
- [132] . 2013. Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions. In IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, New York, NY, 1–12.Google Scholar
- [133] . 2008. Scalable performance of the Panasas parallel file system. In USENIX Conference on File and Storage Technologies. 1–17.Google Scholar
- [134] . 2017. Detecting Changes in Simulations.
Technical Report . Los Alamos National Lab. (LANL), Los Alamos, NM.Google Scholar - [135] . 2012. Interactive exploration of large-scale time-varying data using dynamic tracking graphs. In IEEE Symposium on Large Data Analysis and Visualization (LDAV). IEEE, New York, NY, 9–17.Google Scholar
Cross Ref
- [136] . 2009. FastBit: Interactively searching massive data. In Scientific Discovery through Advanced Computing Conference.Google Scholar
Cross Ref
- [137] . 2017. Optimizing the query performance of block index through data analysis and I/O modeling. In International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, New York, NY.Google Scholar
Digital Library
- [138] . 2010. In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Applic. 30, 3 (2010), 45–57.Google Scholar
Digital Library
- [139] . 2011. Massively parallel fluid simulations on Amazon’s HPC cloud. In 1st International Symposium on Network Cloud Computing and Applications. IEEE, New York, NY, 73–78.Google Scholar
- [140] . 2019. MIQS: Metadata indexing and querying service for self-describing file formats. In International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, New York, NY, 5.Google Scholar
Digital Library
- [141] . 2009. Simulations of global hurricane climatology, interannual variability, and response to global warming using a 50-km resolution GCM. J. Clim. 22, 24 (2009), 6653–6678.Google Scholar
Cross Ref
- [142] . 2018. Scaling embedded in-situ indexing with DeltaFS. In International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New York, NY, 30–44.Google Scholar
Digital Library
- [143] . 2015. DeltaFS: Exascale file systems scale better without dedicated servers. In 10th Parallel Data Storage Workshop (PDSW’15). ACM, New York, NY, 1–6.
DOI: Google ScholarDigital Library
- [144] . 2018. Key time steps selection for large-scale time-varying volume datasets using an information-theoretic storyboard. In Computer Graphics Forum, Vol. 37. Wiley Online Library, New York, NY, 37–49.Google Scholar
Index Terms
EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management
Recommendations
Empress: extensible metadata provider for extreme-scale scientific simulations
PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing SystemsSignificant challenges exist in the efficient retrieval of data from extreme-scale simulations. An important and evolving method of addressing these challenges is application-level metadata management. Historically, HDF5 and NetCDF have eased data ...
How descriptive metadata changes in the UNT libraries' collections: a case study
DCMI'14: Proceedings of the 2014 International Conference on Dublin Core and Metadata ApplicationsThis paper reports results of an exploratory quantitative analysis of metadata versioning in a large-scale digital library hosted by University of North Texas. The study begins to bridge the gap in the information science research literature to address ...
Semantic metadata generation for large scientific workflows
ISWC'06: Proceedings of the 5th international conference on The Semantic WebIn recent years, workflows have been increasingly used in scientific applications. This paper presents novel metadata reasoning capabilities that we have developed to support the creation of large workflows. They include 1) use of semantic web ...






Comments