ABSTRACT
High-throughput genetic sequencing produces the ultimate "big data": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting "big data" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.
- Agrawal, R. and Srikant, R. 2003. Searching with numbers. IEEE TKDE. 15, 4 (Aug. 2003), 855--870. Google Scholar
Digital Library
- Ahrens, J.P. et al. 2011. Data-intensive science in the US DOE. CISE. 13, 6 (Dec. 2011), 14--24. Google Scholar
Digital Library
- Altschul, S.F. et al. 1997. Gapped BLAST and PSI-BLAST. Nucleic acids res. 25, 17 (1997), 3389--3402.Google Scholar
- Cafarella, M.J. et al. 2008. Webtables: exploring the power of tables on the web. VLDB. 1, 1 (2008), 538--549. Google Scholar
Digital Library
- CURSOR: http://cursor.businesscatalyst.com/index.html. Accessed: 2015-02-23.Google Scholar
- Krzywinski, M. et al. 2009. Circos: An information aesthetic for comparative genomics. Genome Research. 19, 9 (Sep. 2009), 1639--1645.Google Scholar
Cross Ref
- Maier, D. et al. 2012. Navigating oceans of data. Scientific and Statistical Database Management (2012), 1--19. Google Scholar
Digital Library
- Martin Sanchez, F. et al. 2013. Exposome informatics. J. of Am. Medical Informatics Ass. 21, 3 (Nov. 2013), 386--390.Google Scholar
- Megler, V.M. 2014. Ranked Similarity Search of Scientific Datasets (PhD Dissertation). Portland State University.Google Scholar
- Megler, V.M. and Maier, D. 2015. Are Datasets Like Documents?. IEEE TKDE. 27, 1 (Jan. 2015), 32--45.Google Scholar
- Robinson, J.T. et al. 2011. Integrative Genomics Viewer. Nature Biotechnology. 29, (2011), 24--26.Google Scholar
- UCSC Genome Browser: http://genome.ucsc.edu/.Google Scholar
- Venetis, P. et al. 2011. Recovering semantics of tables on the web. Proceedings of VLDB. 4, 9 (2011), 528--538. Google Scholar
Digital Library
- Weidman, S. and Arrison, T. 2009. Steps toward large-scale data integration in the sciences. NRC/NAGoogle Scholar
Index Terms
Data Like This: Ranked Search of Genomic Data Vision Paper





Comments