Abstract
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network limit the applicability of standard techniques and demand for specific algorithms to explore and analyze it. The attention of the research community has focused on assessing the security of the Tor infrastructure (i.e., its ability to actually provide the intended level of anonymity) and on discussing what Tor is currently being used for. Since there are no foolproof techniques for automatically discovering Tor hidden services, little or no information is available about the topology of the Tor Web graph. Even less is known on the relationship between content similarity and topological structure. The present article aims at addressing such lack of information. Among its contributions: a study on automatic Tor Web exploration/data collection approaches; the adoption of novel representative metrics for evaluating Tor data; a novel in-depth analysis of the hidden services graph; a rich correlation analysis of hidden services’ semantics and topology. Finally, a broad interesting set of novel insights/considerations over the Tor Web organization and content are provided.
- Daniel Arp, Fabian Yamaguchi, and Konrad Rieck. 2015. Torben: A practical side-channel attack for deanonymizing Tor communication. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIACCS’15). ACM, New York, 597--602. DOI:http://dx.doi.org/10.1145/2714576.2714627 Google Scholar
Digital Library
- Monica J. Barrat. 2012. Silk road: Ebay for drugs. Addiction 107, 3 (2012), 683--683. DOI:http://dx.doi.org/10.1111/j.1360-0443.2011.03709.x Google Scholar
Cross Ref
- Kevin Bauer, Micah Sherr, Damon McCoy, and Dirk Grunwald. 2011. ExperimenTor: A testbed for safe and realistic Tor experimentation. In Proceedings of the Workshop on Cyber Security Experimentation and Test (CSET’11).Google Scholar
- Massimo Bernaschi, Giancarlo Carbone, and Flavio Vella. 2016. Scalable betweenness centrality on multi-GPU systems. In Proceedings of the ACM International Conference on Computing Frontiers (CF’16). ACM, New York, 29--36. DOI:http://dx.doi.org/10.1145/2903150.2903153 Google Scholar
Digital Library
- Alex Biryukov, Ivan Pustogarov, Fabrice Thill, and Ralf-Philipp Weinmann. 2014. Content and popularity analysis of Tor hidden services. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW’14). 188--193. DOI:http://dx.doi.org/10.1109/ICDCSW.2014.20 Google Scholar
Digital Library
- Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Weinmann. 2013. Trawling for Tor hidden services: Detection, measurement, deanonymization. In Proceedings of the Symposium on Security and Privacy (SP’13). IEEE Computer Society, Washington, DC, 80--94. DOI:http://dx.doi.org/10.1109/SP.2013.15 Google Scholar
Digital Library
- Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2014. BUbiNG: Massive crawling for the masses. In Proceedings of the 23rd International Conference on World Wide Web Companion. 227--228. Google Scholar
Digital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, 595--602. DOI:http://dx.doi.org/10.1145/988672.988752 Google Scholar
Digital Library
- Phillip Bonacich. 2007. Some unique properties of eigenvector centrality. Soc. Netw. 29, 4 (2007), 555--564. DOI:http://dx.doi.org/10.1016/j.socnet.2007.04.002 Google Scholar
Cross Ref
- Anthony Bonato. 2005. A survey of models of the web graph. In Combinatorial and Algorithmic Aspects of Networking, Alejandro Lopez-Ortiz and Angle M. Hamel (Eds.). Lecture Notes in Computer Science, Vol. 3405. Springer, Berlin, 159--172. DOI:http://dx.doi.org/10.1007/11527954_16 Google Scholar
Digital Library
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 16 (2000), 309--320. DOI:http://dx.doi.org/10.1016/S1389-1286(00)00083-9 Google Scholar
Digital Library
- Soumen Chakrabarti, Amit Pathak, and Manish Gupta. 2011. Index design and query processing for graph conductance search. VLDB J. 20, 3 (June 2011), 445--470. DOI:http://dx.doi.org/10.1007/s00778-010-0204-8 Google Scholar
Digital Library
- Francisco Claude and Susana Ladra. 2011. Practical representations for web and social graphs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, 1185--1190. DOI:http://dx.doi.org/10.1145/2063576.2063747 Google Scholar
Digital Library
- Francisco Claude and Gonzalo Navarro. 2010. Fast and compact web graph representations. ACM Trans. Web, 4, Article 16 (Sept. 2010), 31 pages. DOI:http://dx.doi.org/10.1145/1841909.1841913 Google Scholar
Digital Library
- Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick. 2002. A survey of web metrics. ACM Comput. Surv. 34, 4 (Dec. 2002), 469--503. DOI:http://dx.doi.org/10.1145/592642.592645 Google Scholar
Digital Library
- Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The second-generation onion router. In Proceedings of the 13th Usenix Security Symposium. Google Scholar
Cross Ref
- Paul Erdős and Alfréd Rényi. 1959. On random graphs. Publicat. Mathemat. Debrec. 6 (1959), 290--297.Google Scholar
- Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. Web data extraction, applications and techniques: A survey. Knowl.-Based Syst. 70 (2014), 301--323. DOI:http://dx.doi.org/10.1016/j.knosys.2014.07.007 Google Scholar
Digital Library
- Gary William Flake, Steve Lawrence, C. Lee Giles, and Frans M. Coetzee. 2002. Self-organization and identification of web communities. IEEE Comput. 35 (2002), 66--71. Google Scholar
Digital Library
- Massimo Franceschet. 2011. PageRank: Standing on the shoulders of giants. Commun. ACM 54, 6 (June 2011), 92--101. DOI:http://dx.doi.org/10.1145/1953122.1953146 Google Scholar
Digital Library
- Christos Giatsidis, Fragkiskos D. Malliaros, and Michalis Vazirgiannis. 2013. Advanced graph mining for community evaluation in social networks and the web. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, New York, 771--772. DOI:http://dx.doi.org/10.1145/2433396.2433495 Google Scholar
Digital Library
- Evgeniy A. Grechnikov. 2012. Degree distribution and number of edges between nodes of given degrees in the buckleyosthus model of a random web graph. Internet Math. 8, 3 (2012), 257--287. DOI:http://dx.doi.org/10.1080/15427951.2011.646176 Google Scholar
Cross Ref
- Rob Jansen, Kevin Bauer, Nicholas Hopper, and Roger Dingledine. 2012. Methodically modeling the Tor network. In Proceedings of the 5th USENIX Conference on Cyber Security Experimentation and Test (CSET’12). USENIX Association, Berkeley, CA, 8--8. Retrieved from http://dl.acm.org/citation.cfm?id=2372336.2372347Google Scholar
Digital Library
- Rob Jansen and Nicholas Hopper. 2012. Shadow: Running Tor in a box for accurate and efficient experimentation. In Proceedings of the 19th Symposium on Network and Distributed System Security (NDSS’12). Internet Society.Google Scholar
- Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins. 1999. The web as a graph: Measurements, models, and methods. In Computing and Combinatorics. Lecture Notes in Computer Science, Vol. 1627. Springer, Berlin, 1--17. DOI:http://dx.doi.org/10.1007/3-540-48686-0_1 Google Scholar
Cross Ref
- Raymond Kosala and Hendrik Blockeel. 2000. Web mining research: A survey. SIGKDD Explor. Newsl. 2, 1 (June 2000), 1--15. DOI:http://dx.doi.org/10.1145/360402.360406 Google Scholar
Digital Library
- Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications, Philip S. Yu, Jiawei Han, and Christos Faloutsos (Eds.). Springer, New York, 337--357. DOI:http://dx.doi.org/10.1007/978-1-4419-6515-8_13 Google Scholar
Cross Ref
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. 2000. Stochastic models for the web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science. 57--65. DOI:http://dx.doi.org/10.1109/SFCS.2000.892065 Google Scholar
Cross Ref
- Damon McCoy, Kevin Bauer, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. 2008. Shining light in dark places: Understanding the Tor network. In Privacy Enhancing Technologies. LNCS, Vol. 5134. Springer, Berlin, 63--76. DOI:http://dx.doi.org/10.1007/978-3-540-70630-4_5 Google Scholar
Digital Library
- Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256. Google Scholar
Digital Library
- Gareth Owen and Nick Savage. 2016. Empirical analysis of Tor hidden services. IET Info. Sec. 10, 3 (2016), 113--118. Google Scholar
Digital Library
- Mike Perry. 2009. Torflow: Tor network analysis. Retrieved from http://fscked.org/talks/ TorFlow-HotPETS-final.pdf.Google Scholar
- Dimitrios Prountzos and Keshav Pingali. 2013. Betweenness centrality: Algorithms and implementations. SIGPLAN Not. 48, 8 (Feb 2013), 35--46. DOI:http://dx.doi.org/10.1145/2517327.2442521 Google Scholar
Digital Library
- Robin Snader and Nikita Borisov. 2011. Improving security and performance in the Tor network through tunable path selection. IEEE Trans. Depend. Secure Comput. 8, 5 (2011), 728--741. Google Scholar
Digital Library
- Robin Snader et al. 2008. A Tune-up for Tor: Improving Security and Performance in the Tor Network. Retrieved from https://www.internetsociety.org/doc/tune-tor-improving-security-and-per formance-tor-network-paper.Google Scholar
- Kyle Soska and Nicolas Christin. 2015. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15), Washington, D.C., 33--48.Google Scholar
Digital Library
- Martijn Spitters, Stefan Verbruggen, and Mark van Staalduinen. 2014. Towards a comprehensive insight into the thematic organization of the tor hidden services. In Proceedings of the Intelligence and Security Informatics Conference (JISIC’14), 220--223. DOI:http://dx.doi.org/10.1109/JISIC.2014.40 Google Scholar
Digital Library
- Flavio Vella, Giancarlo Carbone, and Massimo Bernaschi. 2016. Algorithms and heuristics for scalable betweenness centrality computation on multi-GPU systems. CoRR abs/1602.00963 (2016). Retrieved from http://arxiv.org/abs/1602.00963.Google Scholar
- Zachary Weinberg, Jeffrey Wang, Vinod Yegneswaran, Linda Briesemeister, Steven Cheung, Frank Wang, and Dan Boneh. 2012. StegoTorus: A camouflage proxy for the tor anonymity system. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS’12). ACM, New York, 109--120. DOI:http://dx.doi.org/10.1145/2382196.2382211 Google Scholar
Digital Library
Index Terms
Exploring and Analyzing the Tor Hidden Services Graph
Recommendations
Spiders like Onions: on the Network of Tor Hidden Services
WWW '19: The World Wide Web ConferenceTor hidden services allow offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far, most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize ...
Out-of-band discovery and evaluation for tor hidden services
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingTor is one of the most well-known anonymity networks that provides anonymity to service providers such as hidden web services. Recently size and content of Tor hidden services are highly concerned because of emerging illegal content. It is necesary to ...
Analysis of Fingerprinting Techniques for Tor Hidden Services
WPES '17: Proceedings of the 2017 on Workshop on Privacy in the Electronic SocietyThe website fingerprinting attack aims to infer the content of encrypted and anonymized connections by analyzing traffic patterns such as packet sizes, their order, and direction. Although it has been shown that no existing fingerprinting method scales ...






Comments