Abstract
IP Geolocation databases are widely used in online services to map end-user IP addresses to their geographical location. However, they use proprietary geolocation methods, and in some cases they have poor accuracy. We propose a systematic approach to use reverse DNS hostnames for geolocating IP addresses, with a focus on end-user IP addresses as opposed to router IPs. Our method is designed to be combined with other geolocation data sources. We cast the task as a machine learning problem where, for a given hostname, we first generate a list of potential location candidates, and then we classify each hostname and candidate pair using a binary classifier to determine which location candidates are plausible. Finally, we rank the remaining candidates by confidence (class probability) and break ties by population count. We evaluate our approach against three state-of-the-art academic baselines and two state-of-the-art commercial IP geolocation databases. We show that our work significantly outperforms the academic baselines and is complementary and competitive with commercial databases. To aid reproducibility, we open source our entire approach and make it available to the academic community.
- 2017. IPv4 Special-Purpose Address Registry. Technical Report. Internet Assigned Numbers Authority.Google Scholar
- John Akhilomen. 2013. Data mining application for cyber credit-card fraud detection system. In Proceedings of the Industrial Conference on Data Mining. Springer, 218–228. Google Scholar
Digital Library
- Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: Improving geographical prediction with social and spatial proximity. In Proceedings of the Annual Conference on the World Wide Web (WWW'10). ACM, 61–70. DOI:https://doi.org/10.1145/1772690.1772698 Google Scholar
Digital Library
- Paul N. Bennett, Filip Radlinski, Ryen W. White, and Emine Yilmaz. 2011. Inferring and using location metadata to personalize web search. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, 135–144. DOI:https://doi.org/10.1145/2009916.2009938 Google Scholar
Digital Library
- Tej Paul Bhatla, Vikram Prabhu, and Amit Dua. 2003. Understanding credit card frauds. Cards Bus. Rev. 1, 6 (2003).Google Scholar
- R. Braden. 1989. Requirements for Internet Hosts—Application and Support. RFC 1123. RFC Editor. Retrieved from https://tools.ietf.org/html/rfc1123. Google Scholar
Digital Library
- Asmir Butkovic, Fahrudin Orucevic, and Anel Tanovic. 2013. Using whois based geolocation and google maps api for support cybercrime investigations. In Proceedings of the WSEAS International Conference on Circuits, Systems, Communications, Computers and Applications (CSCCA'13). 194–201.Google Scholar
- CAIDA. 2018. The CAIDA Internet Topology Data Kit—2018-03. Retrieved August 19, 2020 from https://www.caida.org/data/internet-topology-data-kit.Google Scholar
- Pew Research Center. 2013. Location-Based Services. Retrieved February 6, 2019 from http://www.pewinternet.org/2013/09/12/location-based-services/.Google Scholar
- Joseph Chabarek and Paul Barford. 2013. What's in a name?: Decoding router interface names. In Proceedings of the 5th ACM Workshop on HotPlanet. ACM, 3–8. Google Scholar
Digital Library
- Balakrishnan Chandrasekaran, Mingru Bai, Michael Schoenfield, Arthur Berger, Nicole Caruso, George Economou, Stephen Gilliss, Bruce Maggs, Kyle Moses, David Duff, et al. 2015. Alidade: Ip geolocation without active probing. Technical Report, Department of Computer Science, Duke University.Google Scholar
- Gloria Ciavarrini, Maria S. Greco, and Alessio Vecchio. 2018. Geolocation of Internet hosts: Accuracy limits through Cramér–Rao lower bound. Comput. Netw. 135 (2018), 70–80.Google Scholar
Cross Ref
- Kc Claffy. 2016. The 7th workshop on active internet measurements (AIMS7) report. ACM SIGCOMM Comput. Commun. Rev. 46, 1 (2016), 50–57. Google Scholar
Digital Library
- Ovidiu Dan, Vaibhav Parikh, and Brian D. Davison. 2016. Improving IP geolocation using query logs. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 347–356. Google Scholar
Digital Library
- Ovidiu Dan, Vaibhav Parikh, and Brian D. Davison. 2018. Distributed reverse DNS geolocation. In Proceedings of the IEEE International Conference on Big Data (Big Data'18). IEEE, 1581–1586.Google Scholar
- Digital Element. 2018. Finding Yourself: The Challenges of Accurate IP Geolocation. Retrieve February 6, 2019 from https://dyn.com/blog/finding-yourself-the-challenges-of-accurate-ip-geolocation/.Google Scholar
- Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, and K. C. Claffy. 2020. RIPE IPmap active geolocation: Mechanism and performance evaluation. ACM SIGCOMM Comput. Commun. Rev. 50, 2 (2020), 3–10. Google Scholar
Digital Library
- Zakir Durumeric, Eric Wustrow, and J. Alex Halderman. 2013. ZMap: Fast internet-wide scanning and its security applications. In Proceedings of the 22nd USENIX Security Symposium (USENIX Security'13). 605–620. Google Scholar
Digital Library
- H. Eidnes, G. de Groot, and P. Vixie. 1998. Classless IN-ADDR.ARPA Delegation. RFC 2317. RFC Editor. Retrieved from https://tools.ietf.org/html/rfc2317. Google Scholar
Digital Library
- P. T. Endo and D. Sadok. 2010. Whois based geolocation: A strategy to geolocate internet hosts. In Proceedings of the International Conference on Advanced Information Networking and Applications (AINA'10). 408–413. DOI:https://doi.org/10.1109/AINA.2010.39 Google Scholar
Digital Library
- Center for Applied Internet Data Analysis. [n.d.]. DDec–DNS Decoded–CAIDA's public DNS Decoding database. Retrieved July 31, 2018 from http://ddec.caida.org/help.pl.Google Scholar
- United Nations Economic Commission for Europe. [n.d.]. UN/LOCODE: United Nations Code for Trade and Transport Locations. Retrieved June 27, 2018 from https://www.unece.org/cefact/locode/welcome.html.Google Scholar
- Mozilla Foundation. [n.d.]. Public Suffix List. Retrieved June 28, 2018 from https://publicsuffix.org/list/.Google Scholar
- Michael J. Freedman, Mythili Vutukuru, Nick Feamster, and Hari Balakrishnan. 2005. Geographic locality of IP prefixes. In Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement. USENIX Association, 153–158. Google Scholar
Digital Library
- Fysical. 2019. The Next Data Frontier Isn't Digital. It's Fysical. Retrieved February 6, 2019 fromhttps://fysical.org/.Google Scholar
- Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. 2017. A look at router geolocation in public and commercial databases. In Proceedings of the 2017 Internet Measurement Conference. ACM, 463–469. Google Scholar
Digital Library
- Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida. 2006. Constraint-based geolocation of internet hosts. IEEE/ACM Trans. Netw. 14, 6 (Dec. 2006), 1219–1232. DOI:https://doi.org/10.1109/TNET.2006.886332 Google Scholar
Digital Library
- Chuanxiong Guo, Yunxin Liu, Wenchao Shen, H. J. Wang, Qing Yu, and Yongguang Zhang. 2009. Mining the web and the internet for accurate IP address geolocations. In Proceedings of the IEEE International Conference on Computer Communication (INFOCOM'09). 2841–2845. DOI:https://doi.org/10.1109/INFCOM.2009.5062243Google Scholar
Cross Ref
- Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. 2013. Measuring personalization of web search. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 527–538. Google Scholar
Digital Library
- K. Harrenstien, M. Stahl, and E. Feinler. 1985. DoD Internet Host Table Specification. RFC 952. RFC Editor. Retrieved from https://tools.ietf.org/html/rfc952. Google Scholar
Digital Library
- Jochen Hipp, Ulrich Güntzer, and Gholamreza Nakhaeizadeh. 2000. Algorithms for association rule mining—a general survey and comparison. ACM SIGKDD Explor. Newslett. 2, 1 (2000), 58–64. Google Scholar
Digital Library
- Cheng Huang, D. A. Maltz, Jin Li, and Albert Greenberg. 2011. Public DNS system and global traffic management. In Proceedings of the IEEE International Conference on Computer Communication (INFOCOM'11). 2615–2623. DOI:https://doi.org/10.1109/INFCOM.2011.5935088Google Scholar
Cross Ref
- Bradley Huffaker, Marina Fomenkov, and K. C. Claffy. 2014. DRoP:DNS-based router positioning. ACM SIGCOMM Comput. Commun. Rev. 44, 3 (Jul. 2014), 6–13. Google Scholar
Digital Library
- Stephen Mark Huffman and Michael Henry Reifer. 2005. Method for Geolocating Logical Network Addresses. US Patent 6,947,978.Google Scholar
- Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement. ACM, 71–84. Google Scholar
Digital Library
- Kiip. 2019. Moments Based In-App Mobile Advertising. Retrieved February 6, 2019 from http://www.kiip.me/.Google Scholar
- Chloe Kliman-Silver, Aniko Hannak, David Lazer, Christo Wilson, and Alan Mislove. 2015. Location, location, location: The impact of geolocation on web search personalization. In Proceedings of the 2015 Internet Measurement Conference. ACM, 121–127. Google Scholar
Digital Library
- Bernhard Kölmel and Spiros Alexakis. 2002. Location based advertising. In Proceedings of the 1st International Conference on Mobile Business.Google Scholar
- Lori MacVittie. 2012.Geolocation and Application Delivery. Retrieved August 2, 2018 from https://www.f5.com/pdf/white-papers/geolocation-wp.pdf.Google Scholar
- Douglas Maughan et al. 2009. A roadmap for cybersecurity research. U.S. Department of Homeland Security.Google Scholar
- Reveal Mobile. 2019. Win More Business with Location-Based Marketing & Analytics. Retrieved February 6, 2019 from https://revealmobile.com/.Google Scholar
- P. Mockapetris. 1987. Domain Names—Concepts and Facilities. RFC 1034. RFC Editor. Retrieved from https://tools.ietf.org/html/rfc1034. Google Scholar
Digital Library
- James A. Muir and Paul C. Van Oorschot. 2009. Internet geolocation: Evasion and counterevasion. ACM Comput. Surv. 42, 1 (2009), 4. Google Scholar
Digital Library
- Abdullah Yasin Nur and Mehmet Engin Tozal. 2018. Geography and routing in the internet. ACM Trans. Spatial Algor. Syst. 4, 4, Article 11 (2018), 16 pages. DOI:https://doi.org/10.1145/3239162 Google Scholar
Digital Library
- A. Costello P. Faltstrom, and P. Hoffman. 2003. Internationalizing Domain Names in Applications (IDNA). RFC 3490. RFC Editor. Retrieved from https://tools.ietf.org/html/rfc3490. Google Scholar
Digital Library
- Venkata N. Padmanabhan and Lakshminarayanan Subramanian. 2001. An investigation of geographic mapping techniques for internet hosts. In Proceedings of the Annual Conference of the Special Interest Group on Data Communication (SIGCOMM'01). ACM, 173–185. DOI:https://doi.org/10.1145/383059.383073Google Scholar
- Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. 2011. IP geolocation databases: Unreliable?ACM SIGCOMM Comput. Commun. Rev. 41, 2 (2011), 53–56. Google Scholar
Digital Library
- Lee Rainie and Maeve Duggan. 2016. Privacy and information sharing. Pew Res. Center 16 (2016).Google Scholar
- Rapid7Labs. [n.d.]. Reverse DNS (RDNS)–2013-2017. Retrieved June 23, 2018 from https://opendata.rapid7.com/sonar.rdns/.Google Scholar
- Rapid7Labs. [n.d.]. Reverse DNS (RDNS) v2–2017 Onward. Technical Report. Retrieved Jun 23, 2018 from https://opendata.rapid7.com/sonar.rdns_v2/.Google Scholar
- Joel Reardon. 2018. Apps Sending Location, Secretly. Retrieved February 6, 2019 from https://blog.appcensus.mobi/2018/05/14/apps-sending-location-secretly/.Google Scholar
- SafeGraph. 2019. The Source of Truth for Physical Places. Retrieved February, 6, 2019 from https://www.safegraph.com/.Google Scholar
- Quirin Scheitle, Oliver Gasser, Patrick Sattler, and Georg Carle. 2017. HLOC: Hints-based geolocation leveraging multiple measurement frameworks. arXiv:1706.09331. Retrieved from https://arxiv.org/abs/1706.09331.Google Scholar
- Yuval Shavitt and Noa Zilberman. 2011. A geolocation databases study. IEEE J. Select. Areas Commun. 29, 10 (2011), 2044–2056.Google Scholar
Cross Ref
- Craig A. Shue, Nathanael Paul, and Curtis R. Taylor. 2013. From an IP address to a street address: Using wireless signals to locate a target. In Proceedings of the Workshop on Offensive Technologies (WOOT'13). USENIX. Google Scholar
Digital Library
- Neil Spring, Ratul Mahajan, and David Wetherall. 2002. Measuring ISP topologies with Rocketfuel. ACM SIGCOMM Comput. Commun. Rev. 32, 4 (2002), 133–145. Google Scholar
Digital Library
- Dan Jerker B. Svantesson. 2007. E-commerce tax: How the taxman brought geography to the ‘Borderless’ internet. Rev. Law J. 17, 1 (2007), 11.Google Scholar
- Geo Targetly. 2019. Automatically Switching Website Language Based on Visitor Country. Retrieved January 26, 2019 from https://geotargetly.com/automatically-switch-website-language-based-on-country.Google Scholar
- The New York Times. 2018. How the times analyzed location tracking companies. The New York Times, December 10 (2018). Retrieved February 6, 2019 from https://www.nytimes.com/2018/12/10/technology/location-tracking-apps-privacy.html.Google Scholar
- Paul Timmins. [n.d.]. TelcoData Telecommunications Database. Retrieved June 27, 2018 from https://www.telcodata.us/.Google Scholar
- Marketa Trimble. 2011. The future of cybertravel: Legal implications of the evasion of geolocation. Fordham Intell. Prop. Media Ent. Law J. 22 (2011), 567.Google Scholar
- Jennifer Valentino-DeVries, Natasha Singer, Michael H. Keller, and A. Krolik. 2018. Your apps know where you were last night, and They're not keeping it secret. The New York Times,December 10 (2018). Retrieved February 6, 2019 from https://www.nytimes.com/interactive/2018/12/10/business/location-data-privacy-apps.html.Google Scholar
- Yong Wang, Daniel Burgener, Marcel Flores, Aleksandar Kuzmanovic, and Cheng Huang. 2011. Towards street-level client-independent IP geolocation. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI'11). USENIX, Berkeley, CA, 365–379. Google Scholar
Digital Library
- Lin Wei, Guoming Ren, Lei Shi, Yongcai Tao, and Yangjie Cao. 2013. How does the recursive undns algorithm affect the accuracy of an IP geolocation system? In Proceedings of the 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'13). IEEE, 1060–1064.Google Scholar
- Marc Wick. [n.d.]. GeoNames. Retrieved June 27, 2018 from http://download.geonames.org/export/dump/.Google Scholar
- Bernard Wong, Ivan Stoyanov, and Emin Gün Sirer. 2007. Octant: A comprehensive framework for the geolocalization of internet hosts. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI'07). USENIX Association, Berkeley, CA, 23–23. http://dl.acm.org/citation.cfm?id=1973430.1973453. Google Scholar
Digital Library
- Inja Youn, Brian L. Mark, and Dana Richards. 2009. Statistical geolocation of internet hosts. In Proceedings of the ICCCN 2009. 1–6. DOI:https://doi.org/10.1109/ICCCN.2009.5235373 Google Scholar
Digital Library
Index Terms
IP Geolocation through Reverse DNS
Recommendations
IP Geolocation through Geographic Clicks
IP geolocation databases map IP addresses to their physical locations. They are used to determine the location of online users when their precise location is unavailable. These databases are vital for a number of online services, including search engine ...
IP Geolocation Using Traceroute Location Propagation and IP Range Location Interpolation
WWW '21: Companion Proceedings of the Web Conference 2021Many online services, including search engines, content delivery networks, ad networks, and fraud detection utilize IP geolocation databases to map IP addresses to their physical locations. However, IP geolocation databases are often inaccurate. We ...
Improving IP Geolocation using Query Logs
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningIP geolocation databases map IP addresses to their geographical locations. These databases are important for several applications such as local search engine relevance, credit card fraud protection, geotargetted advertising, and online content delivery. ...






Comments