Abstract
Search engines have greatly influenced the way people access information on the Internet, as such engines provide the preferred entry point to billions of pages on the Web. Therefore, highly ranked Web pages generally have higher visibility to people and pushing the ranking higher has become the top priority for Web masters. As a matter of fact, Search Engine Optimization (SEO) has became a sizeable business that attempts to improve their clients’ ranking. Still, the lack of ways to validate SEO’s methods has created numerous myths and fallacies associated with ranking algorithms.
In this article, we focus on two ranking algorithms, Google’s and Bing’s, and design, implement, and evaluate a ranking system to systematically validate assumptions others have made about these popular ranking algorithms. We demonstrate that linear learning models, coupled with a recursive partitioning ranking scheme, are capable of predicting ranking results with high accuracy. As an example, we manage to correctly predict 7 out of the top 10 pages for 78% of evaluated keywords. Moreover, for content-only ranking, our system can correctly predict 9 or more pages out of the top 10 ones for 77% of search terms. We show how our ranking system can be used to reveal the relative importance of ranking features in a search engine’s ranking function, provide guidelines for SEOs and Web masters to optimize their Web pages, validate or disprove new ranking features, and evaluate search engine ranking results for possible ranking bias.
- AccuraCast. 2007. Google algorithm’s top ranking factors. http://www.accuracast.com/seo-weekly/ranking-factors.php.Google Scholar
- Alexa. 2014. Alexa, the web information company. http://www.alexa.com/.Google Scholar
- Anderson, S. 2007. Google seo test google prefers valid HTML and CSS. http://www.hobo-web.co.uk/seo-blog/index.php/official-google-prefers-valid-html-css/.Google Scholar
- Aubuchon, V. 2010. Google ranking factors. http://www.vaughns-1-pagers.com/internet/google-ranking-factors.htm.Google Scholar
- Benczur, A. A., Csalogany, K., Sarlos, T., and Uher, M. 2005. Spamrank--Fully automatic link spam detection. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb’05). 25--38.Google Scholar
- Blog1. 2013. How google works: Why does crappy website rank higher than mine? http://www.trafficgenerationcafe.com/how-google-works-relevance/.Google Scholar
- Blog2. 2013. My actual blog post ranks lower than pages associated with it. http://productforums.google.com/forum/#!topic/webmasters/QOv273CK07I.Google Scholar
- Cheng, J., Wang, Z., and Pollastri, G. 2008. A neural network approach to ordinal regression. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’08). IEEE World Congress on Computational Intelligence, 1279--1284.Google Scholar
- Cho, J. and Roy, S. 2004. Impact of search engines on page popularity. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM Press, New York, 20--29. Google Scholar
Digital Library
- CPLEX. 2014. Ilog cplex: High-performance software for mathematical programming and optimization. http://www.ilog.com/products/cplex/.Google Scholar
- Davison, B. D. 2000. Recognizing nepotistic links on the web. In Proceedings of the AAAI Workshop on Artificial Intelligence for Web Search.Google Scholar
- FANN. 2014. Fast artificial neural network library. http://leenissen.dk/fann.Google Scholar
- Friedman, J. 2001. Greedy function approximation: A gradient boosting machine. Annals Statist. 29, 5, 1182--1232.Google Scholar
- Friedman, J. 2003. Statistical gradient boosting. Comput. Statist. Data Anal. 38, 4, 367--378. Google Scholar
Digital Library
- Gift, N. 2007. RSYNC version 3 alpha out - O’reilly onlamp blog. http://www.oreillynet.com/onlamp/blog/2007/10/rsync_version_3_alpha_out.html.Google Scholar
- Google. 2014a. Pagerank on Google toolbar. http://www.google.com/support/toolbar/bin/answer.py?hl=en&answer=79837.Google Scholar
- Google. 2014b. Google trends. http://www.google.com/trends.Google Scholar
- Google. 2014c. Google webmaster tools. http://www.google.com/webmasters/.Google Scholar
- Grimmelmann, J. 2013. What to do about Google? Comm. ACM 56, 9, 28--30. Google Scholar
Digital Library
- Gyongyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J. 2006. Link spam detection based on mass estimation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). 439--450. Google Scholar
Digital Library
- Hopkins, L. 2012. Online reputation management: Why the first page of Google matters so much. http://www.leehopkins.net/2012/08/30/online-reputation-management-why-the-first-page-of-google-matters-so-much/.Google Scholar
- HTML Tidy. 2014. HTML tidy library project. http://tidy.sourceforge.net/.Google Scholar
- Joachims, T. 1999. Making large-scale SVM learning practical. In Advances in Kernel Methods---Support Vector Learning, B. Scholkopf, C. J. C. Burges, and A. J. Smola Eds., MIT Press, Cambridge, MA, 169--184. Google Scholar
Digital Library
- Joachims, T. 2009. SVM-rank support vector machine. http://www.cs.cornell.edu/People/tj/svm_light/svm_rank.html.Google Scholar
- Joachims, T., Granka, L., Pang, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting click-through data as implicit feedback. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 154--161. Google Scholar
Digital Library
- Kontopoulos, G. 2007. Top Google ranking factors. http://www.squidoo.com/topGoogleRankingFactors.Google Scholar
- Li, L. 2008. Google’s top search engine ranking factors. http://lornali.com/online-marketing/seo/googles-top-search-engine-ranking-factors.Google Scholar
- Marshall, B. 2009. Top 10 most important Google ranking factors. http://blogs.myspace.com/index.cfm?fuseaction=blog.view&friendId=21196&blogId=493022330.Google Scholar
- Mitchell, T. 1997. Machine Learning. McGraw-Hill. Google Scholar
Digital Library
- Moran, M. and Hunt, B. 2005. Search Engine Marketing, Inc.: Driving Search Traffic to Your Company’s Web Site. Prentice Hall PTR, Upper Saddle River, NJ. Google Scholar
Digital Library
- Palaniswamy, A. 2005. Iptables dependency: Why we got there and how we got out. http://www.zimbrablog.com/blog/archives/2005/11/iptables-dependency-why-we-got-there-and-howwe-got-out.html.Google Scholar
- Panda, B., Herbach, J. S., Basu, S., and Bayardo, R. J. 2009. Planet: Massively parallel learning of tree ensembles with mapreduce. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB’09). Google Scholar
Digital Library
- Patel, N. 2006. A breakdown of Google’s ranking factors. http://www.pronetadvertising.com/articles/a-breakdown-of-googles-ranking-factors.html.Google Scholar
- PHP HTML Parser. 2014. PHP simple HTML dom parser. http://simplehtmldom.sourceforge.net.Google Scholar
- RankNet. 2009. RankNet: How Bing works. http://neotracks.blogspot.com/2009/06/ranknethow-bing-works.html.Google Scholar
- RSS Pieces. 2007. HTML validation: The hidden key to seo. http://www.rsspieces.com/html-validation-the-hidden-key-to-seo.Google Scholar
- SEOmoz. 2007. Search engine ranking factors. http://www.seomoz.org/article/search-ranking-factors.Google Scholar
- SeoPros. 2014. Seopros.org. http://www.seopros.org/.Google Scholar
- Seth, S. 2010. Yahoo! Transitions organic search back-end to Microsoft platform. http://www.ysearchblog.com/2010/08/24/yahoo-transitions-organic-search-back-end-to-microsoft-platform/.Google Scholar
- Su, A.-J., Hu, Y. C., Kuzmanovic, A., and Koh, C.-K. 2010. How to improve your Google ranking: Myths and reality. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’10). Google Scholar
Digital Library
- TOPSEOs. 2014. Topseos.com. http://www.topseos.com/.Google Scholar
- Vapnik, N. V. 2000. The Nature of Statistical Learning Theory. Springer. Google Scholar
Digital Library
- Wu, B. and Davison, B. D. 2005. Identifying link farm spam pages. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web (WWW’05). 820--829. Google Scholar
Digital Library
Index Terms
How to Improve Your Search Engine Ranking: Myths and Reality
Recommendations
Implementing white hat search engine technique in e-business website
IC4E '19: Proceedings of the 10th International Conference on E-Education, E-Business, E-Management and E-LearningIn today's worldwide of e-business marketing, it is important to know about search engine. Search Engine Marketing (SEM) is to use to gain visibility of the website based on the keywords. Consumer often interested in the top list of search results, ...
Search Engine Optimization (SEO) for Websites
ICCTA '19: Proceedings of the 2019 5th International Conference on Computer and Technology ApplicationsWith the growth of the Internet in the recent years, search engines such as Google, Bing, and Yahoo are becoming more and more crucial and reliable. The role of search engines is to index billions of web pages and display only the most relevant results ...
The impact of search engine optimization on online advertising market
ICEC '06: Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internetOnline advertising market is becoming a popular area of academic research. Among other types of advertising, search engine advertising is leading the growth in terms of revenue. In general, there are two types of search engine advertising: paid ...






Comments