skip to main content
research-article

A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems

Authors Info & Claims
Published:27 October 2020Publication History
Skip Abstract Section

Abstract

Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.

References

  1. Mohammad Gheshlaghi Azar, Remi Munos, Mohammad Ghavamzadeh, and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  2. Sulin Ba and Paul A. Pavlou. 2002. Evidence of the effect of trust building technology in electronic markets: Price premiums and buyer behavior. MIS Quart. 26, 3 (2002), 243--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimitri P. Bertsekas and John N. Tsitsiklis. 1996. Neuro-Dynamic Programming (1st ed.). Athena Scientific.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Steven J. Bradtke and Michael O. Duff. 1994. Reinforcement learning methods for continuous-time Markov decision problems. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’94).Google ScholarGoogle Scholar
  6. Alpha C. Chiang. 1984. Fundamental Methods of Mathematical Economics. McGraw-Hill/Irwin, Boston, Mass.Google ScholarGoogle Scholar
  7. Chrysanthos Dellarocas. 2001. Analyzing the economic efficiency of eBay-like online reputation reporting mechanisms. In Proceedings of the ACM Conference on Economics and Computation (EC’01).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Adithya M. Devraj and Sean Meyn. 2017. Zap Q-learning. In Advances in Neural Information Processing Systems. 2235--2244.Google ScholarGoogle Scholar
  9. Prashant Dewan and Partha Dasgupta. 2010. P2P reputation management using distributed identities and decentralized recommendation chains. IEEE Trans. Knowl. Data Eng. 22, 7 (2010), 1000--1013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. eBay. 1995. eBay Classifies Sellers into Twelve Stars. Retrieved from http://pages.ebay.com/help/feedback/scores-reputation.html.Google ScholarGoogle Scholar
  11. Fortune500. 2015. Retrieved from http://fortune.com/fortune500/.Google ScholarGoogle Scholar
  12. Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the Annual Conference on the World Wide Web (WWW’04). 403--412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kevin Hoffman, David Zage, and Cristina Nita-Rotaru. 2009. A survey of attack and defense techniques for reputation systems. ACM Comput. Surv. 42, 1, Article 1 (December 2009), 31 pages.Google ScholarGoogle Scholar
  14. Daniel Houser and John Wooders. 2006. Reputation in auctions: Theory, and evidence from eBay. J. Econ. Manage. Strategy 15, 2 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  15. Daniel R. Jiang and Warren B. Powell. 2015. An approximate dynamic programming algorithm for monotone value functions. Operat. Res. 63, 6 (2015), 1489--1511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ginger Zhe Jin and Andrew Kato. 2006. Price, quality, and reputation: Evidence from an online field experiment. AND J. Econ. 37, 4 (2006), 983--1005.Google ScholarGoogle Scholar
  17. Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. 2003. The eigentrust algorithm for reputation management in P2P networks. In Proceedings of the Annual Conference on the World Wide Web (WWW’03).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tapan Khopkar, Xin Li, and Paul Resnick. 2005. Self-selection, slipping, salvaging, slacking, and stoning: The impacts of negative feedback at eBay. In Proceedings of the ACM Conference on Economics and Computation (EC’05).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Stuart Landon and Constance E. Smith. 1998. Quality expectations, reputation, and price. South. Econ. J. 64, 3 (1998), 628--647.Google ScholarGoogle ScholarCross RefCross Ref
  20. Nolan Miller, Paul Resnick, and Richard Zeckhauser. 2005. Eliciting informative feedback: The peer-prediction method. Manage. Sci. 51, 9 (September 2005), 1359--1373.Google ScholarGoogle Scholar
  21. Lev Muchnik, Sinan Aral, and Sean J. Taylor. 2013. Social influence bias: A randomized experiment. Science 341, 6146 (2013), 647--651.Google ScholarGoogle Scholar
  22. Martin L. Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman. 2000. Reputation systems. Commun. ACM 43, 12 (December 2000), 45--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Paul Resnick and Rahul Sami. 2009. Sybilproof transitive trust protocols. In Proceedings of the ACM Conference on Economics and Computation (EC’09).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. Ann. Math. Stat. 22, 3 (1951), 400--407.Google ScholarGoogle ScholarCross RefCross Ref
  26. Aameek Singh and Ling Liu. 2003. TrustMe: Anonymous management of trust relationships in decentralized P2P systems. In Proceedings of the Annual Peer-to-Peer Conference (P2P’03).Google ScholarGoogle ScholarCross RefCross Ref
  27. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT press Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hong Xie and John C. S. Lui. 2015. A data driven approach to uncover deficiencies in online reputation systems. In Proceedings of the IEEE International Conference on Data Mining (ICDM’15).Google ScholarGoogle Scholar
  29. Hong Xie and John C. S. Lui. 2015. Modeling eBay-like reputation systems: Analysis, characterization and insurance mechanism design. Perf. Eval. 91 (2015), 132--149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hong Xie and John C. S. Lui. 2017. Mining deficiencies of online reputation systems: Methodologies, experiments and implications. IEEE Trans. Serv. Comput. 13, 5 (2017), 887--900. DOI:https://doi.org/10.1109/TSC.2017.2730206Google ScholarGoogle ScholarCross RefCross Ref
  31. Hong Xie, Richard T. B. Ma, and John C. S. Lui. 2018. Enhancing reputation via price discounts in E-commerce systems: A data-driven approach. ACM Trans. Knowl. Discov. Data 20, 3, Article 26 (Jan. 2018), 29 pages. DOI:https://doi.org/10.1145/3154417Google ScholarGoogle Scholar
  32. Li Xiong and Ling Liu. 2004. Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. Knowl. Data Eng. 16, 7 (2004), 843--857.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Haitao Xu, Daiping Liu, Haining Wang, and Angelos Stavrou. 2015. E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In Proceedings of the Annual Conference on the World Wide Web (WWW’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. In Proceedings of the ACM Special Interest Group on Data Communication Conference (SIGCOMM’06).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xiuzhen Zhang, Lishan Cui, and Yan Wang. 2014. Commtrust: Computing multi-dimensional trust by mining e-commerce feedback comments. IEEE Trans. Knowl. Data Eng. 26, 7 (2014), 1631--1643.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 20, Issue 4
        November 2020
        391 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3427795
        • Editor:
        • Ling Liu
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2020
        • Accepted: 1 May 2020
        • Revised: 1 April 2020
        • Received: 1 November 2019
        Published in toit Volume 20, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!