Abstract
Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.
- Mohammad Gheshlaghi Azar, Remi Munos, Mohammad Ghavamzadeh, and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in Neural Information Processing Systems.Google Scholar
- Sulin Ba and Paul A. Pavlou. 2002. Evidence of the effect of trust building technology in electronic markets: Price premiums and buyer behavior. MIS Quart. 26, 3 (2002), 243--268.Google Scholar
Digital Library
- Dimitri P. Bertsekas and John N. Tsitsiklis. 1996. Neuro-Dynamic Programming (1st ed.). Athena Scientific.Google Scholar
Digital Library
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.Google Scholar
Digital Library
- Steven J. Bradtke and Michael O. Duff. 1994. Reinforcement learning methods for continuous-time Markov decision problems. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’94).Google Scholar
- Alpha C. Chiang. 1984. Fundamental Methods of Mathematical Economics. McGraw-Hill/Irwin, Boston, Mass.Google Scholar
- Chrysanthos Dellarocas. 2001. Analyzing the economic efficiency of eBay-like online reputation reporting mechanisms. In Proceedings of the ACM Conference on Economics and Computation (EC’01).Google Scholar
Digital Library
- Adithya M. Devraj and Sean Meyn. 2017. Zap Q-learning. In Advances in Neural Information Processing Systems. 2235--2244.Google Scholar
- Prashant Dewan and Partha Dasgupta. 2010. P2P reputation management using distributed identities and decentralized recommendation chains. IEEE Trans. Knowl. Data Eng. 22, 7 (2010), 1000--1013.Google Scholar
Digital Library
- eBay. 1995. eBay Classifies Sellers into Twelve Stars. Retrieved from http://pages.ebay.com/help/feedback/scores-reputation.html.Google Scholar
- Fortune500. 2015. Retrieved from http://fortune.com/fortune500/.Google Scholar
- Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the Annual Conference on the World Wide Web (WWW’04). 403--412.Google Scholar
Digital Library
- Kevin Hoffman, David Zage, and Cristina Nita-Rotaru. 2009. A survey of attack and defense techniques for reputation systems. ACM Comput. Surv. 42, 1, Article 1 (December 2009), 31 pages.Google Scholar
- Daniel Houser and John Wooders. 2006. Reputation in auctions: Theory, and evidence from eBay. J. Econ. Manage. Strategy 15, 2 (2006).Google Scholar
Cross Ref
- Daniel R. Jiang and Warren B. Powell. 2015. An approximate dynamic programming algorithm for monotone value functions. Operat. Res. 63, 6 (2015), 1489--1511.Google Scholar
Digital Library
- Ginger Zhe Jin and Andrew Kato. 2006. Price, quality, and reputation: Evidence from an online field experiment. AND J. Econ. 37, 4 (2006), 983--1005.Google Scholar
- Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. 2003. The eigentrust algorithm for reputation management in P2P networks. In Proceedings of the Annual Conference on the World Wide Web (WWW’03).Google Scholar
Digital Library
- Tapan Khopkar, Xin Li, and Paul Resnick. 2005. Self-selection, slipping, salvaging, slacking, and stoning: The impacts of negative feedback at eBay. In Proceedings of the ACM Conference on Economics and Computation (EC’05).Google Scholar
Digital Library
- Stuart Landon and Constance E. Smith. 1998. Quality expectations, reputation, and price. South. Econ. J. 64, 3 (1998), 628--647.Google Scholar
Cross Ref
- Nolan Miller, Paul Resnick, and Richard Zeckhauser. 2005. Eliciting informative feedback: The peer-prediction method. Manage. Sci. 51, 9 (September 2005), 1359--1373.Google Scholar
- Lev Muchnik, Sinan Aral, and Sean J. Taylor. 2013. Social influence bias: A randomized experiment. Science 341, 6146 (2013), 647--651.Google Scholar
- Martin L. Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons.Google Scholar
Digital Library
- Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman. 2000. Reputation systems. Commun. ACM 43, 12 (December 2000), 45--48.Google Scholar
Digital Library
- Paul Resnick and Rahul Sami. 2009. Sybilproof transitive trust protocols. In Proceedings of the ACM Conference on Economics and Computation (EC’09).Google Scholar
Digital Library
- Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. Ann. Math. Stat. 22, 3 (1951), 400--407.Google Scholar
Cross Ref
- Aameek Singh and Ling Liu. 2003. TrustMe: Anonymous management of trust relationships in decentralized P2P systems. In Proceedings of the Annual Peer-to-Peer Conference (P2P’03).Google Scholar
Cross Ref
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT press Cambridge.Google Scholar
Digital Library
- Hong Xie and John C. S. Lui. 2015. A data driven approach to uncover deficiencies in online reputation systems. In Proceedings of the IEEE International Conference on Data Mining (ICDM’15).Google Scholar
- Hong Xie and John C. S. Lui. 2015. Modeling eBay-like reputation systems: Analysis, characterization and insurance mechanism design. Perf. Eval. 91 (2015), 132--149.Google Scholar
Digital Library
- Hong Xie and John C. S. Lui. 2017. Mining deficiencies of online reputation systems: Methodologies, experiments and implications. IEEE Trans. Serv. Comput. 13, 5 (2017), 887--900. DOI:https://doi.org/10.1109/TSC.2017.2730206Google Scholar
Cross Ref
- Hong Xie, Richard T. B. Ma, and John C. S. Lui. 2018. Enhancing reputation via price discounts in E-commerce systems: A data-driven approach. ACM Trans. Knowl. Discov. Data 20, 3, Article 26 (Jan. 2018), 29 pages. DOI:https://doi.org/10.1145/3154417Google Scholar
- Li Xiong and Ling Liu. 2004. Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. Knowl. Data Eng. 16, 7 (2004), 843--857.Google Scholar
Digital Library
- Haitao Xu, Daiping Liu, Haining Wang, and Angelos Stavrou. 2015. E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In Proceedings of the Annual Conference on the World Wide Web (WWW’15).Google Scholar
Digital Library
- Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. In Proceedings of the ACM Special Interest Group on Data Communication Conference (SIGCOMM’06).Google Scholar
Digital Library
- Xiuzhen Zhang, Lishan Cui, and Yan Wang. 2014. Commtrust: Computing multi-dimensional trust by mining e-commerce feedback comments. IEEE Trans. Knowl. Data Eng. 26, 7 (2014), 1631--1643.Google Scholar
Cross Ref
Index Terms
A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems
Recommendations
Enhancing Reputation via Price Discounts in E-Commerce Systems: A Data-Driven Approach
Reputation systems have become an indispensable component of modern E-commerce systems, as they help buyers make informed decisions in choosing trustworthy sellers. To attract buyers and increase the transaction volume, sellers need to earn reasonably ...
Trading Discount for Reputation?: On the Design and Analysis of E-Commerce Discount Mechanisms
SIGMETRICS '16: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer ScienceWe develop an optimization framework to trade short-term profits for reputation (i.e., reducing ramp-up time). We apply the stochastic bandits framework to design an online discounting mechanism which infers the optimal discount from a seller's ...
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
As more companies are beginning to adopt the e-business model, it becomes easier for buyers to compare prices at multiple sellers and choose the one that charges the best price for the same item or service. As a result, the demand for the goods of a ...






Comments