Abstract
As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such inequalities enable our algorithm to adaptively take samples until it has enough data to make a decision. We implement our algorithm in a tool called VeriFair, and show that it scales to large machine learning models, including a deep recurrent neural network that is more than five orders of magnitude larger than the largest previously-verified neural network. While our technique only gives probabilistic guarantees due to the use of random samples, we show that we can choose the probability of error to be extremely small.
Supplemental Material
- Aws Albarghouthi, Loris D’Antoni, Samuel Drews, and Aditya V Nori. 2017. FairSquare: probabilistic verification of program fairness. In OOPSLA.Google Scholar
- Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Cal. L. Rev. 104 (2016), 671.Google Scholar
- Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. 2016. Measuring neural net robustness with constraints. In Advances in neural information processing systems. 2613–2621.Google Scholar
- Dan Biddle. 2006. Adverse impact and test validation: A practitioner’s guide to valid and defensible employment testing. Gower Publishing, Ltd.Google Scholar
- Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building classifiers with independency constraints. In ICDMW. 13–18.Google Scholar
- Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21, 2 (2010), 277–292.Google Scholar
Digital Library
- Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized Pre-Processing for Discrimination Prevention. In Advances in Neural Information Processing Systems. 3995– 4004.Google Scholar
Digital Library
- Yuansi Chen, Raaz Dwivedi, Martin J Wainwright, and Bin Yu. 2018. Fast MCMC sampling algorithms on polytopes. The Journal of Machine Learning Research 19, 1 (2018), 2146–2231.Google Scholar
Digital Library
- Guillaume Claret, Sriram K Rajamani, Aditya V Nori, Andrew D Gordon, and Johannes Borgström. 2013. Bayesian inference using data flow analysis. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 92–102.Google Scholar
Digital Library
- Edmund M Clarke and Paolo Zuliani. 2011. Statistical model checking for cyber-physical systems. In International Symposium on Automated Technology for Verification and Analysis. Springer, 1–12.Google Scholar
Cross Ref
- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797–806.Google Scholar
Digital Library
- Anupam Datta, Shayak Sen, and Yair Zick. 2017. Algorithmic transparency via quantitative input influence. In Transparent Data Mining for Big and Small Data. Springer, 71–94.Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214–226.Google Scholar
Digital Library
- Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Mark DM Leiserson. 2018. Decoupled Classifiers for Group-Fair and Efficient Machine Learning. In Conference on Fairness, Accountability and Transparency. 119–133.Google Scholar
- Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115.Google Scholar
- Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259–268.Google Scholar
Digital Library
- Antonio Filieri, Corina S Păsăreanu, and Willem Visser. 2013. Reliability analysis in symbolic pathfinder. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 622–631.Google Scholar
Digital Library
- Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. 2016. A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 144–152.Google Scholar
Cross Ref
- Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 498–510.Google Scholar
Digital Library
- Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. 2018. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In IEEE Symposium on Security and Privacy.Google Scholar
Cross Ref
- Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. Psi: Exact symbolic inference for probabilistic programs. In CAV.Google Scholar
- Jaco Geldenhuys, Matthew B Dwyer, and Willem Visser. 2012. Probabilistic symbolic execution. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. ACM, 166–176.Google Scholar
Digital Library
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. In ICLR.Google Scholar
- Google. 2018. Recurrent Neural Networks for Drawing Classification. https://www.tensorflow.org/versions/master/tutorials/ recurrent_quickdraw . Accessed: 2018-04-15.Google Scholar
- Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Rajamani. 2014. Probabilistic programming. In Proceedings of the on Future of Software Engineering. ACM, 167–181.Google Scholar
Digital Library
- Radu Grosu and Scott A Smolka. 2005. Monte carlo model checking. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 271–286.Google Scholar
Digital Library
- David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017).Google Scholar
- Sara Hajian and Josep Domingo-Ferrer. 2013. A methodology for direct and indirect discrimination prevention in data mining. IEEE transactions on knowledge and data engineering 25, 7 (2013), 1445–1459.Google Scholar
Digital Library
- Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. In NIPS. 3315–3323.Google Scholar
- Thomas Hérault, Richard Lassaigne, Frédéric Magniette, and Sylvain Peyronnet. 2004. Approximate probabilistic model checking. In International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 73–84.Google Scholar
Cross Ref
- Thomas Herault, Richard Lassaigne, and Sylvain Peyronnet. 2006. APMC 3.0: Approximate verification of discrete and continuous time Markov chains. In Quantitative Evaluation of Systems, 2006. QEST 2006. Third International Conference on. IEEE, 129–130.Google Scholar
- Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American statistical association 58, 301 (1963), 13–30.Google Scholar
Cross Ref
- Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In International Conference on Computer Aided Verification. Springer, 3–29.Google Scholar
Cross Ref
- Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1517–1525.Google Scholar
Digital Library
- Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification. Springer, 97–117.Google Scholar
Cross Ref
- Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems. 656–666.Google Scholar
Digital Library
- Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent trade-offs in the fair determination of risk scores. In ITCS.Google Scholar
- Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4069–4079.Google Scholar
- Marta Kwiatkowska, Gethin Norman, and David Parker. 2002. PRISM: Probabilistic symbolic model checker. In International Conference on Modelling Techniques and Tools for Computer Performance Evaluation. Springer, 200–204.Google Scholar
Cross Ref
- Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables. In KDD.Google Scholar
- Jim Lawrence. 1991. Polytope volume computation. Math. Comp. 57, 195 (1991), 259–271.Google Scholar
Cross Ref
- Axel Legay, Benoît Delahaye, and Saddek Bensalem. 2010. Statistical model checking: An overview. In International conference on runtime verification.Google Scholar
Cross Ref
- David Monniaux. 2000. Abstract interpretation of probabilistic semantics. In International Static Analysis Symposium. Springer, 322–339.Google Scholar
Cross Ref
- David Monniaux. 2001a. An abstract Monte-Carlo method for the analysis of probabilistic programs. In ACM SIGPLAN Notices, Vol. 36. ACM, 93–101.Google Scholar
Digital Library
- David Monniaux. 2001b. Backwards abstract interpretation of probabilistic programs. In European Symposium on Programming. Springer, 367–382.Google Scholar
Cross Ref
- Razieh Nabi and Ilya Shpitser. 2018. Fair inference on outcomes. In AAAI, Vol. 2018.Google Scholar
- Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 560–568.Google Scholar
Digital Library
- Aimee Picchi. 2019. Odds of winning $1 billion Mega Millions and Powerball: 1 in 88 quadrillion. CBS News (2019). https://www.cbsnews.com/news/odds-of-winning-1-billion-mega-millions-and-powerball-1-in-88-quadrillionGoogle Scholar
- Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. 2018. Certified defenses against adversarial examples. In ICLR.Google Scholar
- Adrian Sampson, Pavel Panchekha, Todd Mytkowicz, Kathryn S McKinley, Dan Grossman, and Luis Ceze. 2014. Expressing and verifying probabilistic assertions. In PLDI.Google Scholar
- Sriram Sankaranarayanan, Aleksandar Chakarov, and Sumit Gulwani. 2013. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In PLDI. 447–458.Google Scholar
- Koushik Sen, Mahesh Viswanathan, and Gul Agha. 2004. Statistical model checking of black-box probabilistic systems. In International Conference on Computer Aided Verification. Springer, 202–215.Google Scholar
Cross Ref
- Koushik Sen, Mahesh Viswanathan, and Gul Agha. 2005. On statistical model checking of stochastic systems. In International Conference on Computer Aided Verification. Springer, 266–280.Google Scholar
Digital Library
- Mallory Simon. 2009. HP looking into claim webcams can’t see black people. http://www.cnn.com/2009/TECH/12/22/hp. webcams/index.htmlGoogle Scholar
- Vincent Tjeng and Russ Tedrake. 2017. Verifying Neural Networks with Mixed Integer Programming. arXiv preprint arXiv:1711.07356 (2017).Google Scholar
- Leslie G Valiant. 1979. The complexity of computing the permanent. Theoretical computer science 8, 2 (1979), 189–201.Google Scholar
- Abraham Wald. 1945. Sequential tests of statistical hypotheses. The annals of mathematical statistics 16, 2 (1945), 117–186.Google Scholar
- Min Wen, Osbert Bastani, and Ufuk Topcu. 2019. Fairness with Dynamics. arXiv preprint arXiv:1901.08568 (2019).Google Scholar
- Håkan LS Younes, David J Musliner, et al. 2002. Probabilistic plan verification through acceptance sampling. In Proceedings of the AIPS-02 Workshop on Planning via Model Checking. Citeseer, 81–88.Google Scholar
- Håkan LS Younes and Reid G Simmons. 2002. Probabilistic verification of discrete event systems using acceptance sampling. In International Conference on Computer Aided Verification. Springer, 223–235.Google Scholar
Cross Ref
- Håkan LS Younes and Reid G Simmons. 2006. Statistical probabilistic model checking with a focus on time-bounded properties. Information and Computation 204, 9 (2006), 1368–1409.Google Scholar
Digital Library
- Hakan Lorens Samir Younes. 2004. Verification and Planning for Stochastic Processes with Asynchronous Events. Ph.D. Dissertation. Pittsburgh, PA, USA.Google Scholar
- Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1171–1180.Google Scholar
Digital Library
- Tal Z Zarsky. 2014. Understanding discrimination in the scored society. Wash. L. Rev. 89 (2014), 1375.Google Scholar
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333.Google Scholar
- Shengjia Zhao, Enze Zhou, Ashish Sabharwal, and Stefano Ermon. 2016. Adaptive Concentration Inequalities for Sequential Decision Problems. In NIPS. 1343–1351.Google Scholar
Index Terms
Probabilistic verification of fairness properties via concentration
Recommendations
Compositional probabilistic verification through multi-objective model checking
Compositional approaches to verification offer a powerful means to address the challenge of scalability. In this paper, we develop techniques for compositional verification of probabilistic systems based on the assume-guarantee paradigm. We target ...
A Counter Abstraction Technique for the Verification of Probabilistic Swarm Systems
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsWe introduce a method for formally verifying properties of arbitrarily large swarms whose agents are modelled probabilistically. We define a parameterised probabilistic semantics for modelling swarms and observe that their verification problem against ...
Model Checking Liveness Properties under Fairness & Anti-fairness Assumptions
APSEC '13: Proceedings of the 2013 20th Asia-Pacific Software Engineering Conference (APSEC) - Volume 01Model checking liveness properties needs antifairnessas well as fairness assumptions. As a formula expressing fairness assumptions becomes too long to make livenessmodel checking feasible, so does one expressing anti-fairness ones. ABP is used as an ...






Comments