Abstract

This paper considers an online control problem involving two controllers. A central controller chooses an action from a feasible set that is determined by time-varying and coupling constraints, which depend on all past actions and states. The central controller's goal is to minimize the cumulative cost; however, the controller has access to neither the feasible set nor the dynamics directly, which are determined by a remote local controller. Instead, the central controller receives only an aggregate summary of the feasibility information from the local controller, which does not know the system costs. We show that it is possible for an online algorithm using feasibility information to nearly match the dynamic regret of an online algorithm using perfect information whenever the feasible sets satisfy a causal invariance criterion and there is a sufficiently large prediction window size. To do so, we use a form of feasibility aggregation based on entropic maximization in combination with a novel online algorithm, named Penalized Predictive Control (PPC) and demonstrate that aggregated information can be efficiently learned using reinforcement learning algorithms. The effectiveness of our approach for closed-loop coordination between central and local controllers is validated via an electric vehicle charging application in power systems.
- Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111--119, 2019.Google Scholar
- Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In Advances in Neural Information Processing Systems, pages 10175--10184, 2019.Google Scholar
- Oren Anava, Elad Hazan, and Shie Mannor. Online learning for adversaries with memory: price of past mistakes. In Advances in Neural Information Processing Systems, pages 784--792, 2015.Google Scholar
- Masoud Badiei, Na Li, and Adam Wierman. Online convex optimization with ramp constraints. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 6730--6736. IEEE, 2015.Google Scholar
Cross Ref
- Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, (5):834--846, 1983.Google Scholar
Cross Ref
- Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. Safe model-based reinforcement learning with stability guarantees. In Advances in neural information processing systems, pages 908--918, 2017.Google Scholar
Digital Library
- Andrey Bernstein, Emiliano Dall'Anese, and Andrea Simonetto. Online primal-dual methods with measurement feedback for time-varying convex optimization. IEEE Transactions on Signal Processing, 67(8):1978--1991, 2019.Google Scholar
Digital Library
- Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online learning with imperfect hints. arXiv preprint arXiv:2002.04726, 2020.Google Scholar
- Xuanyu Cao and KJ Ray Liu. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on Automatic Control, 64(7):2665--2680, 2018.Google Scholar
Cross Ref
- Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, and Lachlan LH Andrew. Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 191--204, 2015.Google Scholar
Digital Library
- Niangjun Chen, Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman. Using predictions in online optimization: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193--206, 2016.Google Scholar
Digital Library
- Tianyi Chen and Georgios B Giannakis. Bandit convex optimization for scalable and dynamic iot management. IEEE Internet of Things Journal, 6(1):1276--1286, 2018.Google Scholar
Cross Ref
- Tianyi Chen, Qing Ling, and Georgios B Giannakis. An online convex optimization approach to proactive network resource allocation. IEEE Transactions on Signal Processing, 65(24):6350--6364, 2017.Google Scholar
Digital Library
- Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. arXiv preprint arXiv:1806.07104, 2018.Google Scholar
- Alon Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with only $sqrtT$ regret. In International Conference on Machine Learning, pages 1300--1309, 2019.Google Scholar
- Sarah Dean, Stephen Tu, Nikolai Matni, and Benjamin Recht. Safely learning to control the constrained linear quadratic regulator. In 2019 American Control Conference (ACC), pages 5582--5588. IEEE, 2019.Google Scholar
Cross Ref
- Gautam Goel and Adam Wierman. An online algorithm for smoothed regression and lqr control. Proceedings of Machine Learning Research, 89:2504--2513, 2019.Google Scholar
- Lars Grüne and Simon Pirkelmann. Economic model predictive control for time-varying system: Performance and stability results. Optimal Control Applications and Methods, 41(1):42--64, 2020.Google Scholar
Cross Ref
- Linqi Guo, Karl F Erliksson, and Steven H Low. Optimal online adaptive electric vehicle charging. In 2017 IEEE Power & Energy Society General Meeting, pages 1--5. IEEE, 2017.Google Scholar
Cross Ref
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861--1870, 2018.Google Scholar
- Eric Hall and Rebecca Willett. Dynamical models and tracking regret in online convex programming. In International Conference on Machine Learning, pages 579--587, 2013.Google Scholar
- Eric C Hall and Rebecca M Willett. Online convex optimization in dynamic environments. IEEE Journal of Selected Topics in Signal Processing, 9(4):647--662, 2015.Google Scholar
Cross Ref
- Elad Hazan. Introduction to online convex optimization. arXiv preprint arXiv:1909.05207, 2019.Google Scholar
- Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398--406, 2015.Google Scholar
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- Alec Koppel, Felicia Y Jakubiec, and Alejandro Ribeiro. A saddle point algorithm for networked online convex optimization. IEEE Transactions on Signal Processing, 63(19):5149--5164, 2015.Google Scholar
Digital Library
- Alec Koppel, Brian M Sadler, and Alejandro Ribeiro. Proximity without consensus in online multiagent optimization. IEEE Transactions on Signal Processing, 65(12):3062--3077, 2017.Google Scholar
Digital Library
- Zachary J Lee, Daniel Chang, Cheng Jin, George S Lee, Rand Lee, Ted Lee, and Steven H Low. Large-scale adaptive electric vehicle charging. In 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pages 1--7. IEEE, 2018.Google Scholar
- Zachary J Lee, Tongxin Li, and Steven H Low. Acn-data: Analysis and applications of an open ev charging dataset. In Proceedings of the Tenth ACM International Conference on Future Energy Systems, pages 139--149, 2019.Google Scholar
Digital Library
- Yanzhe Murray Lei, Stefanus Jasin, and Amitabh Sinha. Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Ross School of Business Paper, (1252), 2014.Google Scholar
- Antoine Lesage-Landry, Iman Shames, and Joshua A Taylor. Predictive online convex optimization. Automatica, 113:108771, 2020.Google Scholar
Digital Library
- Tongxin Li, Steven H Low, and Adam Wierman. Real-time flexibility feedback for closed-loop aggregator and system operator coordination. In Proceedings of the Eleventh ACM International Conference on Future Energy Systems, pages 279--292, 2020.Google Scholar
Digital Library
- Yingying Li, Guannan Qu, and Na Li. Using predictions in online optimization with switching costs: A fast algorithm and a fundamental limit. In 2018 Annual American Control Conference (ACC), pages 3008--3013. IEEE, 2018.Google Scholar
Cross Ref
- Minghong Lin, Zhenhua Liu, Adam Wierman, and Lachlan LH Andrew. Online algorithms for geographical load balancing. In 2012 international green computing conference (IGCC), pages 1--10. IEEE, 2012.Google Scholar
- Qiulin Lin, Hanling Yi, John Pang, Minghua Chen, Adam Wierman, Michael Honig, and Yuanzhang Xiao. Competitive online optimization under inventory constraints. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(1):1--28, 2019.Google Scholar
Digital Library
- Yiheng Lin, Gautam Goel, and Adam Wierman. Online optimization with predictions and non-convex losses. arXiv preprint arXiv:1911.03827, 2019.Google Scholar
- Shie Mannor, John N Tsitsiklis, and Jia Yuan Yu. Online learning with sample path constraints. Journal of Machine Learning Research, 10(Mar):569--590, 2009.Google Scholar
- Aryan Mokhtari, Shahin Shahrampour, Ali Jadbabaie, and Alejandro Ribeiro. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 7195--7201. IEEE, 2016.Google Scholar
Digital Library
- Miguel A Ortega-Vazquez, Francc ois Bouffard, and Vera Silva. Electric vehicle aggregator/system operator coordination for charging scheduling and services procurement. IEEE Transactions on Power Systems, 28(2):1806--1815, 2012.Google Scholar
Cross Ref
- Romer Rosales and Stan Sclaroff. Improved tracking of multiple humans with trajectory prediction and occlusion modeling. Technical report, Boston University Computer Science Department, 1998.Google Scholar
Digital Library
- Ugo Rosolia, Xiaojing Zhang, and Francesco Borrelli. Data-driven predictive control for autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems, 1:259--286, 2018.Google Scholar
- Guanya Shi, Yiheng Lin, Soon-Jo Chung, Yisong Yue, and Adam Wierman. Beyond no-regret: Competitive control via online optimization with memory. arXiv preprint arXiv:2002.05318, 2020.Google Scholar
- Ming Shi, Xiaojun Lin, Sonia Fahmy, and Dong-Hoon Shin. Competitive online convex optimization with switching costs and ramp constraints. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pages 1835--1843. IEEE, 2018.Google Scholar
Digital Library
- Shashank Singh, Ananya Uppal, Boyue Li, Chun-Liang Li, Manzil Zaheer, and Barnabás Póczos. Nonparametric density estimation under adversarial losses. In Advances in Neural Information Processing Systems, pages 10225--10236, 2018.Google Scholar
- Bo Sun, Ali Zeynali, Tongxin Li, Mohammad Hajiesmaili, Adam Wierman, and Danny HK Tsang. Competitive algorithms for the online multiple knapsack problem with application to electric vehicle charging. arXiv preprint arXiv:2010.00412, 2020.Google Scholar
- Wen Sun, Debadeepta Dey, and Ashish Kapoor. Safety-aware algorithms for adversarial contextual bandit. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3280--3288. JMLR. org, 2017.Google Scholar
Digital Library
- Xianfu Wang. Volumes of generalized unit balls. Mathematics Magazine, 78(5):390--395, 2005.Google Scholar
Cross Ref
- Xinlei Yi, Xiuxian Li, Lihua Xie, and Karl H Johansson. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68:731--746, 2020.Google Scholar
Cross Ref
- Hao Yu, Michael Neely, and Xiaohan Wei. Online convex optimization with stochastic constraints. In Advances in Neural Information Processing Systems, pages 1428--1438, 2017.Google Scholar
- Jianjun Yuan and Andrew Lamperski. Online convex optimization for cumulative constraints. In Advances in Neural Information Processing Systems, pages 6137--6146, 2018.Google Scholar
- Lijun Zhang, Tianbao Yang, Zhi-Hua Zhou, et al. Dynamic regret of strongly adaptive methods. In International Conference on Machine Learning, pages 5882--5891, 2018.Google Scholar
- Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928--936, 2003.Google Scholar
Digital Library
Index Terms
Information Aggregation for Constrained Online Control
Recommendations
Information Aggregation for Constrained Online Control
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsWe consider a two-controller online control problem where a central controller chooses an action from a feasible set that is determined by time-varying and coupling constraints, which depend on all past actions and states. The central controller's goal ...
Information Aggregation for Constrained Online Control
SIGMETRICS '21We consider a two-controller online control problem where a central controller chooses an action from a feasible set that is determined by time-varying and coupling constraints, which depend on all past actions and states. The central controller's goal ...
Model Predictive 2DOF PID Control for Slip Suppression of Electric Vehicles
ICINCO 2014: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 2This paper propose the design method of 2DOF (two degrees of freedom) PID (Proportional-Integral-
Derivative) controller based on MPC (Model predictive control). This controller is called as MP-2DOF PID
controller. The method repeatedly optimizes the ...






Comments