FrugalLight: Symmetry-Aware Cyclic Heterogeneous Intersection Control using Deep Reinforcement Learning with Model Compression, Distillation and Domain Knowledge

Developing countries need to better manage fast increasing traffic flows, owing to rapid urbanization. Else, increasing traffic congestion would increase fatalities due to reckless driving, as well as keep vehicular emissions and air pollution critically high in cities like New Delhi. State-of-the-art traffic signal control methods in developed countries, however, use expensive sensing, computation and communication resources. How far can control algorithms go, under resource constraints, is explored through the design and evaluation of FrugalLight (FL) in this paper. We also captured and processed a real traffic dataset at a busy intersection in New Delhi, India, using efficient techniques on low cost embedded devices. This dataset (https://delhi-trafficdensity-dataset.github.io ) contains traffic density information at fine time granularity of one measurement every second, from all approaches of the intersection for 40 days. FrugalLight (https://github.com/sachin-iitd/FrugalLight ) is evaluated on the collected traffic dataset from New Delhi and another open source traffic dataset from New York. FrugalLight matches the performance of state-of-the-art Convolutional Neural Network (CNN) based sensing and Deep Reinforcement Learning (DRL) based control algorithms, while utilizing resources less by an order of magnitude. We further explore improvements using a careful combination of knowledge distillation and domain knowledge based DRL model compression, with employing Model-Agnostic Meta-Learning to quickly adapt to traffic at new intersections. The collected real dataset and FrugalLight therefore opens up opportunities for resource efficient RL based intersection control design for the ML research community, where the controller should have limited carbon footprint. Such intelligent, green, intersection controllers can help reduce traffic congestion and associated vehicular emissions, even if compute and communication infrastructure is limited in low resource regions. This is a critical step towards achieving two of the United Nations Sustainable Development Goals (SDG), namely sustainable cities and communities and climate action.


INTRODUCTION
Traic signal control, a historical application of automated planning and scheduling, is seeing a dramatic shift with recent advances in Internet of Things (IoT), Computer Vision, and Reinforcement Learning (RL).As shown in [47,52], researchers, in collaboration with city authorities, are collecting traic videos at scale using IoT devices, processing them using the latest Convolutional Neural Networks (CNN) based computer vision methods, and using the processed data for traic light control using Deep Reinforcement Learning (DRL).
The problem of traic congestion is acute in developing countries like India, further contributing to the high air pollution in cities like New Delhi [10].Therefore the state-of-the-art intersection control algorithms should also be used to beneit these low resource communities.There is, however, a key challenges involved in transferring technology as it is: the current intersection control algorithms [45,46] use exact vehicle counts, and work perfectly in developed countries with orderly laned traic.Traic in developing countries is non-lane based (sample images in Figure4).Thus exact vehicle counts computed using state-of-the-art CNNs like YOLO, even if work ine at daytime [8], start dropping in accuracy at night.But intersection control is needed at all times, independent of the lighting condition.Thus intersection control algorithms should use information that can be computed at all times.There is, therefore, a clear gap in data and algorithms for automated intersection control in developing regions.This paper seeks to bridge this gap, with the following three contributions: ❶We release the irst traic density dataset (https:// delhi-traicdensity-dataset.github.io)from a developing country intersection (attributed by non-lane behaviour, connectivity/compute limitations), under a Creative Commons Attribution 4.0 International License [7].We deployed 6 cameras at a busy 3-approach intersection in New Delhi.The location and camera placements is indicated in Figure 1.Due to lack of broadband connectivity from the road to the cloud, the video feeds could not be transmitted.This lack of cloud connectivity will hold in any actual intersection control deployment for a developing country.So we performed in-situ processing of the traic videos, using computer vision algorithms like background subtraction [6], that also work well at night, unlike CNNs.The released dataset contains traic density information at ine time granularity of one measurement every second, from all approaches of the intersection for 40 days between Sep-Dec 2020 (ğ 3).
❷We carefully design a reinforcement learning based intersection control algorithm FrugalLight(FL) in this paper, that needs traic density information as input and not exact vehicle count.FL (https:// github.com/sachiniitd/ FrugalLight) is shown to improve transportation metrics like throughput and travel time over traditional and state-of-the-art policies, while showing better transferability and fast adaptability.More importantly, FL and all computer vision inputs to FL, are computationally so eicient, that we can run them on low cost embedded platforms at the intersection in real time, without cloud connectivity (ğ 5).
❸We compare FL with state-of-the-art intersection control algorithms [45,46] on this new Delhi dataset, as well as an open source traic dataset from New York, the latter containing exact vehicle count information.Surprisingly, FL matches the performance of the baseline RL algorithms, while utilizing computation and communication resources less by an order of magnitude.This shows that algorithms designed for constrained datasets (only traic density) as released in this paper, can work well even in resourceful countries with unconstrained data (exact vehicle counts).Thus if ML researchers start considering eiciency metrics like model size and inference latency, in addition to accuracy metrics, ML algorithms for critical applications like intersection control can have less carbon footprint than the state-of-the-art [45,46] (ğ 6).
❺The challenges of directly importing the FrugalLightmodel are three fold ś (i) extreme budget constraints, which allows for only very low cost, compute and RAM constrained, embedded platforms to be deployed (ii) poor network connectivity between the road and the servers, forcing all analysis to happen in-situ on the road and (iii) chaotic non-laned driving behavior in developing regions, which makes accurate video analysis for exact counting and classiication of vehicles harder.We hence extend FrugalLightfor learning eicient Look-Up Table (LUT) based or threshold based intersection control.This solution, termed EcoLight [9], performs at par with the compute intensive methods, at a mere fraction of runtime overhead.Optimizing computational overhead while not losing accuracy has been challenging for EcoLight.We reduce DRL states from over a thousand dimensions in state-of-the-art papers [45,46] to one or two dimensions.We remove the DNN based RL computation at runtime using static LUTs.We quantize the original continuous values of DRL states for inite sized LUTs.All these optimizations needed to be carefully tuned for accuracy.We experiment with both open-source developed country dataset and a custom developing region dataset, created by us from our deployed cameras.As a result of careful tuning, EcoLight gives comparable beneits and sometimes even improves upon the compute-intensive methods, on both performance metrics (throughput, average travel time etc.) and fairness metrics (worst case travel time, vehicles stuck etc.) (ğ 7).
❻Finally, an end-to-end system has also been demonstrated in this paper.This incorporates video feeds from cameras at a real intersection and computer vision based traic density estimation for input to the control algorithms.Our results show great promise towards practical adaptive intersection control at extreme budget and network constraints, a vital necessity for sustainability (ğ 8).

RELATED WORK
Researchers have identiied and modelled traic congestion in developing countries using diferent methods [5,23,24].To mitigate the problem of traic congestion, in most developing countries today, traditional traic signal control methods are still being used.These are static systems that work by changing the phase periodically with a ixed cycle length.Research has suggested that such traic lights even become the cause of traic jams in some cases as shown in [4].In order to deal with the issue of dynamic-updation, researchers have modeled the problem of traic control as an optimization problem based on certain assumptions and come up with rules for setting the phase based on the traic densities in the network as shown in [28,29,38,42].Even with this approach, the rules obtained are pre-deined and cannot be dynamically adjusted for real-time traic.A variety of Reinforcement Learning works, like [2,11,14,15,32,35,41], have proposed diferent state and reward formulations for diferent action choices.
Recent Reinforcement-Learning based methods [45,46] outperform the traditional traic signal control methods in simulation environments.These methods, however, require signiicant computational and communication resources as each agent uses a Deep Neural Network to compute the current Phase and communicates to a central server in real-time, making them infeasible for deployment in developing countries.A recent work EcoLight [9] uses very small dimensional states, with fairness optimizations, to generate lookup tables to be deployed in lieu of the DRL models.The lookup tables being static, cannot fully leverage the dynamic traic conditions.Our method FrugalLightis an extension of this work and can adapt to changing traic conditions and can be cost-efectively deployed in developing countries.
High quality real traic data is necessary to develop ML/RL based control algorithms for eicient intersection control.Researchers have been generating real traic data through various means.[20] used loop sensors based dataset creation and [39] used loop sensors to calculate traic queue length, but as per [16,34], though cheap and widely used, there are several operational constraints with loop sensors, like they degrade with the conditions of the road and water penetration afects the performance.They are prone to damage due the poor state of our road surfaces, and through weaknesses caused by the installation of loops or other damage over time such as potholes.Loop tails are also often cut in the course of other road works such as utility companies accessing their infrastructure.These indings were used by diferent city authorities for improvements in transport infrastructure, As per [13], Swansea City Council opted for an alternate solution considering such factors.We found similar case for London [25,48].
Besides, [45,46] used taxi data to approximate real traic data in New York.[37] recorded the traic in Doha, Qatar in the peak hour of weekday and reproduced in the SUMO [26] simulator.[44] used open source camera data from Hefei (China) to analyze and use a peak hour low for the experiments.[2] used the 24 hour primary vehicle Origin-Destination data collected from the municipality of Tehran and adjusted it by one-hour interval traic count data obtained from traic sensors and gathered (impatient) pedestrians data via ieldwork.[3], used the six hourly origin-destination matrices calculated by the municipality of Tehran for the traic demands between 6am and 12pm on a workday.Continuing this efort of creating datasets for traic intersection control, we alongside provide multiple days traic density data that can be used to train ML models directly or can be used to generate smaller datasets to be used in various simulators like citylow [52] and SUMO [26].

REAL DATA DESCRIPTION 3.1 Data Collection Challenges
Installing traic cameras at a busy intersection in New Delhi, involved a series of challenges.Proper infrastructure to mount the cameras and draw connections for power and communication, was needed.All work had to be done by minimally afecting the low of traic.Permissions were needed from the traic authorities.We collaborated with an industry partner1 for the deployment.It was also not feasible to send raw video from cameras to cloud server, hence edge computing was required.Camera video was processed at the intersection itself, to generate traic density numbers, and the density values were stored locally for periodic retrieval.During deployment, we observed multiple issues hampering the sound operation of the deployed system: ❶One camera power adapter failure (1 camera down), ❷Power failure for one approach (2 cameras down), ❸Communication line failure from one approach (2 cameras down), ❹Multiple times Local Processing Unit hang or power-of (all cameras down).In issues related to the camera device, a crane was required for the repair.For issues at ground level, eforts had to be made to trace down the point of failure, and then replacement of the faulty component.All of these require days to weeks to get done, due to many dependencies involved in the maintenance process.We inally had 40 days of complete data in a 4-month duration (Sep-Dec 2020), as shown in Figure 2. Due to winter, heavy fog and late sunrise, during Dec 15-19, we started delayed data collection as compared to previous days, hence the plot shows a bit smaller lines for these 5 days.

Dataset Processing
Before discussing the processing of the traic density data, we irst describe our density estimation method based on background subtraction.As presented in Figure 3, a continuously updated Background Filter (with learning rate ) is subtracted from each frame to get the foreground, and the ratio foreground/background denotes Queue Density.To ind Stop Density, we need to discard the density caused by the moving/dynamic traic.Using the optical low algorithm to detect moving pixels, we computed the standing traic (Stop Density).The inbuilt adaptation in Background Subtraction makes the processing robust to the changing ambient light conditions.The density estimation code2 deployed on the road to generate queue and stop densities contains the methods to receive the video frames from the camera and process it using background subtraction to generate Queue Density and Stop Density values for the each camera (with cuda optimized version).There are background images for each camera used to initiate the background subtraction algorithm in the morning and projection coordinates to transform 3D view 3 of the camera to 2D vertical view 4 .

Dataset uality
To work with intersection control algorithms, camera frames are processed to get representative traic summary.CNN based Computer Vision methods like YOLO [36] are used to identify the vehicles present on the road, and enumerate the identiied vehicles to get vehicle count.Figure 4 shows two good examples of YOLO traic identiication on the Left, and two fairly miserable detections due to occlusion in heavy traic in the Middle.Poor lighting conditions also drastically afect YOLO accuracy.CNN accuracy vs. latency: We use the labeled dataset from [8] to explicitly check the accuracy vs. latency trade-of of diferent CNN models.We split the dataset into train and test data as shown in Table 1, and train YOLO V2 and Tiny-YOLO models.On test data, the average accuracies are reported in terms of mAP, precision and recall in Table 2.As seen from the values, Tiny-YOLO reports low mAP and low recall compared to YOLO V2.In terms of latency, as measured on NVIDIA Jetson TX2, CPU frame rates are really low for both models, whereas GPU rates are low for YOLO V2 and moderate for Tiny-YOLO.We observe that the Traic Density calculation using background subtraction [6] works pretty well, in both natural and street lighting.Figure 4(Right) shows two example density computations, in light to moderate traic (Top) and in heavy traic (Bottom).The Top graph corresponds to Left of YOLO detections, and the Bottom graph corresponds to Middle of YOLO detections.Queue density grows correctly between red and green signal vertical lines, signifying the red phase for the approach.In contrast, queue density drops between green and the next red signal, signifying the green phase for the approach.Dynamic/moving density remains close to 0 in heavy traic (as seen in below bottom plot) during red cycle and only rises in green cycle.As per our observations for other dataset collection using loop-based or camera data, intermittent validation is a feasible way forward.We have observed high density (∼1) when the frame was full with vehicles, and very low (almost 0) in case of no traic or when the vehicles pass completely across the intersection.In between also, we have observed density values in the similar ratio as traic is present on the road.So, over many manually veriied parts of the dataset, we repeatedly observed this perfect density calculation vs. high YOLO errors.This is understandable as density estimation is a easier task than detecting vehicle bounding boxes.The traic density processing code is available at https:// github.com/sachin-iitd/ TraicDensity.
In addition to being accurate, our density estimation code runs at 6 FPS on low cost embedded platform (1.8 GHz Intel(R) Atom(TM) CPU D525 with 4 logical cores and 8GB RAM) budgeted by our deployment partners, and gives us Queue Density and Stop Density values per second for the 6 cameras.The dataset, thus also has ine granularity of recorded traic density measurements.Histogram of Queue and Stop densities: We analyzed the histogram of the queue and stop densities of our dataset.Both the densities are available in the scale of 0 -1, where 0 means empty road with no traic and 1 means road with full traic.Figure 5 shows the histogram of Queue and Stop densities for the 6 cameras.By manual observations of the traic images, we have seen that approach 3 usually had limited traic stopped for the red light, the same was found evident in the StopDensity5 histogram which explicitly shows high occurrence for almost none waiting traic beyond the camera 6 scope.

Dataset Uniqueness
Our dataset is a longitudinal dataset which is collected over many days from all approaches at an intersection, which is needed speciically for traic intersection control algorithms.Using the background subtraction and optical low techniques, our dataset contains traic density and stop density for each approach per second.We further convert the density values from to traic dataset suitable for simulator evaluations with methods similar to other works.Inter'n Layout, denotes how many roads cross to create how many intersections, describing the road network architecture.E.g. 16x1=16 indicates there is one road perpendicular to 16 roads, crossing each of them creating 16 intersections.Num App denotes the (ixed) number of approaches at intersections.The next three columns indicate the traic volume (in vehicles arriving per 5 minutes), duration of datasets (from 1 to 6 hours), and the geographical location (from USA/India/China) from which the datasets are collected.The last column points the source of the data (Taxi trip information, Camera) and the processing method (YOLO: [36], Background Subtraction: [6]) The single intersection data from New Delhi corresponding to Cam+BackSub process method is generated from 1-6 hours portions of the shared 40 days traic density dataset.
The uniqueness of this dataset lies in ❶limited features (density, not vehicle count), which we show is a practical information to obtain in real time in a developing country, and ❷longitudinal nature.The analysis of multiple days data from our New Delhi dataset is shown in Figure 6 5 , the horizontal axis denote the hour of the day, and vertical axis denote the Queue Density.The box-plot shows the peak traic during morning and evening hours for the Approaches 1 and 2. The approach 3, which joins the other approaches to form a T junction, has an independent pattern where the traic increases as the day progresses.The variation in density at each hour over the 40 days, show how dynamic traic is at this Delhi intersection, and how ML researchers can use this to benchmark their RL based intersection control algorithms.

NEED FOR INTELLIGENT TRAFFIC LIGHT CONTROL
Traic in both developed and developing countries is very dynamic, which is very hard to be approximated, predicted or calculated with simple equations and formulas.It is also heavily dependent on location, time, and many other local constraints.Utilization of the features of AI and DRL is necessary to approximate the traic behaviour in a better way.Such time varying and complex traic information needs suitable methods, to be processed and transformed into an efective policy to help control the traic more efectively.Reinforcement Learning (RL) methods have shown great promise to learn efective control policies from such situations.Traditional RL methods used in the research either could not capture the varying traic (limiting experiments to hourly traic), or miss out to process the data in a structured way (by simply trying to it to the raw data).We try to overcome this problem by presenting a structured and formalized way to learn from the real data efectively.
Existing methods work with vehicle count (and their lanes and distance from the intersection), which required expensive in-situ processing to convert camera videos to vehicle count, with models such as YOLO [36].As we could eiciently process the real-time traic data in terms of density, the new RL method should be able to efectively work using limited information such as only traic density as the input parameters.
The state-of-the-art models are big in size which makes them unsuitable for deployment over low cost edge devices.Our new model should be having small size, making it fast in processing and hence easily deployable on edge platform.As no existing RL algorithms [45,46] meets these requirements, we formulate the traic control problem as a Markov Decision Process (MDP) and design FrugalLight.

Problem Definition
To start-with, we deine the problem of traic signal control as a Markov Process.Each intersection in the system is controlled by an agent running independently, and without any communication with the others.In this setting, each agent observes part of the total system, and decides for its own intersection whether to keep the same phase or switch to the next, so as to minimize the average traic density on the approaches around the intersection.Speciically, the problem can be characterized by the following major components < S, O, A, P, , , > as described in detail below.
❶With system state space S and observation space O, we assume that there are intersections in the system and each agent can observe part of the system state S as its observation O.We deine for agent at time , which consists of traic density in one or two dimensions as described later.
❷With set of actions A, at time , an agent would choose an action from its candidate action set A as a decision for the next Δ period of time.Here, each agent would choose either 0 or 1 as its action , indicating that from time to + Δ, this intersection would be in same phase or under transition to the next phase.
❸With transition probability P, given the system state and actions of agent at time , the system arrives at the next state +1 according to the state transition probability ( +1 | , ). ❹With reward , each agent obtains an immediate reward from the environment at time .In this paper, we want to minimize the travel time for all vehicles in the system, which is hard to optimize directly.Therefore, we deine the reward for intersection as = − , where , is the stop density on the approach of intersection at time .
❺With Policy and discount factor , as the independent actions have long-term efects on the system, we want to minimize the expected stop density of each intersection in each episode.Speciically, at time , each agent chooses an action following a certain policy O x A → , aiming to maximize its total reward = = − , where is total time steps of an episode and [0, 1] diferentiates the rewards in terms of temporal proximity.
In this paper, we use the action-value function ( ) for each agent at the ℎ iteration (parameterized by ) to approximate total reward with neural networks by minimizing the loss: where ′ denotes the next observation for .These earlier snapshots of parameters are periodically updated with the most recent network weights and help increase the learning stability by de-correlating predicted and target q-values.

FRUGALLIGHT
Based on a recent and comprehensive survey of intelligent traic light control methods [47], we choose two [45,46] most promising state-of-the-art DRL based traic light control algorithms to be our baselines.FrugalLightuses domain knowledge and careful optimizations to match the performance of these complex baselines, at a tiny fraction of computational resources, utilizing practical sensor inputs for developing region traic.Our work, in principle, follows the current research trend on eicient machine learning, mostly for optimizing CNN models for computer vision tasks [19,21,30,49,50].We extend the eiciency question to a new application domain of traic light control and optimize a diferent machine learning model, namely DRL.Our innovations come from practical constraints (not addressed in prior work) that developing regions pose on the intended application.We also utilize Knowledge Distillation based guidance approaches [18,27] to improve our methods, and use MAML based MetaLearning approaches [17,51] to scale them, paving a way for eicient and efective learning of the traic situations.

Design Prerequisites
There are several environment level design considerations before discussing the DRL based methodology.
Control agents ś coordinated vs decentralized: While absence of continuous network connectivity to the cloud necessitates in situ computations for the computer vision and traic light control algorithms, the same connectivity issue also necessitates the design of independent traic light control agents.Real time communication across agents of diferent intersections cannot be taken for granted.So we design individual agents for each intersection in this paper, without assuming mutual communication.Phase characteristics at intersections: As traic in developing regions is non-laned and chaotic, giving green simultaneously to diferent approaches increases chance of collisions at an intersection.Single-approach-green (Y) pattern is, therefore, typically used in intersection design in developing countries.This scheme comprises phase sequences as shown in Figure 7, where in each phase, the straight low and the turning low are given green simultaneously.The number of such phases depend on the number of approaches at a particular intersection.We analyze our methods primarily for this Y-scheme, and later show that our method works better for the other phase schemes as well, such as Double-approach-green (X) pattern and a mix of both (XY).
Expected output of control algorithm: Our agents can take one of the two kinds of decisions, at every decision making time point.The decision making time point comes at ixed periodicity for the agent.❶Switch to the next phase: In this setting, the scheduler delivers a binary decision either to continue current green signal or to switch to the next phase in a cyclic order.❷Set any phase: In this setting, the scheduler switches to the best phase, which can be any of the allowed phases.Set any phase is more lexible and can potentially give better values for the traic metrics being optimized (travel time or throughput).But the ixed phase cycle in switch to the next phase is better to set commuter expectations as to who will get the next green.In developing countries where traic is already extremely chaotic and drivers are unruly, phase cycle is kept constant to set predictable expectations to drivers.Thus our control agents should follow switch to the next phase decision scheme.We evaluate both and show that the additional lexibility of set any phase gives minor improvements in metric values, over the more practical and safer switch to the next phase scheme.
Optimization metrics: The primary metric usually used to quantify the performance of a traic light control system, is average travel time of vehicles passing through that intersection.We use this metric in our evaluations.The second metric evaluated is throughput, which is the percentage of vehicles cleared by the intersection.A third metric is total time, which combines the time spent by the vehicles which clear the intersection and also those stuck at the intersection.Throughput needs to be maximized while travel time and total time needs to be minimized.

FrugalLightDRL Architecture
We use a DQN based DRL architecture with fully connected layers, comprising two hidden layers of size each (5 ≤ ≤ 20).Suppose we use states to represent an intersection.Further suppose phases, so our DRL can choose to stay in the current phase, or choose among the remaining − 1 phases, giving possible actions.Then our DRL has an x x x architecture, as shown in Figure 8.Our method is independent of underlying loss function and optimizer choice, and we use MeanSquareError [43] and RMSprop [22] respectively in our experiments.The simple architecture also allows us to explore and emphasize the beneits of using other enhancements in State and Reward design.Such enhancements would give further improvements when a complex DNN architecture is utilized.

FrugalLightRewards
Rewards need to be carefully crafted, so that the DRL algorithm can train to convergence.Rewards need to be fed back to the DRL algorithm, from the environment in which the DRL is applying control.So essentially, the rewards need to be computed from the approaches at the traic intersection, by the computer vision algorithms.Listed next are some rewards, in increasing order of complexity of the vision algorithms.
• Queue density: This is the set of traic densities of all incoming approaches.It can be calculated using background subtraction [6], which we ind works even in poor lighting conditions, as the vehicles' head and tail lights create enough features.• Stop density: This is density of vehicles stopped (halted) at the intersection in all the approaches, waiting for their opportunity to be in motion.It can again be calculated using background subtraction [6], having a diferent learning rate than for queue density, as queue density considers both halted and moving vehicles.Optical low method [33] also facilitates motion detection, thus identifying stalled vehicles.• Max pressure: Max pressure can be approximated by the diference of queue densities at incoming and outgoing approaches.It requires data capture and computation at both incoming and outgoing approaches, thus involving camera and embedded computer deployment in subsequent intersections.• Cross count: Exact count of vehicles crossing the intersection is called cross count.This requires tracking vehicles entering and exiting any approach.Thus YOLO (You only look once) [36] or similar CNN based vehicle detection and subsequent tracking of each individual vehicle needs to be done, which have not been shown to work in poor lighting conditions [8], for example during evening peak hours.Also, there are intermittent poor vehicle detections as discussed in ğ 3.3 (Figure 4) We use the Stop density reward, which can be easily computed by background subtraction [6] at the edge device, for our evaluations in ğ 6.5.We also evaluate our DRL algorithms with diferent rewards in ğ 6.6.

DRL compression using domain knowledge
We explored carefully crafted DRL states utilizing domain knowledge of traic light control.The goal is to reduce the computational and memory overheads of the DRL algorithm along-with reducing dependencies on computer vision methods that might be infeasible for developing region traic.We describe below our four domain-speciic optimizations (illustrated in Figure 9 and Figure 10).❶Reduced spatial range of information: Developing region cameras placed at an intersection can see traic up to a limited distance, depending on their angle of elevation.It is not frugally feasible to repeat cameras throughout the approach to measure traic for the whole approach.Therefore our DRL state will be constrained with limited information within a certain distance from the intersection, and only for incoming traic.The irst column of Figure 10, termed Actual Traic Layout, shows in color the limited camera vision, though there are vehicles in the regions that the camera cannot see.The next column Transformed Layout, in Figure 10, removes vehicles beyond camera's visual range, which will not be part of DRL state.
❷Reduced type granularity of information: In our DRL states, instead of exact vehicle count requiring computation intensive CNN based YOLO [36], we use traic density, based on less computation intensive Fig. 10.DRL states from four signal phases background subtraction [6].Density is also better than queue length (which indicates the distance at which the last vehicle is standing), as chaotic driving in developing regions sometimes create long queues with haphazard gaps in between.Smaller vehicles like auto-rickshaws and motorbikes can trickle in those gaps, keeping the queue length same but increasing traic density.Density thus better captures the traic state on the road, independent of heterogeneous vehicle sizes and their chaotic placement.
❸Reduced spatial granularity of information: As drivers in developing regions do not follow lane markings, vehicles straddle across lanes.So to reduce DRL state complexity, we reduce the information granularity from per lane to per approach, averaging the traic over the lanes.We further explore a DRL state of just two numbers for each phase, one representing density for the approach with green signal and another for the total densities of all other approaches getting red.We inally reduce the DRL state for a particular phase to only one value, where we take the ratio of the green approach density to the overall density of all approaches.These subsequent reduction in DRL states per phase is shown as Lane, Approach, Group and Relative Density in Figure 10.
❹Exploiting symmetry: We inally remove the current phase information from the DRL state.Utilizing the symmetry of traic intersections, we opt to rotate the DRL state to make the data of the current green phase as the irst (or any constant) position.In Figure 10, instead of two columns representing two phases, there is thus a single phase with one approach green and others red.When the approaches difer in properties (like number of Fig. 11.Heterogeneous intersection, with both two-laned and three-laned approaches lanes, width, etc. as shown in Figure 11), we cannot perform a simple rotation to get a phase-free-state.There are two options to make a heterogeneous intersection homogeneous ś i. Padding: We can pad empty lanes in the smaller approaches to make the intersection homogeneous.This would leave some lanes with zero traic.ii.Normalizing to unit lane: We can normalize the density of heterogeneous approaches, scaling down each approach's density to unit lane.∀ i in approaches, where is a scale factor based on width, number of lanes or other appropriate parameters for approach .

FRUGALLIGHTEVALUATION
Giving several design choices for eicient DRL algorithms with constrained inputs, we next evaluate whether any of them can match the performance of computationally intensive state-of-the-art methods.

Baselines:
We compare our FrugalLightalgorithms over state-of-the-art RL models: ❶ Presslight (PL) [45], for decentralized multi-intersection processing and ❷ Colight (CL) [46], for centralized multi-intersection processing.We also utilized two NonRL algorithms, popular in recent research as standard baselines: ❶ Max Pressure (MP) [42], where phase shift occurs based on the diference of vehicles on the incoming and outgoing lanes.❷ Self-Organizing Traic Light (SOTL) [12], where after a minimum phase duration, the signal is switched based on traic level in green and red approaches.

Benchmarks:
We use multiple real road datasets in our experiments, as described in Table 3 in ğ 3.4.The New York datasets are publicly available 6 , which are already processed and used for experiments in prior work [45,46].The state-of-theart works need richer traic information which cannot be gathered with Computer Vision methods, so to be fair with them we too use their advertised datasets [45,46] in our experiments, and validate/present the performance of our input-constrained eicient control for global scenarios as well.We also use self-curated developing region datasets 7 which are 1 to 6 hours long and extracted from larger duration of traic lows available in the original density data 8 for weekdays (8AM -2PM).

Simulator:
We use the CityFlow traic simulator [52] in our experiments.It takes the road network structure, traic phase information and incoming traic details through iles.We create these iles based on the real road datsets described.CityFlow allows us to set the desired phase using API calls.For every phase switch, a 5-second combined yellow and all-red interval exists to clear the intersection.CityFlow also provides the traic information i.e. what happened on applying the phase switch/hold advised by a particular traic light control algorithm.This output list of vehicles, along with their locations, is processed to compute our throughput, travel time and total time metrics, to compare across the traic light control algorithms.

DRL Eficiency:
Given our primary goal is to have more eicient DRL models for resource constrained settings in developing countries, we irst quantify how eicient our optimizations are, compared to the baselines.In Table 5, our solution FrugalLightis denoted as FL, with Lane(L), Approach(A), Group(G) and Relative(R) indicating increasing optimizations.The DNN parameters are a range of values, as they depend on the control choice of Switch to Next with binary output vs. Switch to Any with multinomial outputs.The DRL state size and DNN parameters indicate the signiicant lower FrugalLightoverhead.Such a small DRL model can be deployed on a moderate cost embedded system for roadside deployment, as currently being piloted with our industry partner.Other properties of the underlying DNN are given in Table 4 and properties of Non RL baselines are given in Table 6.

FL performance on existing open-source datasets
The critical question to evaluate is whether FrugalLight's eiciency comes at a trade-of for throughput or travel time metric values.We present these results next, evaluated on Nvidia DGX Workstation (with 4X Tesla V100 GPUs).As shown in Figure 12, the FrugalLightmodels converge faster both in terms of number of episodes, and time taken per episode.The FL models converge fairly well by 30 episodes whereas it takes 60+ episodes for the PL model to converge.Also, FL models take 71% to 74% time per episode as compared to the time taken by PL (i.e.saving 26% compute resources per episode).
As the DRL models converge well within 200 episodes as, we perform training for 250 simulation episodes and the performance is measured and averaged over next 50 (i.e.251-300) unseen episodes, to vouch for stable Among the two DRL baselines, PL with de-centralized control for multiple intersections, works poorly for the larger 16x3 road network (seems due to incompatible MaxPressure reward for 2D grids), while CL with centralized control does well.Both CL and PL perform well for the smaller 16x1 network.The NonRL baselines (MP and SOTL) show lower performance than the DRL baselines.The two control choices do not show signiicant performance diferences.Switch to Any with more freedom to choose outputs and therefore with potential to perform better, is less usable in a developing country, where to avoid more unruliness than already is on the road, Switch to Next is mandated at intersections to have ixed precedence among waiting drivers.Thus the lack of performance diference between the two control choices is encouraging.
More encouraging, however, is FrugalLight's performance.Our algorithms FL-L, FL-A and FL-G do as well as CL for all road settings.The most optimized version FL-R degrades for the 16x3 network (and slightly for 16x1 too), possibly due to the incapacity in capturing absolute traic density.But even the second most optimized version FL-G (showing a slight under-performance for Any 16x3), with only 2-sized DRL states and 184 DNN parameters, can match the performance of CL with upto 12480 sized DRL states and 6084 DNN parameters!FL-G is also de-centralized, not requiring network communication across multiple intersections to have centralized control as in CL.The 2-sized DRL states and the stop density reward can all be computed with simple background subtraction [6] based computer vision methods.This is a tremendous result for developing regions, that a de-centralized DRL algorithm with constrained computer vision inputs and very eicient model parameters, can match the performance of a centralized, more computation intensive, much larger state-of-the-art DRL model.

Diferent Phase Schemes (Y,X,XY):
We next analyze the FrugalLightperformance over diferent phase schemes, such as Double-approach-green (X) pattern and a mix of both (XY), as depicted in Figure 14(Left).Colight(CL) is centralized and alongside local information, it seeks neighbouring intersections' information to decide policy for any given intersection.This creates network-dependency and data-latency, alongside complicating the model by increasing state space signiicantly larger than other models (refer Table 5.While gathering the real data for single intersection, we also observed issues related to power line failures, camera faults, and broken communication.Any fault requires a manual repair, which is a costly and time consuming efort.Hence, networkdependent solution (Colight) is less feasible for deployment in developing countries and single intersections.As shown in Figure 14(Right), we see that FL performs better than baseline PL for all phase schemes.Scaled Road Lengths: Real roads have varying lengths, causing diferent capacity of waiting-traic at the approaches.The longer the length of the road, the more traic it can hold, which may require one lengthy green phase or multiple green phases to pass through the intersection.To see the scaling of FrugalLighton various road lengths, we experiment with diferent road lengths in the simulator (for 16x1 network) in Figure 15.Compared to state-of-the-art Presslight (PL), FL shows improvements in all metrics, which enhances further as we scale-up the road lengths.Thus, FrugalLightcan potentially scale to any road dimensions with similar beneits in average metric values.

FL performance on our New Delhi dataset
We inally analyzed FrugalLight's performance on our collected data for diferent datasets of increasing duration.As citylow simulator has a proprietary format for accepting traic information via input json iles, we utilize the density-to-simulator9 conversion script to convert the real datasets for use with the simulator.In case of real deployments, the density from our Background Subtraction algorithm can be directly fed to FrugalLightmodels.Along with StopDensity, we did experiments over other rewards (MaxPressure, CrossCount and QueueDensity) to gauge their suitability with our method.The weight for CrossCount is 1 and -0.25 for others.The experiments are shown in Table 7.
Table 7. Performance on diferent duration 1x1 developing region data (total time @ throughput)
We next explore the inclusion of state-of-the-art techniques and focused optimizations to further enhance FrugalLightfor training and deployment scenarios.

Student-teacher knowledge distillation, with FrugalLight's domain knowledge
Knowledge distillation is the standard method of compressing large machine learning models (teacher) into smaller more eicient models (student) [18,27].We analyze two methods of student-teacher learning in this paper -Blind and Explored.In Blind learning (Figure 16 without dotted line), student model learns the environment by blindly following teacher's steps for every state, hence teacher controls the learning.In Explored learning (Figure 16 with dotted line), student model learns the environment with self exploitation, asking teacher only during exploration phase, hence student controls the learning.For both scenarios, the teacher provides the Q-values for every experience tuple, and local rewards are ignored, thus student tries to it to teacher's understanding of the environment.We use the following loss function -← 1 =1 ( − ) 2 where, denotes the Q-values predicted by the student network and denotes the target Q-values given by the teacher/peer network, for N training samples.
We hence consider a combination of domain knowledge based DRL state compression along-with knowledge distillation, where we use distillation between models of similar dimensions (peers).These experiments are performed on a 300x300 length 16x1 NY road network, for switch to the next phase signal policy.We train for 250 epochs, then average metrics (total time @ throughput) for next 50 epochs are reported in Table 8 and Figure 17.
The values against the "Teacher" row at the top, depicts the performance of the single Teacher model, selected based on best metric values from the epochs towards the end.The "Self Learn" column on the left, gives the average metric values without knowledge distillation.Self-learning converges slower than distillation, as expected (Figure 17).Explored learning gives better results than blind learning (right side of Table 8 has better values than left), and also gives faster convergence and better stability (Figure 17) than blind.
As seen in Table 8, Explored Learning gives better average metric values than Self Learn, when models of same size and architecture (called peers) are used for teaching.This improvement in average metric through peer learning, is evident if we compare the bold values in the same rows.The values in Teacher row show the best Table 8.Knowledge distillation (total time @ throughput).Peers (same row in bold) teach beter than PL (in italics).

FrugalLight's Transferability and Adaptability
Figure 18 shows the transferability (how well a DRL model trained on one dataset performs on a new unseen dataset) and adaptability (how quickly a DRL model its to the new dataset) of FL vs. baseline PL.The irst leg in Figure 18 shows the regular training on the 16x1 NY dataset, depicting improved training convergence for our FL compared to PL.For the next two legs, we utilize two diferent 1 hour datasets from the 16x1 NY network.We train each model for 300 epochs using the default dataset, then switch the traic pattern to second dataset and allow training for next 300 epochs, and inally do the same for third dataset.FL is signiicantly more transferable and adaptable than PL, over diferent unseen traic patterns.This can be explained by the large size of the PL model, that tends to overit to a given training dataset and generalizes poorly to new data.
We further take an ensemble of 50 models from epochs 251-300 for each training experiment, and use them to train 50 peer models using the Explored Learn method.We then evaluate these peer trained models on an unseen dataset and present the average performance of these models in the table in Figure 19.FL-G is the best generalizable model, both self-learnt and peer-teacher guided, for transferring an ensemble of models to unseen dataset.

Enhanced Adaptability using Gradient based Meta Learning (MAML)
Meta Learning enables a Machine Learning system to learn fast.Model-Agnostic Meta-Learning (MAML) is a general optimization algorithm suitable for models employing gradient descent.Given multiple tasks, the parameters of a model are trained such that few iterations of gradient descent with few training data from a new Fig.18.Transferability/Adaptability of the methods.
So, we consider the irst-order gradient based MAML in our experiments.The FOMAML algorithm provides a good improvement without much computational hindrance, making the overall training/optimization process computationally eicient.For the NY dataset of 16x1 intersections, we randomly select a group of 5 nodes and train the networks for these 5 nodes as usual.We train another network using irst-order MAML approach with data samples from these 5 nodes.Now this pre-trained model is utilized to train the remaining 11 nodes.For the purpose of metrics calculation, we train all 16 nodes with the pre-trained MAML model.The results for the same are depicted in Figure 20 (for 100 rounds, averaged over 5 runs).
We take a subset of total intersection nodes, the data for which act as meta-data to train the meta-network.This meta-network acts as pre-trained model for other nodes, enabling faster convergence.We also combine MAML with our Explored Learning technique and train the student model from pre-trained MAML model and non-MAML teacher model.We observe a more stable training with added beneits of the two.

Doing Away with Runtime DRL: Lookup Table based Intersection Control (Goodness EcoLight)
We also seek to do away with running the DRL at runtime at the deployment site.The irst reason is eiciency: on low cost embedded systems, compute power is limited.The inputs for the control algorithms anyway needs to be computed on the embedded devices, using computer vision algorithms on the real time video data from all approaches.Using these inputs, if the control algorithm can be made more eicient than running a neural network for DRL, it becomes more practical to meet the low computational budget.The second reason to do away with runtime DRL, is the lack of conidence on the DRL black box.Based on anecdotal evidence through discussions with our deployment partners, adaptive intersection control that can be visualized and veriied by human experts before deployment, is much more preferred than algorithms which are free to choose actions at runtime without any human supervision/comprehension, as a runtime DRL would do.
We therefore seek to use static Lookup Tables (LUT) at deployment, where each cell in the table will represent a state in our DRL.The value contained in that cell will represent a boolean action: stay in the current phase vs. switch to the next phase, referred to as keep-change actions henceforth.The actions are learnt using oline DRL training.This training can be compute heavy and high latency, as it is run on powerful GPU servers before deployment for real time intersection control.During training, computer vision based processed video datasets are collected from the road, and fed in traic simulator to create all possible DRL states (cells in the LUT).Actions corresponding to each state are then learnt by training the DRL algorithm.The lighter the color, the more a DRL state is seen.These three images also describe the LUT structure, where the two axes represent quantized values of 1 and 2 for the 2-dimensional state DRL (FL-G).Instead of "how many times a DRL state is seen" presented in these images, the LUT contains a boolean action value in each cell, learnt by DRL training.Veriied by developing country traic control experts for sanity and safety checks, the LUT is eventually deployed on road.At runtime, the current state is computed using computer vision methods on incoming video, and the action corresponding to that state in the stored LUT is taken by the traic signal controller.
While storing DRL decisions for diferent states in LUT is eicient and veriiable, we need to ensure that the learnt decisions are good for subsequent use at runtime.It is important to choose good DRL models to populate the static LUT, as unlike running DRL at runtime, the LUT will not be able to dynamically update these decisions.
As measure of DRL model goodness, we deine two metrics: (a) FairShare: We hypothesize that a good RL tries to achieve FairShare of traic densities among approaches i.e. it the traic among at the intersection such that each approach maintains equal/similar density of traic.To quantify this FairShare property of a given DRL model, we project all instances of observed states (factored by the distance) onto the equal density segments of LUT (corresponding to the diagonal starting at 0,0) in Figure 21.We sum this vector of the projections to get a single scalar, which will be high for models with most states with equal density (like Epoch 90-99 in Figure 21), and low otherwise.This scalar quantiies how balanced traic is among the approaches for a particular DRL model.(b) DecisionConsistency: If a model predicts to hold/keep the signal for a state, we hypothesize that a good or stable model should continue to predict the same for all states having higher traic in the green approach (or low traic in the red approaches).We name this model property of sticking to the same decision under similar traic scenarios as DecisionConsistency.To quantify DecisionConsistency, for each green density level (1) we take the ratio of two numbers, the large range of red density (2) over which the keep decision is maintained vs the range followed with opposite decision.The sum of all such ratios gives rise to a scalar which will be larger for models with better DecisionConsistency.
In addition to hypothesizing what properties good DRL models might have, and deining scalar metrics to quantify those goodness properties, we also need mechanisms to use these goodness metrics.We do this in the following two ways: ❶DRL training using model goodness metrics: We use the FairShare and DecisionConsistency scalars during the DRL training process to identify and favour better RL models.We maintain a threshold for these scalars, as training progresses.As presented in Figure 22, at each epoch we hold a model if its goodness metric is below , lower by a factor, and start the training for a fresh model in that epoch.We approve the best model so far (new or on hold), if its goodness metric exceeds , or after ixed number (=5) of retries in that epoch, and move on with the metric value of this model as new .❷DRL selection using model goodness metrics: Figure 23 shows the correlation between Total Time performance metric and DRL model's goodness metric values.We discard models with goodness metric values lower than the average of all the models, to remove outliers (see Perspective 1 of Figure 23).In order to select the good models among the remaining ones, we pick the best model (again based on the goodness metric values) among a set of ( =20) models, and restart the process from the model next to the selected one (see Perspective 2 of Figure 23).This inal set of high performing models can be efectively used to generate the LUT to be deployed at the intersection.We need to evaluate this LUT based signal control, compared to the FL-G that we designed in ğ 5, and also the state-of-art DRL methods Presslight [45] and CoLight [46].Static LUTs lose performance due to quantization of the traic density values, while runtime DRL can use continuous values of traic density.But the quantization is unavoidable, as the table needs to be of inite dimensions.Whether our training and training+selection with goodness metrics can overcome the quantization related performance loss, needs to be quantiied.
Table 9 shows the average case performance metric values ❶nOut (number of vehicles cleared by the intersection), ❷Travel (time spent by cleared vehicles) and ❸Total (time spent by all vehicles).The T in model names denotes Goodness based Training only experiments, whereas TS includes Goodness based Selection as well.We continue the training for 200 epochs, allowing all methods to converge and then average the next 50 epochs for performance metrics calculation for T, and the selected few out of these for TS.As can be seen from the table, performance loss compared to FL-G due to quantization, is gracefully recovered by both our goodness metrics.DecisionConsistency performs signiicantly better than FairShare for all datasets.We further show the value of worst case or fairness metrics for 16x3 benchmark dataset in Table 10.Our fairness metrics are: (a) WrstTime (maximum time spent in the network by any stuck vehicle), (b) WrstWait (maximum wait time at any intersection by any vehicle), (c) MaxWait (maximum of average wait times at any intersection) and (d) StuckX (vehicles stuck in network at X% time from simulation end).Fairness loss due to quantization is not only gracefully recovered by our goodness metrics, but we signiicantly outperform all baselines as well.
Using a inite sized LUT with (a) quantized traic density values as rows and columns, and (b) cells containing binary decisions learnt using DRL model training, and model selection based on some goodness metrics, gives us performance and fairness comparable to the state-of-the-art DRL algorithms.This is extremely encouraging in terms of practical deployment in developing countries.

Doing Away With Look-up Tables: Threshold based Intersection Control (Threshold EcoLight)
Based on anecdotal discussions with intersection control companies, while most intersections in developing regions will be able to support LUTs, some intersections might be budget constrained to such an extent that the controller's RAM will not be enough to even store LUTs.In this section, we therefore consider how to design such a stateless controller, with better performance and fairness metrics compared to other widely deployed stateless controllers.We start by examining the FL-R tried in ğ 5, and gradually build performant and fair stateless control.
1-dimensional state RL (FL-R) did poorly on the Throughput and TotalTime metrics in Figure 13, especially for the 16x3 road network.Wondering what is being learnt by the RL for the case of 1-dimensional state (in FL-R), we checked the model behaviour for the whole range of this state variable < 3 = 1/(1 + 2) > from 0.0 to 1.0.We calculate the expected value of signal change for all 16 intersections (of 16x1 NY road network) for continuous 50 rounds after training for 500 rounds.Figure 24 plots the expected signal change along y-axis, with relative density along x-axis.The signal change expectation is high when relative density is low (top left) and vice-versa (red line given for reference for exact negative correlation between signal change expectation and relative density).The blue curve shows a near-linear response following the red line, but is still non-linear.Thus 1-dimensional state FL-R with ratio 1/(1 + 2) is not enough to capture the necessary non-linearity and overall traic concentration -empty vs. moderate vs. saturation.It only captures relative density among approaches, while absolute values retained in 2-dimensional state of FL-G are clearly important.
We explore the options of both 1-dimensional relative density (FL-R) < 1/(1 + 2) > and 2-dimensional absolute densities (FL-G) < 1, 2 > in the simple algorithm next.The algorithm does not use any LUT to store the signal switching decisions learnt by RL for all possible states.It only uses few empirically learned thresholds.This is to support embedded hardware, that cannot use LUTs due to RAM constraints and would need the control algorithm to be completely stateless, possibly using only a few thresholding parameters.
The intuition behind the algorithm is ❶to take the CycleTime (i.e. the cumulative duration of all phases), and divide it among phases in proportion to their relative densities and ❷to increase CycleTime based on increasing absolute densities.At each decision making point, the agent allows the green signal to continue until the relative density for that approach has not fallen below a threshold .Below , signal can be switched.When CycleTime is deined (we call this variant Timed), the agent uses it in proportion to the relative density (Timed (1dim)), with optionally increasing the given CycleTime in response to absolute densities (Timed (2dim)).When CycleTime is undeined (we call this variant Random), it would switch randomly, but still proportional to the relative density.The various hyper parameters are listed in Table 11.We compare the performance of our stateless algorithms against below baselines.These baselines also do not use any state, but work with few parameters as listed in Table 12.State-of-the-art research based RL methods like Presslight and CoLight are still in literature and not adopted in the real world.So these simpler baselines are the widely deployed intersection control algorithms across the world.Developing countries, typically, still use Fixed Timing signals.❶Fixed Timing: Signal switches in cyclic order to the next approach after ixed time intervals.❷Max Pressure: Pressure is calculated by the diference of vehicles on the incoming and outgoing lanes for the possible movements in each phase [42].Signal is switched to the phase with maximum pressure.If current phase pressure is not the maximum, we switch to the next phase.❸Self-Organizing Traic Light (SOTL): This is a vehicle actuated mechanism [12].There is a minimum phase duration.Once the minimum phase duration is over, the switch signal is generated if the traic in green approach is less than a threshold and traic in any other approach is more than another threshold.Table 13 shows the average case metric values (a) nOut (number of vehicles cleared by the intersection), (b) Travel (time spent by cleared vehicles) and (c) Total (time spent by all vehicles).Our algorithms Random, Timed (1dim) and Timed (2dim), clear many more vehicles at lower Travel and Total times than the baselines, for all benchmark datasets.The Travel times for 16x1 network is higher (italicized in Table 13) for our algorithms, though other metrics improved.This is due to the fact that it is a linear network of 16 intersections and the traic pattern is such that a good part of the traic enters around one end and exits around the other (and vice-versa), making the vehicles cross many intersections in a sequence.Supported by increased nOut, our algorithms make more vehicles to exit the network.The extra vehicles which exit are mostly the ones with larger travel times, thus pushing the average travel time for all cleared vehicles higher.Similar behaviour is observed for the baselines as well, where SOTL Travel time (with more nOut) is higher than other baselines (with less nOut).
Based on these results, in situations where running RL based control or maintaining LUTs are not feasible due to RAM constraints, our stateless algorithms can be deployed, vastly improving both performance and fairness metrics, compared to the currently deployed intersection control baselines.Given the hardware constraints, we need to make sure that this input is available to our control algorithms at an acceptable latency, with limited computation and no communication to a back-end server.As eicient computer vision candidates, we use background subtraction and optical low techniques as discussed in ğ 3.2.Background subtraction based density estimates comprise both standing and moving traic, whereas the control algorithms need to discard density contributed by the moving vehicles.So we additionally use optical low algorithm, to detect moving pixels between frames, and compute standing traic density from the stationary parts of the frames.

CONCLUSION AND FUTURE WORK
This paper shows the feasibility of deployable intelligent traic light control methods for developing regions, using eicient and optimized computations on low-cost edge devices.Our shared dataset is peculiar in terms of its traic representation properties despite the various functional challenges.Our proposed traic control method FrugalLight, which supports using the simpliied traic data, is evaluated on many hours of real world data, both existing open-source from New York, USA 11 and now open-source from New Delhi, India 12 .Though our problem statement comes from developing country, our data and models are useful everywhere empowered by their eiciency and simplicity.We do equally well in both orderly and chaotic situations.FrugalLightalso demonstrates that control can be made computationally eicient resulting in less carbon footprint, without losing utility in terms of metric optimization.We will continue to explore how such deployable systems will actually beneit the sustainability goals like air pollution reduction.

Fig. 1 .
Fig. 1.Google Maps Location of the intersection and installed cameras in New Delhi.Approach 1 has cameras 1 and 2, approach 2 has cameras 3 and 4, approach 3 has cameras 5 and 6.

Fig. 2 .
Fig. 2. Trafic Data Availability for the six cameras from Sep-Dec 2020, each colour denoting the data coming from separate camera.

Fig. 4 .
Fig. 4. Good Yolo labeling (Let), Bad Yolo labeling (Middle) and Good trafic density (queue density and dynamic density) (Right).Green/red vertical lines in trafic density indicates the event of green/red signal.

Fig. 5 .
Fig. 5. Histogram of ueue and Stop densities for the 6 cameras for the 40 days for total 2,160,040 samples.

Fig. 12 .Fig. 13 .
Fig. 12. Training Convergence over RL models.For time taken per episode, PL takes N seconds and FL-L variant takes 0.71N seconds.

Fig. 14 .
Fig. 14. (Let) Allowed phases at intersections (Right) Performance over diferent phase schemes, numbers at botom of bars denote throughput, darker bars denote travel time and lighter bars (at the top) denote total time.

Fig. 15 .
Fig. 15.Performance over scaled road lengths, numbers at botom of bars denote throughput, darker bars denote travel time and lighter bars (at the top) denote total time.

Fig. 16 .
Fig. 16.Blind and Explored learning methods.Fig. 17.Convergence of Teaching Strategies, Doted Lines denote Throughput, and Solid Lines denote Total Time.

Fig. 21 .
Fig. 21.DRL training and LUT structure The irst graph in Figure 21 shows how metrics Total time and Travel time improve over many epochs of oline DRL training.The other three images show how many times diferent DRL states are seen by the DRL training algorithm as training progresses.The lighter the color, the more a DRL state is seen.These three images also

Figure 25
Figure 25 on the left shows a high traic density frame, from one approach of a developing region intersection we are working at.The graph on the right shows for this location: (a) background subtraction based density (Queue Density in blue curve) and (b) optical low based density (Dynamic Density in orange curve), over a span of over 15 minutes.Queue density starts to rise when signal turns red (indicated by vertical red lines), and starts to fall when signal turns green (indicated by vertical green lines).Dynamic density is zero when red signal is on (between red and green vertical lines) and rises when signal turns green and vehicles start moving.The diference between these two curves gives the density of standing vehicles, the input required by our control algorithms.

Table 1 .
Train and test data for YOLO V2 and Tiny-YOLO

Table 3 .
Open source real trafic datasets.

Table 5 .
Model property for 4approach x 3lane intersection

Table 6 .
Properties for Baseline NonRL Algorithms

Table 9 .
Performance of Goodness EcoLight for average case metrics

Table 10 .
Performance of Goodness EcoLight for worst case (fairness) metrics for 16x3

Table 11 .
Algorithm Hyper Parameters

Table 13 .
Performance of EcoLight Thresholding Algorithms for average case metrics