TODOS: Thermal sensOr Data-driven Occupancy Estimation System for Smart Buildings

Occupancy sensing and estimation in large commercial buildings has become a significant problem to be solved, with applications ranging from occupancy-based HVAC control to space planning, and security, etc. Thermal sensing is a promising technology to solve this problem, being easy to deploy in practice and allowing an actual occupancy count in a particular room without violating the data and privacy concerns. While initial strides have been made to solve this problem with thermal arrays, there are many problems that remain unsolved, including accuracy performance, overlapping of sensing areas that lead to under/over-counting, and data training requirements for different zones. In this paper, we introduce TODOS 1, a novel system for estimating occupancy in intelligent buildings. TODOS uses a low-cost, low-power thermal sensor array along with a passive infrared sensor. We introduce a novel data processing pipeline that allows us to automatically extract features from the thermal images using an artificial neural network. Through an extensive experimental evaluation2, we show that TODOS provides occupancy detection accuracy of 98% to 100% under different scenarios. In addition, it solves the issue of occupancy over/under-counting by overlapping sensing areas when using multiple thermal sensors in large rooms. This is done by treating the entire area as a single input thermal image instead of partitioning the area into multiple thermal images individually processed. Furthermore, TODOS introduces a data augmentation technique that allows the generation of training data for rooms of different sizes and shapes, without requiring specific training data from each room. Using these data, TODOS can train specifically designed neural networks optimized for any room size and shape, and achieve almost the same level of occupancy detection accuracy in rooms where experimental labeled training data is available, making it a viable solution that generalizes to the different rooms in large buildings.


INTRODUCTION
We spend 87% of our time inside buildings [13,14,22], making buildings an important part of our lives.In particular, in large commercial buildings, it is important to know at any time what is the current and predictive occupancy matrix of all the zones/rooms in a building.Having detailed occupancy information about the number of people in real-time and in the near future for each zone allows for control of the HVAC systems in a much more efficient manner [3,5,15], conditioning zones for temperature only when occupied, and adjusting ventilation rates based on the actual number of occupants.Occupancy data can also aid with many building controls and management decisions.The work in [1] identified some of the important factors leading to poor maintenance strategies in buildings such as a lack of understanding of occupant satisfaction, inadequate staffing levels, and inefficient maintenance delivery.To enhance maintenance management operations, Moretti et al. [26] used occupancy data to alert the maintenance staff for cleaning operations when a threshold (defined based on the number of occupants using a facility) was reached.Facility managers can leverage occupancy sensing technologies to determine space usage and occupancy mobility patterns for security in buildings [2,9], improve health and safety by adjusting ventilation rate for disease transmission according to the number of occupants [36] and locate individuals in case of an emergency evacuation by first responders [37].
Among the many sensing modalities available, thermal occupancy techniques [6,8,35,38] seem the most promising for the task at hand.This is because they are non-intrusive, low-cost, lowpower, easy to deploy, and perhaps more importantly, they can provide accurate occupancy counts without violating data and privacy concerns to the building's occupants [16].While thermal occupancy technology seems very promising, there are still several issues that prevent full adoption.First, the features extracted from thermal images have been manually derived, but they may still not be powerful enough to provide a higher level of accuracy required for many applications [13].Second, most of the studies carried out have concentrated on processing the input of a single sensor.When deploying multiple thermal sensors in a single zone/room, a small amount of sensing overlapping is unavoidable to provide full coverage.This leads to issues of occupancy over/under-counting, when multiple sensors deployed in the same zone count the occupant in an overlapping area twice (over-counting) or when the portion of the occupant thermal signature is not sufficient to lead to a count in any of the sensors (under-count).Finally, data-driven processing pipelines require a significant amount of labeled training data to build the models.The input of these models for any specific zone/room will depend on the distribution and location of the sensors being deployed, which roughly coincide with the size and shape of the room.Consequently, since the size and shape of the rooms in a building are different for many of them, this means that a significant effort must be carried out to obtain adequate training data for each type of room (size and shape).This last point is a significant roadblock to the wide adoption of this promising technology in practice.
In this paper, we introduce TODOS, a novel system for estimating occupancy in intelligent buildings.TODOS uses a low-cost, low-power thermal sensor array along with a passive infrared sensor.We introduce a novel data processing pipeline that allows us to automatically extract features from the thermal images using a convolutional neural network.Through an extensive experimental evaluation done in multiple rooms at multiple university buildings, we show that TODOS provides occupancy detection accuracy of 98% to 100% under different scenarios.In addition, it solves the issue of occupancy over/under-counting by overlapping sensing areas when using multiple thermal sensors in large rooms.This is done by treating the entire area as a single input image instead of partitioning the area into multiple thermal images individually processed.Moreover, TODOS introduces a data augmentation technique that allows the generation of training data for rooms of different sizes and shapes, without requiring specific training data from each room.Using these data, TODOS can train specifically designed neural networks optimized for any room size and shape, and achieve almost the same level of occupancy detection accuracy in rooms where experimental labeled training data is available, making it a viable solution that generalizes to the different rooms in large buildings.The main contributions of this work are as follows: • We developed TODOS, a thermal-based occupancy sensing system that uses low-power, low-cost, easy-to-deploy sensors and a novel processing pipeline that automatically extracts thermal image features, treats all the sensor data as a global thermal image per zone/room and achieves excellent accuracy in occupancy detection.
• We provided more robust training data sets using data augmentation techniques for different combinations of sensors.This allows TODOS to have better training, even for cases that were unseen in the original experimental data gathering.We then replicated this augmented data set and concatenated them according to the room size and its geometrical shape, so we can train our model for rooms of any shape and size.
• We run an extensive experimental campaign, to see how TODOS generalizes to different buildings, air diffusers, and rooms with different sizes and shapes, and compared TODOS's performance using both local training labeled data as well as augmented data from a different room, showing that TODOS can achieve excellent performance without the need to collect local data.
• We performed Energy+ simulations to show the energy and quality of comfort impact that a more accurate occupancy estimation may have when using occupancy-based HVAC control.

RELATED WORK
Occupancy estimation schemes can be broadly categorized into user-based and user-free schemes.In user-based occupancy sensing systems such as [10,11,18], the building users carry a device or tag and are capable of delivering acceptable performance ranging from 83% to 94% accuracy.However, such systems suffer from degraded performance issues when the users fail to carry the device.In addition, many of them suffer from regulatory barriers to entry due to data and privacy requirement regulations [16].User-free schemes [4,6,27,31,40] do not suffer from the problems mentioned above.There are many different sensing technologies, including video-based sensing systems [7,21,34],  2 sensors [20,28,40], vibration systems [25,29,30], WiFi-based systems [24,39], and thermal-based occupancy sensing modalities [6,27,38].
The critical feature of why thermal-based sensing works is the temperature differential between occupants and their indoor environment, which is usually much lower when it is thermally conditioned by an HVAC system.These systems [6,8,35,38] are capable of detecting occupants in large groups both in static (e.g.sitting) and dynamic (e.g.moving) positions.They do require LOS for detection, but differently from camera-based systems, they are significantly less intrusive, since they just measure temperature instead of recording a full video [16].This means that the technology is very suitable for deployments in indoor environments while complying with data and privacy requirements.
The authors in [8] proposed an accurate privacy-preserved thermal occupancy sensing system.However, this system design lacks an energy-saving mode in the sensor hardware when it is idle.In addition, they use very similar processing techniques to Ther-moSense [6], including very similar features that are fed to different types of classifiers to do occupancy estimation.In TODOS, and similarly to ThermoSense [6], we do include an additional PIR sensor that allows us to duty-cycle the thermal array when the sensing area is empty.Moreover, we propose to use a deeper neural network technique optimized for occupancy sensing and a technique to generalize to rooms/zones of different sizes and shapes that significantly reduces the requirements for training and testing data in different buildings.
The closest related work to TODOS is ThermoSense [6].The node hardware design is very similar.ThermoSense processing pipeline includes active pixel detection, background subtraction, and connected components as a preprocessing stage.It then extracts the total number of active pixels passing a threshold, the number of the blobs (from connected components), and the size of the largest blob.These features are then used as inputs for different types of classifiers (including linear regression, KNN, and ANN), to classify the number of occupants in the sensing area covered by the sensor.In their evaluation, they use the linear classifier since the performance differences among the different classifiers evaluated were not significant and it was easier to implement the linear classifier in computationally limited TelosB motes hardware.
Tyndall et al. [35] presented a thermal-based system that is built upon the work in ThermoSense (i.e.combination of PIR and thermal  In our work with TODOS, we do a similar preprocessing stage to ThermoSense, including active pixel detection and background subtraction.However, this information is fed into a deep convolutional neural network (CNN) which is significantly deeper than the ANN used in ThermoSense, and we let the CNN learn and extract the features of the thermal images instead of manually setting the input features.Moreover, we solved the problem of over/under-counting occupants when different sensors detect an occupant at the edge of their sensing range, by aggregating all the sensor data in one big zone image.Both ThermoSense and TODOS use at the end of their respective processing pipelines an EWMA low-pass filter to get rid of spurious occupancy readings over time.Both [6,35] are used in our performance evaluation for comparison with the state-of-the-art schemes available in the literature.As a thorough comparison, we also tested and analyzed TODOS's performance under two different standard CNN architectures.

TODOS OVERVIEW
Fig. 1a shows TODOS's processing pipeline.The sensor nodes are deployed in the ceiling of a particular room, with a particular deployment of sensors depending on the size and shape of the room.All the nodes sense a specific part of the room, sample every 20 seconds both the PIR and thermal array sensors, and transmit the thermal image using an 802.15.4 Zigbee radio to a central gateway (a TelosB mote attached to the USB port of a CPU).The gateway, then, performs a radio-to-serial conversion and transfers thermal images to the processing server.Within the processing server, the images are concatenated based on a preconfigured order of images (that depends on the node id).This new larger thermal image is fed to a convolutional neural network doing classification on the number of occupants.Finally, this output is passed through a lowpass EWMA filter to remove high-frequency noise that may appear from time to time and improve the overall estimate.

Hardware Sensor Components
TODOS sensing hardware consists of a PIR sensor and a thermal sensor array connected together to a TelosB sensor mote (see Fig. 1b).
The PIR delivers a binary indication of occupancy, and it allows for saving energy when no activity is detected.The PIR has a detection viewing angle of 102 0 × 92 0 , horizontally by vertically.It is able to detect motion up to 12.The other sensor is a thermal array sensor to determine the number of occupants underneath (see Fig. 1c).The sensor is a Panasonic Grid-Eye thermal array, which consists of an 8 × 8 thermal sensor that can cover an area of approximately 2.5 × 2.5 with a ceiling height of 3.It is capable of measuring 64 temperature values, with temperatures ranging from −20 0  to 80 0  with an accuracy of ±2.5 0 .

Thermal Background
A background subtraction between a new thermal image (with one or more occupants) and the thermal background when there were no occupants is done to infer occupancy.This operation works because the temperature of the background objects (e.g.chairs, desks, floor, etc.) tends to be around the conditioning temperature of the room, which usually oscillates between 22 0  and 26 0  for HVAC-conditioned spaces in commercial buildings.Since the temperature of a human head and limbs is around 37 0 , the difference is significant enough to work for the detection of occupants.
A thermal background map is maintained to distinguish between warm objects like computers or refrigerators and occupants.If the PIR has detected no movement for a certain period of time (e.g.10-15 minutes), the background gets updated and the standard deviation is also updated for each grid component.Note that if an occupant remains in a space for a quite long period of time, it is possible that the background changes during this period.To adjust the background in this case, a few grid points are chosen with the lowest temperatures as the scaling factor.The points with the lowest temperatures are most likely unoccupied and can be used to update the old background.Those scale points are then divided by the old background and averaged to find a multiplier to update the previous background.
We perform background subtraction by using the difference between the background and the current thermal map and applying a standard deviation-based threshold to this difference, we can create an 8 × 8 binary matrix representing the significantly warm points from the thermal map.

Pre-processing Data
After reception of all the 8 × 8 thermal images, they are concatenated using a preconfigured order based on the placement of the sensors in the room.For example, for a square room with 4 sensors, the concatenated input image will be 16 × 16 pixels (using node ID to find the correct placement).In other words, the final concatenated thermal image is a thermal representation of the entire room/zone.We considered two different sets of images to input to the classifier.Each pixel image could be either in (1) a binary format, which is the result of the background subtraction and thresholding process explained in § 3.2, or (2) a grayscale of 8 bits (i.e.256 values), that is obtained through the background subtraction but not thresholded.Fig. 2a shows an example of a concatenated greyscale thermal image (40 × 16 pixels) representing the entire room, assembled from nine (9) individual thermal sensor images (8 × 8 pixels).Both types of input data (i.e.binary and greyscale images) get processed through the deep learning module to evaluate the performance of each.

Neural Network Classifier
Fig. 2c shows the convolutional neural network (CNN) architecture used in our paper, which is Residual Network-18 (ResNet-18) [19] capable of improving the efficiency of CNN while minimizing the errors.In general, ResNet family architectures start with a convolutional layer of 64 kernels with size 7 × 7 i.e. (7 × 7, 64), followed by a 3 × 3 max-pooling layer.In ResNet-18, it is then followed by 16 convolutional layers of: a (3 × 3, 64) layer, another (3 × 3, 128) layer, then a (3 × 3, 256) layer, and a fourth layer of (3 × 3, 512), and each layer is repeated 4 times.At the end of the ResNet architecture, there is an average pooling layer followed by a fully connected layer with 1000 nodes using the Softmax activation function.ResNet is made up of a series of residual blocks with skip connections to address the vanishing gradient problem.Fig. 2d shows a typical residual function in ResNet-18.

EWMA Filter
While the classifier provides a good estimate of the occupancy count on a single thermal image, it is a memoryless system, in the sense that its output is not affected by previous inputs.In our application, we sample occupancy every 20 seconds.However, human occupancy over time has specific time patterns that can be leveraged in order to improve accuracy.For example, it is rare for the occupancy of a room to be 8 people, and then be 1, and then 8 again, and so forth.So spurious values of a classifier could be improved by applying a filter.To get a smooth final occupancy estimate and remove any spurious values over time, we apply an Exponentially Weighted Moving Average (EWMA) low-pass filter.The filter is defined as: where  () is the current occupancy estimate, ŷ () is the current occupancy sample from our classifier at time  and  ( − 1) is the previous occupancy at  − 1.  is a real value between 0 and 1.

Data Augmentation and Model Training
The dataset in a neural network model needs to be rich and sufficient, so the model performs better and more accurately.However, collecting and labeling that rich data can be a costly process.Transformations in datasets via Data Augmentation techniques allow us to reduce those operational costs [23].In this work, we have used width shift, height shift, shearing, zooming, rotation, horizontal flip, and vertical flip geometric augmentations.These types of geometric transformations are safe in our case since they represent occupants in different positions with respect to the sensor position, and for different ceiling heights that can be found in the different rooms in a building.Fig. 2b shows some examples of changes we applied to our existing dataset.In our work, we used an augmented dataset containing 7000 to 10000 thermal images for each occupancy case between 0 and 11.We partition our data into training (80%) and testing (20%) sets.The data used for training is of the form { − →   ,   }  =1 with − →   → R  representing the thermal input images, and   → R representing the occupancy values.For hyperparameter optimization, we use a 5-fold cross-validation with grid search to estimate the generalization performance.Table 1 shows the validated hyper-parameters, the values tested by grid search, and the values that minimize the validation error.Once the optimal hyperparameter values for the neural network are obtained, we proceed to find the optimal  value for the EWMA filter through another grid search.We found that a value of  of 0.15 works best.Note that the best way to perform this would be to jointly optimize the cross-validation error for both the neural network and the EWMA.However, the optimization libraries available do not easily allow us to optimize both the neural network and the EWMA together as we would like.We have left this issue for future work.

PERFORMANCE EVALUATION
In this section, we proceed to experimentally evaluate TODOS performance.We considered two cases of TODOS estimates, trained with binary and greyscale images in the training set as well as concatenated thermal images with different room shapes and sizes.We also compare TODOS performance with two different closely related state-of-the-art thermal occupancy sensing schemes [6,35], using different classifiers used in those schemes.For a thorough comparison, we tested TODOS's performance under another standard architecture of VGG-16 [33], as well.

Experimental Setup
The majority of the experiments were performed in a university research laboratory of approximately 51  2 area with a rectangular shape, having 4 cubicles on each long side of the rectangle, and a meeting table in the middle with 6 chairs around it.There are also file cabinets near the entrance of the lab.Nine (9) thermal occupancy sensor motes were deployed to cover the entire lab.We considered two scenarios of low occupancy (LO, up to 3 occupants) and high occupancy (HO, up to 8 occupants), using a single sensor occupancy covering the area underneath i.e. 2.5 × 2.5, and using all the sensors throughout the laboratory (i.e. 9) covering the entire area studied.Table 2 shows the number of experimental days with ground truth collected via direct observations of real occupancy for each scenario.During the experimental days, the room started being occupied at around 10:00 a.m., and the last occupants left at around 5:00 p.m.The thermal sensors were deployed on the ceiling covering the entire lab.To avoid missed detection due to blind spots, we deployed the thermal nodes sensing coverage areas very close to each other.Therefore, some sensor overlapping is inevitable when an occupant is located at the edge of two adjacent sensing areas.In this case, it may be counted twice, one by each thermal node or not counted at all, if the number of active pixels in the thermal images is not enough to trigger a count.The sensor nodes perform the background update and background subtraction operations described in § 3.2 and transmit the final thermal image to the gateway node using the low-power 802.15.4 radio.We collected occupancy sensing data every 20 seconds, and we got around 34,000 data points (i.e.thermal images) in total for our main test bed.Note that this was done to get statistically significant results.For HVAC building control, the occupancy estimate should be ∼5 to 15 minutes, to match the actuation interval commonly used in buildings.

Exploratory Analysis
Fig. 3 shows two examples of time-series occupancy in two days, for our main test bed in high occupancy (on the left) and low occupancy (on the right) scenarios.The top figures on each sub-figure show the occupancy estimate for the ThermoSense system using Linear Regression, the middle figures show TODOS estimates trained with binary images, and the bottom figures show TODOS estimates with greyscale images.The data is also concatenated in a single thermal image covering the entire lab as explained in § 3.3.As we see in Fig. 3, TODOS corrects for most of the occupancy estimate errors made by ThermoSense.Examples of this could be seen at [∼1800 sec, ∼2000 sec], and [∼11100 sec, ∼12000 sec] in Fig. 3a, and at [∼2600 sec, ∼2700 sec] and [∼7500 sec, ∼7700 sec] in Fig. 3b.This is because the middle and bottom figures are obtained through data augmentation and thermal image concatenation to address the issue of sensor overlapping when an occupant is located at the edge of two adjacent sensors' coverage areas.In both scenarios, TODOS with greyscale images provides a slightly better performance than  when we train it with binary images, as the greyscale image dataset is more robust against the unseen cases with its wider range of values for each thermal pixel.

Accuracy
Table 3 presents the classification accuracy as the main performance metric result, evaluated for each scheme tested in both low and high occupancy scenarios for single and all sensor cases.We evaluated two classifiers used in ThermoSense (i.e.linear regression and KNN), as well as Support Vector Regression, the Naive Bayes and Multi-Layer Perceptron artificial neural network (MLP) examined in [35] as the state-of-the-art techniques for comparison.In the case of support vector machines, we tried an extended version; support vector regression (SVR) with a non-linear kernel to be more flexible and robust having floating values in the output instead of integers [32].The output will then be rounded to the nearest integer.The MLP classifier trains the model via back-propagation and the loss function is assumed to be cross-entropy in order to be comparable with the proposed classifier in this paper.We considered the same input features from the ThermoSense work (i.e. total number of active pixels, number of connected components, and size of the largest component) for all the state-of-the-art classifiers.It is noted that all results for these classifiers are obtained with no image concatenations.Also, we tested TODOS with VGG-16 architecture presented in [33] along with ResNet-18.We evaluate TODOS using these two architectures with both binary and greyscale images.
All results regarding TODOS are achieved through thermal image concatenation and data augmentation, except when using a single sensor (no concatenation).Based on Table 3, the accuracy values for the low occupancy scenario (in both single and all sensor cases) are generally higher than the ones for the high occupancy scenario.That is due to the limited number of occupants in low occupancy scenarios that may lead to experiencing much fewer unseen cases in the training set.Considering the single sensor case alone, our proposed TODOS scheme is the only approach that delivers an accuracy of 100% in both low and high occupancy scenarios.The reason would be the data augmentation used by TODOS makes the final classifier more robust against unexpected shapes of active pixels and their combinations coming from the thermal image data.In addition, we process the input images after concatenating them based on the sensor placement in the room (automatically done based on sensor position recorded at deployment time).Furthermore, there is a more significant improvement when we use TODOS trained with greyscale images than when we train it with binary images.This way, we can correct most of the errors, especially in high occupancy scenarios for the case of all sensors (i.e.3% and 4% improvements in VGG-16 and ResNet-18, respectively).

Generalization to other Rooms
In this section, we discuss and evaluate the generalization power of our TODOS model to other rooms that have no labeled training data.Different rooms have different sizes and shapes.This means that the concatenated thermal images used as input for our models are  different for each room/zone, which implies that we need different training data to train the models for each room.The fundamental question that we would like to answer is whether we can use labeled data obtained in a room/zone, and even a different building, and modify it to use it for training in a different room/zone.For this, we launched an experimental campaign, deploying occupancy sensors in 10 rooms in 3 different buildings, all of different shapes and sizes.Table 4 shows the parameters of the rooms tested.Perhaps more importantly, all the rooms in the different buildings have different zone air diffusers, which could change the thermal signature detected by the sensor from building to building.Fig. 4 shows the shapes and sizes of the different rooms tested.Each of the inner squares in the rooms represents the area covered by a single sensor.For each of these rooms, we collected ground truth occupancy data, such that we could train the models with data from the specific room.We call this case, the local data case.In addition, we use data from the SE2-314 room (our main test bed).This room not only has the ground truth data for the total occupancy but also the occupancy sensed by each sensor in the room.Using the data from this room, we "manufacture" a training set with a subset of the data when the target room is smaller (i.e.we only take a small subset of the occupancy data representing a smaller room), or we duplicate data when the target room is larger (i.e.we increase the data size to fit the larger room).We call this second method main room data.All the data used (i.e.local data and main room data) is augmented with the techniques discussed in § 3.6.
A special case happens when the target room has an irregular shape, like an L shape.In this case, the training set is of the form allows us to zero out occupancy inputs that do not exist in the room due to its irregular shape.
Fig. 5 shows the accuracy results for all rooms for binary and greyscale images for VGG-16 and ResNet-18 models under two different conditions of (1) trained with local data, and (2) with the main room data.We also include the results of ThermoSense trained with local data for a baseline.In general, we see that grayscale images produce better results than binary images.In addition, using the local data produces more accurate results when using the ResNet-18 models.However, the results are only marginally more accurate than when using main room data ( 1% more error in the worst case).This discovery is very significant.It means that with a single curated training set, we are able to produce good enough training sets to train models in rooms of different shapes and sizes obtaining very similar performance than when using ground truth data in situ.This increases the applicability of our technique since we could potentially pre-train any model before deployment by only knowing the size and shape of the room in question.

Training Data Size
Due to the non-linear nature of TODOS architecture, it is expected to have high variance in the occupancy predictions, while at the same time being able to capture relevant relations between features and target outputs.In general, this comes at the expense of requiring  more training data.Thus, a good model is one that requires as little training data as possible to achieve acceptable accuracy.
Fig. 6 shows how the performance of TODOS as a function of total training data in case of being trained with the local data for a specific room (i.e.SE2-230V), and the data from our main room (i.e.SE2-314).As Fig. 6 shows, in general, TODOS with greyscale images delivers a higher accuracy at the expense of requiring more training data to converge to stable accuracy.Also, the proposed CNN architecture in TODOS (i.e.ResNet-18) provides higher accuracy with less amount of training data compared to VGG-16 architecture.This is due to the skip connections through the residual blocks in ResNet addressing the vanishing gradient issue that makes ResNet be faster and more accurate than VGG.

ENERGY AND QUALITY OF COMFORT
We analyze the energy and quality of comfort impact on HVAC building control using an occupancy-based controller.The analysis is based on an accurate occupancy estimate.The energy analysis was simulated based on: (1) the time-series occupancy information for two data schemes of TODOS trained with local data, and with the main room data, and also (2) the amount of over/underheating/ventilation as one important input factor.We divided the analysis into over-counting and under-counting scenarios to study the effect of false positive/negative occupancy counts on energy consumption and temperature effectiveness (as a quality of comfort metric).We provided the input occupancy to the Energy+ simulator using the Blended Markov chain (BMC) occupancy model presented in [15].The input model to the Energy+ includes 3786 and 3773 over-counting samples and 2252 and 2637 under-counting samples for TODOS trained with local data and main room data, respectively, with a total number of samples of 13088.Based on our Energy+ simulation results, we discuss the energy usage and quality of comfort in the following sections.

Energy Consumption
The impact of both data schemes (i.e.TODOS trained with local data, and with main room data) on energy usage is investigated in this section.The building's HVAC system includes a single duct terminal reheat composed of Variable Air Volume (VAV) boxes, and an Air Handler Unit (AHU).The AHU has heating and cooling coils together with a fan, that can change the air's temperature.The VAV boxes take this pre-conditioned air from the main duct and control the airflow for each zone.The power consumption sources include the supply fan, heating coils, and cooling coils.The HVAC control method is an Energy+ built-in rule-based control method based on occupied/unoccupied zone information.In the working time (i.e.07:00 am -06:00 pm), the heating and cooling setpoints are 21.1°C and 23.9°C, and in the non-working time (i.e.6:00 pm -07:00 am), they are 12.8°C and 40°C, respectively.
The monthly energy consumption for two data schemes in two scenarios of over/under-counting is shown in Fig. 7.The Energy+ controller operates with occupancy information provided by each occupancy data scheme, with the over/under-counting values to the ground truth for each scheme.In the over-counting scenario, the energy consumption resulting from both data schemes are almost similar (see Fig. 7a), since those schemes have very closed values of over-counting.In this scenario, the HVAC controller will tend to consume more energy by trying to condition zones that may be empty, but TODOS occupancy estimation informs that they are occupied.In terms of under-counting, both schemes provide a very similar energy use over the entire year.In the under-counting scenario, the HVAC system will tend to float the temperature in zones that it believes to be empty, even though they are occupied.In this case, the HVAC controller consumes less energy, however, that would be at the expense of degrading the quality of comfort.

Quality of Comfort
In this section, we analyze the impact of both data schemes on the building's temperature effectiveness.Temperature effectiveness is defined as the ideal temperature that should be provided to the occupants for quality of comfort.To be ASHRAE [12] compliant, the setpoint temperatures must fulfill the Predictive Mean Vote (PMV) condition: −0.5 ≤  ≤ 0.5, where PMV is calculated by Fanger's equation [17].PMV predicts the mean thermal sensation vote on a standard scale for a large group of people.ASHRAE developed the thermal comfort index by using coding -3 for cold, -2 for cool, -1 for slightly cool, 0 for neutral, +1 for slightly warm, +2 for warm, and +3 for hot.PMV has been adopted by the ISO 7730 standard and it recommends maintaining PMV at level 0 with a tolerance of 0.5 as the best thermal comfort.Fanger's PMV depends on temperature, humidity, air velocity, occupants' clothing, and activity.We get the best temperature when the PMV is 0 (see the PMV equation above).Then, we compare the ideal temperature with the temperature under two different data schemes.For this analysis, we examine the root mean square error (RMSE) of the zone temperature difference per person between these two values.
The product of room temperature RMSE and the number of occupants for two data schemes under the over-and under-counting scenarios is shown in Fig. 8.In the over-counting scenario, both data schemes result in a better quality of comfort (i.e.smaller RMSE deviation) compared with the under-counting case.However, this higher quality of comfort comes with a high price tag, since this is obtained by over-conditioning the spaces and using a lot of energy as seen in § 5.1.In the under-counting scenario, TODOS produces a similar quality of comfort for both data schemes (even with comparable energy consumption as seen before).The energy savings in the previous section in the under-counting scenario, come at a significant cost in the quality of comfort, as the HVAC controller will save energy by not conditioning certain zones that are in reality occupied.Therefore, that trades off the energy consumption for lower occupant quality of service.

DISCUSSION
The most important discovery was how well the models generalized by using the main room data, even in completely different rooms, different buildings, with even different air diffusers, with only a very small degradation in performance compared to using local data.We believe this works mostly because of two things.First, the type of thermal images is simpler than more complex and high-resolution images available elsewhere (e.g.web images), which makes it simpler for a well-trained neural network to recognize occupants and get very accurate counts.Second, the geometric transformations used on the augmented dataset allow compensation for different occupant body sizes, different ceiling heights, and different sensing deployment strategies, among other factors, even if they do not appear in the original labeled data.This is an important point, mostly because the transformations applied are mostly "safe", i.e. they do not change the occupancy count, allowing to pay the price of data collection and labeling once and replicate it elsewhere.
However, there may be limits to the above.While we have tested in different buildings with different zone air diffusers and supply vents, there are many types that we did not test.It may be possible that a specific air diffuser type could change the thermal signature detected by the sensor from building to building.We believe we could compensate for these cases using a color transformation that we did not try in our data augmentation work.We have left this more detailed analysis for future work.
Another thing to discuss is that when applying data augmentation techniques for different room shapes and sizes, larger errors may be introduced due to over/under counting between adjacent training patches.For example, if we have augmented training data that consists of 16 × 16-pixel images concatenated together to form a new image of 32 × 16 pixels (a new rectangular room from two squared patches), some of the image relations will be broken.While an occupant between two sensors within each square may be corrected counted, this will not be the case for an occupant between the two different squares.So a slight degradation in performance should be expected.This can be seen in the results of the Castle Building, which showed the largest degradation in performance between the local data and the main room data (close to 1%).However, this degradation should be acceptable since it does not significantly impact the energy use and quality of comfort as shown in § 5.

CONCLUSION
In this paper, we present TODOS, a novel system for estimating occupancy in intelligent buildings.TODOS uses a low-cost, lowpower thermal sensor array along with a passive infrared sensor.We introduce a novel data processing pipeline that allows us to automatically extract features from the thermal images using an artificial neural network.Through an extensive experimental evaluation, we show that TODOS provides occupancy detection accuracy of 98% to 100% under different scenarios.In addition, it solves the issue of occupancy over/under-counting produced by overlapping sensing areas when using multiple thermal sensors in large rooms.This is done by treating the entire area as a single input thermal image instead of partitioning the area into multiple thermal images individually processed.Finally, TODOS introduces a data augmentation technique that allows the generation of training data for rooms of different sizes and shapes, without requiring specific training data from each room.Using these data, TODOS can train specifically designed neural networks optimized for any room size and shape, and achieve almost the same level of occupancy detection accuracy in rooms where experimental labeled training data is available, making it a viable solution that generalizes to the different rooms in large buildings.
(a) TODOS processing pipeline.The data used for model training depends on the shape and size of the room the sensor(s) are deployed.(b) Thermal sensor node deployed in the ceiling.(c) The area covered by a sensor mote with an occupant.

Figure 1 :
Figure 1: TODOS Deployment and System Overview array sensor), but with a different heat map array size and different classifiers such as MLP, SVM, Naive-Bayes, KNN, and linear regression processing pipeline.The reported performance is similar to the one reported by ThermoSense.In our work with TODOS, we do a similar preprocessing stage to ThermoSense, including active pixel detection and background subtraction.However, this information is fed into a deep convolutional neural network (CNN) which is significantly deeper than the ANN used in ThermoSense, and we let the CNN learn and extract the features of the thermal images instead of manually setting the input features.Moreover, we solved the problem of over/under-counting occupants when different sensors detect an occupant at the edge of their sensing range, by aggregating all the sensor data in one big zone image.Both ThermoSense and TODOS use at the end of their respective processing pipelines an EWMA low-pass filter to get rid of spurious occupancy readings over time.Both[6,35] are used in our performance evaluation for comparison with the state-of-the-art schemes available in the literature.As a thorough comparison, we also tested and analyzed TODOS's performance under two different standard CNN architectures.

Figure 2 :
Figure 2: (a) An example of a concatenated thermal image of an entire room (40 × 16 pixels), (b) some transformation of the 8 × 8 array heat map on the occupancy level 3 due to data augmentation, (c) ResNet-18 architecture used in TODOS for a room with four 8 × 8 sensors forming input and output images of 16 × 16, and (d) a typical residual function used in ResNet-18.

Figure 3 :
Figure 3: All sensors time-series data in (a) high occupancy, and (b) low occupancy scenario, in the main test bed.

Figure 4 :
Figure 4: Room shapes and sensor arrangements for all the rooms tested.

Figure 5 :
Figure 5: Accuracy results for all rooms for binary and greyscale images for VGG-16 and ResNet-18 models under two different conditions of: trained with local data, and with the main room's data.

Table 4 :
Additional rooms to test the generalization of TODOS, including the area, # of data points, and # of sensors representing the thermal input images, and   → R representing the occupancy values.− →  ′  = ( − →   ) =   × − →   with  being an indicator vector and − →   being the training data from the main room.This vector simply

Figure 6 :
Figure 6: Accuracy as a function of the training data for room SE2-230V trained with (a) local data, and (b) main room data.

Figure 8 :
Figure 8: Monthly temperature RMSE for over-counted occupants for (a) over-counting, and (b) under-counting scenarios.

Table 1 :
Cross-validation parameters used in TODOS.

Table 2 :
Total experimental days in the main test bed, and associated data points collected for each occupancy scenario.

Table 3 :
Performance evaluation in terms of occupancy estimation accuracy for different occupancy schemes under the scenarios of low (LO) and high (HO) occupancy and for two cases of using a single sensor and all sensors.