Gradual Semi-Automatic Annotation and Hybrid Model for Effective Detection of Garbage Bags

As the amount of unauthorized garbage dumping keeps rising, the existing labor-intensive approach for monitoring dumping actions unleashes the study for automatic detection systems. However, since previous studies focused on explicit dumping behavior, they have difficulty applicable to real-world situations. To find implicit dumping behavior, we propose combining object detection and color classification models to detect unauthorized garbage bags. Our approach employs a gradual semi-automatic annotation method inspired by semi-supervised learning to create a model. After using gradual semi-automatic annotation to extract training data, we train a suitable model for the CCTV datasets. We detect only garbage bags attached to people using IoU (intersection-over-union) distance to reduce false detection. To evaluate the effectiveness of the proposed method, we measured the model’s accuracy and compared it with the existing model. The experimental evaluation demonstrates that the proposed annotation method shows 92.8% model accuracy and 90.9% annotation accuracy. In addition, the classification model using data growth and reduction confirms that the F1-score is increased by 6.3%.


INTRODUCTION
Rapid economic and social development has greatly improved the quality of life but has also resulted in a significant amount of garbage.Local governments in Korea also have made efforts to build authorized places to collect garbage more smartly, and economically.For monitoring and detecting the factors related to the cleanliness or garbage of the streets or neighbors, a large number of surveillance cameras are being deployed in authorized places, namely "Clean House (CH)".People should put their trash in standardized bags and throw it into CH, but a lot of it is being thrown into CH using unauthorized bags.The related human resources are dedicated to observing these cameras 24 hours a day or finding events of garbage dumping by exploring streaming images collected from them.Generally, video streaming contains a large volume of consecutive images, but the amount of useful images is very small.Therefore, as the volume of data from surveillance cameras keeps growing, these continuous roles increase economic overhead and are immensely laborious, so a system to detect unauthorized garbage bags is highly needed at CH.
The main role of CH monitoring systems is to support the human operator in detecting person-held garbage or unauthorized bags easily and effectively.Because basic object detection models use previous well-refined datasets, they are not compatible with realworld CH.Therefore, we first collect a real-world CCTV video streaming dataset from CCTV stations installed in four CH spots in South Korea and analyze datasets from CH. Figure 1 shows realworld people's actions related to garbage in CH spot (Figure 1a), and the neighboring street of CH spot (Figure 1b).There are several challenges to developing a system using exploiting these datasets effectively.
One of the key challenges is to detect person-held objects, especially unauthorized garbage bags, with data sets collected from CCTV images.Object detection algorithms have advanced rapidly, especially after realizing the efficiency of deep learning and CNN combination models on feature extraction [9].However, these detection algorithms need significant annotated data for training.Our detection and classification algorithm also requires a significant amount of data to accurately detect people and garbage bags in various environments and angles.The CCTV datasets used in this paper contain meaningful data but are extremely limited in quantity.Thus, it is necessary to inspect the content of all CCTV image datasets to obtain a small number of meaningful data.To solve this problem, we employ a semi-automatic annotation technique that combines human annotators and object detection models.This technique allows for rapid annotation of large amounts of data.
Secondly, to develop effective garbage monitoring systems, various studies have progressed using object detection, tracking, and pose estimation.Most of these studies assume that dumping behavior is visible [16].However, in Figure 1a, dumping behavior is partially visible in actual CCTV footage or invisible, as shown in  To address this issue, we propose a method to identify unauthorized garbage bags through the color classification of garbage bags inspired by the South Korean government's standardized garbage bag policy.In South Korea, each local government designates different colored garbage bags based on the type of waste, facilitating waste disposal.We exploit this system to identify unauthorized garbage events.
We summarize the main contributions of this paper as follows: • We propose a gradual semi-automatic annotation method with a pre-trained model for four spot CCTV image datasets to reduce the time consumed for data annotation and update the model.• We increase computational speed&detect meaningful data by reducing unnecessary object detection using post-processing techniques such as IoU Distance.• We use a hybrid deep learning model that performs color classification and object detection to improve accuracy and reduce detection errors.
This study aims to identify unauthorized garbage in real-world images where garbage dumping behavior is not visible.To verify the detection of unauthorized garbage bags using object detection and color classification models, CCTV image datasets near CH were used, and datasets were created using semi-automatic annotation techniques that fit the characteristics of CCTV image datasets.This study focused on speculating whether it was unauthorized to recognize the color of the bags, not the dumping behavior, using the Korean government's standardized garbage bag policy.Therefore, it is expected to be used to more effectively solve the unauthorized garbage problem in various local governments by helping users monitor images by storing events according to the color of garbage bags.

PRELIMINARIES 2.1 Video surveillance for unauthorized events
Video surveillance has been actively studied in computer vision research.Research in various fields has progressed to support monitoring observers, such as image segmentation, foreground detection, object detection, tracking, motion analysis, and abnormal behavior detection [4,7,14].These studies have been integrated into an intelligent system that uses data from CCTV images to detect abnormal events, such as vehicle license plate recognition, theft identification, violence detection, possibility of explosion, and unauthorized dumping [14,17].Among these, research on unauthorized garbage has been actively conducted recently, such as detecting garbage dumping using the distance between joints and garbage bags [16], tracking garbage dumping persons by combining object tracking and pose estimation [8], tracking vehicles that dump garbage [11], recognizing garbage on the beach using UAVs [10] and detecting landfills using satellite pictures [15].However, these prior studies cannot be used if garbage dumping cannot be observed.Also, prior studies are slow because they use various methods such as object detectors, point estimation, and object trackers.Therefore, previous studies may not work properly in real-world scenarios.

Semi-automatic annotation
Object detection and tracking technologies have been developed rapidly, but because of the characteristics of supervised learning, a large amount of data must be annotated to create a custom model.These annotation tasks require a lot of human and time resources.To address this problem, various semi-auto annotation technology has been proposed, such as research using GAN (Generative Adversarial Networks) to improve annotation accuracy for small amounts of image data [13], range-MBR algorithm to extract accurate BBOX (Bounding BOX) [18], effective clustering annotation research using a small amount of annotation data with semi-supervised learning [3], and image segmentation using active learning methods [12].However, there is less motivation to use GAN because the current system already has a lot of data, and object rotation is not an issue as the image captures a person walking on the road.In addition, semi-supervised learning is unsuitable because of the presence of various colored garbage bags that can change based on lighting or weather conditions.A new approach is required to annotate this data with these characteristics rather than relying on conventional methods.

PROPOSED METHOD 3.1 Gradual semi-automatic annotation
As shown in Figure .2, the CCTV data used in this study contains various situations, including weather conditions and light reflection.This requires a more careful annotation process, which incurs more time.Analyzing all images, regardless of annotators' expertise, is difficult in this situation.Therefore, we propose a gradual semiautomatic annotation method that combines the domain expert's knowledge and an object detection model.As shown in Figure 3, the gradual semi-automatic annotation method is divided into four main steps: initial learning, image extraction&annotation, inspection, and transfer learning [1].
• Initial learning: After extracting human class images for as many days as the user specifies, create a user annotation about a garbage bag.• Image extraction and annotation: Using the last learned model, the images that pass rule matching are extracted, and annotated data is stored.
rule 1: Person&Trash classes are included in the frame rule 2: Each class has a bigger than 0.75 Confidence Score rule 3: Trash class is attached to the person class • Inspection: Reviewing the extracted images and annotation information, domain experts verify and correct the annotations to inspect for any missed or false detections.
• Transfer learning: To increase the accuracy and diversity of the model, transfer learning is performed by including the tested data in the last trained model.The characteristic of the annotation method proposed in this paper is using a pre-trained object detection model to create human class annotations.Also, we use an anchor-free based yolov8 model to annotate various sizes of objects.After that increased annotation accuracy through human cross-validation.However, compared to a well-trained model, an early learning model that lacks data is less accurate, putting more pressure on users.Therefore, it is designed to increase accuracy and reduce the pressure on users by advancing the model by gradually increasing the amount of detection data as shown in Equation 1, rather than looking at all the remaining data at once after initial learning.

Data extraction using IoU distance
We used an object detection model with gradual semi-automatic annotation to detect humans and garbage bags.However, the purpose this study is not to detect garbage bags but instead to detect garbage bags held by people.Existing object detection models have class filtering functions that are designed to detect specific classes.However, these function has a limit that does not find out the relationship between people and trash bags.Also, there are many cases of garbage bags that have been abandoned on the road because CCTV was installed near the CH.In this case, the existing function has a limit that saves unconditionally every time a filtered class appears, which increases unnecessary data.Therefore, in this paper, we propose a method of extracting data using IoU Distance as shown in Equation 2. This technique saves the positions of BBOX for each class and then compares the IoU distance between each class BBOX.As shown in Equation 3, if the IoU distance exceeds 0, we classify them as attached to a person and save trash BBOX information.But if it equals 0, we classify them as garbage bags dumped on the road and ignore trash BBOX.As shown in Figure .4, we applied this to the object detection model and saved the detected object's location for each class and referred to Equation 2 calculate the IoU between trash and human object and confirmed that it results in creating different BBOX depending on whether people and trash are attached or separated.

Color detection
Finally, we want to identify unauthorized garbage by storing events for each color.Therefore, we need to classify the color of garbage bags when the person holding the garbage bag appears.However, although the object detection model is effective in detecting various objects in images, there can be false positives about color detection due to some of the person's clothes, as shown in Figure 5. Also, the image classification model is more accurate in color classification but cannot detect various objects.Therefore, this paper proposes a hybrid deep learning model that combines the object detection and image classification models.The operation method of the hybrid deep learning model is as follows.First, rule matching of 3-1 and calculating IoU distance of 3-2 are performed to extract images of garbage bags.The extracted images use the following two additional rules for color recognition.
• Trash Image Size is bigger than 64*64 • Trash Object aspect ratio is between 0.5 and 2.0 Second, the extracted image is classified by using the color classification model.We note that there is no guarantee that all the extracted garbage bags' colors are accurately classified.Also, the color classification model cannot be performed correctly if the extracted garbage bag is a new color.Therefore, if the confidence score specified by the user is not exceeded, it is classified as an uncertain class to prevent false detection and store data for further learning.

EXPERIMENT 4.1 Datasets
In this study, CCTV data collected at a specific CH was used for a month from May to June 2023, and each image has FHD (1920x1080) quality and 30 FPS, consisting of 77,760,000 images.Installed CCTV has a night-vision function turned on and off by light.If installed in a dark place, the night vision function is turned on, it is unsuitable for color recognition.Therefore, in this study, we installed CCTV near streetlights to collect color images of both day and night time.In this data, the useful data is only a small portion, about 0.2%, and most images are either no one or only passersby.In addition, four types of garbage bags are available in the current location: pink, white, green, black, blue, and other bags are considered unauthorized Therefore, we establish the data collection rules for situations where people and garbage bags were detected for useful data collection that will be used to train models.

Data study using gradual semi-automatic annotation
In this study, one month's CCTV image datasets were used, and to learn all of these datasets, we specify the amount of data for initial learning.The more initial training data, the more accurate models can be made with less iteration, but annotation takes a lot of time.On the other hand, if less initial learning data is used, it takes less time for annotation, but more learning iteration should  be needed.Therefore, in this study, initial learning is set as a day based on Equation 1, and all data is learned through five iterations.
To measure the accuracy of the trained model, we use mAP (Mean Average Precision).mAP (Mean Average Precision) is a metric used to measure the accuracy of Object Detection models.After Therefore, we evaluate the performance using two days' worth of data that were not used for learning.The measurement results increase not only the accuracy of the model but also the accuracy of automatic annotation as the amount of dataset increases, as shown in Table 1.Using this method, it took only about 5 days to learn all 1 month's CCTV image datasets.This result demonstrates that this method drastically reduced the speed compared to the basic annotation method and created an accurate object detection model.

Data augmentation and color classification
Data using five-rule matching and IoU distance were extracted and stored to create a color classification model.The five colors to be identified in this study are black (unauthorized), blue, green, pink, and white, and have been moved to each folder for training.The ).However, this dataset has a limitation where a particular class occupies the majority of the dataset, which is likely to lead to class imbalance problems [2].To solve this problem, data augmentation and reduction were used together in this paper.First, the amount of pink class data, which occupies the majority of the dataset, was reduced to 15,000, and the other class data was augmented.As shown in Table 2, brightness, blur, flip, and rotation were used for data augmentation, and for the purpose of color classification, color augmentation was not used or only slightly changed.After the augmentation, the number of images increased to 9322 black, 9292 blue, 11341 green, 14000 pink, and 12,058 white.In this section, we want to examine the accuracy of classifying the color of garbage bags and the problem of class imbalances.We use Equation 4 to measure accuracy, precision, and recall.However, data imbalance is difficult to measure with the previous three performance factors.Therefore, in this paper, we measure the class imbalance problem using an F1-score that combines precision and recall, as shown in Equation 5 [5,6].To measure model accuracy and class imbalance problems, datasets with and without data augmentation and reduction were learned, respectively.The dataset was divided into Train/Val/Test (7:2:1) and trained 100 epochs.The measurement results for each model are shown in Table 3, and it was measured that has high accuracy in both models.However, the normal model had a class imbalance problem, as shown in that the F1-score was lower than the accuracy except for the pink class.On the other hand, models that augmented data showed higher accuracy and F1-score than normal models.This result shows that using data augmentation and reduction mitigated the class imbalance problem, resulting in the improved model's generalization performance.

Hybrid deep learning model
Previous experiments created a color classification model and an object detection model.However, a comparison with the existing detection model is required to verify whether the proposed model is effective.Existing models learned only class for trash and human class, so there was no color class.Therefore, the semi-automatic method was performed using the classification and detection models.Unlike previous situations, we had two well-trained models this time, so we performed color class annotations and inspections on all datasets at once.The training was proceeded by dividing it into Train/Val (7:3) using the inspected dataset.We measure the detection accuracy of the model using mAP, and the classification accuracy using the F1-score.The proposed model uses detection and classification.Therefore, dual verifications must be performed, the classification accuracy is compared by multiplying Precision, Recall, and F1-Score from the object detection and classification model.Table 4 shows the accuracy comparison of the models, and the proposed model shows better performance in mAP.Also, the proposed model in this paper performs better for precision, recall, and F1-score.

CONCLUSIONS
This paper proposes an effective detection method for unauthorized garbage inspired by the Korean government's standardized garbage bag policy.We used a gradual semi-automatic analysis to annotate the CCTV dataset, IoU distance for valid data extraction, data augmentation, and reduction to prevent class imbalance and a hybrid deep learning model for more accurate color recognition.Unlike other studies, the proposed method can detect unauthorized garbage bags even if speculative behavior is not seen.Therefore, by storing events using this method, it is possible to identify unauthorized garbage more easily and analyze garbage trends in that region.To verify our proposed methods, experiments progressed on gradual semi-automatic analysis, data augmentation, and a hybrid deep learning model.The experiment results indicate that the gradual semi-automatic analysis showed a high accuracy of 92.8%, and 90.9% accuracy for the test dataset and reduced time consumption because about 78 million images were all annotated in five days.We also confirmed that the value of F1-Score, a representative factor of class balance through data augmentation, increased by 6.3% from 92.7% to 99%.Finally, the proposed model can achieve 1.5% and 5.6% higher mAP and F1-score, respectively.Our future work will combine the object tracking model and re-identification model with this method to track people with unauthorized garbage bags.

Figure 1 :
Figure 1: The images from recorded video datasets.

Figure 1b .
Figure 1b.In this situation, existing studies are not suitable for CH monitoring systems.To address this issue, we propose a method to identify unauthorized garbage bags through the color classification of garbage bags inspired by the South Korean government's standardized garbage bag policy.In South Korea, each local government designates different colored garbage bags based on the type of waste, facilitating waste disposal.We exploit this system to identify unauthorized garbage events.We summarize the main contributions of this paper as follows:• We propose a gradual semi-automatic annotation method with a pre-trained model for four spot CCTV image datasets to reduce the time consumed for data annotation and update the model.• We increase computational speed&detect meaningful data by reducing unnecessary object detection using post-processing techniques such as IoU Distance.• We use a hybrid deep learning model that performs color classification and object detection to improve accuracy and reduce detection errors.

Figure 2 :
Figure 2: The situations from recorded video datasets .

Figure 5 :
Figure 5: False color detection using object detection model

Table 1 :
Results that show gradual semi-automatic annotation.

Table 3 :
Results for color classification.

Table 4 :
Results for the hybrid model.