skip to main content
10.1145/3240508.3241916acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

Published: 15 October 2018 Publication History

Abstract

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles:(I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions.The edit operations are also assisted by the model.(II) Full image annotation in a single pass. As opposed to performing a series of small annotation tasks in isolation [51,68], we propose a unified interface for full image annotation in a single pass.(III) Empower the annotator.We empower the annotator to choose what to annotate and in which order. This enables concentrating on what the ma-chine does not already know, i.e. putting human effort only on the errors it made. This helps using the annotation budget effectively.
Through extensive experiments on the COCO+Stuff dataset [11,51], we demonstrate that Fluid Annotation leads to accurate an-notations very efficiently, taking 3x less annotation time than the popular LabelMe interface [70].

References

[1]
D. Acuna, H. Ling, A. Kar, and S. Fidler. 2018. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN
[2]
[3]
P. Arbeláez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik. 2014. Multiscale Combinatorial Grouping. In CVPR .
[4]
A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. ECCV .
[5]
S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material Recognition in the Wild with the Materials in Context Database. In CVPR .
[6]
T. Berg and D.A. Forsyth. 2006. Animals on the web. In CVPR .
[7]
H. Bilen and A. Vedaldi. 2016. Weakly Supervised Deep Detection Networks. In CVPR .
[8]
Arijit Biswas and Devi Parikh. 2013. Simultaneous active learning of classifiers & attributes via relative feedback. In CVPR .
[9]
Y. Boykov and M. P. Jolly. 2001. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images. In ICCV .
[10]
Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual recognition with humans in the loop. In ECCV .
[11]
H. Caesar, J.R.R. Uijlings, and V. Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR .
[12]
J. Carreira and C. Sminchisescu. 2010. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR .
[13]
L. Castrejón, K. Kundu, R. Urtasun, and S. Fidler. 2017. Annotating object instances with a polygon-rnn. In CVPR .
[14]
L.-C. Chen, A. Hermans, F. Schroff G. Papandreou, P. Wang, and H. Adam. 2017. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. ArXiv (2017).
[15]
L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. on PAMI (2018).
[16]
R.G. Cinbis, J. Verbeek, and C. Schmid. 2014. Multi-fold MIL Training for Weakly Supervised Object Localization. In CVPR .
[17]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. CVPR (2016).
[18]
J. Dai, K. He, Y. Li, S. Ren, and J. Sun. 2016. Instance-sensitive Fully Convolutional Networks. In ECCV .
[19]
Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S. Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable Multi-label Annotation. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, 3099--3102.
[20]
T. Deselaers, B. Alexe, and V. Ferrari. 2010. Localizing Objects while Learning Their Appearance. In ECCV .
[21]
I. Endres and D. Hoiem. 2014. Category-Independent Object Proposals with Diverse Ranking. IEEE Trans. on PAMI, Vol. 36, 2 (2014), 222--234.
[22]
M. Everingham, S. Eslami, L. van Gool, C. Williams, J. Winn, and A. Zisserman. 2015. The PASCAL Visual Object Classes Challenge: A Retrospective. IJCV (2015).
[23]
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. 2010. Object Detection with Discriminatively Trained Part Based Models. IEEE Trans. on PAMI, Vol. 32, 9 (2010).
[24]
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. 2010. Learning Object Categories From Internet Image Searches. In Proceedings of the IEEE.
[25]
Y. Freund and R.E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences (1997).
[26]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR .
[27]
Greg Griffin, Alex Holub, and Pietro Perona. 2007. The Caltech-256. Technical Report. Caltech.
[28]
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. 2014. Simultaneous Detection and Segmentation. In ECCV .
[29]
M. Haußmann, F.A. Hamprecht, and M. Kandemir. 2017. Variational Bayesian Multiple Instance Learning with Gaussian Processes. In CVPR .
[30]
K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In ICCV .
[31]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. CVPR .
[32]
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR .
[33]
M. Huh, P. Agrawal, and A.A. Efros. 2016. What makes ImageNet good for transfer learning? NIPS LSCVS workshop .
[34]
S. Jain and K. Grauman. 2016. Click Carving: Segmenting Objects in Video with Point Clicks. In Proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing .
[35]
Suyog Dutt Jain and Kristen Grauman. 2013. Predicting sufficient annotation strength for interactive foreground segmentation. In ICCV .
[36]
B. Jin, M.V. Ortiz-Segovia, and S. Süsstrunk. 2017. Webly supervised semantic segmentation. In CVPR .
[37]
Ajay J Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In CVPR .
[38]
V. Kantorov, M. Oquab, M. Cho, and I. Laptev. 2010. ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization. In ECCV .
[39]
Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. 2007. Active learning with gaussian processes for object categorization. In ICCV .
[40]
A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele. 2017. Simple does it: Weakly supervised instance and semantic segmentation. In CVPR .
[41]
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár. 2018. Panoptic Segmentation. In ArXiv.
[42]
A. Kolesnikov and C.H. Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV .
[43]
K. Konyushkova, J.R.R. Uijlings, C. Lampert, and V. Ferrari. 2018. Learning Intelligent Dialogs for Bounding Box Annotation. In CVPR .
[44]
Adriana Kovashka, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2011. Actively selecting annotations among objects and attributes. In ICCV .
[45]
I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, and K. Murphy. 2017. OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storage.googleapis.com/openimages/web/index.html (2017).
[46]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS .
[47]
A. Li, A. Jabri, A. Joulin, and L. van der Maaten. 2017. Learning Visual N-Grams from Web Data. ICCV .
[48]
X. Li, L. Chen, L. Zhang, F. Lin, and W-Y. Ma. 2006. Image Annotation by Large-scale Content-based Image Retrieval. In ACM Multimedia .
[49]
J.H. Liew, Y. Wei, W. Xiong, S-H. Ong, and J. Feng. 2017. Regional interactive image segmentation networks. In ICCV .
[50]
D. Lin, J. Dai, J. Jia, K. He, and J. Sun. 2016. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation. In CVPR .
[51]
T-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C.L. Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV .
[52]
W. Liu, A. Rabinovich, and A.C. Berg. 2016. ParseNet: Looking Wider to See Better. In ICLR workshop .
[53]
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In CVPR .
[54]
D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In ArXiv .
[55]
K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool. 2018. Deep Extreme Cut: From Extreme Points to Object Segmentation. In CVPR .
[56]
Pascal Mettes, Jan C van Gemert, and Cees GM Snoek. 2016. Spot On: Action Localization from Pointly-Supervised Proposals. In ECCV .
[57]
R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. 2014. The role of context for object detection and semantic segmentation in the wild. In CVPR .
[58]
R. Mottaghi, S. Fidler, J. Yao, R. Urtasun, and D. Parikh. 2013. Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs. In CVPR . 3143--3150.
[59]
N. S. Nagaraja, F. R. Schmidt, and T. Brox. 2015. Video Segmentation with Just a Few Strokes. In ICCV .
[60]
Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017a. Extreme clicking for efficient object annotation. In ICCV .
[61]
Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017b. Training object class detectors with click supervision. In CVPR .
[62]
D. P. Papadopoulos, Jasper R. R. Uijlings, F. Keller, and V. Ferrari. 2016. We don't need no bounding-boxes: Training object class detectors using only human verification. In CVPR .
[63]
Amar Parkash and Devi Parikh. 2012. Attributes for classifier feedback. In ECCV .
[64]
D. Pathak, P. Kr"ahenbuhl, and T. Darrell. 2015. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV .
[65]
Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, and Hong-Jiang Zhang. 2008. Two-dimensional active learning for image classification. In CVPR .
[66]
S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS .
[67]
C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut: interactive foreground extraction using iterated graph cuts. SIGGRAPH, Vol. 23, 3 (2004), 309--314.
[68]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei. 2015a. ImageNet Large Scale Visual Recognition Challenge. IJCV (2015).
[69]
O. Russakovsky, L-J. Li, and L. Fei-Fei. 2015b. Best of both worlds: human-machine collaboration for object annotation. In CVPR .
[70]
B. Russel and A. Torralba. 2008. LabelMe: a database and web-based tool for image annotation. IJCV, Vol. 77, 1--3 (2008), 157--173.
[71]
B. C. Russell, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. IJCV (2008).
[72]
F. Schroff, D. Kalenichenko, and J. Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR .
[73]
J. Shotton, J. Winn, C. Rother, and A. Criminisi. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context. IJCV, Vol. 81, 1 (2009), 2--23.
[74]
A. Shrivastava, A. Gupta, and R. Girshick. 2016. Training region-based object detectors with online hard example mining. In CVPR .
[75]
Behjat Siddiquie and Abhinav Gupta. 2010. Beyond active noun tagging: Modeling contextual interactions for multi-class active learning. In CVPR .
[76]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .
[77]
H. Su, J. Deng, and L. Fei-Fei. 2012. Crowdsourcing annotations for visual object detection. In AAAI Human Computation Workshop .
[78]
C. Sun, A. Shrivastava, S. Singh, and A. Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV .
[79]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI .
[80]
Joseph Tighe and Svetlana Lazebnik. 2013. Superparsing - Scalable Nonparametric Image Parsing with Superpixels. IJCV, Vol. 101, 2 (2013), 329--349.
[81]
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. IJCV (2013).
[82]
Sudheendra Vijayanarasimhan and Kristen Grauman. 2008. Multi-Level Active Prediction of Useful Image Annotations for Recognition. In NIPS .
[83]
Sudheendra Vijayanarasimhan and Kristen Grauman. 2009. What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR .
[84]
Sudheendra Vijayanarasimhan and Kristen Grauman. 2014. Large-scale live active learning: Training object detectors with crawled data and crowds. IJCV, Vol. 108, 1--2 (2014), 97--114.
[85]
Catherine Wah, Grant Van Horn, Steve Branson, Subhrajyoti Maji, Pietro Perona, and Serge Belongie. 2014. Similarity comparisons for interactive fine-grained categorization. In CVPR .
[86]
T. Wang, B. Han, and J. Collomosse. 2014. TouchCut: Fast image and video segmentation using single-touch interaction. CVIU (2014).
[87]
Z. Wu, C. Shen, and A. van den Hengel. 2016. Bridging Category-level and Instance-level Semantic Image Segmentation. ArXiv (2016).
[88]
J. Xiao, K. Ehinger, J. Hays, A. Torralba, and A. Oliva. 2014. SUN Database: Exploring a Large Collection of Scene Categories. IJCV (2014), 1--20.
[89]
J. Xu, A. G. Schwing, and R. Urtasun. 2015. Learning to Segment Under Various Forms of Weak Supervision. In CVPR .
[90]
N. Xu, B. Price, S. Cohen, J. Yang, and T.S. Huang. 2016. Deep interactive object selection. In CVPR .
[91]
Angela Yao, Juergen Gall, Christian Leistner, and Luc Van Gool. 2012. Interactive object detection. In CVPR .
[92]
B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. 2017. Scene Parsing through ADE20K Dataset. In CVPR .
[93]
Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao. 2017. Soft Proposal Networks for Weakly Supervised Object Localization. In ICCV .

Cited By

View all
  • (2024)A Semi-Automatic Magnetic Resonance Imaging Annotation Algorithm Based on Semi-Weakly Supervised LearningSensors10.3390/s2412389324:12(3893)Online publication date: 16-Jun-2024
  • (2024)Integrating Crowd and Machine Learning in an Intelligent Interface: A Case Study of Oil Spill Detection in Satellite ImagesProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656680(1-9)Online publication date: 3-Jun-2024
  • (2024)Understanding Novice's Annotation Process For 3D Semantic Segmentation Task With Human-In-The-LoopProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645150(444-454)Online publication date: 18-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Check for updates

Author Tags

  1. computer vision
  2. human-machine collaboration
  3. image annotation

Qualifiers

  • Research-article

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)5
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Semi-Automatic Magnetic Resonance Imaging Annotation Algorithm Based on Semi-Weakly Supervised LearningSensors10.3390/s2412389324:12(3893)Online publication date: 16-Jun-2024
  • (2024)Integrating Crowd and Machine Learning in an Intelligent Interface: A Case Study of Oil Spill Detection in Satellite ImagesProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656680(1-9)Online publication date: 3-Jun-2024
  • (2024)Understanding Novice's Annotation Process For 3D Semantic Segmentation Task With Human-In-The-LoopProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645150(444-454)Online publication date: 18-Mar-2024
  • (2024)Human-LLM Collaborative Annotation Through Effective Verification of LLM LabelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641960(1-21)Online publication date: 11-May-2024
  • (2024)CataAnno: An Ancient Catalog Annotator for Annotation Cleaning by RecommendationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637931:1(404-414)Online publication date: 16-Sep-2024
  • (2024)Dynamic Interaction Dilation for Interactive Human ParsingIEEE Transactions on Multimedia10.1109/TMM.2023.326297326(178-189)Online publication date: 2024
  • (2023)Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model PredictionsACM Transactions on Interactive Intelligent Systems10.1145/3594552Online publication date: 17-May-2023
  • (2023)KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3DIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317950745:3(3292-3310)Online publication date: 1-Mar-2023
  • (2023)Clicking Matters: Towards Interactive Human ParsingIEEE Transactions on Multimedia10.1109/TMM.2022.315681225(3190-3203)Online publication date: 2023
  • (2023)Compensation Learning in Semantic Segmentation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00329(3267-3278)Online publication date: Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media