A Taxonomy of Methods, Tools, and Approaches for Enabling Collaborative Annotation

Collaborative annotation reduces the time and cost of the annotation task and increases data quality. For knowledge-intensive contexts, collaborative annotation is an interesting approach to combine expert users’ knowledge with domain data. This work was motivated by a use case where expert users’ tacit knowledge associated with externalized data can produce new domain knowledge. That new knowledge can inform AI-based systems and make them more accurate for that specific domain. To identify and understand the collaborative annotation process’ dimensions, especially when the annotation is used to support AI models, we defined the following research question: “What are the methods, tools, and solutions to allow collaborative annotation?”. We did a systematic literature review, using five specific questions to help us answer the main research question. Our findings allowed us to identify different domains that adopt this approach, tools for collaborative annotation, players present in the process, and strategies for collaborative annotation. We summarize the findings in a taxonomy to represent aspects related to the characterization of annotation, the collaboration itself, and user experience strategies to support this process. We conclude the paper suggesting open opportunities for future research and implications for the design of an AI-based system to support experts in knowledge-intensive processes.


INTRODUCTION
In recent years, with the growth in available data, annotating them to extract information has become a significant area of research involving the most varied application domains [16].Annotating data is a time-consuming, costly, and error-prone task when performed by humans [65,85].Taking annotations collaboratively is a way to reduce the time and cost of annotating, in addition to being able to reduce misclassification [97].Thus, collaborative annotation becomes an interesting approach to annotating data.Collaborative annotation (CA) appears in different contexts, such as education, to produce knowledge [62].It has also gained importance with introducing Artificial Intelligence (AI) algorithms such as deep learning that offer promising results in information extraction but need a large volume of annotated data [17].
This survey on CA was motivated by an AI-based system's design process.This AI-based system aims to support use cases of knowledge-intensive processes [29], which considers experts' knowledge [48] central for the process execution and success.This kind of process can be slightly structured, but it considers variation due to uncertainty and no predicted human actions and situations.For industries that have their strategic decision-making processes based on experts' knowledge, the collaboration of people and AI is a interesting approach to build new domain knowledge.The CA is a scenario that we considered for our AI-based system's design.We see the CA approach as a way to build new knowledge in the context of knowledge-intensive processes, not just a way to "feed AI models".
In the literature, we identified papers that presented reviews about a specific type of data to be annotated, such as on image annotation [16,40] and image and video annotation [25].We also found reviews about annotation tools based on the application domain [24,36,62], but we did not identify works that considered CA considering distinct data type and domains.We aimed to identify works related to CA, not just annotation.In this work, we did not seek papers that had the CA for labeling data to feed machine learning (ML) models as a unique objective.For this reason, this research did not consider papers focused on Mechanical Turk or crowd-sourcing to do data labeling without collaboration between users or between users and the system.
The main goal of this research is to categorize and summarize the knowledge currently available in the literature around the field of "collaborative annotation" to: a) identify gaps in current research, b) suggest areas of investigation by providing knowledge to those interested in this research field and c) provide us with references and inspiration to design a CA approach for our AI-based system.The methodology adopted is inspired by the work reported by Kitchenham et al. [47] and is based on a mixture of manual selection and automatic search.As a way to structure, organize and present this survey, we build a conceptual taxonomy regarding dimensions for CA.

RELATED WORKS
Annotation tools have been developed since before using annotation tools for massive data.The educational context is one of the domains where people started to annotate and discuss annotations.For example, in medieval times, scholars annotated in margins and between lines as a forum of knowledge [94].As technology advanced,?? solutions were developed to support the annotation process.The work in [94] presents a tool and a literature review about annotation, pointing to 25 tools for annotation.In this context, annotations are for users to use themselves or to share with others but do not yet present a CA scenario.
In [86], the author reviewed annotation for knowledge management.They present requirements to support this task besides a survey pointing out tools that enable CA, mainly involving the semantic web.We searched for distinct purposes for annotation, not just knowledge management.Considering a more general panorama, in Fort's book [34], focused on CA for reliable natural language processing, she presents a technical and sociological overview of CA, describing the steps of the annotation process, annotation complexity analysis, annotation tools, and how to evaluate the quality of annotations.Fort presents a guide to annotating data for use in Natural Language Processing (NLP) models.Our research focused on more than just this purpose for annotation.
We found papers focused on specific data types, as [25] that surveyed tools for semantic annotation of videos and images.They pointed out the importance of enriching this data with meta-information.They defined the input and output categories, the annotation level (meta-data, granularity, locality), and miscellaneous.In addition, they analyzed image and video annotation tools.The work in [17] reviews visual content-based and users' tag-based image annotation.The authors point out the parameters for image annotation, such as the information that should be analyzed to annotate an image.Also points to methods for image annotation, like taxonomies and ontologies, and to content annotation techniques such as image segmentation and feature extraction (which may involve AI algorithms).Both papers focus on specific data types and do not directly focus on tools and approaches that address CA.
The work in [64] presents a review of 78 tools that allow manual annotation of different data types.The authors defined a set of filters that selected 15 tools for detailed evaluation considering 26 criteria.The authors briefly described these tools pointing out characteristics such as being web-based, where to access, and the type of data they can annotate.In this work, the authors present some tools that allow CA, but it is not a focus, as in our work.
We searched for publications that presented some literature mapping related to CA.We did not identify works directly addressing this subject, considering different application domains and data types, so we conducted our study.We emphasize that our interest was in CA, and it directly influenced the development of our search string.In the next section, we describe how we conducted this study.

METHODOLOGY
This study followed the guidelines for systematic mapping studies proposed by [47].Systematic mapping provides a procedure for identifying and classifying papers published in a given research field.So we first define a research question and the form of the search, then define the exclusion criteria to collect data from the articles of interest.Later, researchers analyze the papers selected and discuss the classification and findings about trends and gaps.

Research question
The motivation for this survey on CA is the design of an AI-based system that aims to support knowledge-intensive processes [29].This kind of process considers the experts' knowledge as key to handle uncertainties with success [48].We believe that, with a CA approach in an AI-based system, we can provide experts with the support to manage experts' knowledge to handle those uncertainties.
To design our CA approach, we looked at the literature considering the Research Question (RQ), "What are the methods, tools, and solutions to allow collaborative annotation?".To guide and help answer this RQ we considered the following Specific Questions (SQ): • SQ1: What approaches, models, and tools are adopted?• SQ2: What are the domain applications?• SQ3: What annotation strategy was adopted?• SQ4: Who are the players in the collaborative annotation?• SQ5: How do annotators collaborate with each other?
These questions helped answer the RQ identifying methods, tools, and annotation strategies adopted, as well as information about the players involved in the CA, the role they assume, the domains where the CA is adopted, and others.

Search process
The search process was conducted in three steps.In the first and second steps, we screened important related venues seeking (i) for mapping studies related, (ii) for works related to CA.Then we performed (iii) an automated search in three important digital libraries using a search string and also collected manually the first 50 papers using our research string on Google Scholar.Based on the keywords extracted from relevant articles we defined our research string as: ("collaborative annotation") AND (artificial intelligence OR machine learning) AND (Human-in-the-loop OR human centered AI) We used this research string in the following digital libraries: ACM Digital Library1 , IEEE Xplore Digital Library2 , Elsevier's Sci-enceDirect 3 , and Google Scholar4 .We restricted the research to the period from January 2010 to January 2023.The search provided 548 results.
In the third and fourth steps, we analyzed the articles to select those that most could help to answer our RQ and the SQ.So we excluded articles following these exclusion criteria: (i) articles focused in Mechanical Turk studies, (ii) articles focused in annotation just to train AI algorithms without collaboration, (iii) articles that do not explain the annotation process, (iv) articles without collaboration strategies to annotate data, (v) articles presenting the same approaches in different levels of development (we selected the newest or most complete version).After using these exclusion criteria, we used the SQ as a guide to extract information from the selected articles.In the end, we selected 91 papers.

Data extraction
In step 4 of selecting articles, we collected data from papers considering the SQ to classify the articles by approaches, models, and tools adopted, the domain context, annotation strategies, and so on.We divided each SQ into parts, i.e., the kind of tool (web-based or desktop), if the solution was a model, and other specializations of SQ.
Each article was analyzed by one researcher, and in case of doubt, it was discussed with other researchers and it was put in the next step to be examined in more details.We added all the papers to an Excel sheet and, at each step, we classified them informing why several were excluded and at which step.To extract data from the 91 selected articles, we filled in the Excel table, collecting more detailed information for each SQ, i.e. in SQ1 we separated into three attributes: model description (if applicable), approach adopted, kind of tool (if applicable).Thus, we collected 13 types of information from each article (target audience, annotation goal, annotation strategy, type of tool, etc.).The following section presents the results after analyzing all the 91 articles, and we developed a taxonomy to summarize and classify the works related to CA.

RESULTS
This section presents in detail the analysis considering the 91 articles selected.We note that in the intervals between 2012 and 2013, between 2015 and 2017, and between 2020 and 2022 we found more papers.However, when analyzing these papers, they have different forms to perform and use annotation.In the first time interval, it appears more in systems for knowledge management allowing people to share their annotation and annotating collaboratively documents [32,35,67,71].In the second period, most of them focuses on the semantic web context, so those annotations can be machine readable [37,45,46,76,84].The articles between 2020 and 2022 are more related to approaches allowing CA between humans with AI support [9,60,88,100].The following sections describe how we answered the SQ.

SQ1: Approaches, models, and tools adopted
Analyzing the papers, we identified that seventy-five (75) present a tool as a result.Web tools are the majority of them.Regarding CA, it coheres because they allow several users to work simultaneously and access annotations from their peers.Figure 1 depicts the tools and the application domain for which they were developed.Most articles do not focus on a specific domain, but in the annotation process [69,102,103] or in general context as for people annotate videos to promote accessibility [100].In 4.2 we will talk in detail about it.These systems usually involve a login to a specific server, but we also find browser plugins, so the user can annotate at any time web pages and access the web system where data are managed [11,21,68,89].Despite this, we identified works focused on desktop tools [18,27].Other works presented a model without a specific tool to support the user or is not clear if they are web-based, desktop, plugin, etc [12,14,37,57,80,82,85,88,100,101,103,104].Part of the papers presented models as the result or part of it.Considering the works where the result is a model without a specific tool, they usually involve active learning algorithms or other AI algorithms as support for the user who annotates the data [45,85,[99][100][101] or is at the center of the annotation process, which the human help to train algorithms [12-14, 33, 43, 51, 66, 82, 88].Papers as [99], [85] and [101] propose a framework to allows the collaboration between humans and AI to annotate per-pixel images, a laborious and time consuming task.The process occurs in turns, so as the end user annotates an image, the model containing AI algorithms also starts to annotate it.
Other papers presented a tool that implements the model developed by the authors.Most of them are web-based.Doumat et al. [32] proposed a model that became the system ARMARIUS, allowing the annotation and curation of annotated data in the cultural heritage domain.It facilitates non-experts' annotation of large corpus and allows archivists to reduce and improve their work.In [52], the authors developed the tool ALVA which uses active learning to facilitate the annotation process and an information organization model witch offers visualizations to help the user annotate and classify texts.Other works in the medical domain present a tool based on a data model or pipeline to improve the annotation and data sharing process that helps in diagnosis decisions as [58,65].
The approach adopted depends on the application domain as well as the goal of the CA.In the medical context, papers mainly focus on image analysis for diagnosis.They use ontologies and Graphical User Interface (GUI) that allow drawing and textual annotations on images [55,58,74].Moreover, there is a tendency to add AI as a support in the pre-annotation process of these data [57,60,77,97].Considering domains such as biomedicine and research, papers focused mainly on approaches that allow annotation on text and specifically aim at retrieving and managing knowledge.The focus is on managing knowledge via structured data models to build knowledge bases that provide access to a shared space with articles, reports, datasets, and expert annotations on specific topics where they can use ontologies and text to annotate [3,10,83].
When the purpose of the CA was for training ML algorithms, the papers focused on adopting techniques like active learning, deep learning, and Support Vector Machine (SVM) to annotate documents [12,82,87,88].These models sometimes presented a tool as a result.They usually leveraged user annotations or selftraining to train their models, and later, users are asked to annotate more data or curate the results to improve the model.More details on how the annotation is performed on the 4.3 section.
Most papers are in the medical domain, especially in image annotation.The works focused on creating a shared space where experts can help each other with diagnoses based on images [58,74,77,93].Several solutions provided a web-based system where experts can annotate images, especially in the WSI format (whole slide image -a large image) [93,95].They can share content with their colleagues to create a base of annotated (enriched) images that can serve either to train AI models, that will pre-annotate other images or to help fellows making decisions or build reports.Furthermore, it can be used in an educational way as a training base for users with a lower level of expertise [56,87].
Considering the linguistic context, most articles focused on ways to construct and manage data collections, making them more accessible considering web semantics.Experts in linguistics compose the target public, but there are solutions where general people start annotation, and then the expert curates it [8,22,52].In the cultural heritage domain, most papers focuses on experts or the general public annotating images as ancient maps, Greek pottery, and manuscripts to construct metadata and transcription of this material, making it machine readable and accessible [31,42,98].The biomedical domain focus in creating shared spaces for experts contribute enriching documents, annotating datasets and making knowledge more accessible.The papers suggest annotation considering semantic web, and dataset annotation in biomedical context, making them machine readable [10,76].Ontologies are adopted to perform semantic annotation [28,76].It allows experts to find relations that could be difficult without the metadata generated from the annotation [27].
In the educational context, most articles focus on annotation as an educative process, allowing students to annotate lectures, texts, and web content.In [21,81], the teachers can use student annotation to assess their understanding of the content considering the amount of annotation performed, the quality of them, and the student's comments on other annotations.In [30] TrACE tool aims at increasing the learning by annotating and assessing annotations from other students.In research context, the annotation role is mainly to support the use of AI algorithms to reduce the weight of human annotation or enrich papers making them machine readable [45,59,83].
Most articles were classified as general context because they did not clearly defined a specific domain.Several are focused on semantic web, allowing annotation of web content [61,68], and corpus creation [16,38].Some articles focused on video or multimedia annotations to enrich the content [50,51,100,103], making them machine readable, providing accessibility and context to it.Some works allowed crowd-source as part of annotation process [2,23].Most papers had general public as target user in annotation.Some papers included AI techniques to simplify or reduce the annotation weight and provide a shared space that allowed users to interact among them, as in the tool NOVA [9].Most papers presented a tool that allowed annotation of multimedia, web content, text and videos.

SQ3: Annotation strategies
Analyzing the tools, we identified some common annotation strategies that can serve as a guide of elements to consider when creating this kind of tool.The articles that presented only a model, did not clarify how the process would be accomplished considering the user interface.However, we considered strategies for how the CA workflow was developed.
We considered the annotation strategies to support user experience in two parts.The first is to support users considering available resources for them to annotate.The second refers to the elements that support the collaboration between the players (participants in CA).Strategies for collaboration support between players (or actors) will be discussed in section 4.5.Considering the resources that support the annotation found in the papers, we subdivided it into annotation input technique, input constraints, and interaction style.Considering the aspects related to how to annotate the data, we identified nine forms, they are: Highlight text: users can highlight part of text.This strategy is used specially in educational context and applied to text data [11,21,89].Free text: introduction of free text and comments on data.It appears in several papers in different domains.It is used mainly to help users understand and share information in natural language with other users to explain annotation.Used to annotate text, image, and video [2,5,9,11,30,31,35,39,42,46,58,71,77,83,100].Tags (or categories): adoption of a system of a set of predefined tags or categories, or users can create new ones, they serve to summarize and category data.Adopted in several domains and applied on different data type [4-8, 11, 16, 35, 54, 55, 65, 82, 85, 89, 90, 93, 96, 98].Speech to text: transform speech into text annotation.It was identified in information retrieval and medical domains applied to text and videos [15,56].Video segmentation: the solution perform video segmentation as an annotation, the segments or segmentation points carry some information or is performed by AI to show to the user a point of interest [56,100].Draw on images or videos: users can draw directly on images and videos, specially applied in medical domain and some general application as when an human and AI work in turns [2, 21, 49, 56-58, 77, 78, 85, 101].Queries: the system identify important information for annotation and generate explicit requests for information from users in the form of queries [41].Annotation using audio: it appeared in one paper to annotate videos of endoscopy surgeries to promote information about the medical procedure [56].Pre-annotation using AI: solutions offers pre-annotated data using Natural Language Processing (NLP) or computer vision algorithms to reduce human annotation.This strategy was adopted to annotate images and text in different domains [7,8,15,16,42,53,59,67,96,96].
Considering the aspects related to the input constraints, we identified the strategies: Hierarchical annotation: involves using tags and categories, but authors do not describe this classification as a formal ontology, promoting more freedom to users.The papers are from different domains and different data types [6,35,55].Using ontologies: formal ontologies from a specific domain or created by authors are used to annotate the data.Almost all articles use ontologies to annotate as in biomedicine, and domains such as research, enterprise, and cultural heritage.It is used mainly to classify texts and multimedia data [4,10,15,19,20,22,27,28,37,41,46,50,59,67,73,76,78,83,98,104].In addition, when inserting text and drawings as annotations into documents, users could input free text and freehand drawings or polygons, as described above.
Comparison by similarity: solutions provide a way to compare similar images, multimedia, and sounds to help humans annotate data [31,41,54,63,77,104]. Active learning: approach used to request data annotation from the human according to the queue policy defined by the authors (data with a low degree of certain, or data with a high degree of certain).This strategy was applied in different domains and data types [12-14, 46, 51, 70, 79, 82, 88].
Considering the elements that support the collaboration between the players, we divided the elements in two groups, those related to access and those related to the collaboration task.Bellow we pointed the ones related to access: Shared view of documents: the users can see annotations from fellows in the same shared space to help them in the annotation task.
In some cases, they can only visualize annotations from another user; in others can edit or annotate in the same document.It appears in several domains and is related to different data types, especially text [1, 11, 20, 23, 74-76, 78, 79, 102].Roles and privacy control: users can assume different roles that provide access in distinct levels, it can be combined with privacy control, so an admin or super users can edit annotation of others, create new elements in an ontology, curate annotation, and so on.This role is usually assigned to experts in work involving experts and non-experts or to the person with the most expertise in a subject.The privacy control is applied to allow users to use private spaces, even in shared projects/data, and make personal annotations they can share afterward with colleagues [5,16,22,31,44,58,76,81,84,93,97].Dashboard: provide analytic data about shared endeavor like a project, different visualizations, and summaries [52,69,80,90].Timeline implementation, user tracing, and life cycle: it helps users providing an history about how they got to such a point and what decisions and data they accessed.This can help them retrace their steps to understand why certain decisions were made, the system can give them tips or recommendations based on it, and also help users with administrative roles to track the work of other users [31, 32, 38, 67-69, 76, 91, 93, 104].Exposing users to previous annotations and pre annotated data: these annotations can be created by humans or the system.It helps users to see how data should be annotated and facilitates annotation [2,3,7,15,16,38,42,50,53,59,67,93,95,98].
Related to the collaboration task, we identified: Discussion spaces (chat, forum, notes): some solutions provide spaces where users can discuss why data were annotated in a way, discuss doubts, and so on.Users can interact synchronously or asynchronously [4,5,21,22,54,80,81]. Consolidation mode (system or humans): this space is used to adjudicate or consolidate the annotation.In some articles, this is part of the annotation workflow as the final step to finish annotation or as a continuous process [16,22,37,38,60,65,80].People can perform it or the system consolidates the results and a expert user can approve it [26].Retrieval options combined with annotation: the system provides annotation suggestions based on previous annotated data or retrieve previous annotations [27,31,37,50,55,63,70,77]. Calculate the agreement inter and intra-user: the first relates to the agreement level of annotations from a group of annotators considering a data set.The intra-agreement refers to the level of agreement of a specific annotator considering the same data he annotated in distinct moments.It provides quality metrics for the annotation.These metrics allow the system managers to decide to take more annotation turns or quantify the quality of annotations [1,38,46,69].
In this SQ, we considered these elements based on tools presented in the papers.The aspects related to other strategies related to precisely how the solutions propose collaboration are presented in 4.5.

SQ4: Players in the collaborative annotation
We classified the collaboration between actors or players into three types: between users, between the system and the user, and between multiple users and the system.Considering the user collaboration, the systems support the process without promoting the annotation with pre-annotations, suggestions, etc, just summarizing data and administrative resources to manage annotation process.In this case, users collaborate to make the data annotation.Works in the educational context are mainly in this group because annotation is part of an educational process [5,30,35,71,81,89].Several papers in medical context also allowed users to collaborate because they use the annotations to compare with others from colleagues to help in diagnosis, specially considering images.The image annotation of medical content is laborious and need experts to perform it, so creating shared spaces with expert annotations can help even in an educational context [55,56,58,60,65,74,78,93,97].Several works were focused on enabling CA between users in various contexts beyond education and medicine, such as [1-6, 10, 18, 19, 21, 23, 26-28, 30, 32, 39, 45, 46, 49, 59, 63, 67-69, 73, 75, 80, 81, 84, 91, 92, 98, 103].
We considered the collaboration between multiple users and the system when users could collaborate with their counterparts and the system that could suggest annotations, pre-annotate data, and support the users.In [44], authors proposed a hybrid platform where pathologists and algorithms can cooperate in building a rich labeled dataset that can be used to train or improve image analysis, algorithms can help the pathologists in the review process.Several works are in this group [8,9,11,15,16,31,44,46,50,51,54,57,61,66,72,83,87,90,95].
In addition, we consider when the players participated in the collaboration process.Collaboration between users occurs mainly at three points: during the annotation task, querying data, and curating annotations.The collaboration between users and system, occurs when the user actively starts collaborating with the system, mainly in the parameter tuning phase, training, and on daily annotation tasks.We classify collaboration between the system and users as when the system proactively provides the user with suggestions and pre-annotates data, as in [54] where the system identifies segments of time series charts that can indicate failure points and suggest users to annotate it and then update its model.

SQ5: Collaboration process between annotators
Considering other collaboration strategies less focused on user experience, we divided it into four categories: access, user collaboration, AI collaboration, and decision-making.Related to access, we identified: Users see annotations from others: this strategy was adopted in most articles, especially considering the educational and biomedical domains.Users could see annotations from others and could collaborate in their annotation.
Users contribute in the same group/workspace: creating these spaces promotes collaboration since users are focused on a specific set of documents and can communicate and access content on it.
Considering strategies for user collaboration, we identified: Users contribute in the same document: some works allowed that users annotate documents in a project as separate annotations.It can accelerate annotation and integrate users as a team, as in [1].Open communication between users: it allows users to freely communicate to understand others annotations, improve general annotations, clarify doubts, discuss issues, and etc. Pre-annotate data: it considers using pre-annotated data from users or AI.Users can consult material annotated by others to help them in their annotation, or as in [98], non-experts annotated large data amount using crowdsourcing, and the experts will annotate less data, but mainly curate it.Algorithms also can provide preannotated data according to the people' use of the system.
Considering strategies for AI collaboration, we identified the following two strategies.The strategy "pre-annotation data" also considers AI collaboration, as described above.Users train IA: it is performed directly or not according to when the users and AI interact with each other.User receives suggestions from system: besides receiving preannotated data, the user can receive tips from the system to annotate other data or suggests tags and categories based on annotations from other users or relationships identified by the system itself.
Considering strategies for decision-making, we identified: User activities are controlled by administrators: several solutions manage the collaboration using administrative roles, so they assign tasks to other users, can check notes, and check inter and intra-agreement.Adjudication between users: allows users to use voting systems, free text, tags, and other artifices to resolve points of disagreement between annotations.This process can be done in groups or by the administrator, who can decide by evaluating the annotations himself or by consulting the adjudication made by the system.This task can occur in annotation turns or at the end of the processes giving the project an "finished" status.
We also considered the management of annotations to identify elements that allow the evolution or structuring of the life cycle of a CA process.For this, we divided this annotation management into two groups, performed by the user and by the user with the system.Considering the management by users: Shared mechanisms for discussion: theses mechanisms (as chats and forums) allows users evolve the annotation process by solving disagreement points and clearing doubts.Annotation status: it allows users quickly identify if they have open issues and the status of the process.It helps users better manage time and project progress.Annotation turns: defining turns of annotation can improve inter and the intra-agreement.As algorithms, humans can improve performance by training.Consolidation mode: it can define check points to users discuss on the process.
Annotation curation by roles/expertise: defining roles or expertise levels enables more experienced users to manage the process.Since they are more experienced, they can resolve ties, add new ontology terms, and curate the annotation done by other users, improving the quality of the annotated data.Timeline: it allows users, especially administrators, to manage the process considering time, and paths followed by annotators.
Considering the management by users and the system, we identified: User and AI turn: the annotation evolution can be measured according to the turns took among user and system as in [85] and at each turn, the user can define if continues or not.AI improvement by user interaction: this evolution occurs as much the user interacts with the system and it promote system training that help to give to the user better suggestions, and preannotated data.Database enrichment: the richer the datasets, the more meaningful they are and can be used by algorithms to extract data relations and information that is not clear.

Mapping results
Combining the results from the five SQ that helped answer our RQ, we could obtain additional insights into the findings.The goal was to relate the answers to each SQ and possibly identify gaps and opportunities.Below we have highlighted the relations that we thought were most interesting.
In 4.1, we combined information identifying tools and their domain (Figure 1).Reinforce that the medical domain has the highest number of works.This is an area where the data are highly private and difficult to collect and annotate because it requires experts [88].
Education, linguistics, cultural heritage, and biomedicine were the other contexts where we identified the most papers.We classified most articles as general purpose because there was no specific application domain.However, several of them were related to data annotation to enrich documents and make them machine-readable, as is the case of [6,103].Works such as [37,92] were focused on the annotation of videos to make them context-loaded, and in [100], the authors intended to promote video accessibility for visually impaired people.
We summarized the articles according the type of data they annotate.Most solutions are focused on text annotation (36 works), and almost all application domains adopted an approach to support it.Image annotation is the second that appears the most (27 works).It is adopted in most medical domain papers (14).Video annotation (12 papers) appears in some contexts.The authors deal with audio annotation only in 2 papers, [82], and [63].Solutions that support multimedia annotation (14 papers) are more robust in features and mainly related to solutions for general application domain.These solutions aim to annotate different data types and use more complete GUI with sliders, permission to draw on images and videos, or add textual notes to the data.The CoUX tool [80] allows user experience (UX) specialists to annotate data such as video recordings of think-aloud sessions with users and can use tags for classification.It also has a shared environment to discuss with colleagues and access project information.
We also identified whether the authors validated or evaluated the tools and approaches.It is interesting to know what kind of evaluations were performed to analyze the quality of the work.We identified nine (9) types of evaluations.Several papers did more than one type of assessment, considering users and system performance, comparisons, and others.Twenty-nine papers (29) did not evaluate the quality of their solution.Most of the works carried out formative evaluations, testing the quality metrics of the algorithms adopted and tool performance by the authors (27).We identified twenty (24) user studies, evaluations with people using the proposed solution and evaluating its use ([2, 9, 21, 22, 35, 41, 49, 52, 54, 55, 59, 68, 80, 82, 83, 85, 89, 92, 97, 100, 102-104]), and six (6) pilot studies, which help as a user test before finishing the tool ( [30,43,44,60,75,77]). Five (5) papers presented use cases ( [16,18,54,70,90]), two (2) did profof-concept ( [37,93]), and one (1) presented a quasi-study( [81]).The authors also presented comparisons of their solutions with the state of the art, mainly related to algorithms that perform annotation.In addition, usability tests also were used to evaluate the tools.
The SQ answers helped us think about which approaches and tools support the CA.One of the exclusion criteria was that the work was more than just a tool for multiple people to annotate a corpus or Mechanical Turk solutions.We focused on works where CA involved more engagement among the actors collaborating in annotating the data.In the next section, we discuss the results and present a taxonomy based on the answers we obtained from our data analysis.

DISCUSSION
In this section, we present and discuss a taxonomy that characterizes the CA dimensions identified with the analysis of the articles, some implications and the study limitations.

Taxonomy
To summarize our findings, we built a taxonomy from the results of the article analysis.We classified CA into three significant categories: Annotation characterization, Collaboration, and User Experience (UX) strategies 5 .In the first category, we considered the type of data annotated, the purpose of the annotation, the players involved, the kind of solution, the domain context, and the users.So, we mapped the questions related to the context of the annotations, to whom they are intended, and who participates in them.In the second category, we considered when the annotation occurred, the strategies that facilitate collaboration, and the management of the CA process.Finally, in the UX strategies category, we consider strategies that support users to collaborate in annotating data.For space reasons, we will not elaborate on the third level of the taxonomy, which has already been explained in the SQs.
5.1.1Annotation Categorization.Figure 2 presents this category and its subdivisions.We identified the type of data each work aimed to annotate, and observed that most works deal with text annotation in different contexts.One of the purposes that most articles deals with is promoting the enrichment of data to be used in systems that perform some machine processing (i.e. AI systems, which are system with any AI technology in their composition.).This kind of data can also be used to manage and develop more knowledge.Labeling data to inform AI systems is the second purpose with the most papers.These kind of software allows AI systems to evolve and help annotate data for humans, reducing the annotation work that can be expensive and time-consuming.

Figure 2: Annotation characterization subcategories
We also define players as a subcategory that indicates who participates in the annotation.We identified that CA can occur between (human) users, between a user and an AI system, or between multiple users and AI system.In the first case, users can collaborate in different ways without considering AI algorithms.The collaboration between users and AI system often involves making annotations in turns between them or the user providing annotations when the AI requests it.In the case between multiple users and AI, there is usually a system where users can collaborate in addition to having support from AI algorithms suggesting annotations and providing pre-annotated data.
We identified that the solutions presented are models or templates (which can be part of the solution), web-based tools, browser extensions, a plugin for Microsoft Word, and a desktop tool.Users were often domain experts.However, several papers had non-experts as users.Some works combined expert and non-expert users, with the first group generally validating annotations or with system administration privileges.
5.1.2Collaboration. Figure 3 presents this category and its subdivisions.The first subcategory defines when the collaboration occurs.We identified three moments: (i) when users interact, (ii) when the AI interacts with the user, and (iii) when the user interacts with the AI.In the first case, collaboration occurs during annotation, e. g. analyzing annotations from co-workers, comments, tags, and other information provided by colleagues, querying data, and consolidating or adjudicating annotations.The user collaborates with the AI by performing parameter tuning and when directly responding to annotation requests from the AI for training it.The AI collaborates with the user when it suggests annotations or data that the user could annotate and provides pre-annotated data.
The second subcategory defines the strategies that allow collaboration between players.We identified several possibilities of consulting annotations from colleagues as a reference (appears in most tools) and allowing users to collaborate on the same document, even with different access permissions.Defining groups or In the annotation management subcategory, we identified strategies for organizing a workflow for CA.Some solutions offer artifices such as mechanisms for discussion and consolidation of the annotations.Features for the curation of annotations according to the annotator's role (level of expertise or administrator) is an strategy identified in the papers.Definition of a status or annotation turns (which can be between users or user and AI).The data are enriched with collaboration and have information that identifies the users' path to that point or the level of training of AI algorithms.The second subcategory points to resources that allow annotation and what the user interface offers to perform the task.The tools that allow annotation of multimedia, videos, and images are the ones that provide most resources for annotation, such as sliders to categorize elements, freehand draws and draw geometric shapes, and notes with free text comments, among others.Text annotation tools offer categories, free texts, ontologies, and tags as ways to support annotation.In cases of audio annotation, tools especially provide category selectors.We found other resources for annotating data, such as audio annotation (for educational purposes), presenting similar elements or objects to facilitate annotation by comparison, and others, as shown in figure 4.

Implications
The findings summarized in the taxonomy, and results have implications for the design of future approaches and tools to support CA and on our research directions.
First, we can mention the users involved in this task.Several authors present solutions that involve more than one type of user.Sometimes, the non-expert user annotates the data that will receive an evaluation by an expert that will correct the annotations and certify its quality [96,98].Considering this aspect, allowing users to assume roles with privacy control, access, and responsibilities is essential for this type of tool.Shared spaces received different names in the works (group, project, and workspace).It is fundamental when the users are domain experts who generally seek the CA process to extend their knowledge.
Second, it is desirable spaces to visualize the annotation progress (status), besides allowing the users to express themselves freely and use different resources to accomplish the annotation (drawings, ontologies, tags).The resources available are directly connected to the kind of data and the application domain of the data.Most works in medical domain involve annotating large WSI images, which require annotations such as drawings in regions (regions of interest), adding tags related to parts of the image, and text comments that justify the annotations performed.As a result of the annotation, these works often seek to expand the amount of annotated data to assist other specialists in making diagnoses.These data can be used to feed AI algorithms that can pre-annotate data since it is a laborious task.
Also related to the users, the level of freedom of annotation is also directly related to the expertise of the annotator.More freedom is given to expert users when one of the purposes of annotation is to promote learning.Works involving general users generally allow them to annotate using a set of categories, whether or not they are part of an ontology.The focus is more on the annotation that will be performed by several people, thus increasing the level of confidence in the data.In these works, the authors do not focus on providing many communication resources among users, only those with administrative roles can carry out more interactions with other users, more that seeing annotations from others.
When one of the purposes of annotation is to feed AI models, authors mainly use specific ontologies and categories to feed these models.In the case of image annotation, regions of interest and categories are the main information these algorithms use.Formal ontologies and other categorizations are widely adopted in this context as they allow data to become machine-readable.It allows them to serve as input for recommendations, information retrieval, and other AI algorithms.Reducing the manual effort of annotating is one of the most common goals in works with the purpose of training AI models.
Several works use the active learning strategy for the AI model to request annotations from users.Recently, this strategy has received attention for reducing the annotation burden in the process.However, there is still a lack of studies that point out how the AI requests these data from the user and presents the annotation made by it.Some works cite NLP and deep learning algorithms that run under the application and can pre-annotate data.But it is unclear whether users have any control over the data that feeds these models.If every annotation is used or only the ones that fit some quality criteria.None of the works reported how the user accesses the information of the AI models embedded in the annotation tool or if they must directly inform that they want to update the model based on the new annotated data.
The previous observations suggest three research directions: (i) promote interfaces that allow users to manage the data that will serve as input to feed AI models.(ii) Development of interfaces that allow users and AI models to perform the CA process in turns and incrementally.It is mainly related to strategies involving active learning that do not inform how models request data from users and how the annotation workflow occurs.(iii) Carrying out studies that assess both the quality of annotation interfaces and collaboration resources, identifying best practices.

Limitations
Some limitations affect our study.The first is the delimitation of our research between January 2010 and January 2023.We chose 2010 because we wanted to observe the panorama of the last decade.On the date of paper submission, there are likely other published papers that could be of interest to this study.In addition, as our search strategy captured 548 papers, we set several exclusion criteria to identify viable works to analyze and may have excluded original ideas in thesis and dissertations.The digital libraries and the format of our search string that focused on articles with "collaborative annotation" in their text may have prevented us from identifying relevant articles.Expanding our search string and considering other digital libraries, such as Wiley Interscience, could mitigate these limitations.Still, we chose to limit the scope of the search due to the time and specificity of our research to develop our AI-based system.
Inaccuracy in data extraction and misclassification of articles may occur in our results since the information was extracted by one researcher and discussed in with others researchers with higher experience level.The researchers weekly discussed inclusion and exclusion criteria, doubts, and finds about the articles.Later, researchers analyze the papers selected and discuss the classification and findings about trends and gaps.Performing discussions throughout the systematic mapping process was a way to mitigate this limitation.Following the usual aim of systematic mapping studies, we categorized the selected articles according to five questions and identified relevant studies by conducting a qualitative evaluation.

CONCLUSIONS
Collaborative annotation is an approach that has grown in recent years, given the possibility of annotating data faster and more accurately.So information can be extracted from these data, for example, using information retrieval and recommendation algorithms.We did not find any work that pointed out what solutions that aim to support CA offer users and what type of solution is developed for this purpose.For this reason, we carried out this SLR to understand and summarize what has been developed and can be used in new tools and solutions for this purpose.
We identified 548 works using the search string we developed, considering four digital libraries (ACM, IEEE, Elsevier SD, Google Scholar).Using our exclusion criteria, we selected 91 articles.We analyzed the articles based on five SQ derived from our RQ and obtained as a result, the approaches, tools, users involved, strategies for annotation, and collaboration in the annotation.To summarize our findings, we created a taxonomy divided into three aspects related to: annotation characterization, collaboration, and UX strategies that support both what is related to annotation and the collaboration.
The motivation for this survey was the design of an AI-based system that aims to support use cases of knowledge-intensive processes.The results indicate interesting findings on models, tools and approaches to inform our AI-based system regarding CA.We identified two topics for CA research agenda that need more investigation: (1) CA as more than a task to "feed AI models" -We believe that a CA approach is a way to build and share new knowledge in the context of knowledge-intensive processes, not just a way to "feed AI models".(2) People are the main players in CA -The collaboration of people and AI is one perspective for an AI-based system, but the collaborative aspects among people is also very important.
As experts create and share new knowledge among them, the AI-based system needs to be in that loop.
We plan to explore those research topics in future studies.In one study, we will compare papers proposals and results that involve user studies and identify aspects that make the CA process difficult or easier.In another study, we plan to assess how to improve collaboration between human and AI models to reduce the amount of data that needs to be annotated.

Figure 1 :
Figure 1: Tools according to domain application

Figure 4
presents this category and its subdivisions.This category points to elements implemented in the interfaces of tools that support CA.The first subcategory lists UX strategies to support collaboration.In the results, we talk about them in 4.3, as a shared view of documents, discussion spaces, implementation of roles and privacy control, and others.