Automating Research Data Management Using Machine-Actionable Data Management Plans

Many research funders mandate researchers to create and maintain data management plans (DMPs) for research projects that describe how research data is managed to ensure its reusability. A DMP, being a static textual document, is difficult to act upon and can quickly become obsolete and impractical to maintain. A new generation of machine-actionable DMPs (maDMPs) was therefore proposed by the Research Data Alliance to enable automated integration of information and updates. maDMPs open up a variety of use cases enabling interoperability of research systems and automation of data management tasks. In this article, we describe a system for machine-actionable data management planning in an institutional context. We identify common use cases within research that can be automated to benefit from machine-actionability of DMPs. We propose a reference architecture of an maDMP support system that can be embedded into an institutional research data management infrastructure. The system semi-automates creation and maintenance of DMPs, and thus eases the burden for the stakeholders responsible for various DMP elements. We evaluate the proposed system in a case study conducted at the largest technical university in Austria and quantify to what extent the DMP templates provided by the European Commission and a national funding body can be pre-filled. The proof-of-concept implementation shows that maDMP workflows can be semi-automated, thus workload on involved parties can be reduced and quality of information increased. The results are especially relevant to decision makers and infrastructure operators who want to design information systems in a systematic way that can utilize the full potential of maDMPs.

Automating Research Data Management Using Machine-Actionable DMPs 18:3 To sum up, a DMP in its current form does not reach its full potential as a satisfactory, integral part of research, which enables the flow of information and the automation of data management processes from which many stakeholders can benefit.
The Research Data Alliance (RDA), which is an international organization promoting datasharing and data-driven research, identified the limitations of current DMPs and took steps toward "active" DMPs [43]. A standardized machine-actionable (meta-)data model [14,19,39] for DMPs is a prerequisite for DMPs to become "active" and to enable the exchange of DMP information between research systems and the automation of RDM tasks throughout the research data lifecycle. The RDA DMP Common Standards working group produced an official recommendation [29] that describes an application profile for machine-actionable DMPs (maDMPs). Thus, the RDA established a common way to model information that is typically described in DMPs.
However, to fully realize the potential of maDMPs as a way to exchange and act on information about data used and produced by researchers, all stakeholders involved in RDM must be connected by information systems that manage and exchange information in an automated way. For that purpose, services that provide structured machine-actionable information, which can be fed into a DMP, and services that can consume this information and act upon it are required.
In this article, we design a domain-agnostic system that utilizes the RDA's recommendation on maDMPs. We explore the feasibility of automation of typical data management processes using maDMPs in an institutional context, such as a research institution or university with its systems, services, and stakeholders. This involves (i) identification of relevant stakeholders at a research institution and related organizations (e.g., funders) and their requirements for a machine-actionable data management planning support system, (ii) development and description of workflows/business processes for machine-actionable data management planning, (iii) description of the system architecture on an enterprise level that can serve as a reference architecture for maDMPs at institutions, and (iv) development of a proof-of-concept implementation that shows selected features of the described architecture.
We evaluate the proposed system in a case study at TU Wien, which is the largest technical university in Austria, by implementing selected workflows and associated services of the architecture as a proof-of-concept, connecting with the institutional infrastructure of systems and services of TU Wien. We follow the design science methodology [13,33] and use the proof-of-concept implementation to evaluate the DMP efficiency gain by assessing the degree of automation achieved for data management planning tasks. We also assess the effectiveness with which the resulting maDMP can meet the stakeholder requirements. To evaluate how the maDMP meets funder requirements, we use the DMP templates of the European Commission and a national funding body in Austria and assess to which extent they can be covered.
The information system described in this article shows how machine-actionable data management planning can be embedded into the landscape of an institutional research data management infrastructure. Thus, our work helps decision makers and infrastructure operators at research institutions design in a systematic way information systems that can utilize the full potential of maDMPs.
The article is structured as follows. Section 2 describes related work. Section 3 describes the requirements engineering carried out for this work. Section 4 describes the proposed architecture for an institutional maDMP support system. Section 5 describes the proof-of-concept implementation of selected workflows and services of the proposed architecture in a real-world environment. Section 6 evaluates the degree of automation achieved for DMP tasks with the proof-of-concept implementation. Section 7 provides the conclusion and future work.

RELATED WORK
maDMPs are a new generation of DMPs. The central characteristic of an maDMP is that its information is modeled in a machine-actionable way. The term machine-actionable is associated with the Findable, Accessible, Interoperable, Reusable (FAIR) principles to express that machines should be able to autonomously take action on digital objects [48]. Machine-actionability for DMPs shall be achieved by modeling the semantic information with the use of controlled vocabularies and standards [26], as well as by using persistent identifiers (PIDs) to reference specific entities, such as people, institutions, funders, grants, datasets, or repositories [41,42]. RDM still has low maturity at many research institutions [9]. Universities maintain data in several information silos, each of them engineered to serve a specific vertical application. Data about key entities, such as people, publications, courses, and projects, is scattered across them and difficult to correlate due to the diversity in format, metadata, conventions, and terminology used [22]. When a DMP is machine-actionable, it can become a living (active) document [28] since information can be integrated from various sources [27] and automatically be updated. maDMPs facilitate the exchange of information across research systems and enable new use cases as described in the work of Simms et al. [42]. Transition from DMPs to maDMPs can be seen as a transition from Level 1 Content Management to Level 3 Knowledge Management as defined in the work of Tuzhilin [47]. This is because maDMPs model the actual information and are not simple text documents that structure is hard to interpret by machines. The benefits of using structured information over traditional text-based approaches to automate document management was discussed in the work of Lee et al. [21].
The RDA DMP Common Standards working group [39] developed a common data model/metadata application profile for DMPs [29]. The recommendation was developed as a collaborative effort by consulting stakeholders and collecting user stories as described in the work of Miksa et al. [26]. It is capable of describing various entities involved in data management planning such as the DMP itself, projects, funding, contributors, costs, datasets, and their relations. An overview of the RDA DMP Common Standard data model is given in Figure 1.
The principles described in the work of Miksa et al. [28] specify what is required to bring the maDMP vision to life and to realize use cases from stakeholders involved in RDM. Among other things, the principles suggest embedding maDMPs in the workflows of stakeholders to enable automation and to use a common data model for the exchange of DMP information. Further automation can be enabled by the machine-actionable description of resources such as data policies or components of the research data ecosystem such as repository systems. maDMPs do not replace other systems for managing data [30,38], storing data [3], or citing data [16] but complement them to establish research data services [20,45].
The research community also identified potential use cases [2,41,42] and requirements [26] for maDMPs and supporting systems. This article breaks down these use cases into specific processes and describes how they can be realized in an institutional context.

REQUIREMENTS ENGINEERING
Requirements engineering plays a crucial role in the development process of an information system as it aims at providing a complete and accurate requirement specification [6]. In this section, we describe how we collected requirements for the information system.
Since there were no well-established use cases for machine-actionable Data Management Plans (maDMPs), we started our work in 2017 by identifying specific requirements. We coorganized workshops and consultations to identify together with the scientific community typical use cases for maDMPs. Specifically, the outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, the United States, Australia, and Europe form the basis of our work [42]. We have further refined the requirements in physical workshops and virtual meetings. Together with the RDA DMP Common Standards working group, we coorganized between October 9, 2017, and November 30, 2017, an open consultation in which 108 user stories were collected. They express viewpoints of funders, institutions, repository operators, research support, researchers, and service providers [26]. The participants contributing their user stories represented different institutions, such as universities, medical universities, universities of economics, technical universities, and social data archives [24]. Thus, the requirements identified in the consultations are neither limited to the use case discussed in this work nor to particular domain, such as a technical university.
Based on the requirements, we developed workflows (Section 3.1) and graphical mockups (Section 3.2) that reflect these use cases. The workflows underwent an initial evaluation during a workshop hosted during the International Conference on Theory and Practice of Digital Libraries in 2018 [25]. A group of about 15 participants was divided into breakout groups and asked to review a series of workflows that could potentially be automated under an maDMP ecosystem. Each group had to answer the following questions: • What would you change in the workflows?
• Do the workflows fit into the context of your organization/domain?
• Which other stakeholder needs should be addressed?
• Which other systems can be used?
• What else could be automated?
The mockups were tested and refined with the help of relevant stakeholders at TU Wien but also external bodies such as a representative of a major research funding agency in Austria.
We are aware that more requirements engineering needs to be conducted. However, with our exploratory work, we hope to advance the discussion and contribute to a better understanding of maDMPs and the surrounding processes.

Automated Workflows
The design of the workflows is framed by the process of creating an initial DMP at an early stage of a project [26] but also considers data management tasks at later stages, such as when data is ready for submission into a repository system [28].  2(a) depicts a high-level workflow of creating an initial DMP and shows that services and stakeholders of an RDM infrastructure are involved in the process [26]. In this high-level workflow, the researcher starts a DMP and administrative information like researcher affiliation is fetched from related information systems to pre-fill the DMP. Next, the researcher estimates the expected size and type of the research data and gets assistance in storage booking, cost estimation, and license selection for datasets. The DMP is a living document that evolves over time. Hence, especially in the initial DMP, the information cannot be considered final and its correctness may be subject to review of other stakeholders-for example, estimations of required storage made by researchers may need to be revised by the ICT operator. Figure 2(b) shows potential interactions between stakeholder services using an maDMP as a medium for information exchange at a later stage in the project [28]. Information contained in the maDMP like the specified license or embargo period for a dataset can be used to facilitate the submission process of research data to the selected repository by automatically assigning these fields in the repository system. A repository operator service can return a list of Persistent Identifiers (PIDs) pointing to the submitted datasets and provide associated cost to update the maDMP. A funder service can use the information to check how the DMP was implemented.
The high-level workflows described so far remained vague, and it was not clear what exactly should be done at each step. Hence, we modeled nine sub-workflows using the Business Process Model and Notation (BPMN). Figure 3 gives an overview on the high-level Business Process Model and Notation (BPMN) workflows, which form the foundation for our system architecture design. We describe all workflows in a technical report [36]. In the following, we present one of the workflows.
The Specify Size and Type workflow, illustrated in Figure 4, deals with the specification of research data that will be (re-)used and generated during the project: (1) Cite data: If existing, citable datasets are being reused, their unique and resolvable PIDs like DataCite 6 digital object identifiers (DOIs) can be entered and associated metadata be retrieved and imported into the DMP. Many repositories support the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, and other repositories like  GitHub provide a REST-API to collect metadata. Citing reused datasets in a DMP enables the data discovery use case for datasets referenced in different DMPs. Retrieved metadata such as the license of the reused dataset clarify the terms of use [5] and can avoid pitfalls already in the planning phase. (2) Specify output data: The researcher can specify which data will be generated during the project. The researcher can specify the expected types, file formats, and volumes of research data manually or get automated support by uploading datasets that are analyzed (file format, size, other metadata) by a file identification and characterization tool. If entire datasets are already known or available, they can be uploaded for analysis. Alternatively, sample data can be uploaded and analyzed, the expected number of similar files can be specified, and an estimate of the total size can be calculated. (3) Classify/tag data: The researcher can classify/tag the data, such as by assigning labels. (4) Save: The workflow concludes with storing the details of the research data specification to the DMP.

Graphical Mockups
In Section 3.1, we described automated workflows for data management planning using BPMN and textual descriptions. BPMN is a notation used to facilitate discussions about business processes between non-technical and technical people by using a common language. However, from a user interface (UI) design perspective, it is not straightforward to derive requirements for a User Interface (UI) from the BPMNs. UI mockups present a quick opportunity to test concepts with stakeholders and receive valuable feedback on the further design. Therefore, we developed interactive, graphical UI mockups [35] using Balsamiq 7 for five different stakeholder groups: (1) Researcher (2) Research Support (3) Information and Communications Technology (ICT) Operator (4) Management (5) Funder All of these stakeholders are involved in the data management planning workflows and related use cases. The graphical UI mockups are composed of about 60 wireframes. The wireframes are linked with each other and contain interactive, clickable elements. We published the graphical mockups on GitHub pages 8 as a PDF and with a link to the interactive version hosted at Balsamiq Cloud. Visitors of Balsamiq Cloud could provide comments with markers directly on the wireframes. Figure 5 shows a sample wireframe designed for a researcher. It allows researchers to specify what data will be used by defining tags that represent a dataset, such as "Raw data. " Researchers describe the dataset by selecting from pre-defined options, such as information on type, size, and format. Researchers can also simply upload sample files instead. The system will detect file format and size. The user will have to specify how many "similar" files will be used/produced in the project.
Additionally, we met in person with representatives from the stakeholder groups located in Austria and presented the mockups and documented their feedback.
The feedback collected contributed to the improvement of the graphical mockups. In this work, we went through several iterations of incorporating feedback, with v1.2 [35] being the latest version of the mockups.

ENTERPRISE ARCHITECTURE
In this section, we introduce an architecture for machine-actionable data management planning in the context of a research institution and its infrastructure. A research institution or university with its various stakeholders, systems, and services can be viewed as an enterprise. We therefore use a software engineering technique called enterprise architecture (EA) modeling to describe the architecture from various perspectives, highlighting different aspects. The purpose of the architecture description is to give a comprehensive overview of the system by using different architectural views to highlight different system aspects, capture significant architectural decisions, and communicate them to various stakeholders.
In our work, we defined architectural goals and associated requirements and constraints by assessing the drivers for change of an institutional Research Data Management (RDM) infrastructure. We integrated the workflows derived from community use cases into the business process layer of the architecture. For the business processes and sub-processes, we derived application services and components that can implement them. By arranging architectural components around business capabilities, we achieved a modular architecture that is polyglot and enables selfcontained components to be independent of programming language and technology.
Due to space limitations, we cannot present all the views. The full description of the architecture with all its views (use case view, logical view, process view, implementation view, deployment view) can be found in the work of Oblasser [34]. In the rest of this section, we describe the main building blocks of the proposed architecture. Figure 6 shows the business process views of our architecture. In Figure 6(a), the stakeholders and the business services are depicted-for example, the Data Management Planning service serves the researcher and is realized by the Data Management Planning business process. The Data Management Planning business process consists of sub-processes, which are listed in Section 3.1.
The Get Storage business process implements the storage provisioning for the researcher. The Get Help business process realizes the DMP Support service for the researcher and research support.
The result of the Data Management Planning business process is the DMP business object, representing the actual DMP. Data management cost is an output of the Configure Storage and estimate Cost sub-process. Figure 6(b) shows business services of the ICT operator and the library. The Storage Demand Overview service serves the ICT operator. The library is served by the following business services: Preserve Data service, Open Access Publishing service, and Issue DOI service.
The layered view depicted in Figure 7 shows the relations between the elements of the business layer (yellow), the application layer (blue), and the technology layer (green). Each layer provides services that serve the layer above. In this way, it can be traced which business process is served by which application service and implemented by which application component. Further, it can   Web service for recommending repositories (API) Metadata Standard Discovery Web service for discovering metadata standards (API) Metadata Importer Web service for importing metadata (API) File Characterizer Web service for uploading and analyzing file samples, identifying their file formats, and providing characterization metadata (API) Notifier Notification service to deliver messages to users (API) Administrative Data Collector Web service for importing administrative data from information systems such as CRIS or ORCID (API) Repository Ingestor Web service for ingesting data into a supported repository (API) Service Broker Web service for brokering services of an ICT service provider; the broker provides a catalog of services and enables their provisioning (API) Service Catalog Controller Web service for registering service brokers from different providers and providing an aggregate catalog of available ICT services (API) Help Desk Web application serving the help desk GUI ICT Dashboard Web application serving the ICT dashboard GUI be traced which application components are served by which technology service and realized by which technology. The illustrated application layer only partly shows how application components interact. For example, the CRIS API serves the Administrative Data Collector component. In other words, the Administrative Collector component fetches administrative data from the CRIS API. Details about how application components are integrated and how data flows between application components are depicted in the implementation view described in the work of Oblasser [34].
The technology layer (green) at the bottom of Figure 7 suggests that application components are realized as web application services. Some web application services are intended to be stateless such as the Metadata Importer, whereas others require a database to persist data such as the DMP Store. We identified 13 application services that are required to implement the business processes ( Table 1). The services vary in complexity and form. Some services are intended to provide a GUI (graphical user interface), whereas others offer an API (application programming interface). The system is designed to be deployed on an institutional level, but some of the services could be shared with other institutions or the public. For example, the Repository Recommender service that consumes DMP and policy information in a standard format can be deployed publicly. The same applies for the Metadata Standard Discovery, the File Characterizer, the Metadata Importer, or the Repository Ingestor services under certain conditions. The Help Desk service could be shared between institutions.

IMPLEMENTATION
In this section, we describe an implementation of the proposed maDMP support system in an institutional context. We chose TU Wien as an example institution to show how the proposed system can be embedded in the system and service landscape of an institution.
This section is structured as follows. First, we introduce the TU Wien case study. Second, we present the DMap tool, which implements selected automated workflows using the TU Wien infrastructure and other systems and services.

TU Wien Case Study
TU Wien is a major university in Austria with more than 27,000 students, around 5,000 employees, and eight faculties. It has departments, information systems, and ICT services that are relevant to our maDMP use cases.
The automated workflows described in Section 3.1 involve three main stakeholders, namely the researcher, research support, and ICT operator. At TU Wien, DMP-related research support is offered by the Center for Research Data Management and the European and International Research Support (EIRS) team. TU.it is the in-house ICT operator that provides ICT services for employees and researchers. The services offered range from storage solutions to virtual server hosting to the provision of High Performance Computing (HPC) infrastructure and more.
The TU Wein Information Systems and Services (TISS) 9 is the central information system to ensure university operations. TU Wien Information Systems and Services (TISS) includes a searchable address book of employees, students, and organizational units. TISS also contains a project database to manage research projects and their funding. Both of these two sub-systems contain relevant information for DMPs. The address book serves as a source for personal information about researchers and other employees, such as person ID, email, or affiliation. The project database contains relevant project and funding information, such as project ID, title, duration, funder, or grant ID. TISS provides a public REpresentational State Transfer (REST) API to fetch information from the address book and the project database.

DMap Tool
To show the applicability of the EA described in Section 4 in a real scenario, we developed a proofof-concept tool named DMap. 10 DMap uses the infrastructure of TU Wien and its systems and services. In DMap, we implemented selected use cases from the set of workflows described in Section 3.1. DMap is the implementation of the DMP App application service and related application components described in the EA and therefore realizes the service used by researchers and research support staff. Other application services described in the EA such as the ICT Dashboard or the Help Desk are not implemented as part of this work. The DMap UI was implemented according to the mockup design described in Section 3.2. Figures 8 and 9 show screenshots of DMap in action. Further, DMap screenshots can be viewed in the slides [33] resulting from the presentation of DMap to the Open Science community at the 15th RDA plenary in Helsinki in October 2019.
DMap is separated into two decoupled services -a backend and a frontend service. The communication between the frontend and backend is realized as a resource-based, synchronous communication over HTTP (Hypertext Transfer Protocol). Hence, the backend provides a Representational State Transfer (REST) Application Programming Interface (API) that serves as a gateway to various kinds of resources, such as DMPs stored in the database or project information located at the external TISS. JSON (JavaScript Object Notation) is used as the payload format for the   transfer of structured information between the backend and the frontend. The frontend runs in a web browser and provides the UI to the user.
The backend is implemented as a Spring Boot 11 application with an embedded Apache Tomcat servlet engine using Java 8. The communication with the DMP database is done via an abstraction data access layer provided by Spring Data. Currently, DMap uses the document database Mon-goDB 12 to persist DMPs, but it can easily be replaced with any other datastore due to the used data access abstraction layer.
The fronted has been developed with the JavaScript framework Vue.js 13 and makes use of several libraries to provide a scalable and maintainable architecture. For example, Vue Router is used for client-side routing between different views, which enables us to create a modern single-page web application. The UI is composed of self-contained and reusable Vue components that contain HTML (Hypertext Markup Language) template code, JavaScript code, and Cascading Style Sheets (CSS)-style information. A central Vuex state store is used to maintain a consistent state of the frontend application. To communicate with the backend, a Hypertext Transfer Protocol (HTTP) service using the Axios library has been implemented.
The RDA DMP Common Standard for maDMPs is the native format for the tool. Thus, the information is modeled and structured from the beginning. This is different when compared to other tools that mostly structure questionnaires (i.e., consist of question-answer hierarchies).
We implemented workflows to assist with the specification of research data, license, and repository selection. DMap can export a maDMP as JavaScript Object Notation (JSON) that complies with the Research Data Alliance (RDA) DMP Common Standard. We implemented a service  broker that represents a standard interface to the services provided by TU.it, the local ICT operator. We also integrated the CRIS of TU Wien to fetch structured information about projects and researchers. Table 2 provides an overview of implemented business processes (cf. Section 3.1). We introduced some simplifications of implementation compared to the original specification of workflows and architecture description. Specifically, we did not model machine-actionable data policy elements, described as one of the 10 principles of maDMPs [28]. We also simplified the proposed repository recommendation service by integrating a repository search in DMap, backed by the repository registry Registry of Research Data Repositories (re3data), rather than implementing an actual recommender system. To implement a repository recommendation service as described, relevant maDMP metadata must be aligned with repository metadata and machineactionable data policies, and a suitable recommendation model must be developed. The simplifications have no effect on the later evaluation.

EVALUATION
In this section, we describe how we evaluate the proposed information system for maDMPs. The implemented proof-of-concept tool DMap described in Section 5.2 serves as a vehicle for our assessment to make statements about the usefulness of the system. In Section 6.1, we assess the system efficiency by evaluating the processes implemented in DMap. For this purpose, we analyze the extent to which the creation of a DMP in DMap is automated and simplified compared to the conventional way of writing a DMP in free form.
In Section 6.2, we evaluate the maDMP created with DMap with regard to its completeness of information that is relevant for funders. The funders' requirements for DMPs are currently expressed in funder-specific DMP templates. Therefore, we evaluate how well the questions of two major DMP templates (the Austrian Science Fund (FWF) [4] and the European Commission's Horizon 2020 programme [11]) can be answered with the information contained in their machineactionable counterpart.

Degree of Automation
We analyzed the processes involved in creating a DMP from the perspective of a researcher who has to write a DMP for a research project. We set the conventional way of writing a DMP in freeform as the baseline for our analysis. In the analysis, we focused on the key steps of creating a DMP. For each of the steps, we assessed the degree of automation achieved with DMap. The results of the assessment are summarized in Table 3. The evaluation shows that most of the processes are semi-automated. Compared to the conventional free-form way of writing a DMP, DMap presents an interactive way of creating a DMP. Instead of being faced with a blank page, the user gets assistance for every step. This is achieved by integrating external information systems (TISS, re3data) and tools (FITS, EUDAT license selector) into DMap, providing a suitable UI (Section 3.2) for various tasks and offering the user pre-defined options to choose from.
By integrating external services and tools, we simplified the user experience by providing a single UI instead of switching between different UIs. A positive side effect is that the data from external information systems is imported into the tool in a controlled manner by interacting with the respective service APIs. This enables to unambiguously reference entities such as persons (TISS ID), projects (project ID), grants (grant ID), licenses (uniform resource identifier), or repositories (re3data ID) in a maDMP. A manual approach to referencing entities is prone to errors, due to possible typing errors.
The Horizon 2020 DMP template survey results [15] show that users wish for more guidance in terms of recommendations and drop-down options to choose from. We our earlier work [37], we discuss a recommender system for repositories. However, in DMap, we did not implement a repository recommender system but provided the user with the ability to find suitable repositories by applying filters and text search. Therefore, there is higher potential for automation than realized in DMap. DMap provides drop-down options for the assignment of contribution roles, the specification of research data types, sizes, and the selection of licenses. We also set default values to simplify the user experience. For example, the licenses for datasets are set to CC-BY by default, or the project end date is automatically set as the date from when the license should be active. However, further drop-down options can be provided, such as a list of suitable metadata standards to select from.
The selection and automated cost estimation for active data storage (Get Cost/Storage workflow) was not integrated with DMap and therefore could not be evaluated. However, the implementation of the TU.it service broker (not described in this article) shows that we can implement a standard interface for the interaction with ICT services that enables us to configure ICT services and calculate the costs based on the respective cost models. By integrating service brokers into a data management planning application, we can automate the discovery of suitable ICT services by showing a catalog of services to the user and provide semi-automated assistance in service configuration and cost estimation. In this evaluation, we assessed the level of automation from the perspective of a researcher. However, there are other stakeholders of a maDMP support system who can benefit from automated processes. For example, the ICT operator can get automated support in the management of ICT resources by being integrated into the data management planning workflow. The Research Office can be notified if an ethics approval is needed. The funder can semi-automatically evaluate maDMPs because of their machine-actionable format. At the time of writing, these use cases remain aspirational and need to be evaluated after they are implemented.

Completeness
As funders are the primary recipients of DMPs, we evaluate how well an maDMP can meet the funders' requirements. For this evaluation, we created an maDMP in the RDA DMP Common Standard v1.0 with our proof-of-concept tool DMap. We then used the DMP templates of two major funders (the FWF [4] and the European Commission's Horizon 2020 programme [11]), which contain sets of questions and evaluate to what extent they can be answered with the information contained in the maDMP.
For each of the 31 questions in the H2020 template and 27 questions in the FWF template, we assess whether the question can be answered completely/partially/not with the information contained in the maDMP. If the question can be answered completely or partially, we list the corresponding maDMP fields. Thus, we created manually a table with a mapping between questions from the templates and fields in the maDMPs. Table 4 presents the results for the FWF DMP template grouped by their categories. Figure 10 depicts the analysis results for both DMP templates. The analysis shows similar results for both DMP templates, although the questions differ in topic and granularity. More than half of the questions (58.1% H2020, 59.3% FWF) of both templates can be completely answered with the information contained in the maDMP. Of these questions, many can be answered with the type-safe, machineactionable information in the maDMP, such as PIDs, uniform resource identifiers, numerical values, or data fields with a controlled vocabulary, whereas others can be answered with the free-form text fields of the maDMP. For example, the maDMP uses DOIs for all data objects, and ORCIDs and internal staff identifiers for all actors involved.
About one-fifth to one-quarter (22.6% H2020, 22.2% FWF) of questions from both templates can be answered partially, and about one-fifth of the questions (19.4% H2020, 18.5% FWF) cannot be answered. One reason for this is that some of these questions require human narrative for which the maDMP does not provide suitable data fields or the information in the maDMP is not meaningful enough to adequately answer the questions. For example, the question "How are you planning to document this information?" in the Documentation and Metadata section of the FWF template has no matching field in the maDMP.
If the funder requirements for DMPs do not converge, it is difficult to design a data model that supports all kinds of questions from different DMP templates. The maDMP specification contains a minimal set of common fields used among different templates and use cases. There are developments to harmonize the requirements for DMPs among the funding organizations by establishing core requirements for DMPs [40]. Another solution can also be development of an extension to the maDMP specification, which extends it with funder specific fields that are missing in the core specification. However, this can result in the deluge of extension and lack of interoperability, which in turn contradicts the concept of a common standard for maDMPs.
We performed the evaluation on the templates that are most common for the considered use case. However, it is possible to map maDMPs to other templates. For example, during the maDMP hackathon on maDMPs 14 organized by the RDA DMP Common Standards working group, one of the groups managed to achieve 100% accurate mapping for the National Science Foundation template in the United States [7]. This shows that the maDMPs can be used in non-European context as well.

CONCLUSION AND OUTLOOK
maDMPs are a way to exchange information about data used and produced by researchers. Research institutions can build their research data infrastructure around maDMPs to bring researchers together with departments and services. Thus, they can integrate all stakeholders involved in RDM by connecting their information systems to manage and exchange information in an automated way. By automating tasks for data management planning, researchers can be supported on their journey to good data management practice. For that purpose, services that provide structured machine-actionable information, which can be fed into an maDMP, and services that can consume this information and act upon it are required.
In this article, we explored use cases of maDMPs in the context of a research institution or university with its systems and services and proposed an architecture for an information system that helps in machine-actionable data management planning. We described a system architecture on an enterprise level that can serve as a reference architecture for other institutions that plan to automate data management. A proof-of-concept implementation shows that the use cases are feasible and tasks can be semi-automated, but human interaction and free-form text cannot be evaded completely.
The benefits and limitations of the proposed solution can vary depending on the context of specific deployment, such as the number of implemented processes or services. The effort and expenses depend on the starting point of the specific institution. The EA and processes described in this article can help all stakeholders identify needed changes in the existing services and better coordinate investments to create value for everyone involved in RDA.
maDMPs are still in the early adoption phase, and more research is required. In this work, we focused on providing automated support for the creation of maDMPs and enabling related use cases of institutional stakeholders. Future work will elaborate on the use cases of the research funder, including DMP monitoring and automated support for reviewing DMPs.