Abstract
This article studies the problem of automated information processing from large volumes of unstructured, heterogeneous, and sometimes untrustworthy data sources. The main contribution is a novel framework called Machine Assisted Record Selection (MARS). Instead of today’s standard practice of relying on human experts to manually decide the order of records for processing, MARS learns the optimal record selection via an online learning algorithm. It further integrates algorithm-based record selection and processing with human-based error resolution to achieve a balanced task allocation between machine and human. Both fixed and adaptive MARS algorithms are proposed, leveraging different statistical knowledge about the existence, quality, and cost associated with the records. Experiments using semi-synthetic data that are generated from real-world patients record processing in the UK national cancer registry are carried out, which demonstrate significant (3 to 4 fold) performance gain over the fixed-order processing. MARS represents one of the few examples demonstrating that machine learning can assist humans with complex jobs by automating complex triaging tasks.
1 INTRODUCTION
It is widely believed that with the advances of artificial intelligence (AI) and machine learning (ML), machines are taking over human jobs [3, 27]. Closer studies have argued that this is not entirely true—lower-level jobs that can be rigorously streamlined are much more susceptible to be replaced by machines than jobs that require “intelligence” and “creativity”, such as artists, architects, IT specialists, and public relations professionals [1, 7]. However, the views should be also taken with a grain of salt, as the boundaries between such low-level and high-level jobs are constantly shifting and becoming much more obscure with AI and ML [2].
Historically, tools were invented to assist humans to complete tasks, and today’s machines should be no different, even with much more advanced computation and reasoning capabilities. It is, however, a new and challenging problem to design the framework that integrates machines with humans so that the former can seamlessly assist the latter to complete a task, one that can accommodate the ever-increasing “intelligence” of machines. Solving this problem has significant societal ramifications, as it is central to the ongoing concern of job losses to AI [23].
This article aims at developing one such framework, using data entry systems as a particular example. The objective of a data entry system is to construct and maintain a structured database that contains accurate information of required fields from possibly heterogeneous, unstructured, and sometimes untrustworthy data sources. Among the many tasks in a data entry system, the “low intelligence” processing tasks have already been automated with computer vision (CV) and natural language processing (NLP) tools. Nevertheless, the “high intelligence” tasks of control, that determines which record to process next among the large volume of unstructured data records, is significantly harder as it requires understanding the quality of the data sources and making educated predictions. In today’s reality, this is exclusively done by human experts, and there is a strong desire to develop ML algorithms to automate the record selection process [16]. Doing so would further reduce the workload of data entry clerks and allow them to focus on other high-value tasks.
However, achieving this goal faces several unique challenges. Data records are heterogeneous as they come from different sources and may contain different sets of reports with overlapping information. They can also be unstructured, leading to a large variation of processing cost. Last but not the least, the quality of different “raw” records may vary significantly. Record content may be missing or inconsistent due to data collection and conversion errors. Recognizing processing errors alone is often complex enough that domain expert knowledge is an indispensable requirement in contemporary data processing systems.
In this article, we aim at addressing the aforementioned challenges by proposing a new Machine Assisted Record Selection (MARS) paradigm to automate the record selection process in data entry systems. We use clinical data registry as an example to highlight the proposed paradigm, but the methodology is general enough to be applied to other applications in healthcare data processing and beyond. To handle heterogeneous reports with a large variation of processing cost, MARS has two sets of algorithms, fixed-order greedy (FOG) and varying-order greedy (VOG), that sequentially select the next record and the subset of features to be processed. MARS is principled—there is no “black box” in the decision making. This is because we leverage two critical features (that are known to the domain expert). First, the cost can be reduced by removing the realized features from future report examination. Second, the set of features that are yet to be extracted vary based on the past observations, and the optimal selection of the next report should be adjusted in an adaptive manner. When no prior knowledge about the records is given, online MARS is proposed that simultaneously learns such knowledge and selects the reports.
In addition, identifying errors or inconsistency in the heterogeneous patient records for clinical data registration often requires medical domain knowledge or societal background information, which can be better handled by a well-trained cancer registration staff than today’s AI agents. On the other hand, by allowing machines to make decisions on record selection, human experts have a reduced workload, which can lead to fewer human errors and smaller task delay. The overall MARS paradigm demonstrates that pursuing performance gains does not necessarily lead to unfairness among humans and machines, as our design achieves “the best of both worlds’—they both exploit their strengths and avoid shortcomings, leading to improved system efficiency and fairness. This is further validated in the semi-synthetic experiments, which build on a data entry processing pipeline with data generated from statistics that are obtained from real-world patient records and data entry clerk processing in the UK national cancer registry. We see significant (3–4 fold) performance improvement of adapting the order of report examination based on past observations over the fixed-order algorithm.
The rest of this article is organized as follows. The proposed MARS system, together with the problem formulation is described in Section 2. The proposed algorithmic framework is presented in Sections 3 and 4. Theoretical performance analysis of MARS is given in Section 5. Numerical experiments are reported in Section 6, and related literature is discussed in Section 7. Finally, Section 8 concludes the article.
2 SYSTEM DESCRIPTION AND PROBLEM FORMULATION
2.1 Data Entry from Heterogeneous Sources
In this work, we use cancer registration as an exemplary setting to illustrate the proposed MARS framework for general data processing systems. The goal of a cancer registry is to create a high-quality clinical database that consists of relevant features for a given tumour. To accomplish this goal, the data entry processor takes incoming patients records that are collected from participating hospitals and clinics. These records arrive irregularly and sometimes unpredictably at the cancer registry center, with varying degrees of data quality. Each record is assumed to consists of some reports, e.g., pathology report, clinical meeting report, and surgery report. All of these reports may or may not have the needed features for cancer registration, and it only becomes clear after the report has been processed. Figure 1 illustrates this problem with a simple example: The required features in the registry are scattered in multiple reports of a patient’s record, and the order in which these reports are processed makes a big difference to the effectiveness and efficiency of data entry processing.
Fig. 1. Exemplary record with 3 reports containing relevant features for cancer registry.
2.2 Probabilistic Model
We use \( N \) and \( D \) to denote the maximum number of features and the maximum number of reports in a patient’s record. Associated with each record is a random feature indicator matrix \( X \in \lbrace 0, 1\rbrace ^{N \times D} \), with each element \( X_{i,j} \) assumed to be distributed independently as \( X_{i,j} \sim {\rm Bernoulli} (p_{i,j}) \), where \( X_{i,j} = 1 \) indicates that feature \( i \) exists in report \( j \), and 0 otherwise. The column and row vectors of \( X_{ij} \) are denoted as \( X_{{\cdot }j} \) and \( X_{i{\cdot }} \), and for a subset, \( \mathcal {I} \subseteq \lbrace 1,\ldots ,N\rbrace \), \( X_{\mathcal {I},j} \) is a column vector whose \( i \)th element is \( X_{ij} \) if \( i\in \mathcal {I} \) and 0 otherwise. In addition, each report \( j \) is associated with a quality parameter \( \epsilon _j \), which denotes the probability that any feature is populated with an incorrect data item. We thus also have a random quality indicator matrix \( Y \in \lbrace 0, 1\rbrace ^{N \times D} \), where \( Y_{i,j} \sim {\rm Bernoulli} (1-\epsilon _j) \) is 1 if the item is correct, and 0 otherwise. Lastly, processing each report \( j \) and examining whether feature \( i \) is correctly populated incurs costs (e.g., processing time), which is modeled as a bounded independent random variable \( C_{i,j} \). We do not specify its distribution but assume \( \mathbb {E}[C_{i,j}] = \alpha _{i,j} \), and \( \mathbb {E}[ |C_{{\cdot }j}|_1 ] = \sum _{i=1}^{N} \alpha _{i,j} \doteq \alpha _j \). We denote the random cost matrix \( C = [C_{i,j}] \in \mathcal {R}_{+}^{N \times D} \) with mean \( A = [\alpha _{i,j}] \).
Before proceeding to the system description, we emphasize that the statistical modeling of \( X \), \( Y, \) and \( C \) reflects the unknown and varying characteristics of the patient records. The realizations \( x \) (what features actually exist in the report), \( y \) (what existing features are incorrectly populated), and \( c \) (the processing cost of extracting these features and examining them) of a record are unknown until they are processed by MARS, with a control unit making the decisions while facing these uncertainties to assist the overall data entry work flow.
2.3 System Description
Figure 2 illustrates the relationships among various entities in the proposed data entry system and how they interact with each other in MARS. Note again that we focus on the decision problem of which report to examine in the control unit of Figure 2, and how the machine interacts with a human expert for error resolution. Specifically, the control unit adopts a policy \( \pi \) to determine which report to process next. At time step \( t \), let \( {\tilde{\mathcal {D}}(t)} \) be the set of reports that have not been examined yet in \( \tilde{\mathcal {D}} \). We use \( \mathcal {N}(t) \) to denote the set of features whose correct data items are still unknown after time \( t \). The control unit selects record \( \tilde{D}(t) \) and the processing unit extracts relevant features within \( \mathcal {N}(t-1) \) from this report. A feature indicator matrix \( x_{\mathcal {N}(t-1),\tilde{D}(t)} \) is revealed to the control unit, and the actual extracted features are sent for human examination, which returns the error indicator matrix \( y_{\mathcal {N}(t-1),\tilde{D}(t)} \) together with the processing cost. The extracted features are then re-processed with error resolution, and the existing and correct data items are finally stored in the data registry, i.e., \( \begin{equation*} \tilde{x}_{\mathcal {N}(t-1),\tilde{D}(t)} = x_{\mathcal {N}(t-1),\tilde{D}(t)} \wedge y_{\mathcal {N}(t-1),\tilde{D}(t)}. \end{equation*} \) The current realization is stored and the processing continues to \( t+1 \).
Fig. 2. The MARS framework for record selection in building cancer registry.
2.4 Problem Formulation
The performance of policy \( \pi \) at time \( t \) is measured by the number of successfully collected features (1) \( \begin{equation} \sigma (t) = N - |\mathcal {N}(t)|. \end{equation} \) The goal for designing policy \( \pi \) is to maximize the expected number of successfully collected features at the end of budget \( T \) \( \begin{equation*} \max _{\pi } \mathbb {E} \left[ \sigma (T) \right], \end{equation*} \) with minimal expected total cost \( \begin{equation*} \mathbb {E} \left[ \sum _{t=1}^{T} \left\Vert c_{\mathcal {N}(t-1),\tilde{D}(t)} \right\Vert _{1} \right]. \end{equation*} \)
The report selection policy design can be posed in both fixed-order and adaptive settings, both of which assume that the probabilistic information of \( X_{i,j} \), \( Y_{i,j}, \) and \( C_{i,j} \) is available to the MARS control unit a priori. In a fixed-order setting, the selection of any future reports after \( t \) does not depend on the actual realizations \( x, y \), and \( c \) in the past. Essentially, the fixed-order policy \( \pi _{\sf {fo}} \) re-orders the set of report types \( \mathcal {D} \) and obtains \( \tilde{\mathcal {D}} \) using only the knowledge of \( \lbrace P, \epsilon , A\rbrace \).
In the adaptive setting as illustrated in Figure 2, policy \( \pi _{\sf {a}} \) sequentially selects a non-repetitive report from \( \mathcal {D} \). Let us define the realization \( \psi (t) \) as the set of selected reports, observed feedback, and actual cost at the end of time \( t \) (2) \( \begin{equation} \psi (t)=\left\lbrace \left(\tilde{D}(\tau), x_{\mathcal {N}(\tau -1),\tilde{D}(\tau)}, y_{\mathcal {N}(\tau -1),\tilde{D}(\tau)}, c_{\mathcal {N}(\tau -1),\tilde{D}(\tau)} \right)^{t}_{\tau =1} \right\rbrace . \end{equation} \) The policy for the adaptive setting is formally defined as a mapping from the past realizations to the next report to choose (3) \( \begin{equation} \tilde{D}(t+1) = \pi _{\sf {a}}\left(\psi (t), \lbrace P, \epsilon , A \rbrace \right). \end{equation} \)
In both settings, the learner has access to \( \lbrace P, \epsilon , A\rbrace \) as an input of the problem. Such assumptions are reasonable because for certain use cases, these values can be estimated from, e.g., the existing registries, representing a quantitative interpretation of domain expert guidance. However, there are also situations where such prior domain knowledge does not exist, and the system is built entirely from a clean slate. When this information is unavailable a priori, an online setting is motivated where the proposed solution needs to simultaneously determine the future actions and learn the problem-dependent variables. This problem is addressed in Section 4.
3 REPORT SELECTION ALGORITHMS
Three categories of data entry algorithms are presented in this section: non-adaptive, adaptive, and online. The differences primarily lie in whether the a priori statistical knowledge is available, or whether the action can be based on actual realizations of the reports. This section focuses on developing both non-adaptive and adaptive policies for the data entry problem where the learner has access to the values of \( p_{ij} \) and \( \epsilon \), which can be estimated from existing cancer registries if available. The difference between both policies is that the adaptive algorithm takes into account the realization \( \psi _k(t) \) when choosing the next report, while the non-adaptive algorithm does not. As a result the non-adaptive algorithm always outputs the same sequence of reports for all patients.
To facilitate the discussion, we summarize the main notations in Table 1. We also note that all proposed algorithms in Sections 3 and 4 have theoretical performance analyses that are presented in Section 5.
Table 1. Table of Notations
3.1 FOG
In the fixed-order setting, since no realization can be observed to facilitate the future decisions, we can absorb \( \epsilon _j \) into the “effective” existence probability \( \tilde{p}_{i,j} = p_{i,j} (1-\epsilon _j) \). In other words, only the probability that a correct feature is collected matters when making the report selection decision.
Generally, the optimal solution can be computed from an exhaustive search of all possible combinations, where for each chosen \( \tilde{\mathcal {D}} \) of \( T \) reports, we have (4) \( \begin{eqnarray} \mathbb {E} \left[ \sigma (T) \right] \doteq F(\tilde{\mathcal {D}}) = \sum _{n=1}^{N} F_n(\tilde{\mathcal {D}}), \end{eqnarray} \) where (5) \( \begin{equation} F_n(\tilde{\mathcal {D}}) = 1 - \prod _{d \in \tilde{\mathcal {D}}} \left(1 - \tilde{p}_{n,d} \right). \end{equation} \)
For a given set of reports \( \mathcal {D} \), adding a new report \( d \) would result in an increased expected reward vector for all features (6) \( \begin{equation} \Delta (d | \mathcal {D}) = \left[ F_1(\lbrace d \cup \mathcal {D} \rbrace) - F_1(\lbrace \mathcal {D} \rbrace), \ldots , F_N(\lbrace d \cup \mathcal {D} \rbrace) - F_N(\lbrace \mathcal {D} \rbrace) \right], \end{equation} \) while increasing the average total cost by \( \alpha _{d} \). The FOG algorithm sequentially selects a new report \( d \) that maximizes the marginal sum increase normalized by the average additional cost \( \begin{equation*} \frac{|| \Delta (d | \mathcal {D}) ||_{1} }{ \alpha _{d} }. \end{equation*} \) FOG is formally presented in Algorithm 1.

3.2 VOG
The difference between adaptive and non-adaptive settings is that the adaptive one takes into account the realization \( \psi _k(t) \) when choosing the next report. This also allows for potential error resolution before the next decision is taken.
We use \( \mathcal {D}(t) \) and \( \mathcal {D}_{\sf {res}}(t) \) to denote the set of examined reports and the set of remaining reports after time step \( t \), respectively. At time \( t \), the algorithm has access to not only \( \mathcal {D}(t-1) \), but also the realization \( \psi (t-1) \). The importance of the actual realization is two-fold. First, we can now apply the greedy procedure only to those features that still have reward 0 (meaning that the features are unknown, due to either no or incorrect realizations in previous steps). Second, after the current decision \( d \) is made, we can observe the feature realizations, thus only keeping the realized features with the correct data, with the help of domain experts.
The complete VOG algorithm is presented in Algorithm 2. We first define \( \bar{\Delta }(d|\psi (t-1)) \) as the conditional normalized expected gain for choosing report \( d \) at time \( t \). For each \( d \in \mathcal {D}_{\sf {res}}(t-1) \), we have (7) \( \begin{equation} \bar{\Delta }(d|\psi (t-1)) = \frac{ (1-\epsilon _d) \mathbb {E}\left[\sum _{i \in \mathcal {N}(t-1)}X_{i,d} \right] }{ \mathbb {E}\left[\sum _{i \in \mathcal {N}(t-1)}C_{i,d} \right] } = \frac{ \sum _{i \in \mathcal {N}(t-1)} \tilde{p}_{i,d} }{ \sum _{i \in \mathcal {N}(t-1)} \alpha _{i,d} }. \end{equation} \) Note that in Equation (7), the numerator represents the expected number of additional correct features registered after examining report \( d, \) while the denominator represents the expected cost associated with it; both are only based on fields that have not been registered before \( t \). The adaptive algorithm then greedily chooses report with the greatest \( \bar{\Delta }(d|\psi (t-1)) \) at time \( t \) (step 2), and the extract features from the selected report, resulting in a realized feature vector \( x_{\mathcal {N}(t-1), d} \) (step 4).
The design for error resolution relies on a human-in-the-loop action, as it typically requires expert knowledge to recognize errors or inconsistencies in the reports. At this stage, a registration staff examines the populated features and flags the ones with error (step 5).1 The adaptive algorithm thus only keeps the features that are populated with the correct data for future evaluation (step 6). The total cost of extracting features from the selected report, expert examination, and feature correction is then observed by the control unit in MARS (step 7).

We remark that the VOG design for the adaptive record selection problem not only allows both machine and human expert to exploit their strengths but also improves the interpretability of the final outcome from an AI algorithm via human involvement, not just at the last stage of decision making but throughout the entire process. In addition, fairness can also be easily incorporated, as domain experts can influence the extracted features by nulling some data items. The singular criterion of error resolution can be extended to incorporate some fairness criteria to achieve a more balanced objective.
3.3 Queuing Delay
The expected queueing delay of a patient record is proportional to the expected total cost for different algorithms. For FOG, the set of features to be extracted remains the same, i.e., \( \mathcal {N}(t) = [N] \) regardless of \( t \). This means that the expected total cost will be \( \sum _{j \in \mathcal {D}_{\sf {fog}}} \alpha _j \). On the other hand, the VOG algorithm has an expected total cost \( \mathbb {E} [ \sum _{t=1}^{T} \Vert c_{\mathcal {N}(t-1),\tilde{D}(t)} \Vert _{1}] \), which is difficult to analyze but we can show that it is superior to FOG for the same set of selected reports. Intuitively, the adaptive algorithm prioritizes the reports that, on average, generate more useful (unknown) information while incurring less cost. Hence, for a given amount of budget for the total cost, more records can be processed, thus reducing the queueing delay. This aspect will be further evaluated in the experiment.
4 ONLINE ALGORITHMS
The online setting is arguably a more important and challenging use case, because an accurate estimate of the report “quality” may not always be available a priori. In fact, such issues are not limited to AI algorithms—senior data entry clerks often know better from their experiences which reports or sources are more likely to provide the needed information than junior clerks, who have to gain such knowledge by trial and error. This aspect can be tackled by a new online learning-based control unit design, which is the focus of this section.
As it is clear from Algorithms 1 and 2, only the effective existence probability \( \tilde{p}_{i,j} \) impacts the algorithms. Hence, without loss of generality and to simplify the notation, we ignore \( \epsilon \) and only focus on estimating \( P \). In addition, the cost distribution \( A \) needs to be learned online as well for the VOG algorithm (it is not needed in FOG).
The online algorithm is designed as a clean-slate solution with no prior knowledge of the quality and processing cost of the records, so that the learner determines policies over multiple episodes. In each episode, it selects \( T \) reports dictated by a policy and observes the realizations of the reports that are selected as feedback. Decisions of the control engine are based on the feedback that it has received in previous episodes.
4.1 Online FOG
It is possible to have a direct estimate of the unknown \( p_{i,j} \) from the selected reports and collected data items. However, since future decisions are based on these estimates, the algorithm needs to account for the estimation uncertainty. For simplicity, we consider the cost fixed and be the same for all reports. With this assumption, we note that the fixed-order online problem can be considered a particular instance of the probabilistic maximum coverage problem described in [4]. By applying the principle of [4] to Algorithm 1, we have the fixed-order online greedy record processing in Algorithm 3.

For each element of the Bernoulli distribution matrix, Algorithm 3 maintains an average of the previous observations (\( \hat{p}_{i,j} \)) and an exploration bonus (\( \rho _{i,j} \)) based on the number of such observations (\( K_{i,j} \)). In each episode, the exploration bonuses are added to the averages to obtain an “optimistic” estimate of \( P \). Then, Algorithm 1 is run with these optimistic estimates as input to determine a policy.
4.2 Online VOG
The online problem can be viewed as a variation of the adaptive submodular maximization in a bandit setting, as discussed in [9]. The difference is that [9] deals with binary or categorical realizations for items, while our work handles vectorial realizations of reports. We adapt the OASM algorithm in [9] as Algorithm 4. Differently from Algorithm 3, this algorithm also maintains averages for reciprocal costs, that is \( \beta _{i,j}\doteq 1/\alpha _{i,j} \). For simplicity, we assume that all costs are lower bounded by 1, but our results can be generalized to any positive lower bound on costs. This assumption guarantees that \( \beta _{i,j}\in (0,1] \).

5 PERFORMANCE ANALYSIS
The performance study in this section is carried out to provide insight into the proposed algorithms. In particular, we see that different principles (monotone submodularity for FOG and VOG, and upper confidence bound for online algorithms) that exist in the algorithms help to bound their performances.
5.1 FOG and VOG
For a deterministic set of costs, which could be different across reports, the analysis is a simple application of the budgeted maximum coverage problem in [13], and we arrive at the following result leveraging monotone submodularity.
\( \mathcal {D}_{\sf {fog}} \) obtained from Algorithm 1 achieves: (8) \( \begin{equation} F(\mathcal {D}_{\sf {fog}}) \ge \left(1-\frac{1}{e} \right) \max _{\tilde{\mathcal {D}} \subseteq \mathcal {D}: |\tilde{\mathcal {D}} | \le T} F(\tilde{\mathcal {D}}). \end{equation} \)
This is an application of the well-known \( (1-1/e) \)-approximation [8] once we verify that function \( F_n(\cdot) \) is monotone submodular, which is straightforward using the definition.□
The analysis for Algorithm 2 is much more involved due to the simultaneous impacts of sequential rewards and costs. If we assume the same constant cost for all reports, we have Theorem 2.
The expected reward \( f_{\sf {avg}}(\pi) \) of Algorithm 2 achieves: (9) \( \begin{equation} f_{\sf {avg}}(\pi) \ge \left(1-\frac{1}{e}\right)f_{\sf {avg}}(\pi ^*), \end{equation} \) where \( \pi ^* \) is the optimal adaptive policy and all costs are constant and the same.
We first define two operations on policies. The policy truncation \( \pi _{[i]} \) denotes a new policy obtained by running policy \( \pi \) for at most \( i \) steps. The policy concatenation \( \pi _1 @ \pi _2 \) denotes a new policy obtained from first running \( \pi _1 \) to its finish and then running \( \pi _2 \) without considering the realizations collected from \( \pi _1 \), i.e., running \( \pi _2 \) from a fresh start. If \( \pi _2 \) chooses a report \( d \) that has been examined previously by \( \pi _1 \), it gets the same realization \( x_{{\cdot }, d} \) as before.
For the optimal adaptive policy \( \pi ^* \) that is allowed to run \( T \) steps, we have the following inequality for the greedy policy \( \pi \): (10) \( \begin{align} f_{\sf {avg}}(\pi ^*) &\le f_{\sf {avg}}(\pi _{[i]} @ \pi ^*) \\ & \le f_{\sf {avg}}(\pi _{[i]}) + T\left(f_{\sf {avg}}(\pi _{[i]} @ \pi ^*_{[1]}) - f_{\sf {avg}}(\pi _{[i]})\right) \\ & \le f_{\sf {avg}}(\pi _{[i]}) + T\left(f_{\sf {avg}}(\pi _{[i+1]}) - f_{\sf {avg}}(\pi _{[i]})\right). \end{align} \) The first inequality follows from the monotonicity of \( \sigma \): examining more reports never decreases the reward. The second inequality is due to the submodularity of \( \sigma \). The third inequality holds because \( \pi \) is a greedy policy.
Let us define \( \delta _i \doteq f_{\sf {avg}}(\pi ^*) - f_{\sf {avg}}(\pi _{[i]}) \). After rearranging the terms in the above inequality, we obtain \( \delta _{i+1} \le (1 - \frac{1}{T}) \delta _{i} \). Without loss of generality, we assume the greedy policy is also allowed to run \( T \) steps. In this case, we have \( \delta _T \le (1 - \frac{1}{T})^T \delta _{0} \lt (1-\frac{1}{e})\delta _{0} \), where for this last inequality we have used the fact that \( 1 - x \lt e^{-x} \) for all \( x \gt 0 \). Further rearranging the terms proves Theorem 2.□
5.2 Online Learning
The online algorithms estimate \( P \) and \( A \) while simultaneously selecting reports, which can lead to suboptimal performance. The performance is measured by comparing its cumulative reward with the cumulative reward that could have been achieved by the algorithms with full knowledge of \( P \) and \( A \) from the very beginning of the process, and define the expected difference between the two quantities as regret. For the online FOG case, the regret up to episode \( \tilde{T} \) can be written as \( \begin{align*} \text{Reg}_{\sf {fog}}(\tilde{T}) = \tilde{T}\left(1-\frac{1}{e}\right)\max _{\tilde{\mathcal {D}}}F(\tilde{\mathcal {D}}) - \mathbb {E}\left[ \sum _{\tau =1}^{\tilde{T}}F(\mathcal {D}_{\tau }) \right] ~, \end{align*} \) where \( \mathcal {D}_{\tau } \) is the policy determined by the learner at episode \( \tau \). For the adaptive case, the regret up to episode \( \tilde{T} \) can be written as \( \begin{align*} \text{Reg}_a(\tilde{T}) = \tilde{T}f_{\sf {avg}}(\pi ^g) - \mathbb {E}\left[ \sum _{\tau =1}^{\tilde{T}}f_{\sf {avg}}(\pi _{\tau }) \right] ~, \end{align*} \) where \( \pi ^g \) is the greedy policy given in Algorithm 2 and \( \pi _{\tau } \) is the policy determined by the learner at episode \( \tau \).
Theorem 3 bounds the regret of Algorithm 3.
The regret of Algorithm 3 up to episode \( \tilde{T} \) is bounded as \( \begin{align*} \text{Reg}_{\sf {fog}}(\tilde{T}) \le \sum _{ \begin{array}{c} i\in [N],-PLXBCR- j\in [D]:-PLXBCR- \Delta _{\min }^{i,j}\gt 0\end{array} } \frac{12N^2D^2\log \tilde{T}}{\Delta _{\min }^{i,j}} + \left(1+\frac{\pi ^2}{3} \right)ND\Delta _{\max } ~, \end{align*} \) where \( \Delta _{\min }^{i,j}=(1-1/e)\max _{\tilde{\mathcal {D}}}F(\tilde{\mathcal {D}}) - \max _{\tilde{\mathcal {D}}:j\in \tilde{\mathcal {D}}}F(\tilde{\mathcal {D}}) \) and \( \Delta _{\max }=(1-1/e)\max _{\tilde{\mathcal {D}}}F(\tilde{\mathcal {D}}) - \min _{\tilde{\mathcal {D}}}F(\tilde{\mathcal {D}}) \).
The proof of Theorem 3 can be derived by noting that it is a special case of Theorem 1 in [4]. For more details, see Section 4.2 in [4], which describes how Theorem 1 is applied to the PMC problem.
Theorem 4 bounds the adaptive regret of Algorithm 4.
The adaptive regret of Algorithm 4 up to episode \( \tilde{T} \) is bounded as \( \begin{align*} \text{Reg}_a(\tilde{T}) \le N^2T\sum _{j=1}^D\ell _j+\frac{1}{3}N^2D(D+1)T ~, \end{align*} \) where \( \Delta _j=\min _{\psi :j\ne \pi ^g(\psi)}(\bar{\Delta }(\pi ^g(\psi)|\psi)-\bar{\Delta }(j|\psi)) \) and \( \ell _j=\lceil 6(N+N^2)^2\log \tilde{T}/\Delta _j^2\rceil \).
The proof of Theorem 4 is technically very complicated and is deferred to Appendix A.
6 EXPERIMENTS
6.1 Setup
In the experiment we have new patient records (e.g., the national, local, and other feeds in Scottish Cancer Registry [15]) arrive daily following a Poisson arrival process with rate \( \lambda \) (patients per day).2 The records are generated, stored, and processed as described in Section 2. If the sum of the cost in one day reaches a predefined budget (8 working hours), the processor stops the action for that day.3 The remaining reports in the queue will wait to be processed in subsequent days.
The control unit in the data entry processor employs the algorithms detailed in previous sections to determine the report processing order. In addition, we allow for an early exit when the successfully collected features exceed a predefined threshold \( \eta \). This is a meaningful setting and widely adopted in practice, particularly for varying reports, because not all data may be available for each patient. We assume that the processor can examine at most \( T \) reports for each patient. If the number of registered features does not reach \( \eta \), this particular processing attempt is marked as a failure.
As discussed before, the registered features may contain errors. We randomly generate the error probability \( \epsilon _d \) from a uniform distribution in \( [0,1] \). In the constant cost case, the cost for examining each report \( \alpha _d \) is fixed as one while in the varying cost case it is generated from an Exponential distribution with mean one. When the VOG algorithm makes use of the human-in-the-loop error resolution, it will incur 20% additional (human) cost, which is obtained from practice [15].
Whether a feature is registered from a report or not depends on the underlying probability matrix \( P \). We generate the matrix \( P \) in a way that only a small number of entries in each column is non-zero. Hence, only a small set of features is likely to be registered from a particular type of report. This is a reasonable assumption because medical reports fall into different “topics” and each topic may contain very different information. For example, medical test reports are likely to contain bio-markers but they are very unlikely to cover treatment plans or diagnosis codes. We also make sure that the features have a certain level of overlapping across reports. This situation is very common in reality. For example, gender and age information is present in almost all medical reports.
6.2 Baselines
We compare the proposed FOG algorithm and VOG algorithm with two baseline approaches. The first baseline (Random) goes through the reports in a random order. The algorithm stops when either \( \eta \) features have been registered or it has gone through all \( T \) reports. This baseline mimics a real-world case where inexperienced data entry staff (possibly without rigorous training) randomly process the incoming patient reports.
The second baseline algorithm (Naive Order) is designed to resemble a one-size-fits-all report selection guideline. This baseline mimics a real-world situation where a detailed, step-by-step data entry processing procedure is provided to the staff and they are asked to strictly follow the documented instructions. It orders reports by the expected number of features per unit cost i.e., \( \sum _i P_{ij} / \alpha _j \). The algorithm does not adapt to the partial realization \( \psi (t) \) obtained during processing.
6.3 Main Results
Our experiments are based on an anonymized dataset collected from the UK national cancer registry. The dataset contains the detailed information about the operation of the cancer registry including (1) each patient record’s arrival time and processing time, (2) the number of fields (features) successfully extracted from each report, and (3) if the patient is deemed successfully registered. The dataset covers the period spanning five years from 2014 to 2018. Due to the highly sensitive nature of the dataset, we extracted the key statistics which faithfully represent the real-world cancer registry (shown in Table 2). Based on these statistics, we perform two representative experiments, and the results are reported in this section. Table 2 lists the configuration for FOG and VOG adaptive algorithms, while Table 3 is for online algorithms. Both settings closely mirror the real scenario. In the first experiment, a total of 60,000 patients are added to the queue at an average rate of 2,000 patients per day. We evaluate the performance under both constant cost and varying random cost. This simulation is run independently for 20 times.
Table 2. Basic Experiment Configuration
In Table 4, we present the results averaged across 20 independent runs together with the standard deviations. We measure the processing capacity for various algorithms as the average number of successfully registered patients in a day. We are also interested to know how long a patient’s record has to wait before getting processed. This is reflected in the metric wait time (days). Last but not least, the metric average cost tracks the average time (minutes) for processing each patient. A successful algorithm should have a high capacity, low average cost, and low waiting time.
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
Table 4. Summary of the First Experiment Result
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
The VOG algorithm achieves the best performance among all the algorithms being studied even when it takes 20% more cost due to human error resolution. Compared with the second-best algorithm, FOG, the processing capacity for VOG is increased by 3 to 4 folds. We see a similar order of reduction in its average cost and waiting time. This confirms our proposition that an efficient algorithm should adapt to the previous realizations and update the policy accordingly.
Turning to a comparison of the constant and varying cost cases, we see that the AI-powered algorithms do even better in the varying cost cases. The algorithms are able to exploit the fact that some reports have lower cost and higher marginal values than others.
The first experiment does not exhibit strong heterogeneity; all the patient records have the same number of reports. In reality, as discussed earlier in the article, patient reports often vary significantly. For example, if the physical condition of a particular patient does not allow for invasive medical tests, the corresponding reports will not be available. In the second experiment, we focus on the record heterogeneity and only allow a random subset of reports to be available for each patient. The cardinality of the random subset is assumed to be uniformly distributed between \( D_{min} = 64 \) and \( D_{max} = 128 \) so that at most \( 50\% \) of reports are unavailable for a given patient record. Table 5 presents the experiment results. We observe that VOG again achieves top performance, mainly because it can react to the adversarial situation where the reports could be randomly missing.
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
Table 5. Summary of the Experiment Result When at Most 50% of Reports are Unavailable
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
6.4 Performance of Online Algorithms
Online FOG and VOG algorithms are similarly evaluated in the experiment. Figure 3 illustrates how the processing capacity of online algorithms grows over time as their estimate of \( P \) becomes better. We see that the online algorithms gradually converge to their corresponding benchmarks. Furthermore, the relatively slow convergence shown in this experiment highlights the need for prior knowledge. In reality, the statistical estimate does not necessarily come from the online operation. Leveraging the data entry systems from other similar tasks may lead to a good initial estimate, which can simplify the online operation and speed up the convergence.
Fig. 3. The daily processing capacity for online and benchmark algorithms. Benchmark algorithms are given the true \( P \) .
6.5 Impact of Human Error Resolution
In this section, we compare the performance of the adaptive algorithm with and without human-in-the-loop error resolution. When the algorithm decides to register a report \( d \), it will get a realization \( x_{d} \). However, this realization will be subject to error. The features in \( x_{d} \) might be invalid for a probability \( \epsilon _d \). If no error correction is made, the algorithm will take the potential erroneous realization as the truth and proceed as usual. As a result, the features it registered may contain errors. When a data entry clerk performs error correction, all invalid features in \( x_{d} \) will be removed, and the algorithm will be presented with the correct information only. However, this process will incur additional costs.
When evaluating whether a patient is successfully registered, we will remove all the invalid features in the data registry. If the number of registered features still exceeds the threshold \( \eta \), we mark it as a success. We calculate the average success rate in addition to the capacity and average cost defined previously.
The performance summary is given in Table 6. Without error correction, the algorithm tends to terminate prematurely because it is over-optimistic about the number of features registered so far. This means that the algorithm can process more patient records but at the cost of a lower success rate. However, even considering both effects, the algorithm with error resolution still outperforms the one without in terms of the number of daily successful registration.
| Error Correction | Capacity | Average Cost | Success Rate |
|---|---|---|---|
| Yes | 63.0 \( \pm \) 0.3 | 9.2 \( \pm \) 0.03 | 95.3% \( \pm \) 0.2% |
| No | 46.2 \( \pm \) 0.5 | 7.1 \( \pm \) 0.02 | 54.0% \( \pm \) 0.4% |
Average Cost is measured in Minutes.
Table 6. Performance Comparison of Adaptive Algorithms with and without Human Involvement
Average Cost is measured in Minutes.
6.6 Experiment Results with Missing Reports
In practice, not all types of reports will be available for each patient. For example, if the physical condition of the patient does not allow for an invasive medical test, the test reports will not be acquired. Here, we test how well the algorithms adapt to the unavailable report situation. We assume that only a random subset of reports are available for each patient. The cardinality of the random subset is assumed to be uniformly distributed between \( D_{min} \) and \( D_{max} \). In this simulation at most \( 50\% \) of reports are unavailable for some patients. Table 7 reports the performance. VOG achieves top performance again because it can react to the adversarial situation where the reports are missing by design.
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
Table 7. Summary of the Experiment Result when at Most 50% of Reports are Unavailable
Average Cost is measured in Minutes. Wait Time is measured in Days. The bold values indicate the best performance.
7 RELATED WORKS
Information processing from data is an active research area. For example, how to extract useful features from big data has been extensively studied [10, 14, 20]. Handling “big data” becomes very important in recent years due to the explosion of generated data [12, 18]. In particular, processing unstructured medical reports using NLP is relevant to our work and has seen significant advances over the past decade. The methods have evolved from the bag-of-words models [5] to more sophisticated deep-learning-based models [21] [6]. Validation studies [22] [11] have shown that NLP-based data registration tools have achieved human-level performance on certain tasks. Our work is different from these information processing techniques in that the focus is on automated decision making with potential human expert involvement. In other words, these specific information processing techniques are complimentary to MARS.
The implication of ML and AI on job creation, losses, and transfers have been among the most debated topics due to its significant societal and economical ramifications. The authors of [2] discuss the key implications of AI for the workforce by drawing on the rubric of AI capabilities. It is recognized that although the capabilities of ML systems have had an impressive growth over the past decade, it is still far from being suitable for all tasks [23]. The proposed MARS framework is a concrete example of the philosophy to integrate AI and ML to assist humans to complete tasks.
The MARS framework is also broadly related to the human-in-the-loop machine learning (HITL-ML) [24] and human-machine collaboration. The main principle of HITL-ML is to let humans track changes and intermediate results of the iterative process in ML, thus accelerating the iteration and providing quick responsive feedback. This design principle has led to successful applications in image classification [26] and text analytics [25]. Human-machine collaboration [19] has been discussed in the application of object detection [17], where CV models and multiple human inputs are integrated to generate object annotations. However, the focus of this line of research is on reducing the iteration delay and improving the performance of ML workflows, while our work focuses on the close collaboration between humans and AI.
8 CONCLUSIONS
We have shown that with today’s ML capabilities, especially in the online learning setting, it is possible to automate the complex triaging tasks and allow machines to assist humans with “high intelligence” jobs. The MARS framework was proposed using information processing in data entry systems as the particular example, which improves the efficiency and quality of sophisticated report selection tasks (such as cancer registration) that involve heterogeneous records which can be highly unstructured with erroneous or inconsistent data items. The machine intelligence is principled (no “black box”) – they rely on the monotone and submodular properties of the success probabilities as well as upper confidence bound in the online setting. The performance improvement of the proposed MARS framework was evaluated through experiments using synthetic data that mimic the real-world cancer patient records. Probably more importantly, we demonstrated that MARS not only improves the overall system performance but achieves a balanced human-machine interaction that could have important social benefit.
Appendix
A Proof of Theorem 4
First, we re-write the adaptive regret as (11) \( \begin{align} \text{Reg}_a(\tilde{T}) &= \mathbb {E}\left[\sum _{\tau =1}^{\tilde{T}}\left(f_{avg}(\pi ^g)-f_{avg}(\pi _{\tau }) \right)\right] \nonumber \nonumber\\ &\le \mathbb {E}\left[\sum _{\tau =1}^{\tilde{T}}\sum _{j=1}^D\sum _{t=1}^T\mathbb {I}_{j,t,\tau }\left(f_{avg} \left(\pi ^g\right)-f_{avg}(\pi _{\tau }) \right)\right] \nonumber \nonumber\\ &\le \mathbb {E}\left[N\sum _{\tau =1}^{\tilde{T}}\sum _{j=1}^D\sum _{t=1}^T\mathbb {I}_{j,t,\tau } \times \mathbb {I}\lbrace \exists i\in \mathcal {N}_{\tau }(t-1):K_{i,j}(\tau)\le \ell _j\rbrace \right] \end{align} \) (12) \( \begin{align} &+ \mathbb {E}\left[N\sum _{\tau =1}^{\tilde{T}}\sum _{j=1}^D\sum _{t=1}^T\mathbb {I}_{j,t,\tau } \times \mathbb {I}\lbrace \forall i\in \mathcal {N}_{\tau }(t-1),K_{i,j}(\tau)\gt \ell _j\rbrace \right], \end{align} \) where \( \mathbb {I}_{j,t,\tau } \) is the indicator of the event that policy \( \pi _{\tau } \) selects report \( j \) in time step \( t \) in episode \( \tau \) whereas \( \pi ^g \) would have selected another report, and \( \mathcal {N}_\tau (t) \) denotes the unknown features at the end of time step \( t \) in episode \( \tau \).
Then, Equation (13) can be bounded trivially as \( \begin{align*} (11) \le N^2T\sum _{j=1}^D\ell _j ~. \end{align*} \)
In order to bound Equation (12), we need to introduce some new notation. Let \( \Psi _{j,t} \) denote the set of realizations \( \psi \) such that \( \mathbb {I}_{j,t,\tau }=1 \) if \( \psi _{\tau }(t-1)=\psi \), where \( \psi _{\tau }(t) \) denotes the realization observed at the end of time step \( t \) in episode \( \tau \). Given a realization \( \psi \), let \( \mathcal {N}\langle \psi \rangle \) be the corresponding set of features that are still unknown and \( j^*\langle \psi \rangle \) be the report that would be selected by \( \pi ^g \). Finally, let \( \begin{align*} f(p_{\cdot j},\beta _{\cdot j}|\psi) \doteq \frac{\sum _{i\in \mathcal {N}\langle \psi \rangle } p_{i,j}}{\sum _{i\in \mathcal {N}\langle \psi \rangle } 1/\beta _{i,j}} ~. \end{align*} \)
Note that \( f(p_{\cdot j},\beta _{\cdot j}|\psi) \) is monotonic, meaning \( f(p_{\cdot j},\beta _{\cdot j}|\psi)\lt f(p^{\prime }_{\cdot j},\beta ^{\prime }_{\cdot j}|\psi) \) if \( p_{i,j}\lt p^{\prime }_{i,j} \) and \( \beta _{i,j}\lt \beta ^{\prime }_{i,j} \) for all \( i\in [N] \). It is also Lipschitz continuous, meaning \( \begin{equation*} |f(p_{\cdot j},\beta _{\cdot j}|\psi)-f(p^{\prime }_{\cdot j},\beta ^{\prime }_{\cdot j}|\psi)| \le (N+N^2)\max _{i\in \mathcal {N}\langle \psi \rangle }\max \left\lbrace |p_{i,j}-p^{\prime }_{i,j}|,|\beta _{i,j}-\beta ^{\prime }_{i,j}| \right\rbrace . \end{equation*} \) Then, for all reports \( j \) and time steps \( t \), we have \( \begin{align*} \sum _{\tau =1}^{\tilde{T}} \mathbb {I}_{j,t,\tau }\mathbb {I}\lbrace \forall i\in \mathcal {N}_{\tau },K_{i,j}(\tau)\gt \ell _j\rbrace & \le \sum _{\tau =\ell _j+1}^{\tilde{T}} \mathbb {I}\lbrace \exists \psi \in \Psi _{j,t}: f(\bar{p}_{\cdot j}(\tau),\bar{\beta }_{\cdot j}(\tau)|\psi) \\ & \ge f(\bar{p}_{\cdot j^*\langle \psi \rangle }(\tau),\bar{\beta }_{\cdot j^*\langle \psi \rangle }(\tau)|\psi) \wedge \forall i\in \mathcal {N}\langle \psi \rangle , K_{i,j}(\tau)\gt \ell _j\rbrace . \end{align*} \)
Note that the event we are left with implies that, for some \( \psi \in \Psi _{j,t} \), at least one of the following events must be true: (13) \( \begin{align} \exists i\in \mathcal {N}\langle \psi \rangle : \hat{p}_{i,j^*\langle \psi \rangle }(\tau) &\le p_{i,j^*\langle \psi \rangle }(\tau)-\rho _{i,j^*\langle \psi \rangle }(\tau) \nonumber \nonumber\\ \exists i\in \mathcal {N}\langle \psi \rangle : \hat{\beta }_{i,j^*\langle \psi \rangle }(\tau) &\le \beta _{i,j^*\langle \psi \rangle }(\tau)-\rho _{i,j^*\langle \psi \rangle }(\tau) \nonumber \nonumber\\ \exists i\in \mathcal {N}\langle \psi \rangle : \hat{p}_{i,j}(\tau) &\ge p_{i,j}(\tau)+\rho _{i,j}(\tau) \nonumber \nonumber\\ \exists i\in \mathcal {N}\langle \psi \rangle : \hat{\beta }_{i,j}(\tau) &\ge \beta _{i,j}(\tau)+\rho _{i,j}(\tau) \nonumber \nonumber\\ f(p_{\cdot j^*\langle \psi \rangle }(\tau),\beta _{\cdot j^*\langle \psi \rangle }(\tau)|\psi) &\lt f(p_{\cdot j}(\tau),\beta _{\cdot j}(\tau)|\psi) + 2(N+N^2)\sqrt {\frac{3\log \tau }{2\ell _j}} ~. \end{align} \) In order to see why, assume all these events except Equation (15) is false. Then, we have (14) \( \begin{align} f(p_{\cdot j^*\langle \psi \rangle }(\tau),\beta _{\cdot j^*\langle \psi \rangle }(\tau)|\psi) & \lt f(\bar{p}_{\cdot j^*\langle \psi \rangle }(\tau),\bar{\beta }_{\cdot j^*\langle \psi \rangle }(\tau)|\psi) \end{align} \) (15) \( \begin{align} &\le f(\bar{p}_{\cdot j}(\tau),\bar{\beta }_{\cdot j}(\tau)|\psi) \nonumber \nonumber\\ &\le f(p_{\cdot j}(\tau),\beta _{\cdot j}(\tau)|\psi)+(N+N^2)\max _{i\in \mathcal {N}\langle \psi \rangle }2\rho _{i,j}(\tau) \\ &\le f(p_{\cdot j}(\tau),\beta _{\cdot j}(\tau)|\psi)+2(N+N^2)\sqrt {\frac{3\log \tau }{2\ell _j}} \nonumber \nonumber ~, \end{align} \) where Equation (14) is due to the monotonicity of \( f \) and Equation (15) is due to the Lipschitz continuity of \( f \). The probabilities of all these events except Equation (13) can be bounded using the Hoeffding’s inequality. We have \( \begin{align*} &\mathbb {P}\left(\exists \psi \in \Psi _{j,t}, \exists i\in \mathcal {N}\langle \psi \rangle : \hat{p}_{i,j^*\langle \psi \rangle }(\tau) \le p_{i,j^*\langle \psi \rangle }(\tau)-\rho _{i,j^*\langle \psi \rangle }(\tau) \right) \\ &\le \sum _{i=1}^{N}\sum _{j^*=1}^D\sum _{k=1}^{\tau } \mathbb {P}\left(\hat{p}_{i,j^*}(\tau) \le p_{i,j^*}(\tau)-\rho _{i,j^*}(\tau)|K_{i,j^*}(\tau)=k \right) \\ &\le \sum _{i=1}^{N}\sum _{j^*=1}^{D}\sum _{k=1}^{\tau } \tau ^{-3} \\ &\le ND\tau ^{-2} \end{align*} \) and \( \begin{align*} & \mathbb {P} \left(\exists \psi \in \Psi _{j,t}, \exists i\in \mathcal {N}\langle \psi \rangle : \hat{p}_{i,j}(\tau) \ge p_{i,j}(\tau)+\rho _{i,j}(\tau) \right) \\ &\le \sum _{i=1}^{N}\sum _{k=1}^{\tau } \mathbb {P}\left(\hat{p}_{i,j}(\tau) \ge p_{i,j}(\tau)+\rho _{i,j}(\tau)|K_{i,j}(\tau)=k\right) \\ &\le \sum _{i=1}^{N}\sum _{k=1}^{\tau } \tau ^{-3} \\ &\le N\tau ^{-2} ~. \end{align*} \) The remaining two events can be bounded in the same way.
When \( \ell _j=\lceil 6(N+N^2)^2\log \tilde{T}/\Delta _j^2\rceil \), the event is given in Equation (13) is not possible at all. For all \( \psi \in \Psi _{j,t} \), we have \( \begin{align*} &f(p_{\cdot j^*\langle \psi \rangle }(\tau),\beta _{\cdot j^*\langle \psi \rangle }(\tau)|\psi) - f(p_{\cdot j}(\tau),\beta _{\cdot j}(\tau)|\psi) -2(N+N^2)\sqrt {\frac{3\log \tau }{2\ell _j}} \\ &= \bar{\Delta }(j^*\langle \psi \rangle |\psi)-\bar{\Delta }(j|\psi)-2(N+N^2)\sqrt {\frac{3\log \tau }{2\ell _j}} \\ &\ge \bar{\Delta }(j^*\langle \psi \rangle |\psi)-\bar{\Delta }(j|\psi)-\Delta _j \\ &\ge 0 ~. \end{align*} \)
Then, Equation (12) can simply be bounded as \( \begin{align*} (14) &\le \sum _{j=D}\sum _{t=1}\sum _{\tau =1}^{\infty } 2N(1+D)\tau ^{-2} \\ &\le \frac{1}{3}\pi ^2ND(D+1)T ~. \end{align*} \) This completes the proof of Theorem 4.
Footnotes
1 The human expert involvement for addressing error resolution is the current state-of-the-art to the best of the authors’ knowledge. It is, however, conceivable that this task can be automated with advanced ML and AI techniques in the future, which is itself an interesting research question that is worth investigation.
Footnote2 For example, the national cancer registry in the United Kingdom receives approximately 6,000 to 10,000 records per week for 40 tumour types.
Footnote3 In theory, computers do not have to be bounded by the limited budget. However, since the MARS framework have a human component and in order to have a fair comparison, we have enforced the same budget for different methods.
Footnote
- [1] . 2015. Creativity vs. robots. The Creative Economy and The Future of Employment. Nesta, London (2015).Google Scholar
- [2] . 2017. What can machine learning do? Workforce implications. Science 358, 6370 (2017), 1530–1534.Google Scholar
Cross Ref
- [3] . 2019. Bridging near- and long-term concerns about AI. Nature Machine Intelligence 1, 1 (2019), 5–6.Google Scholar
Cross Ref
- [4] . 2013. Combinatorial multi-armed bandit: General framework and applications. In Proceedings of the 30th International Conference on Machine Learning. 151–159.Google Scholar
Digital Library
- [5] . 2013. Knowledge extraction and outcome prediction using medical notes. In Proceedings of the ICML Workshop on Role of Machine Learning in Transforming Healthcare.Google Scholar
- [6] . 2019. A guide to deep learning in healthcare. Nature Medicine 25, 1 (2019), 24.Google Scholar
Cross Ref
- [7] . 2017. The future of employment: How susceptible are jobs to computerisation? Technological Forecasting and Social Change 114 (2017), 254–280. Google Scholar
Cross Ref
- [8] . 2005. Submodular Functions and Optimization. Elsevier.Google Scholar
- [9] . 2013. Adaptive submodular maximization in bandit setting. In Proceedings of the Advances in Neural Information Processing Systems 26, , , , , and (Eds.). Curran Associates, Inc., 2697–2705.Google Scholar
- [10] . 2009. Information theoretic feature extraction for audio-visual speech recognition. IEEE Transactions on Signal Processing 57, 12 (2009), 4765–4776.Google Scholar
Digital Library
- [11] . 2013. Validating a natural language processing tool to exclude psychogenic nonepileptic seizures in electronic medical record-based epilepsy research. Epilepsy & Behavior 29, 3 (2013), 578–580.Google Scholar
Cross Ref
- [12] . 2017. Signal Processing and Networking for Big Data Applications. Cambridge University Press.Google Scholar
Digital Library
- [13] . 1999. The budgeted maximum coverage problem. Information Processing Letters 70, 1 (1999), 39–45.Google Scholar
Digital Library
- [14] . 2019. Quaternion-based multiscale analysis for feature extraction of hyperspectral images. IEEE Transactions on Signal Processing 67, 6 (2019), 1418–1430.Google Scholar
Digital Library
- [15] . 2019. Scottish Cancer Registry. Retrieved from https://www.isdscotland.org/Health-Topics/Cancer/Scottish-Cancer-Registry/.
[Online; accessed: 2020-02-06] .Google Scholar - [16] . 2019. Machine learning in medicine. New England Journal of Medicine 380, 14 (2019), 1347–1358.Google Scholar
Cross Ref
- [17] . 2015. Best of both worlds: Human-machine collaboration for object annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [18] . 2014. Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure. IEEE Signal Processing Magazine 31, 5 (2014), 80–90.Google Scholar
Cross Ref
- [19] . 2019. Machines as teammates: A research agenda on AI in team collaboration. Information & Management 57, 2 (2019), 103174.Google Scholar
- [20] . 2002. Fast principal component extraction by a weighted information criterion. IEEE Transactions on Signal Processing 50, 8 (2002), 1994–2002.Google Scholar
Digital Library
- [21] . 2017. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE Journal of Biomedical and Health Informatics 22, 5 (2017), 1589–1604.Google Scholar
Cross Ref
- [22] . 2006. Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration. Population Health Metrics 4, 1 (2006), 10.Google Scholar
Cross Ref
- [23] . 2018. The impact of artificial intelligence on work. Retrieved from https://royalsociety.org/topics-policy/projects/ai-and-work/.
Accessed: 2020-06-01 .Google Scholar - [24] . 2018. Accelerating human-in-the-loop machine learning: Challenges and opportunities. In Proceedings of the 2nd Workshop on Data Management for End-To-End Machine Learning. ACM, New York, NY, 4 pages.
DOI: Google ScholarDigital Library
- [25] . 2019. A study on interaction in human-in-the-loop machine learning for text analytics. In Joint Proceedings of the ACM IUI 2019 Workshops – Explainable Smart Systems.Google Scholar
- [26] . 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop.
arXiv:1506.03365 . Retrieved from https://arxiv.org/abs/1506.03365.Google Scholar - [27] . 2019. Viewpoint: Human-in-the-loop artificial intelligence. Journal of Artificial Intelligence Research 64, 1 (2019), 243–252.Google Scholar
Digital Library
Index Terms
(auto-classified)MARS: Assisting Human with Information Processing Tasks Using Machine Learning
Recommendations
Machine learning task as a diclique extracting task
FSKD'09: Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1As we know there exist several approaches and algorithms for data mining and machine learning task solution, for example, decision tree learning, artificial neural networks, Bayesian learning, instance-based learning, genetic algorithms, etc. They are ...









Comments