Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models

Knowledge workers often need to extract and analyze information from a collection of documents to solve complex information tasks in the workplace, e.g., hiring managers reviewing resumes or analysts assessing risk in contracts. However, foraging for relevant information can become tedious and repetitive over many documents and criteria of interest. We introduce Marco, a mixed-initiative workspace supporting sensemaking over diverse business document collections. Through collection-centric assistance, Marco reduces the cognitive costs of extracting and structuring information, allowing users to prioritize comparative synthesis and decision making processes. Users interactively communicate their information needs to an AI assistant using natural language and compose schemas that provide an overview of a document collection. Findings from a usability study (n=16) demonstrate that when using Marco, users complete sensemaking tasks 16% more quickly, with less effort, and without diminishing accuracy. A design probe with seven domain experts identifies how Marco can benefit various real-world workflows.


INTRODUCTION
Knowledge workers derive insights from information to accomplish complex information tasks, with the intention to solve problems, plan actions, and make decisions.For instance, a business analyst may want to determine the best negotiation strategy given related business contracts, a hiring manager may want to select candidates from a pool of resumes, or a researcher may want to conduct a literature review across multiple scholarly articles.In service of their goals, people need to triage, search through, and make sense of copious information throughout their document collections.These goal-driven processes can be viewed within a sensemaking framework (Figure 2), consisting of two interconnected loops of foraging and sensemaking activities [59,60].Prior work has studied how technology can support this sensemaking process within domains such as exploratory online research [13,14,47,62] or scholarly synthesis [42,43,80].But one area that has received less attention is document-centered assistance within the workplace [38].
To address this gap, we investigate the potential for digital assistance to support information foraging and sensemaking over business document collections.We first conducted formative interviews with 12 knowledge workers across diverse business functions, seeking to understand the current tools, strategies, and pain points within their workflows.We found that despite working with a diversity of documents, e.g., legal or financial contracts, and exhibiting a wide range of goals, participants expressed a common challenge of searching for information across their documents.Current foraging processes were described as tedious and time-consuming, involving manually searching for, extracting, and organizing information into structured representations such as a spreadsheet, before repeating for all subsequent documents.Our findings revealed that participants' predominant pain points focused on information foraging, which often comprised the bulk of their workflow, despite being an intermediary albeit critical step in service of their own specialized sensemaking goals, such as to derive insights or inform decision making.Given the manual nature of foraging processes and the lack of supportive tools, many participants were optimistic about the potential for AI assistance to complement their workflows.
In this work, we propose a collection-centric interaction paradigm in which knowledge workers primarily engage with their documents as a cohesive collection rather than as discrete documents.We reify this vision within a novel interactive system, Marco, that leverages AI assistance to help knowledge workers forage for  View provides a collection-level overview.Actions in the Notebook View encode relevant information within result tables, with one row per document (1).Responses can be verified with in-context highlights within the Document View (2).Results across actions are concatenated into a Table View to support collection-level analysis (3).similar information across many documents, organize gathered information, and synthesize collection-level insights.Marco presents a workspace integrating three views: Notebook View, Table View, and Document View (Figure 1).Within Marco, users build up a notebook of cells containing rich text serving as a note-taking space and actions executed over a document collection (Notebook View).

Table View Notebook View Document View
Actions in Marco allow users to delegate foraging tasks to AI assistance, such that users can instead prioritize their focus on sensemaking tasks (Figure 2).For instance, users can perform a lexical or semantic search across each document in their collection in parallel with a Search action, ask questions of their documents with an Ask action, or summarize documents along desired dimensions with a Summarize action.Actions can be executed over a single document, multiple documents, or by default, the entire collection.Foraged information is organized into a tabular schema, allowing users to inspect specific details within documents and also compare across documents.As actions are executed, Marco joins the foraged information into an aggregate table (Table View), aiding sensemaking.Finally, Marco actively recommends actions tailored to users' specific document collections, defined goals, and past actions, to kickstart or encourage future foraging directions.
Through a controlled usability study and a design probe with domain experts, we sought to answer three research questions: RQ1.How does Marco impact users' performance and experience sensemaking over business document collections?RQ2.How do knowledge workers make use of Marco's features when working with business document collections?RQ3.How do knowledge workers perceive and interact with imprecise AI assistance in Marco?
Findings from our usability study demonstrate the efficacy of Marco when compared to a baseline approach.We found participants completed tasks 16% more quickly using Marco, and self-reported information was easier to find and required less effort to synthesize, with no difference in confidence or accuracy.Interaction logs and qualitative analysis complemented these findings, highlighting how Marco's suite of actions reduced tedium in information foraging processes, supported collection-level analyses, and encouraged verification of imperfect AI assistance.Our subsequent design probe identified opportunities for Marco to accelerate sensemaking within real-world business workflows and suggested additional considerations for supporting users in reconciling with imprecise AI assistance.
In summary, this paper contributes: • Insights from a formative study highlighting challenges people face when working with collections of business documents and four design goals that emerged to inform intelligent sensemaking support tools for business workflows.• Marco, a novel interactive workspace with a suite of natural language, document-centered actions that facilitate foraging and sensemaking over business document collections.• Findings from a usability study (n=16) showing Marco improves efficiency and reduces effort in information-seeking tasks over document collections, and insights from a design probe with domain experts (n=7) suggesting how such support can benefit real-world business document workflows.

Foraging Loop Sensemaking Loop
Use actions to search for information, ask questions, and summarize one or more documents in the collection Early sensemaking support focused on helping people explore collections of online web pages.For example, Scatter/Gather uses clustering to browse large online collections [21], SenseMaker focuses on foraging across heterogeneous web pages [6], and faceted search techniques help filter online content [25,34,67].Some work has introduced approaches to improve the foraging process by reducing the costs of finding and saving relevant information [24,45,51,62,68], while others have focused on assisting in the creation of schemas to encode the foraged information [13][14][15][16]24].
The most benefit to sensemaking has arguably been observed in systems that more closely integrate foraging and sensemaking loops, for example within exploratory online search [14,30,33,56,62,74], programming [48][49][50], mobile information exploration [75], or scholarly literature review [42,58,80].In our work, we contribute to this rich thread of research, but focus on the context of sensemaking over business documents in the workplace, an area in which document-centered assistance has seen limited progress.Furthermore, we demonstrate a novel means of integrating the foraging and sensemaking loops.By automatically "mapping" users' queries across many documents, Marco both helps reduce the costs of foraging and creates the organizational structures that facilitate cross-document comparison.Closing the loop, Marco then uses these structures to suggest subsequent foraging actions.

Document Consumption in the Workplace
Knowledge work can consist of a variety of professions and workoriented goals.An underlying characteristic of these workers is high expertise in tacit or declarative knowledge (knowing the facts of the work) and procedural knowledge (knowing how to do the work) [35].In the space of document processes, several works have sought to understand knowledge workers' practices [1,66].An indepth diary study with knowledge workers from diverse functions estimated document-related activities accounted for 82% of users' working time [1].Russell et al. observed that data extraction and encoding from documents accounted for 75% of analysts' time when working with document collections [66].A more recent survey reported knowledge workers spend anywhere from 1-3 hours simply attempting to locate information or a specific document [79].
Toward addressing these challenges, Jahanbakshsh et al. characterized the various types of document-centric assistance that could support knowledge workers.They found that needs over documents varied for different document types, and included factual, reasoning, and overview types of questions [38].Recent work has started to identify opportunities for AI assistance in document processes, for instance through Q&A systems [38,76] and general-purpose LLM-powered tools [12,22].However, these works indicate that adoption and use of automation tools for document-centered assistance have remained limited [38].Moreover, a case study with knowledge workers and AI assistants reported a learning curve for users to teach AI assistance their tacit knowledge about their working context [27].Marco aims to support knowledge workers' sensemaking activities over document collections by leveraging their domain expertise to complement AI capabilities.
Prior work has explored interactions to support document-centric tasks focusing on specific types of documents, such as legal documents [31,65], scientific documents [32,52,53,58], humanities studies [54], and patents [32,37].These systems make use of common characteristics present in these document types to support domain-specific search and analysis tasks.For instance, Passages supports scientists and patent examiners in managing document provenance and organizing relevant text selections across documents in one view [32].PaperForager helps users conduct a literature review by reducing the cost of transitioning between browsing a collection to reading an individual page of interest [53].In our work, we aim to understand the needs of knowledge workers in different business-related domains and equip Marco with generalpurpose tools that support complex information tasks and analyses over a variety of document types.

Human-LLM Interaction
Large language models (LLMs) have engendered myriad applications to support sensemaking, e.g., within online research [74], scholarly research [43], and argumentative writing [81].In conversational applications such as ChatGPT [57] and Bard [29], LLMs have demonstrated impressive capabilities in answering users' opendomain questions, and be further refined to answer questions given user-provided documents.However, unlike question answering over individual documents [82], the non-linear and dynamic workflows of sensemaking motivate exploration of a novel design space of human-LLM interactions.Recent work has shown LLM-powered applications can go beyond a linear, chat-like interaction paradigm, for instance by transposing text-based responses into flexible graphical representations [40], enabling recursive summarization [43], or supporting multilevel exploration of information [74].
Through Marco, we envision how LLMs can be used to support flexible sensemaking processes over document collections.Marco draws on numerous LLM capabilities, from information extraction for semantic search and multi-document question answering, to recommendations for follow-up foraging directions.Despite their success, one well-known challenge of LLMs is their tendency to hallucinate [7,39,55,70].In the context of knowledge work, users therefore have to calibrate their trust in these models, such as through manual verification of the models' responses.
Guiding principles for mixed-initiative user interfaces in which "intelligent services and users may often collaborate efficiently to achieve the user's goals" were proposed over two decades ago by Horvitz [36], and later modernized into 18 design guidelines for human-AI interaction in AI-infused systems [4].We lean on many of these guidelines to build interactions within Marco which adapt to user context ("Show contextually relevant information"), reduce friction in collaboration between users and intelligent agents ("Support efficient invocation and dismissal"), recover from imperfect AI systems ("Support efficient correction of errors"), and learn from users' interactions over time ("Learn from user behavior").Through the design of Marco, we offer an initial vision of how these principles of human-AI interaction can be adapted to mixed-initiative, LLM-powered systems for document assistance.

UNDERSTANDING BUSINESS DOCUMENT COLLECTION WORKFLOWS
To better understand the challenges people encounter within current document-centered business workflows, we conducted a formative interview study with knowledge workers across various functional areas of business.

Participants
We recruited 12 participants from within a large software organization using purposive sampling (Table 1).Participants spanned diverse functional areas, including finance, procurement, legal, and management, and were employed in sectors across technology, education, and healthcare.All participants were personally responsible for or managed teams whose responsibilities involved reviewing large collections of documents.One participant had less than 5 years of experience, one had 5-10 years of experience, six had 11-20 years of experience, and four had more than 20 years of experience performing tasks involving business documents.Participants were thanked for their time but not compensated.

Procedure and Analysis
We conducted semi-structured interviews, asking participants about the primary tasks they conducted for their role, the types and volume of documents they typically review, their strategies and goals while reviewing, and their current usage or perception of AI assistance for their tasks.Interviews were conducted remotely, recorded, and transcribed.One author went through the transcripts and coded them for themes using an open thematic analysis process [9,10].
The research team then discussed and iterated upon the themes, informing a set of clear design goals.

Findings
Participants described various document collections, documentcentric goals, and information foraging and organizational strategies when working with business documents.Despite this diversity, all participants highlighted a common challenge in how their current workflows lacked the appropriate tools to support repetitive and tedious information foraging needs over many documents.Below, we highlight the main findings of our study.Knowledge workers rely on structured representations to facilitate consistent information foraging and enable comparison across documents.Participants often mentioned workflows in which they sought to synthesize information across multiple business documents to inform decision making.For instance, one procurement specialist who reviewed collections of vendor contracts to optimize future negotiations described needing to "extract information, consolidate, and then make some meaning out of it" (P9).Another financial planning analyst who compiled reports for executives by summarizing patterns from previous earnings calls characterized their workflow as "evaluating a whole bunch of documents . . .connecting the dots" (P1).To reason across documents, participants needed to effectively review and compare documents along multiple dimensions of interest.However, performing this comparison was challenging, as business documents-even those within the same collection, e.g., a set of legal contracts-can vary in length, structure, and content.Current reviews of multiple documents are typically conducted by numerous human analysts across a functional team, and therefore what information is extracted from each document and how it is organized can differ.Multiple rounds of review might be required for certain tasks, as one analyst described: "We have an analyst go in and review that contract, especially paying close attention to any non-standard terms or conditions.Then depending on the dollar value, there might be a second review, either by a peer or by a manager" (P8).To encourage consistency across documents and people, participants mentioned strategies for guiding their reviews: using a pre-defined form or checklist (P7, P8), referring to guidelines or a company playbook (P4, P6, P9, P11), organizing documents into descriptive folders with tags (P10-P12), or using a spreadsheet or document to capture key details while reviewing (P2, P10-P12).To complement users' current workflows, Marco supports creating structured representations that emulate the organizational strategies participants described, facilitating consistent review and comparison across documents.
Knowledge workers extract information using a combination of retrieval approaches, depending on the nature of the information needed to reach their goals.Throughout their work, participants sometimes needed to extract a single value accurately from a single document (e.g., a date or payment amount).Other times they sought longer-form answers to specific questions (e.g., "How is payment structured?"), or needed to extract query-specific excerpts from a document (e.g., "Find all termination conditions in this contract").Many participants also reported reasoning over similar information across multiple documents (e.g., "Which candidate has the most experience?" or "Which contract has the "earliest" termination date?").Overall, we found that participants sought different types of information in their documents to meet their goals, often organizing these different information types within their structured representations.These findings prompted Marco's suite of natural language actions, which serve to emulate these document-centered foraging strategies corresponding to users' information needs.For instance, Marco helps users easily extract specific factual information (i.e., lexical search), search for document snippets semantically relevant to a query (i.e., semantic search), and answer natural language questions over documents.
Extracting information across document collections is often repetitive, time-consuming, and tedious.As a result, the process is typically incomplete or prioritized by risk.Participants described workflows that mainly consisted of systematic and repetitive extraction of information from their documents, reflecting prior work that has suggested the extraction and encoding of information are often the most time-consuming processes in sensemaking [66].Participants worked with between tens and hundreds of documents at a time, each varying in length from a few pages (<7) to hundreds (>300) and taking anywhere from a few minutes to several hours to review.None of the participants reported using AI assistance (e.g., LLM-powered applications) to support their document-centric tasks.They instead relied on established strategies, such as keyword search (i.e., Control+F) or skimming documents to manually search for the necessary information.However, due the length and density of jargon within business documents, these strategies were cognitively demanding and potentially haphazard, as it felt "easy to miss a needle in the haystack" (P9).Moreover, current tools only allowed participants to search over a single document at a time, regardless of whether they needed to eventually execute similar searches over every document in their collection.As a result, participants reported struggling with the sheer volume of documents they encountered, often unable to review everything.Some employed a heuristic strategy, choosing to review only the riskiest documents, for instance by prioritizing contracts with the highest contract values (P7-P9).Assistance in identifying relevant information within their documents could significantly improve productivity, as one participant described: "If those [data] are made available to me, then I know where to look for.That'll reduce my time by 30-40%, even 50%.Because then I don't have to read through the entire contract" (P9).These findings suggest the potential for automation to both reduce tedium and improve the coverage of documents reviewed in users' current workflows.
While optimistic for AI assistance, knowledge workers desire agency and establish trust through manual verification.AI-powered systems can provide invaluable support in information foraging, but they also inevitably err [4,36] and can lack the specific expertise required for users' specialized workflows and goals.Participants described tasks for which they were skeptical that AI assistance could complete independently, such as those requiring complex reasoning or understanding of subtle nuances within documents.Participants emphasized their years of experience and contextual understanding (e.g., within a specific organization) were important in making sense of the information.For instance, P7 commented: "Our review from an accounting standpoint is very subjective.So it's not black and white all the time.Just because termination for convenience was found in a clause, it goes two paths-it resulted in a journal entry or it did not.Still, it's not always a journal entry."To build trust in AI assistance, participants articulated their need to understand how and why the AI arrived at a particular result.Participants therefore desired the ability to retain agency and saw the AI as a helpful co-pilot.They stressed the importance of having means to efficiently inspect and assess the output of AI assistance, such as through references to the "the exact language" from a document (P5).Marco addresses this need by indicating the provenance of extracted information and allowing users to edit or remove any erroneous AI responses.

Design Goals
Summarizing our findings, we suggest an effective system for streamlining user workflows with business document collections should support the following goals: [D1] Integrate useful structured representations to help users review and organize information extracted from their documents, with efficient navigation between representations.
[D2] Provide unified support for common information extraction approaches used in business document-centric workflowslexical and semantic search, intra-document querying, and cross-document synthesis.
The following attributes were used to select outstanding candidates: •

THE MARCO SYSTEM
Guided by the insights from our formative study, we developed Marco, an interactive workspace that supports sensemaking by leveraging AI assistance to reduce the costs of foraging over document collections.Marco's user interface consists of three integrated views (Figure 1).The Notebook View allows users to use natural language to extract information across their collection using AI assistance.As users interact with the Notebook View, foraged information is automatically aggregated in the

Notebook View
We refer to the primary area in which users extract and schematize information with Marco as a sensemaking notebook (akin to the evidence file from sensemaking theory [60]).A notebook is comprised of cells, adapting a block-based document metaphor from computational notebooks (e.g., Jupyter Notebooks [41]).Marco provides lightweight affordances for cell manipulation, allowing users to create, delete, hide, duplicate, or clear cells in a notebook.
A cell can be one of three types: Text, Action, and AI Suggestion.
Users create Action and Text cells on-demand, and AI Suggestion cells are suggested in response to users' actions.Figure 3 highlights the different types of cells available in Marco.the separate queries.Importantly, results returned by Search are verbatim snippets from each document.This both allows users to trust the provenance of the extracted information and provides a glimpse into the actual language used in the document.Ask Action.The second action, Ask, allows users retrieve answers to document-centered information-seeking questions.Questions are specified via natural language and Marco returns answers based on information either within individual documents (Ask[Each Document]) or across all documents in the collection (Ask[My Collection]).Users can select from these two types of Ask actions depending on the context for which their question is most appropriate.The first type, Ask[Each Document], allows users to ask the same question to each document in the collection, and retrieve an independent answer for each.Ask[Each Document] supports tasks for which users need to extract similar information across all of their documents.For instance, a hiring manager may ask their collection of resumes, "What programming languages has this candidate used in the past?" and expect a separate answer for each of their candidates.Responses to Ask[Each Document] are returned as a table, with rows for documents and a column for the query.Unlike Search, Marco returns an LLM-generated answer for each document rather than an extracted snippet.
On the other hand, more complex information needs may combine both information extraction and synthesis over multiple documents.For instance, a hiring manager could ask, "Which of these candidates have prior experience training machine learning models?"For these types of questions, the Ask[My Collection] action is better suited.This action first identifies one or more pieces of information required to answer the question (e.g., "experience training machine learning models"), extracts the relevant information from each document (using Search), and then synthesizes a concise answer to the question informed by the extracted snippets.The output of Ask[My Collection] consists of two components: first, the synthesized answer, and second, a table with the information Marco extracted to inform its synthesized answer (same verbatim snippets as returned from Search), which can serve as evidence for a user to verify the system's answer or reasoning process.
Summarize Action.The third action, Summarize, provides users with a short summary of each of their documents.As with the first two actions, results from a Summarize action are presented as a table with rows representing documents and a single column containing a document summary.By default, Marco returns a general summary for each document.Users can further specify dimensions to focus on within the generated summaries.For instance, a hiring manager may want a summary of each of their candidates, but with a specific focus on their leadership skills.
Altogether, these actions meet users' different information needs, offering reliable evidence extracted from each document and generated answers to expedite synthesis in sensemaking (Figure 4).By default, actions execute over an entire collection.However, in working toward their goals, users often begin to filter down the set of relevant information (i.e., documents) they care about.To help focus their exploration, Marco supports drilling down into a collection by selecting a specific subset of documents to execute an action over (D2), thus returning fewer results within the table that users need to review.Finally, users can edit or remove any text within an Action cell, providing full control over inaccurate or irrelevant AI-generated results within the notebook (D4).

AI Suggestion Cell.
To assist in the sensemaking process, Marco suggests periodic guidance to users through AI Suggestion cells which contain recommendations for up to three additional information-seeking queries relevant to a users' goals (D3).To bootstrap the information foraging process, an AI Suggestion cell greets users with several starting queries when a notebook is initially created.As users accumulate action cells in their notebook, Marco leverages their foraging history to provide relevant followup queries that may inspire subsequent sensemaking directions.AI Suggestion cells are non-intrusively placed below the most recently created cell, and can be accepted (i.e., turned into a pre-populated Action cell) or dismissed with a single click.

Table View
Individual result tables within each Action cell are automatically aggregated into an overview table within a Table View, which aims to mirror common organizational artifacts (e.g., spreadsheets) created in current business workflows.Each row in the overview table represents one document in the collection, and each column represents one of the dimensions for which an action was created and executed in the notebook.Using this view, users can easily compare multiple dimensions across multiple documents, reviewing their information foraging history in a single structured representation (D1).Users can also filter and reorder columns to control the exact presentation of information, and export this view to a CSV file to save, reuse, and share their work.

Document View
To help build trust in AI assistance, Marco provides context linking, an interactive feature enabling users to click on any documentgrounded snippet within the other two views to open the corresponding document with attribution highlighted (D1, D4).This interaction allows extracted information to serve as an "index" or entry point into a document, reducing the cognitive costs of switching between document snippets in the other two views and the original source documents.Documents opened via this interaction are rendered in a Document View, placed adjacent to the other two views, and equipped with standard document functionalities (e.g, highlighting, text annotation, and keyword search).

System Architecture
To enable its suite of interactive features, Marco's architecture combines a preprocessing pipeline for ingesting collections of documents and various NLP services for executing user-created actions and suggesting follow-up actions.

Preprocessing a Collection of PDF Documents
. When a collection of PDF documents is uploaded to Marco, each document is preprocessed to minimize latency during subsequent user interactions.Documents are processed with the PDF Extract API [2], extracting content and structural information into a structured JSON format.Sentences are split from the raw text of each document, embedded into a 384-dimensional dense representation using multi-qa-MiniLM-L6-cos-v1 (an encoder model tuned for semantic search and trained with self-supervised contrastive learning) from the SentenceTransformers framework [64], and indexed along with relevant text metadata using OpenSearch [69], an open-source vector database.

Handling
Actions over Individual Documents.Several actions operate over an individual document (Search, Ask[Each Document], Summarize), but are repeated for each document in the collection independently.For each document, we first retrieve relevant context (i.e., 30 chunks with the greatest cosine similarity to the embedded query).The retrieved chunks are sorted by the order they appear in the document and then concatenated with a query to form a fewshot prompt for an LLM (gpt-3.5-turbo).The specific formatting of the query depends on the action type.For instance, Ask uses a user's query verbatim, while Search uses a template, "Search for {user's query}." To allow for a more responsive user experience, we minimize latency by parallelizing across documents and streaming back LLM responses for each document independently.In this way, users can typically start reviewing information from their documents within 1-2 seconds.We opt to use the gpt-3.5-turbomodel for these actions (instead of more performant models, e.g., gpt-4) due to its lower cost and latency.

Answering Queries with Collection Context.
To handle actions that need to synthesize information across a collection (e.g., Ask[Collection]), we design a multi-phase prompting approach (Figure 5), inspired by similar decomposition strategies for answering complex queries with LLMs (e.g., [44,61]).Based on a user's query, gpt-4 is first prompted to identify the relevant document attributes necessary to answer the query.These attributes could include some a user had already previously searched for, in addition to other missing attributes.For each missing attribute, Marco executes a Search action over each individual document, extracting and saving relevant information into an evidence table.The context for a second prompt is then formed by joining the relevant information from each document for each of the identified attributes.Finally, gpt-4 is prompted using the context and original query.
4.4.4Generating Suggested Actions.AI suggestions for initial or followup actions are generated using a few-shot prompting strategy with an LLM (gpt-3.5-turbo).The prompt incorporates the user's goal, up to three documents from the document collection (truncated to the first thousand characters of full text), and the user's interaction history (i.e., any prior queries to an action cell).4.4.5 Implementation.Marco was built as a standalone web application, with JavaScript and React [73] for user interface components.All preprocessing and language understanding services were implemented in Python, using zero-shot and few-shot prompting with LLMs accessed through OpenAI APIs.LLMs were prompted with a sampling temperature of 0 and max generation length of 256 tokens (except AI Suggestion cells, for which a temperature of 0.7 and max generation length of 128 tokens were used).Exact prompts for all actions are provided in Appendix B.4.

STUDY 1: CONTROLLED USABILITY STUDY
To evaluate the efficacy of Marco's design and interactive features (RQ1 and RQ2), we conducted a controlled usability study, comparing Marco and a baseline approach.We used a 2×1 within-subjects study design, counterbalancing system conditions across participants to control for order effects and minimize biases.Participants completed one task with Marco (Marco) and one task with the system default file manager and document viewer (Baseline).The baseline condition was chosen to reflect the current workflows participants described in our formative study, namely the use of a text editor or spreadsheet (e.g., Google Docs and Sheets) and a PDF reader (e.g., Acrobat Reader or system default).

Participants
We recruited 16 participants (9 female, 7 male; average age of 28.2, SD = 4.5) via group messaging channels within a large software organization.Participants consisted of 6 software engineers, 3 PhD students, 2 business analysts, 1 Master's student, 1 program manager, 1 UX designer, 1 learning architect, and 1 sales employee.Participants were required to be 18 years or older and able to read documents written in English.All participants reported having previously used some LLM-powered application (e.g., ChatGPT), and three participants indicated experience developing applications or conducting research with LLMs.  Figure 5: Overview of Marco's multi-step pipeline for answering users' queries over a document collection.Marco first identifies attributes required to answer the query, some of which may already have been retrieved by prior user queries and other which may be missing (1).For each missing attribute, Marco executes a search to extract relevant snippets of information from each document (2), and saves the new search results into the aggregate table (3).Search results for each of the required attributes are formatted into a prompt and sent to an LLM (4), whose response to displayed to the user (5).

Procedure
We first introduced the study and obtained consent.We then guided participants through an interactive tutorial of Marco, highlighting the intended usage and limitations of Marco's AI-powered features.Afterwards, participants were asked to role-play as knowledge workers in two scenarios with business document collections, completing the following two tasks: (1) Hiring.Participants assumed the role of a hiring manager for an entry-level technology analyst position at a financial company, and were given a collection of 15 candidates' resumes.Resumes were one to two pages long and curated from publicly available university resume books.(2) Cleaning.Participants assumed the role of an office manager for a start-up company looking for a new cleaning service provider, and were given a collection of 10 contracts from potential providers.The contracts were pulled from online templates and modified to be two to four pages long.
To ensure both tasks could be completed in a single study session and also include a non-trivial number of documents, we adjusted the content and length of several documents.We also used fewer contracts in the Cleaning task since contracts tended to be longer than the resumes in the Hiring task.
For each of the two tasks, participants answered three questions.One question was intended to be more conducive to keyword search, while the other two questions involved reading and reasoning over more of each documents' text.Additional details for the two tasks can be found in Appendix B. Each question asked participants to select one or more documents in the collection as their final answer.Participants were limited to seven minutes per question, allowed to submit before time expired, and submitted partial progress if time expired.The allotted time was iteratively determined through several pilot studies.The final time limit was chosen to allow most participants to manually review each document in Baseline when completing the tasks.We believe reducing the allotted time would increase the difficulty of the task and thereby the comparative advantage of Marco, but we leave these studies for future work.
After both tasks were completed, participants completed a system usability survey and a demographics survey.Any remaining time in the study was used to allow participants to share their overall experience using Marco and provide additional feedback.All interviews were recorded, transcribed, analyzed for qualitative insights following an open thematic analysis [9,10].Studies ranged between 60 and 70 minutes long.Participants were compensated with $30 (USD) upon completion of the study.

Measures
We recorded the following measures for each question: • Accuracy -A score from 0 to 1 indicating overlap between a participant's final answer and a predetermined ground truth answer.Accuracy was calculated as the sum of true positives and true negatives, divided by the number of documents in the associated collection for the question.• Time -The amount of time a participant spends answering a question, measured from when a participant finishes reading a question to submission of a final answer.• Confidence -A participant's confidence in their answer.
Specifically, their response to the question "How confident do you feel in your answer?" on a 5-point Likert scale ranging from "Not at all confident" to "Extremely confident." • Ease -A participant's ease of identifying the information used to arrive at their answer.Specifically, their response to the question "How difficult was it to find the information you needed to reach an answer?" on a 5-point Likert scale ranging from "Extremely easy" to "Extremely difficult." • Effort -A participant's amount of cognitive effort required to arrive at their answer.Specifically, their response to the question "How mentally demanding was completing the task?" on a 5-point Likert scale ranging from "Extremely low" to "Extremely high." We also measured overall usability with a system usability scale consisting of ten 5-point Likert scale questions [11].

Participants' Usage Patterns of Marco (RQ2).
We provide insight into usage patterns with Marco through an analysis of participants' interaction logs and semi-structured interview responses.We refer to participants with the pseudonyms P1-16.Interaction counts are provided in Appendix Table 3.In the Baseline condition, participants completed tasks by opening and scanning each document in turn, often relying on keyword search and structural and visual cues (e.g., section headings) to identify passages to read.Participants using Marco tended to have fewer interactions with individual documents.They instead created Search and Ask actions, the specific type and quantity of which varied across participants.On average, across three questions, participants created 2.3 (SD = 1.3)Search, 2.4 (SD = 1.6)Ask[Each Document], and 1.7 (SD = 1.3)Ask[My Collection] actions.Participants preferred Search when comparing multiple criteria across their collection at once, searching for 2.0 queries per action on average (SD = 0.94, Mdn = 2.0), and up to as many as six queries in a single action (P16).

System
Usability and Suggested Improvements.The average and median SUS scores were 74.5 and 71.3 respectively, indicating strong overall usability.For instance, some participants found the need to select an action appropriate to a specific information need added cognitive overhead.Instead, based on a user's query, the system could determine which type of action cell to create and whether generative or extractive results is more likely desired (P2-P5).Others suggested providing clearer visual indicators to distinguish between LLM-generated text from Ask and extractive document snippets from Search (P1, P14, P15).

STUDY 2: DESIGN PROBE WITH KNOWLEDGE WORKERS
We conducted a second qualitative study using Marco as a design probe with domain experts whose responsibilities included reviewing large sets of documents.The goal of this study was to understand how Marco might support their current real-world workflows (RQ2) and how they perceive and interact with potentially imprecise AI assistance (RQ3).

Participants
We recruited 7 participants (3 female, 4 male) via group messaging channels within a large software organization (Table 2).Unlike in the usability study, we required participants' work responsibilities to include reviewing large sets of documents.Two of the participants had between 1-5 years of experience, and the remaining five participants had more than 15 years of experience working with business documents.One participant reported using LLM-powered applications to support his work (P1).Participants were thanked for their time but not compensated.

Procedure and Analysis
The study began with introductions and a discussion of the participants' roles, responsibilities, and goals when reviewing document collections.Next, a study facilitator guided participants through a walkthrough of Marco using a document collection of 10 contracts from the usability study ( §5.2), highlighting Marco's features.At each step, participants were encouraged to talk through what they encountered and their reactions.The study concluded with a semi-structured interview to understand how Marco could support participants' current workflows they described at the start of the study.For confidentiality reasons, we opted to demonstrate Marco's capabilities with a pre-selected document collection instead of users' own work documents.Interviews were conducted remotely, recorded, and transcribed for analysis.One author went through the transcripts and coded them for themes using an open thematic analysis process [9,10].The research team then discussed and iterated upon the themes until consensus.

Results
Participants reiterated similar information needs and challenges in their current workflows as those found in our formative study ( §3.3).For instance, one operations analyst described how his team often receives 15-20 documents every 30 minutes to review within 24 hours (P2).To meet their goals, all participants described spending considerable time and manual effort extracting information from various documents.Given the substantial volume and cognitive load, participants prioritized their focus, saying, "We don't check everything.That's impossible."(P6).Next, we report how Marco could support participants' current workflows and identify areas for improvement.We organize our findings under themes that emerged from our qualitative analysis.

ults -Combined
Baseline Marco Time (seconds) It was difficult to find information needed to arrive at an answer.

It was mentally demanding to
arrive at an answer.6.3.1 Accelerating information foraging and facilitating re-use.Participants described having a basic set of recurrent questions they needed to answer over multiple documents and analyses (P1-P4, P6, P7): "We just go through our checklist of five or six questions and then mentally just go through the document and try to answer it" (P2).Marco could lower the foraging costs for repetitive questions, enabling participants to instead focus on validating and analyzing the results.Participants also appreciated how Marco provided structure to reuse analyses (through defined actions) over their documents and apply the same analyses over a new set of documents.

Different actions support different sensemaking use cases.
Most participants appreciated having different types of actions to collect information, which supported various use cases (P1, P2, P4, P5).In some cases, a generated answer over the collection (Ask[My Collection]) could provide participants with sufficient detail, while for other analyses, a Search action with document snippets better supported users' goals of familiarizing with each document's language.A finance analyst described how Marco's design aligned well with existing workflows, for instance with the two types of analyses performed by her team-recommending a strategy and cross-checking details-"I love the fact that it's all documents, each document, a custom set of documents that you want where you could choose . . .that's really important because that's how people are working" (P4).One participant, a legal specialist, found the distinction between actions unnecessary, and suggested Marco could instead understand users' intents based on their queries (P3).
Participants had varying preferences for which actions would best support their unique workflows.Those whose workflows involved processing document-by-document found most use for Ask[Each Document] since it closely matched their working mental model (P1, P2, P7).Ask[My Collection] was seen as useful for making comparisons (e.g., different contract vendors) (P1, P6), identifying documents with relevant terms or statistics (P2, P4), and uncovering patterns across documents (P5).Finally, Search provided an additional mechanism for confirming results (P2, P3, P5).Table 2 lists some of the specific actions participants desired.

Domain experts preferred Notebook View for analysis and
Table View for synthesis.By extracting and aggregating information in one organized workspace, all participants described Marco was better at supporting sensemaking tasks (e.g., drawing comparisons, observing patterns, and identifying non-standard language) compared to their current workflows.Participants described the Notebook View as better suited for deeper analysis and exploration of actions and queries to use with different types of analyses (P3, P4).On the other hand, the Table View provided a better overview of the information once the analyses were complete (P3, P4, P7) and subsequently was seen as more useful for "day-to-day operational tasks" (P3).A global sourcing analyst suggested the Table View columns could be fixed and queries re-applied for day-to-day diagnostics, "If you have a few questions, most important things you identify, ask those questions and then you just change it to this view and you can export this showing the most relevant information side by side.I like that." (P1).To improve its utility, participants suggested being able to save queries as presets in the Table View and adding customizable column filters for different types of analyses (P3, P4).

6.3.4
Verifiable AI assistance can provide value despite imperfections.Participants described how Marco could help augment their current processes, even if the AI assistance was imperfect (P2, P4, P6, P7)."Even if it's not 100%, even just catching stuff we might miss" (P2).For most users using Marco was described as just another round of review (out of several) and not a replacement to their final review.An operations manager described it as, "I feel like this AI chatbot is more to help us . . .The actual work is being done by us" (P7).She elaborated that in real use, when an analyst cannot find some information, they would raise a concern for additional review to a colleague or manager.From this perspective, Marco was seen as another reviewer in their operations.Document highlights that connected to results provided by Marco were important in building confidence when using the system.Highlights made verification easy when needed.Participants emphasized that for risky business cases, verification would always be required (P4, P6).Highlights were also seen as helpful to guide further reading when generated answers lacked detail, "It's useful because maybe the answer we get might be small, just a summary . . .Linking back will give us a step-by-step, a detailed process" (P7).
Participants suggested ways Marco could increase their confidence in its generated responses.First, the system should ensure responses are consistent for similar queries from different actions.Participants might apply multiple similar queries to arrive at the same answers.Providing several mechanisms to verify responses increases users' confidence (P2, P3, P5).Second, the system should offer lengthier answers that provide explanations, which participants preferred, over succinct answers.Knowledge work involves understanding not just reporting a correct answer.As one participant described, "I want the AI to help me actually understand it and not just provide me the answer" (P1).Third, the system should avoid overconfidence and communicate when human review is needed (P1, P2).Participants viewed Marco as an additional thought partner in their workflow, and akin to working with other colleagues perfection was not expected so it should ask for help when needed.For instance, they suggested Marco could communicate uncertainty in answers or flag items requiring further human review.

DISCUSSION
This work was motivated by knowledge workers' need to extract and analyze information from a collection of business documents to solve complex information tasks.Our interviews with knowledge workers in various business functions describe participants' diverse document-centric workflows.Despite this diversity, we identify common challenges with information extraction that motivate the design of Marco.Our observations highlight how current tools at most enable users to access information in documents individually, resulting in tedious and repetitive processes when attempting to complete sensemaking tasks over a collection.We reaffirm that adoption and use of document-centered automation tools in the workplace remains limited [38], as we also found limited use of reading assistance beyond interfaces with simple document annotation and manipulation.A study by Adler et al. [1] in the late 90s provides a detailed characterization of knowledge workers' reading activities at a time when processes where just starting to move from paper to digital.We found that while document processes described by participants were mostly digital, challenges with information extraction persist.We hypothesized these challenges remain because most digital reading tools mirror manual paper-like reading metaphors.Our goal with Marco was to consider an interaction paradigm in which knowledge workers primarily interact with their documents altogether as a holistic knowledge base rather than as individual documents.
To support knowledge workers' wide range of goals, Marco was designed such that users could communicate their information needs to an AI assistant using natural language and interactively build a schema that provides an overview of a document collection.Users remain in control over what type of information is relevant for their specialized analyses but delegate the extraction and organization of information to an AI assistant through actions.We found Marco's approach had a positive impact on users' productivity.Results from the controlled usability study found participants completed information-seeking tasks 16% faster and with less effort using Marco when compared to a manual approach.One participant reflected on how using Marco reduced the overall workload, saying "I definitely feel it saves a lot of my time of going back and forth in the document to search stuff, so it's really helping me in the kind of mental demand and the amount of work I need to do." (P1).While the various actions in Marco provided users flexibility in addressing diverse information needs, some users experienced a learning curve due to the distinctions between these actions and having to craft the right prompt.Future work can explore how AI assistance could help users surface their intent or refine their prompts through better conversational repair strategies [3].
Ensuring the reliability of imprecise AI assistance is a common strategy by users to mitigate the impact of errors in AI-enabled interactions [4,28].In both the usability study and design probe with experts, we observed two common strategies participants used to build confidence with Marco and to recover from errors.First, we observed participants often used different actions in concert for a single query (e.g., both Search and Ask) as a means to crosscheck retrieved results and surface any inconsistencies.Domain experts described how in workplace contexts where complex information tasks drive high-impact decisions, multiple people would be working together to verify information in different ways, and Marco could provide an additional redundant layer of review to inform their decision making.Second, we observed when participants had low confidence or encountered any inconsistency, most opted to manually verify the accuracy of AI responses.Action cells in Marco provided an organized table of results with evidence from each document (e.g., extractive document snippets in Search).This structure allowed users to quickly skim results across all documents and link to specific documents for additional verification when needed.In the usability study, few participants completed the tasks without opening a document and we found no significant differences in task accuracy and self-reported confidence when using Marco compared to Baseline.These observations imply that Marco provided adequate mechanisms for users to build trust and reliance on delegated tasks to Marco, despite not manually reading and reviewing each document individually.
Taken together, findings from both studies suggest that Marco can provide users a productivity boost without impacting work quality.Hearkening back to the "overview first, zoom and filter, details later" mantra for visual information seeking [71], knowledge workers using Marco's collection-centric interaction paradigm need only attend to and reason over the relevant portions of each document, delving into specific documents for details as needed.When relying on AI assistance, the ability to expand on relevant document details is especially important in supporting human evaluation and verification.However, additional studies are needed to understand real-world scenarios where overreliance on AI results may start to impact quality.For instance, several domain experts described scenarios where a review may entail working with thousands of documents and/or tight timelines, making it impossible to manually verify all document details.The strategies we observed with Marco for cross-checking and verification could also become ineffective in these scenarios and potentially encourage overreliance [5].Future work can investigate additional interactions that provide guardrails and mitigate risk in these high-stakes scenarios, for example by providing explorable uncertainty visualizations or further scaffolding results through clustering [5,18,26,46].Our findings moreover illustrate how AI assistance can reshape traditional foraging and sensemaking activities.In these AI-augmented sensemaking processes, humans-previously burdened with cognitively taxing and tedious foraging tasks-now bear a different set of responsibilities, of specifying intents, delegating processes, and evaluating the results of AI assistance.

Limitations
Most of the document-centric workflows in our studies center around information in long-form, primarily text, documents.Thus Marco was designed to primarily support text information foraging.Document workflows can also involve multimodal content (e.g., visual, structural, auditory).For instance, an analyst designing a new marketing campaign might need to synthesize information present in text, image, or video content.We see exciting opportunities in leveraging multimodal LLMs to support interactions over more diverse content [78].These interactions could expand on the types of action cells used for information foraging, as well as the representations used to organize that information (e.g., data visualizations, graphics) [17].
In our controlled usability study, we limited the number and length of documents in each collection.Task questions were also designed to be answerable within seven minutes, ensuring completion within an hour-long session.Despite these constraints, we believe the specific document categories and task questions capture the types of information-seeking queries found in our formative study.Many workflows required extracting information from documents according to established domain-specific criteria and organizing the information into a representation to share with others.Our evaluations were therefore designed around the challenges inherent in these tasks.In contrast, other sensemaking tasks that are largely exploratory (e.g., learning a new skill, debugging code, or creative writing) may not be as ideally compatible.Nevertheless, we believe Marco can offer some utility toward these exploratory tasks.As users iteratively brainstorm new dimensions to evaluate, Marco's suite of actions allows users to quickly extract and compare relevant information for each dimension across many documents.
Beyond exploratory tasks, complex information tasks where needs may not be as well-defined could span multiple hours or days.As constraints on our controlled study could have affected participants' information foraging and decision making behavior, in future work we intend to conduct a longitudinal field study with Marco involving a diverse set of business-related knowledge workers and document collections.This in-the-wild study could offer additional insights into scenarios where Marco enhances users' workflows and areas where it may fall short.

CONCLUSION
We presented Marco, a mixed-initiative workspace leveraging large language models to support workflows with business documents.Through a suite of natural language actions that reflect common information-seeking approaches to document review, Marco aims to improve consistency and reduce tedium in searching for information across many documents.A usability study found Marco helped people search for and reason over information across document collections more quickly and with less effort compared to an existing baseline approach.A design probe with knowledge workers further showed how the design of Marco's actions and views aligned with real-world workflows.Overall, our studies highlight how business workflows even today remain manual and tedious, with low adoption of technological support.We believe Marco offers a glimpse into how document-centered AI assistance can be integrated to complement users' processes, with simple affordances that build appropriate trust in automation.We hope this work can inspire future exploration of the opportunities offered by recent automated language understanding capabilities and envision new mixed-initiative systems for document-centered assistance.

B USABILITY STUDY DETAILS B.1 Apparatus
Each participant completed the study on a provided Apple Mac-Book M1 Pro with 16GB of RAM running MacOS Ventura 13.4.No external monitor was used.Before beginning the study, participants were instructed on how to quickly navigate between desktops, as the Qualtrics survey for collection of task responses, the file manager and PDFs used in the control condition, and Marco were provided on separate desktops.All participants were comfortable with using a computer to browse documents and expressed little to no difficulty using the provided apparatus.

B.2 Scenario-based Tasks
Participants in the usability study completed the following two scenario-based tasks.For each task, they answered three timed questions pertaining to information across a document collection.The answer choices for each question included all documents within the collection, and participants selected one or more documents as their final answer.
Task 1.You are a hiring manager for an entry-level technology analyst role at a large financial organization, Acme Inc.You have    View with an action cell affixed to the table of the table.This action cell executes identically to cells in the Notebook View, but populates the columns in this table directly rather than returning results within an independent cell.received resumes from 15 potential candidates.Review the collection of resumes.Your goal is to identify promising candidates to invite for an on-site interview at your offices.
(1) Candidates must have a strong education background in a relevant technical field.We want to filter all candidates who do not meet this criteria.Which candidates DO NOT have a degree in Computer Science, Mathematics, or Engineering?(2) Candidates should ideally have experience with at least one programming language and have some prior experience working in the financial sector.Which candidates meet BOTH of these criteria?(3) The best candidates should be familiar with two processes often used by technology analysts at Acme Inc: 1) statistical data analysis and 2) financial risk analysis.Which candidates have demonstrated skills or experience relevant to BOTH of these two processes?Task 2. Your financial technology company, Acme Inc., has just relocated to a new office space in San Francisco.You need to hire a cleaning service provider.You have received 10 contract offers from several providers in the area.Review the collection of contracts.Your goal is to identify providers that provide Acme Inc. with good benefits in their contract.Instead you want a one-time payment for each service that includes pricing for any required equipment, materials, and tools, as well as fees.Note: Payment-related fees (e.g., late payment, missed payment, overtime) are still acceptable.Which providers will bill you a one-time payment per service (including all fees)?(3) You haven't had experience with any of these providers so you want to make sure the contract termination is flexible.In case things don't work out you want a contract that allows termination at any time with written notice.You might want the ability to take action if the quality of service provided does not meet your needs.Which providers have flexible terms for termination AND allow you to take action if service is unsatisfactory?

B.3 Marco Usage Details
Table 3 presents the number of interactions participants had with Marco's features throughout the usability study.Participants used different LLM-powered actions to complete their tasks, and often sought to verify the results returned from the LLM by checking the original document context.Answer the user's question based on information in the following table.The table contains information extracted directly from a collection of documents which may be relevant to the user's query.When possible, try to provide a concise explanation for your answer based on the information provided in the table.If the question cannot be answered given the provided information, respond "I don't know".

AI Suggestions
You are a intelligent document reading assistant helping users explore and understand their documents to achieve their goals ({Goal}).
The following shows examples of the types of documents the user is working with (but are not the exact documents they have).The user has previously searched for the following things within their documents: {Searches} The user has previously asked the following questions of their documents: {Questions} Suggest up to two other searches and questions they could ask their documents.These search queries and questions should be answerable given the document and nothing else.Respond with exactly one JSON object with two keys: "suggested_searches" and "suggested_questions".If no relevant searches or questions can be asked, respond with an empty list for both "suggested_searches" and "suggested_questions".

1 2 3 Figure 1 :
Figure 1: Marco is a mixed-initiative workspace for sensemaking over document collections.Marco integrates three views: a Document View renders a PDF document, a Notebook View provides document-centered actions leveraging LLMs, and a Table View provides a collection-level overview.Actions in the Notebook View encode relevant information within result tables, with one row per document (1).Responses can be verified with in-context highlights within the Document View (2).Results across actions are concatenated into a Table View to support collection-level analysis (3).

Figure 2 :
Figure 2: (a) The sensemaking process consists of two iterative loops of activity: information foraging and sensemaking [60].(b) Marco was designed to support different stages of the sensemaking process in business workflows.Using natural language, users delegate foraging tasks to AI assistance (foraging loop), enabling users to focus on verifying AI responses, refining information schemas, and synthesizing information (sensemaking loop).Solid lines indicate capabilities available in Marco.

1 Doc 2 " 1 Doc 2 1 Doc 2 " 2 SuggestedFigure 3 :
Figure 3: Marco's Notebook View is comprised of various cells.Action cells provide collection-centric AI assistance for users' dynamic information needs.Ask[Each Document] answers the same question for each document separately (1a), Ask[My Collection] answers questions synthesizing information across a collection (1b), Search extracts information verbatim from each document (1c), and Summarize generates a user-guided summary for each document (1d).Text cells serve as a note-taking space (2), and AI Suggestion cells provide follow-up actions to continue the foraging process (3).

Figure 4 :
Figure 4: Marco supports four strategies for information foraging across a collection.Search returns snippets extracted directly from each document, Ask[Each Document] and Summarize return LLM-generated answers grounded in each document, and Ask[Each Document] returns a combination of both extracted evidence and an LLM-generated answer synthesizing the evidence.

Figure 7 :
Figure 7: Subjective responses for task questions.Participants found tasks easier and less effortful with Marco than Baseline.

Figures 8 , 9 ,
Figures 8, 9, and 10 provide screenshots from Marco to highlight common configurations of views in the user interface.

Figure 8 :
Figure 8: Marco with both Document and Notebook Views opened side-by-side.This configuration allows users to perform collection-level foraging and sensemaking actions with their notebook, while also verifying details from PDFs on-demand.

Figure 9 :
Figure 9: Marco's Notebook View.Users primarily interact with Marco by creating action cells (outlined in purple) to execute information gathering tasks automatically over their entire collection.Marco encodes execution results as an interactive tabular schema embedded within each individual cell.A left-aligned file manager is also shown, which allows users to see a list of documents in their collection and open individual files, and can be hidden away when unneeded.

Figure 10 :
Figure 10: Marco's Table View with three executed queries.Users can filter, search, and reorder columns.Users can also add new columns directly from the TableViewwith an action cell affixed to the table of the table.This action cell executes identically to cells in the Notebook View, but populates the columns in this table directly rather than returning results within an independent cell.

( 1 )
You are especially interested in ensuring the provider includes carpet cleaning and window cleaning as part of the services offered.Which providers DO NOT list BOTH carpet cleaning and window cleaning in their services provided?(2)You want to know ahead of time how much you will be paying for each service.You don't want to pay for services hourly.
Sample document: {Up to 1000 characters of Sample Document 1} Sample document: {Up to 1000 characters of Sample Document 2} Sample document: {Up to 1000 characters of Sample Document 3}

Table 1 :
Participants in our formative study.
Table View to facilitate comparison across the collection.The Document View allows users to view individual documents in their original format, providing on-demand context for extracted information.Next, we describe the design of Marco's user interface and interactive features, referencing our prior design goals D1-D4.
The first action, Search, allows users to extract information from their documents relevant to one or more search queries.Results are returned as a table, with rows for documents and columns for each search query.Users can select from two types of searches: lexical and semantic.Lexical search returns short document snippets containing an exact keyword match to a user's query, while semantic search extracts short document snippets which are semantically similar to a user's query.Search performs semantic search by default, and users can execute a lexical search by enclosing their query in quotes.Users can also search for multiple information needs in a single Search action with comma-separated queries.Columns in the result table then correspond to each of 4.1.1TextCell.Text cells render rich text editors, with common tools such as lists, text emphasis, and Markdown styling.Text cells allow users to record their ongoing sensemaking processes, providing context and structure to surrounding cells.For instance, users can create in-situ notes as they explore their documents, reducing context switching to external note-taking applications.4.1.2ActionCell.To facilitate the exploration of a large document collection, Marco equips users with AI-powered document assistance through Action cells, which execute user informationseeking queries as actions (D3).Users create Action cells with a slash command, i.e., typing the slash character ('/'), in any empty Text cell.Three available actions are presented in a drop-down menu: Search, Ask, and Summarize.These specific actions are motivated by common document-centered information needs revealed in the formative interviews (D2).Search Action.

1
Identify missing attributes -> Search for missing attributes -> Update overview table -> Identify relevant attributes -> Form collection-level QA query with context -> Prompt LLM -> Form response Marco as Sensemaking Support (RQ1).Figures6 and 7summarize how quantitative measures for the task questions varied between the two system conditions.A main effect analysis of system condition on Time found participants completed questions more quickly with Marco than with Baseline ( = 297.3,=89.1)(2(1)= 8.26,  = .004).On average, participants completed each question with Marco in 249.6s (SD = 88.8s) and with Baseline in 297.3s (SD = 89.1s), a 16% difference.Participants were slightly more accurate with Baseline (91.9% accuracy, SD = 8.3) than with Marco (88.1% accuracy, SD = 10.1),though this difference was not significant ( 2 (1) = 3.35,  = .067).A similar analysis for system condition on Confidence found no difference (2

Table 2 :
Overview of knowledge workers in the design probe, their documents, and how they would use Marco.How quickly do we need to pay after invoicing?What are obligations in terms of pricing?What is my fixed capacity?What sort of rate protection do we have?What is the term growth?What is the cost?What products are included?Who is the supplier?What risks are associated with <clauses>?Do these contracts have any unfair or unfavorable provisions?Which documents discuss <action>?Which of these deals have the <identifier number>?Which of these deals have a flexible start date?Which of these deals have a keep end date pricing?How many <identifiers> are listed?What are unit prices?Does it say auto-renewal?Search: net payment terms, start and end dates P3 Legal | Contracts Ask: Which contracts have <X> days notice period for termination?What are all agreements that include a DPA?Who are all customers that have analytic products and less than 30 day notice period for breach?Which contracts have <agreement terms e.g., termination for convenience>?How frequent was <term> mentioned?What questions are asked on <X>?Which of the <company> contracts do we need to let roll through the cycle?What is the commencement date?Is there guidance on <event outage>?Have you seen <event outage>?In the context of these docs, should I do <event action>?Do you have an opinion on what do when <event outage> happens?Which have <X> terms?Which contracts are expiring?What is the opt out clause?How do we pay?
Ask: Which process documents are about <negotiation deal>?Where is <agreement term>?Which agreements have an addendum changing a billing cycle?

Table 4 :
Prompts used in Marco.{}refers to a placeholder.You will be given excerpts from a document and a search query.Extract the spans of text that are most relevant to the query, if any, word for word from the document.Respond with a JSON object with a single key "snippets" and a list of the extracted spans.If there no relevant spans of text, respond with a JSON object with a single key "snippets" and an empty list.You will be given a document and a question.Based on the information contained in the document, answer the question to the best of your abilities.If you cannot find the answer in the document, respond by saying that the document does not mention this information.Use the following examples as a guide: Suggest what attributes, if any, are still needed to answer a user's question.Only suggest a new attribute if a relevant one is not already in the following list.If no new attribute is needed, just return an empty list.