Designing Accessible Obfuscation Support for Blind Individuals’ Visual Privacy Management

Blind individuals commonly share photos in everyday life. Despite substantial interest from the blind community in being able to independently obfuscate private information in photos, existing tools are designed without their inputs. In this study, we prototyped a preliminary screen reader-accessible obfuscation interface to probe for feedback and design insights. We implemented a version of the prototype through off-the-shelf AI models (e.g., SAM, BLIP2, ChatGPT) and a Wizard-of-Oz version that provides human-authored guidance. Through a user study with 12 blind participants who obfuscated diverse private photos using the prototype, we uncovered how they understood and approached visual private content manipulation, how they reacted to frictions such as inaccuracy with existing AI models and cognitive load, and how they envisioned such tools to be better designed to support their needs (e.g., guidelines for describing visual obfuscation effects, co-creative interaction design that respects blind users’ agency).


INTRODUCTION
Photo sharing is an important activity for blind individuals to access visual information in daily life, socialize, keep memories, and express themselves [3,13,25,28,33,81].At the same time, visual privacy is a major concern with photos taken by blind people, due to accessibility challenges in reviewing and evaluating photos that contain private objects or information [3,13,25,28,54].Blind individuals are thus exposed to higher security risks and more impression management issues when engaging in photo sharing compared to sighted counterparts [4,5,26,81].
Recent research has begun to explore privacy protection features for blind people to manage their visual content (e.g., [8,71,87]).One promising approach is to leverage computer vision for detecting and obfuscating (e.g., blurring, removing) private content in blind individuals' photos [8,16].Accessibility researchers have interviewed blind individuals' perspectives on the use and design of this type of AI-assisted visual privacy tool, noting both needs and concerns from the community [8,71].In particular, blind people desire control over obfuscation decisions and would like tools to be designed more accessibly to support such control [8,71].Such a tool should inform blind users of potential private content in their photos and empower them to decide, manipulate, and evaluate the obfuscation of this content.Still, what interaction designs could support these goals and how blind people would react to using such a tool are under-explored questions.
To bridge this research gap, we prototyped and evaluated a midfdelity screen reader-accessible obfuscation tool, building on insights from past work (e.g., [8,38,71,87]).To examine the capability of existing AI technologies in providing accessible visual obfuscation support while also exploring what an ideal system could ofer, we prepared two versions of the prototype, one using of-the-shelf AI models (i.e., Segment Anything Model [36], BLIP2 [42], Chat-GPT [52]) and one using Wizard-of-Oz, human-authored guidance.We employed the prototype as a probe in a study with 12 blind participants.Participants edited private photos, including those from the BIV-PRIV [66] dataset (photos with fake "prop" private objects taken by blind people) and participants' own non-private photos.We focused on the following research questions: • RQ1: When given the opportunity to apply computer vision methods for managing private visual content in photos, what are blind people's mental models of these methods?• RQ2: How do blind people approach these computer vision methods and why? • RQ3: What design opportunities exist to reduce friction in blind people's experiences with these methods?
Our fndings reveal that blind participants had varied levels of pre-existing understanding of relevant visual concepts (e.g., background versus foreground, blur, inpainting) but were quick to learn and make use of these options.Still, participants experienced a range of frictions in using the prototype to manage private visual content (e.g., inaccuracies with the of-the-shelf prototype version, general difculties with envisioning and evaluating obfuscation results, heavy cognitive load).Accordingly, participants ofered design ideas to alleviate these frictions, such as allowing users to more freely make obfuscation decisions with only supporting input from AI and non-visual communications that clearly indicate obfuscated private content' visibility.We discuss how this feedback could inform both the design of more accessible visual obfuscation interfaces and adaptations to the underlying computer vision models that support blind users' sense-making of obfuscation efects.
In summary, our work makes the following contributions: (1) an empirical understanding of how blind people approach AI-assisted obfuscation manipulations for managing private visual content, (2) design insights for reducing frictions noted in blind people's use of AI-assisted visual privacy obfuscation tools, and (3) an example prototype design of an AI-assisted screen-reader accessible visual obfuscation tool.

BACKGROUND
Our research is informed by prior literature on blind photography, visual privacy, privacy-preservation technology, and accessible visual content-sharing support.

Privacy Concerns Related to Blind People's Photos
Blind people take and share photos for a range of purposes, from visual information access to social interaction and self-expression [3,13,25,28,33,81].These photos commonly feature text, outdoor scenery, people, food, vehicles, crafts, plants, household items, and so on [3,33].In sharing photos, however, blind people face challenges with reviewing and evaluating the photo content compared to their sighted counterparts [3,13,25,28,54].As a result, photos shared by blind people often unintentionally contain private information [4,5,26,81].Blind people are aware of privacy risks involved in photo sharing and often feel concerned when engaging with cameras [4,5,13,33].
Concerns include both inadvertently disclosing their own information and breaching others' privacy (i.e., multiparty privacy conficts) [6,7,73,89].Recent work has examined blind people's privacy concerns and risks related to visual content sharing, a concept termed visual privacy (e.g., [7,31,72,73]).Private visual content categories that blind people are particularly concerned with include: fnancial (e.g., bank account details, credit cards), medical (e.g., medical documents, prescription pill bottles, pregnancy tests), people (especially naked bodies and faces), and location or identifcation (e.g., digital screens, letters, papers with addresses and names) [7,26,73].Blind individuals are also concerned about photos that may negatively infuence others' perceptions of them (i.e., impression management), such as unfattering or embarrassing shots or unorganized homes, and activities that may be misinterpreted as bad behaviors [7,73].
A range of factors infuence blind people's comfort with photo sharing.For example, they are generally more willing to share with close friends and family [7].In particular, they often work with sighted friends and family for visual information access and management [6,87].In doing so, however, blind individuals worry about the lack of independence as well as the potential for compounded risk of sharing sensitive information with close social ties [31,87].In turn, some individuals have become accustomed to sharing private content with remote visual interpretation services to access important information (e.g., Be My Eyes [2], Aira [1]), though their willingness to do so depends on the type of service as well as data handling and access policies [7,71,73].Finally, other sharing considerations include: (1) the potential for disclosing bystanders' private information [7], (2) the impact on intimate personal relationships or broader social interactions if private information is disclosed [73], (3) the burden of choosing between the right to information access vs. others' profting from their data [72], and (4) whether the information is shared knowingly (e.g., with a visual interpretation service to gain access to visual information) or inadvertently (e.g., in the background of a photo) [73].These concerns mostly align with parallel visual privacy research eforts involving sighted people (e.g., [45]).
In this paper, we aim to advance technology design that gives blind people more control in managing their visual private content themselves through non-visual information access and photo manipulation.Additionally, blind people's visual privacy is generally researched within the context of visual interpretation services, yet their photo-sharing practices span a much wider range of contexts, many of which have been considered important by general privacy research (e.g., social media [12,24]).Our research helps reduce this research gap by considering a range of common photo-sharing scenarios of blind people.

Accessible Obfuscation Design
Prior work has proposed a range of privacy-enhancing approaches, including but not limited to access control mechanisms (e.g., [23,69,79,80]), privacy policy measures (e.g., [22,78]), and privacy features that detect, fag, and limit sensitive information (e.g., [18,44,65]).Among them, obfuscation has been highlighted as particularly promising for protecting blind people's visual privacy [7,8,18,26,71].Obfuscation is "the deliberate addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection" [16], which allows hiding specifc private areas in a photo while still displaying important information.Blind people also envision that obfuscation could help focus recipients' visual attention, which is useful in interactions with visual interpretation services [8,71].
While obfuscation could be applied in many forms-such as blurring (e.g., [32,67,75,84]), overlaying with stickers (e.g., [14,43,61]), silhouette/blacking out (e.g., [14,53]), inpainting [51,55], pixelating [29]-each form has pros and cons.For example, in an interview study, some blind people were concerned that blurring may be less reliable than blacking-out due to being more easily reversible, while some also value the potential for blurring to allow at least a vague understanding of the overall visual content without disclosing specifc private information [71].Obfuscation can also be applied to specifc AI-detected private objects (e.g., [9,39,76,85]), though such an approach is difcult, as privacy is contextual [10,50]-so are the types of objects to be obfuscated [10].Work with sighted users shows that people consider a utility-privacy trade-of when making obfuscation decisions about a photo [10,29,35], which has led to proposals to protect privacy without sacrifcing utility, such as using avatars [46] and activity-oriented partial obfuscation [10].
Most existing obfuscation design work is geared toward sighted people, with only a recent focus on the needs of blind users.In an interview study, Alharbi et al. [8] explored blind people's perspectives on accessible obfuscation tool design.Their participants viewed obfuscation to be potentially useful but desired information and control over the obfuscation decision to ensure alignment between their intention and the automated obfuscation results [8].Similarly, Stangl et al. [71] interviewed blind individuals on their expectations for obfuscation tools, noting hypothetical concerns around accuracy, processing delays, and reduced agency and control over their visual content.Moving beyond interview studies, Zhang et al. [87] developed ImageAlly, a prototype system that automatically detects private objects in an image, surfaces that information to a blind user, and, if desired, supports the user in handing of the image to a trusted ally for obfuscation, rather than allowing the blind user to edit the image independently.
Despite growing interest in obfuscation, prior work on supporting blind users in independently managing private visual content has been limited to interviews to capture projected perceptions.Our study instead explored how blind participants make use of an interactive prototype designed to support them in independently making decisions about when and how to obfuscate an image.

Accessible Visual Content Sharing Support
In the context of visual content sharing and editing, blind people tend to desire image descriptions beyond those typically recommended by general guidelines (e.g., [17,57,83]) and have more concerns around description accuracy [34,63,88].For example, aesthetics and potential experiences triggered by photo content are considered relevant to photo-sharing decisions by blind individuals [30,34,63,88].Information related to spatial positions of objects and modifcations on image content is critical for visual layout editing tasks [11,56,63].Because of the abundance of visual details needed, cognitive load is a key challenge.Prior work suggested providing a quick, intuitive visual summary and opportunities to further explore image details [38,63].As many blind people lack understanding of visual concepts and design standards, support for learning in these areas is also important [41,59].Building on these insights, our study explored how non-visual image editing support should be designed to improve private photo obfuscation accessibility.

METHOD
We conducted 12 user studies to understand how blind individuals approach using computer vision methods to independently manipulate visual privacy obfuscation and gain actionable insights for accessible tool design.Participants used two versions of a midfdelity obfuscation prototype to manipulate private photo content: (a) a Wizard-of-Oz version (with functionalities pre-confgured by researchers) for probing design feedback without distraction from algorithmic inaccuracies, and (b) an of-the-shelf version (implemented with newest of-the-shelf models) to explore how inaccuracies in AI models may infuence participants' use of them.In this section, we describe the prototype and study design.

Prototype Design and Implementation
3.1.1Prototype Design.Informed by blind users' desire to control obfuscation decisions with manageable cognitive load [8,71,87], this prototype provides support for users to non-visually manipulate private objects in photos.The prototype automatically detects user-specifed private objects in photos and allows users to decide whether and how to obfuscate them.The prototype is presented through a simple one-page user interface that prioritizes easy navigation using screen readers and consists of two main components: (1) an explore image section and (2) an edit image section (as shown in Figure 1).
Explore Image: The prototype frst presents users with a highlevel caption and then a touch-based explorer for learning photo layout-as users touch diferent areas of the photo, object names and text surrounding the area are announced.We designed this feature to provide photo descriptions at diferent granularity in supporting better interpretation, as inspired by [38].
Edit Image: The prototype detects potential private objects and displays each object's caption in the 'Item' drop-down menu (without a particular order).From this menu, users could choose to obfuscate any detected objects or the background of the photo focus.Users could also confgure a small set of most common obfuscation settings (based on [9,29,44]): from the 'Style' drop-down menu, users can choose to blur, blackout, or erase (i.e., removing a private object and inpainting background to fll the area) the private content; from the 'Shape' drop-down menu, they can confgure the obfuscation area either to ft the exact shape of the private object or to be a bounding box (rectangle) fully enclosing the object.We use this set of options as a starting point to elicit participants' preferences for visual obfuscation manipulation choices.Upon applying the obfuscation, users can review the resulting photo in the explore image section.
Figure 1: Prototype user interface design for (1) explore image section, including a high-level caption (left) and a touch-based explorer with captions displayed for each object bounding box (middle), (2) edit image section (right).The prototype allows users to explore the image and obfuscate private objects by choosing from a list of all potential private objects detected by the system.The interface was designed to mimic a standalone smartphone app but was implemented as a webpage for study participants' easy access.Displayed captions are from the Wizard-of-Oz version.Please see Supplementary Materials for object captions from the of-the-shelf version.
3.1.2Prototype Implementation.We implemented two versions of the prototype that difered in underlying methods to describe image content and detect private objects: Of-the-shelf Version: This implementation employed existing AI models for describing photos, identifying private objects, and obfuscating those objects.This version allowed us to understand whether and how existing AI models can support non-visual obfuscation interpretation and decision-making, with a focus on blind participants' reactions to likely inaccuracies.We adopted an approach used by the Caption-Anything image processing tool [82] by employing the Segment Anything Model (SAM) [36] and BLIP2 [42] to locate and caption objects in an image.We frst used SAM to segment all objects in an image, constructed a bounding box for each segment, and cropped the image area within the bounding box to feed into BLIP2 for caption generation.Then, to detect whether an object belongs to any user-specifed private categories, we used the ChatGPT-3.5API [52] to process each caption with the prompt: "Does the following sentence mention anything related to a [private object category]?Answer yes or no.The sentence is: there is [caption]." This implementation choice was informed by two considerations: (1) SAM and BLIP2 both do not require training on blind people's private images (the collection of which can be challenging and potentially unethical) and thus produce better results on photos used in our study than common of-the-shelf object detection models; (2) SAM and BLIP2 are easily accessible and do not require additional fne-tuning as most state-of-the-art models do, which increases the replicability of our implementation approach for future research.For consistency, our prototype generated high-level captions through BLIP2 as well.
Although this processing pipeline is capable of detecting any private object category, we limited our implementation to fve private categories that previous work identifed as especially concerning for blind individuals: (1) medical, (2) fnancial, (3) personally identifable information, (4) impression management-related, and (5) faces [7,26,73].Limiting our set to fve object categories was useful for two practical reasons: (1) by instructing all participants to focus on obfuscating the same private object categories, we could more consistently analyze their reactions to the prototype design; (2) models for segmenting and captioning images often require extensive computational power and can cause unreliability during user studies-limiting private object categories allowed us to preprocess images prior to the study sessions.To achieve improved model performance, we used concrete object names to represent the fve abstract private object categories in the ChatGPT prompts: credit card, pill bottle, human, sexual product, and paper document.We intentionally left the paper document a broad category to allow users to decide the privacy risks of the document content themselves, as suggested by [8].
The of-the-shelf models used in this prototype version tend to produce the following inaccuracies: (a) BLIP2 could mis-categorize objects, such as "mangoes" as "potatoes"; (b) SAM could segment sub-parts of an object, leading to duplicated objects identifed by the prototype (we used intersection-over-union to remove duplicates but left smaller sub-parts in case users want to hide only a small area of an object); (c) BLIP2 could inaccurately describe unclear or abstract image areas, such as those with obfuscation efects (e.g., blurred).We focused on understanding how participants react to these inaccuracies in the study.

Wizard-of-Oz Version:
To understand what participants' experiences could be in the future with even more accurate underlying models, we implemented a Wizard-of-Oz version of the prototype.In this version, researchers assessed the results of of-the-shelf models and authored a ground truth for each result.Due to the high performance of existing tooling for optical character recognition, image visual efect application, and image segmentation, we limited the scope of the Wizard-of-Oz components to two automation tasks: (a) identifying, locating, and describing objects and (b) detecting private objects.As the of-the-shelf version, these researcherannotated results (e.g., all objects' descriptions and bounding boxes, detected private object list) were manually inserted into the prototype system prior to the study, so that participants could operate the two prototype versions in the same manner.Two researchers collaboratively generated this ground truth information.Following image description best practices (as suggested by [17,27,57,74,83]), we decided on six object description rules: (1) for each object, focus on describing what the object is, its salient characteristics (e.g., color, identity, number, pattern), and actions; (2) if the object's bounding box includes another object underneath the primary object, describe this spatial relationship (e.g., a black cat lying on the couch); (3) if there are multiple objects of the same type close to each other, provide one description for them to avoid confusion (e.g., one description for three mangoes, instead of three descriptions for each mango); (4) for obfuscated areas, describe their corresponding visual efects, shapes, and colors if relevant (e.g., a blurry rectangle with yellow and white colors, a black human silhouette); (5) for visual artifacts left from obfuscation (e.g., unnatural in-painting), briefy describe what the unnatural area looks like to the annotators (e.g., a moving, blurry blue object); ( 6) for high-level image caption, describe all salient objects in the image.These description rules served as a starting point for us to explore image description best practices in the context of visual obfuscation manipulation.
For both the of-the-shelf and Wizard-of-Oz version, we programmatically applied visual obfuscation efects-(1) blackout: set all pixels of the obfuscation area black; (2) blur: applied a Gaussian blur with a high radius value (80) to the obfuscation area; (3) erase: inpainted the obfuscation area with surrounding background using the LaMa (large mask method) tool powered by the SOTA AI Model [62,77].The touch-based explorer additionally featured Microsoft Azure AI's optical character recognition model [48] for text detection.To ease access for study participants, we implemented this mobile application prototype as a webpage and instructed participants to use it through their smartphones.A demo video is included for this prototype in the Supplementary Materials.

Participants
We recruited 12 blind participants through the National Federation of the Blind mailing list and word of mouth.Participants had to be at least 18 years old, identify as blind or legally blind, and have experience taking photos.To ensure consistent screen reader behavior with our prototype, we limited recruitment to iPhone users.As shown in Table 1, participants were 24-59 years old ( = 36, = 40.2),with eight identifying as female and four as male (openended description for gender), and all self-reporting to be either totally blind (N = 7) or with some light perception (N = 5).Their visual condition onset ranged from birth to 40 years old, with the majority beginning at birth (N = 7).Five participants had no visual memory, three had limited visual memory, and four had signifcant visual memory.In terms of photo-sharing experiences, participants most commonly shared photos 'once a week' (N = 7), followed by 'once a month' and 'less than once a month' (N = 2 for both), with only one (P7) sharing once a day or more.Participants' experience with photo editing was more limited-the majority had never edited a photo (N = 8), with three editing less than once a month and one editing approximately once every month.For photo sharing and editing, participants used mobile phones (N = 12) and desktop or laptop computers (N = 3) with screen readers (N = 11) as well as remote sighted assistance (N = 8).

Study Protocol
Participants flled out a short pre-study survey about their demographics and experiences with photography tasks, and then participated in a 90-minute remote study session via Zoom.The full protocol is included in the Supplementary Materials.Below, we detail the photo choices for the obfuscation tasks before summarizing the study procedure.

Obfuscation Task Photo Selection.
To provide a degree of ecological validity for the study tasks, we selected photos primarily from the BIV-PRIV dataset [66]-a dataset that contains photos taken by blind people of fake "prop" private objects, such as medical and fnancial documents, pill bottles, and sensitive objects that could raise privacy concerns (e.g., condoms, pregnancy tests).
We selected one photo from each of fve especially concerning categories (i.e., (1) medical, (2) fnancial, (3) personally identifable information, (4) impression management-related, and (5) faces).The frst four of these photos were from BIV-PRIV, while the face photo is a stock photo from [58], as photos with faces were not included in BIV-PRIV.We used one of these fve photos (a credit card) to familiarize participants with the prototype (Table 2), while the remaining four photos were reserved for the main photo obfuscation tasks.Table 3 shows the image descriptions and object detections for each of the main obfuscation task photos for both the of-theshelf and Wizard-of-Oz prototype versions.Each of these photos was presented to participants alongside a photo sharing scenario, such as "You took a photo of some newly bought mangoes for a fruit review post on social media" for photo (c) and "You took a photo of your new ofce space to post on social media" for photo (a) (Table 3).

Participant
Gender Table 2: Photos used in familiarization task (left, showing a credit card) and to demonstrate a detection error (right, showing a condom box that is undetected as a private object).For these two photos, we only presented caption and private object detection results (associated privacy category indicated in the bracket) from the of-the-shelf implementation to participants, as the left photo was used only for introducing participants to the interface elements and the right photo was meant to show potential of-the-shelf model inaccuracies.
We also selected one detection error demo photo.While the of-theshelf models made minor mistakes in describing most photos from the BIV-PRIV dataset, they produced more false negative object detections for photos that were visually crowded.We included one of these photos where the prototype missed detecting a condom box (Table 2) to understand how participants react to this type of inaccuracy.
Last, we included a personal non-private photo.In the pre-study survey, participants had an option to voluntarily upload a photo they had taken recently to edit with our prototype.To protect participants' privacy, we asked for this photo to not contain any actual private information, but instead, the researcher picked a nonprivate object from each photo's background to ask participants to obfuscate.While this approach did not provide an opportunity for participants to obfuscate their private information, it allowed us to learn about how participants may experience our prototype diferently with their own photos compared to others'.

3.3.2
Procedure.The study session was conducted via Zoom and included three parts: (1) initial understanding and familiarization task, (2) main image obfuscation tasks, and (3) post-study interview.Participants were required to join the Zoom call from a smartphone and share their phone screen during the study tasks (with consent).Prior to the study, we emailed them instructions for accessing the mobile prototype website and asked them to keep the site open in a browser tab when joining the call to avoid additional browsing that may increase privacy disclosure risks.With a researcher's support, all participants were successful in setting up the study environment.Table 3: Overview of the four photos used for the main photo obfuscation tasks, showing descriptions and private object detection results (associated privacy category indicated in the bracket) from both the of-the-shelf and Wizard-of-Oz implementations.
Each participant edited two of these four images using the prototype, in addition to the detection error demo photo (Table 2) and an optional personal photo that they could bring themselves.The of-the-shelf version of photo (a) shows an example of multiple bounding boxes inaccurately detected for one object.

Initial understanding and familiarization task:
The researcher frst guided participants through editing the credit card photo shown in Table 2 using the of-the-shelf prototype version.In introducing the prototype, we asked participants to imagine they had confgured the system to detect the fve private object categories (i.e., credit card, pill bottle, human, sexual product, and paper document) and explained that in the future it could be confgured to detect other specifc objects of interest.
To gauge initial understanding of relevant visual concepts, participants were instructed to read through all obfuscation options in the 'Edit Image' section and describe what they expected each option would do to the photo.
The researcher then provided a verbal description to clarify each option, along with a tactile metaphor for the obfuscation styles as follows: • Background vs. primary object: "...hiding everything behind the most prominent object in the image, if there is one.For example, if you took a photo of an apple on a kitchen countertop, the apple should be the primary object, and everything else on the counter-top is considered background." • Obfuscation style: -Black out: "Removing the content a user wants to hide, leaving the area black", with tactile metaphor, "Imagine a plastic plate with tactile patterns that depict the shape of the United States.We cut a part of the plate so that you can't tell that the depicted shape is of the United States anymore, but you can feel a hole on the plate." -Blur: "Making content that a user wants to hide less clear by adding noise to the area of the image'', with tactile metaphor, "putting a soft fabric on top of the same plastic plate, so that you can feel the shape on the plate less clearly." -Erase: "Removing the content a user wants to hide, and flling it in with non-sensitive content that naturally blends into the photo," with tactile metaphor, "we again cut a part of the plate but replace it with another piece of plastic with a diferent outline that blends into the rest of the plate seamlessly." • Bounding box vs. exact shape: "Choose the hidden area to either exactly ft the shape of the private object, or a rectangle that encloses the object." After the familiarization session, researchers examined participants' understanding of these concepts again by asking them to provide a defnition for each in their own languages.
Image obfuscation tasks: Participants independently reviewed and obfuscated three to four photos, depending on whether they opted to work on a personal photo.They were instructed to think aloud during the tasks and "make decisions about what you want to do in each scenario based on your feelings, judgment, and relevant past experiences-imagine you would share the obfuscated photo on social media."The frst two photos were randomly assigned from the set of four main task photos (ensuring an equal number of participants to process each photo).At the end of editing each photo, we asked participants about: (a) considerations in deciding what/how to obfuscate (e.g., "How did you decide that this image task is completed?","Why did you decide to manipulate the image this way?", "If you were sharing this image to a [coworker/visual interpretation assistant] instead, would you edit the image diferently?How?"); (b) experiences with the obfuscation interaction (e.g., "How would you describe your experience of exploring and editing images with our system so far?", "How ready would you feel if you were to share the photo?");(c) design feedback (e.g., "How useful or not do you fnd the information provided for this photo?","What additional information would you like to know?What suggestions do you have for presenting information?").For the two main task photos, participants were given the of-the-shelf prototype version for one photo and the Wizard-of-Oz version for the other (order counterbalanced).They were initially generally informed that the two tasks made use of diferent algorithms and were given more information upon the completion of both tasks: "task _ is an ideal version of the tool that works fully accurately, whereas task _ is the version that is currently possible through existing algorithms, which can be inaccurate".They were then instructed to try out the of-the-shelf prototype with the detection error demo photo (Table 2) and share how they envision such inaccuracies to infuence their use of AI-assisted privacy obfuscation tools.Last, participants who opted in also tried out the prototype (of-the-shelf version) on their own photos.
Post-study interview: Participants were asked how they felt about the overall idea of using this type of application to support their visual privacy management needs.The researcher also probed for benefts and frictions they foresee in using this type of tool and how participants may use it diferently in real life.The interview ended with an open-ended conversation on how the system could be better designed.

Data and Analysis
Our data collection happened through (1) observational notes, (2) screen recordings of participants' interactions with the prototype, and (3) transcribed audio recordings of the study.Upon study completion, all video clips irrelevant to the prototype interaction were removed, and the remaining clips were cropped to contain only the prototype interface.We adopted a thematic analysis approach in analyzing our qualitative data, as outlined by Braun and Clark [15].The frst author reviewed all transcripts and observational notes to develop an initial codebook and coded through all data.The second and third author then randomly selected half of the coded transcripts to review.They then collaboratively iterated on the codebook and extracted key themes.The frst and ffth author noted down all user actions in using the prototype to triangulate with the qualitative data.

FINDINGS
We report on participants' understanding of visual obfuscation concepts, workfow with the obfuscation tasks, aspects of the prototype they found to be challenging, as well as the related design feedback they provided.

Understanding of Visual Obfuscation Concepts (RQ1)
Pre-existing understanding: The majority of participants (N = 9) learned about the concept of hiding private content in photos prior to attending the study sessions, though none of them had ever obfuscated content in a photo.Specifcally, participants had diferent levels of pre-existing understanding of visual concepts involved in obfuscation manipulation.First, half (N = 6) of the participants clearly understood the diference between foreground and background (e.g., "space like surrounding the main objects in the photo" (P7)), while the other half felt vague or confused about it: "I'm not sure what the whole background is, I guess...any furniture, people, or anything that are in the background?" (P3).Among the obfuscation styles, blacking out was most commonly understood (N = 12), followed by blurring (N = 8), and lastly erasing (i.e., removing a private object and inpainting background to fll the area) (N = 4).Participants were particularly unsure about what would happen to an area once private objects were erased, for example: "Does that get rid of it?Like maybe just take it out altogether...I don't know if there would be anything for you to see though" (P5).Last, most participants (N = 8) also understood how obfuscation could be applied to diferent locations and diferently shaped areas, though some were confused about the relative sizes of bounding boxes compared to the enclosed objects.Participants' pre-existing understanding partially came from personal visual memories (e.g., P11 considered blurring familiar because her previous vision as blurry) and conceptual knowledge, such as "from kind of the concept and analogies" (P6).
Understanding after explanation: With a verbal description and tactile metaphor, all participants felt generally confdent in understanding the above-mentioned visual concepts and were ready to make obfuscation decisions accordingly: "I feel like the descriptions are very easily understandable...I can grasp very quickly what I need to do" (P4).However, they found envisioning some obfuscation results to be less straightforward, such as more complex combinations of options (e.g., the background of a bounding box) and the outcome of erasing.Participants generally understood the erasing option as being able to "completely getting rid of" (P2, P3, P6, P7, P10) the private object, but some could not envision what the area would look like after: "Would there still be like the foral print in the background?Would it erase the pill bottle?And then there wouldn't just be this random spot on the couch?" (P3).P7, for example, assumed "a blank spot" (P7) in the result photo.Confusions on these options sometimes caused participants to avoid selecting them.
In summary, our participants were comfortable learning about common visual concepts involved in obfuscation manipulations, while many had related pre-existing understanding.However, envisioning certain obfuscation results can be challenging, requiring the obfuscation tool to provide efective communication.

User Workfow and Decision Making (RQ2)
4.2.1 Workflow Overview.Figure 2 presents a summary of participants' general workfow.All obfuscation tasks began with an initial exploration of the original photo.On average, each initial photo exploration took 84.9 seconds ( = 21; = 272; = 71.6).Typically, participants made use of a combination of the high-level caption and the touch-based exploration (N = 11).In most cases, participants quickly checked the high-level caption to get a general idea about the photo and used the explorer feature for further details.For some tasks, they also went back and forth between the two to understand how they corresponded to each other (N = 4).Occasionally, only one of the two features was used (explorer only for 5 out of 41 tasks; high-level caption only for 3 out of 41 tasks).P6, for example, felt that the high-level caption was enough for his initial exploration of photo 2: "I didn't necessarily feel like I needed to know the placement of the images at the moment" (P6).
In using the touch-based exploration, most participants wanted to gain an initial understanding of the photo layout (N = 10) and identify potential private objects (N = 9): "just trying to see if there was anything potentially that could give of any information" (P7).The majority tried gaining this information by both (1) touching diferent areas for spatial information and (2) swiping left and right to go through all objects-"Swiping is defnitely easier but...you wouldn't know exactly where it was" (P8)-though many would like to directly hear verbal spatial descriptions.From this exploration, participants were sometimes able to quickly detect and locate private objects, but were also often left unsure, especially with incorrect or insufcient AI-generated captions from the of-the-shelf version of the prototype.
After exploring the original photo, participants then focused on whether and how to apply obfuscation.The total time spent making obfuscation decisions for one photo was on average 128.9 seconds ( = 59; = 260; = 63.5).Participants relied on both their own judgement from the earlier exploration (e.g., "based on the description, it looks like what I want is just to show the mangoes" (P6)) and the system-detected private objects in forming the obfuscation decisions (detailed considerations in Section 4.2.3).After applying an obfuscation, participants always reviewed its efects through the high-level caption (14 out of 41 tasks) and/or touch-based exploration (all tasks).For approximately 60% of all tasks, participants were able to make the fnal decision in one try, whereas for the remaining 40%, they were less sure what manipulations would work best and adopted a trial and error approach by testing out and reviewing a range of options (N = 9), as P4 did to ensure that the focus of the photo, a cat, was unafected by the obfuscation: "I will black it out this time, and I'm gonna hide the exact shape just to see what it does, partly cause I don't want it to get on the cat." The highest number of obfuscation adjustments our participants made for one photo was seven times by P5, followed by P9, who also changed and reviewed his obfuscation fve times: "I was experimenting with how each one will be" (P9).
In reviewing obfuscation changes, participants in general needed to explore through all objects to evaluate the area that had been afected and ensure the absence of private information (N = 12).Some (N = 3) attempted to identify the exact change efciently by memorizing the location of the private area in the original photo: "I could tell kind of where I needed to touch to know that object had been hidden" (P4).However, this approach was not always reliable and at times led to misunderstanding.For example, P8 missed a metal door near the private object when she initially explored an photo, so when she heard a door announced post-obfuscation, she thought it was an outcome of the obfuscation process: "I think where the card was, it now describes it as a middle door with a hole in it...it's funny how it picks up diferent things" (P8).P1 and P11 also tried to use the order of objects appearing in the image explorer as an anchor to track where the changes were supposed to happen, though our prototype was not designed to keep objects in the same order and thus did not support this approach.Participants also used various strategies to check potential inaccuracies (N = 10), such as noting inconsistency in captions (N = 9) and using common sense: "It could be a toy or something.I don't think you'd actually put a dog in the plastic bag" (P9).

4.2.2
Experience with the Study Prototype.Overall, participants found the prototype easy to operate (N = 11).However, they did not feel confdent enough that they would be willing to share all photos they had obfuscated.Only 11 obfuscation results from the 24 main photo tasks were considered ready to share on social media, with six of them deemed absolutely not shareable, and six of them deemed difcult to decide (one task was not completed due to a technical issue).
While participants found both the of-the-shelf and Wizard-of-Oz prototypes to be straightforward, many mentioned having a more positive experience with exploring and obfuscating photos in the Wizard-of-Oz tasks (N = 8)-they felt clearer about whether the result photos were shareable or not: "When I tried to remove the bottle, it still described it as like a bottle being there (Table 5), so I knew you could still kind of see it" (P3).In contrast, participants found it difcult to judge the result of an obfuscated photo when the of-the-shelf tools provided inaccurate descriptions: "...it said something about there being a blank laptop screen...it didn't really make a lot of sense to me" (P3).Section 4.3.1 provides further description of this specifc concern.Still, even with the Wizard-of-Oz prototype version, participants mentioned a range of frictions that they experienced, including inefective obfuscation communication (Section 4.3.2) and high cognitive load (Section 4.3.3). 4 presents the obfuscation choices made by participants across four private content types in the independent image obfuscation tasks.In terms of the obfuscation style, participants generally preferred blacking out or erasing but commonly chose blurring for human faces (5/5 participants who completed the task).For obfuscation shape, the exact shape seemed to be overall preferred, as chosen by all for both paper documents and sexual products, by all but one for human faces, and all but two for pill bottles.For the two tasks where participants had the option to choose between obfuscating the image foreground and background, more of them chose the former.
First, all participants considered the level of perceived privacy an obfuscation edit provides for diferent private content types.For example, when dealing with highly sensitive private objects, such as a pill bottle or a condom package, participants commonly felt that erasing the object would be the safest, as blacking out and blurring both risk catching viewers' attention and suggesting the appearance of something private: "I just fgured it would draw less attention...your eyes would go to the blurry part to try and make out what it was" (P7).Seven participants avoided using blurring, as viewers "could sort of squint and see" (P4) the hidden content and that technology could "take that image and bring it more into focus to where it can be read" (P11).Participants were also aware of risks related to disclosing the shapes of private objects.For example, P8 was concerned that the shape of a person alone may be identifable and decided to use the bounding box option: "to kinda hide a little more as to who the person was." When participants were particularly concerned about privacy disclosure or unsure about private content location, they chose to hide the entire background: "Recognizing that there's a lot of clutter with some text that may or may not be there, it was just easier to almost aggressively hide everything" (P6).Participants' willingness to share the resulting images after obfuscation also varied across private content types.In particular, the majority of participants who worked on the photo (b) (couch with a pill bottle, as in Table 3) did not want to share it even after obfuscation (4 votes out of 6).Many mentioned that they were especially concerned about revealing private medical information, such as: "I'm not as ready as I would like just because the bottle is still on the couch...it doesn't say that they can see the text, but I would still be kind of cautious because that was a lot of information" (P5).In comparison, the other obfuscated photos all received more votes for being ready to share compared to not shareable.
Second, participants were commonly concerned about obfuscation afecting information a photo is supposed to deliver (N =  Obfuscation choices are ranked in popularity within each category.In summary, blackout style, exact shape, and foreground seem to be the most popular, though the choices varied based on private content categories and related considerations, as described in Section 4.2.3.*For the two tasks where the image focus was the background (e.g., a whole ofce space), participants were not provided the option to obfuscate the entire background.*One participant (P8) chose not to make any edits for two image tasks and one participant (P12) did not complete one of the tasks due to a technical issue.Therefore, three of the four categories only had obfuscation choices from fve participants, whereas the other one had six.11).For example, P12 considered how hiding an entire credit card may impede the original photo sharing goal-to fnd the owner of a lost wallet: "Not everybody is gonna have a Morgan Chase card.I don't.But I've got a tan wallet.When I can show someone a picture on social media of both items without identifying information that narrows down the population of people that it likely belongs to" (P12).He therefore hoped to hide just the text on the card but not the card design.Some participants also wanted to hide more content to prevent irrelevant content from distracting the information they intend to deliver: "there's a lot of stuf that people don't really need to know that I would almost want to hide everything but the animal" (P5).A number of them (N = 4) associated certain obfuscation styles with cultural meanings that they considered appropriate for only specifc scenarios: "I know that they used to black out people's eyes in police lineups...that would at least show that there was something there but we can't tell you anything about it" (P10).Third, some participants considered the visual presentation of obfuscated photos in making decisions (N = 9).For example, participants were concerned about the resulting photo looking "weird" (P5, P6), "funny" (P5), or "unnatural" (P1, P2).Specifcally, P1 and P2 considered blacking out the background behind a cat unnatural: "I didn't want the cat to be shown in the air" (P1).P9 was also concerned that mangoes on top of a piece of blacked out paper will appear as "burned mangoes" (P9).Participants occasionally wanted the obfuscated photos "clean" (P6, P8) and "attractive" (P3).For example, P6 mentioned wanting to "edit that image even further and maybe put the mangoes in the center of the image, " and P3 considered removing unattractive items, such as "a radiator" (P3) if the photo was meant for showing a new ofce space.
Participants mentioned that the above considerations would likely shift across photo-sharing contexts.For example, when the recipients are remote sighted assistants, participants generally felt less privacy concerns with sharing unaltered or blurred photos: "cause I mean they signed confdentiality agreements" (P4).In contrast, participants wanted safer obfuscation options (e.g., entire background, black out, erase) when sharing with coworkers and social media, depending on how close their relationships are: "...if I'm sending the picture to the news, it's gotta be perfect.If I'm putting it on social media, it's gotta be close, and then if it's to send it to my brother...he's not gonna care if there's something in there" (P10).Further, some participants only felt the need to refne their photos in more formal or public occasions-for example, P6 would only consider cleaning up a photo if it was for social media, and similarly, for business-related photos, P2 wanted to "make sure that (the photo) looks professional before I post it."

Frictions and Design Insights (RQ3)
Participants experienced a range of frictions in reviewing and obfuscating the photos, some innate to using AI-assisted photo obfuscation tools and some likely addressable through design.

Efects of Inaccuracy with the
Of-the-shelf Prototype.Despite strategies for identifying inaccuracies (as described in Section 4.2.1),we observed that all participants were misled or confused by captions sometimes generated from the of-the-shelf models.For example, the models tend to generate inconsistent object captions for an object (e.g., mangoes described as lemons or potatoes, a stufed animal as a dog or bear) when its surrounding area was obfuscated diferently-with which, participants felt insecure: "I can't be sure if it was a teddy bear or a dog... the frst one told me it was a stufed teddy bear and then the second one told me it was a dog" (P12).For objects that were consistently mis-described (e.g., a purple condom bag as a purple toy), participants had no clue that there was inaccuracy: "Well, obviously I got confused.Because I took it for its word" (P12).Inaccurate captions led to the most confusion when describing an obfuscated part of a photo.Participants found it surprising and "ridiculous" (P10) that the caption model attempted to identify blurred, blacked-out, or distorted areas as objects, and in doing so producing false positive object detections (as Table 5 presents): "It is kinda strange how it's diferent depending on which way you hide it...more things are appearing" (P10).Participants were sometimes unsure what the system actually did and thus were hesitant to share the photo: "It's hard to tell, you know, what is accurate and what isn't" (P6).
For inaccuracies related to private objects being mis-detected, participants had mixed feelings.Some considered such errors unsurprising: "I guess it would be tough to have it recognize all kinds of boxes for some stuf in it.So I don't think it [detection errors] can

Original Erase & Exact Blackout & Bounding Box Blur & Exact
"A blue bottle of pills sitting on a bed." "A blue pill bottle with small blurry texts on the couch." "A blue paint brush is being used to paint on a piece of fabric." "A black background with a white clock on it." "A blue and white cup sitting on a table." Of-the-shelf

Image Caption
A couch with a blue and black foral pattern.
A person sitting on a couch with a laptop.
A close up of a couch with a foral pattern.
A white and black foral patterned couch.

Wizard-of-Oz
A close up of a beige colored couch in foral patterns, with a blue pill bottle in the corner.
The pill bottle has blurry Image Caption small texts.
"A moving, blurry blue object on the couch.""A black rectangle." A close up of a beige colored A close up of a beige colored A close-up of a beige-colored couch in foral patterns, with couch in foral patterns, with couch in foral patterns, with a moving, blurry blue object in a black rectangle in the a blue bottle partially painted the corner.
corner. in beige in the corner.
"A blue bottle partially painted in beige on the couch." Table 5: Example obfuscation applications on the photo (b) (as in Table 3) with descriptions for the obfuscated area (displayed next to its bounding box) and high-level image captions provided by both Wizard-of-Oz and of-the-shelf implementations.
be avoided" (P8).They felt that as long as the caption models were correct and let them know the existence of potentially private objects, they could physically explore and retake the photos to avoid privacy risks (N = 5).For photos participants took themselves, they also generally had a better sense of the objects in the scene and felt more confdent in correctly judging the accuracy of the descriptions (N = 6).Other participants, however, were more concerned: "You can't (always) go back in time and remove it in person.You know, the picture is the picture and you got to be able to remove it after the fact" (P10).
To mitigate the risks of inaccuracy with automated image editing tools, participants described several possible approaches: • Obfuscation Freedom: Participants commonly desired the freedom to select any object in the photo to be obfuscated (N = 11), given that they did not expect AI to be able to fully accurately detect objects considered private to them across contexts: "I like having more control over what I'm able to potentially hide" (P6).Participants wished to further select a specifc part of an object-e.g., "texts" (P5, P8, P12), "just the face" (P11) or a specifc photo area with multiple objects: "maybe divide into 4 or 6 squares...and you could choose to get rid of one" (P11).P5 and P7 also desired to make the selection directly in the image explorer, while P4 wanted to gradually apply obfuscations layer by layer: "you hid the pill bottle, but it still showed up as a blue thing.What if you could go in and edit it again and hide the blue thing?" (Table 5).While not wanting their obfuscation options to be limited by AI, participants did note that AI could be helpful in reminding them of potentially private objects, given that they themselves could also miss some objects-"in case I missed it, say hey here're possibly private items. . .do you want to double check?" (P9).
• Multiple Information Sources: Three participants considered checking the obfuscated photo descriptions with another AI as a way to gain more information and confdence before sharing photos.They thus suggested the tool to incorporate other AI algorithms' assessments for users' easy access: "building a really quick way to send it to another AI application like, you know, ChatGPT or Be My Eyes" (P6).Similarly, participants also desired ways to incorporate sighted assistance more efortlessly through the application.
In particular, many would like to assess the accuracy of this tool with sighted input prior to using it in real life: "if I'm editing 10 photos, then if I confrmed with another person and noticed it's making no mistakes...it's like a relationship like you built up the trust" (P9).• Communication about AI Accuracy: A number of participants also suggested including more information about how well the AI performs to set users' expectations, such as through quantitative measures (e.g., confdence score) (N = 2) and instructions that explain what the tools tend to pick up to encourage critical thinking from the users themselves (as suggested by P6).

• Improvement of Model Accuracy with Visual Efects:
Besides design improvements, the AI models themselves should consider improving performance in captioning not only clear images but also visual efects, such as blurriness, distortion, and shapes in solid colors (N = 6).

Inefective Obfuscation Communication.
As mentioned in Section 4.1, envisioning obfuscation results can be difcult for blind participants.Descriptions for obfuscation manipulation efects are thus critical.Our prototype communicated obfuscation efects through a high-level summary of the resulting photo and a touch-based exploration of objects identifed in the scene.While most participants (N = 8) reacted positively to this approach, we learned that it did not fully support their interpretation needs.First, participants experienced challenges in locating and evaluating obfuscated areas (N = 9).As suggested in Section 4.2.1, participants were often unsure which description was meant for the obfuscated area and needed to explore through many unafected objects to arrive at where they were trying to review.
Participants (N = 6) also found that the object descriptions did not provide the most efective information for them to "determine whether the efect there was applied successfully" (P6), including to what extent the obfuscation had hidden the private information: "I wouldn't feel very comfortable because I'm not sure of how blurred out this image is.I'd need some kind of reassurance" (P12).Participants felt that the mere absence of a description for the private object did not provide enough reassurance: "it says it's blurry.I'm assuming you can't tell what the pill bottle says, but I don't really like that" (P3).
Further, participants felt they lacked information about the original photo's composition needed to make informed obfuscation decision, such as objects' relative positions (N = 10)-"...is the woman sitting in a ffth chair like away from the table, or is she actually at the table?"(P7), and what was in the foreground of the photo versus background (N = 5).In turn, they were unsure of the objects that would be afected by hiding diferent obfuscation choices.
Accordingly, we present design insights related to challenges with conveying the output of an obfuscation action: At the same time, many participants also felt that the information (N = 6) and options (N = 3) provided in our system were just right: "It was just enough to get an idea of what the picture was, but not enough to be overwhelmed" (P5).In fact, despite the concern around information overload, the majority of participants mentioned at least one additional type of information or functionality they would like the system to provide on top of the existing ones (N = 8), including but not limited to "colors" (P12, P9, P3) of objects, "room decorations" (P3) and "people" (P3, P7, P10), as well as the option to crop a photo (N = 4) and editing photos beyond obfuscation (e.g., photo touch-up, flter) (N = 5).Balancing participants' desire for exploration and concern for cognitive load is therefore a challenge.
Participants mentioned the following relevant design improvements: • Minimal design: Overall, participants valued simplicity (N = 7):"Our favorite apps are like the ones that got one option.You turn it on and it works" (P10).They (N = 6) found photos with a smaller number of object captions much easier to interpret and in turn wished to combine repetitive captions-"a stufed animal with blue bandana, we know it's (also identifed as) a teddy bear...just merge the two descriptions" (P12)-or describe objects that belong to the same categories in one caption, while clearly indicating the total number and locations of these objects.A number of participants also particularly desired a "minimal use of sound" (P4) in reducing their cognitive load (P10, P12, P4): "SeeingAI makes noise like music when you're sliding your fnger around...I fnd it a little bit annoying to be honest" (P10).• Confgurable design: While the tool design should be overall minimal, it should also accommodate the varied preferences for how much and what information and functions should be provided.Many participants (N = 6) appreciated being able to choose between quickly checking the highlevel summary and diving deeper into the photo exploration.Following this approach, they further suggested options to get additional information or functionality as they desire, such as through a "button where you could get all the details of the image" (P5), a non-visual "zoom in" (P7, P12) function to get more information of a focused area, a "setting page" (P9) to personalize information included in photo descriptions (e.g., colors, text identifcation, people characteristics, position), as well as "two modes...a photographer mode where you can go in and do fancy touch-ups...then there is a simple (obfuscation) mode" (P4).
Overall impression: Despite frictions experienced with the prototype, all participants were excited about the overall idea of a screen reader-accessible, AI-powered obfuscation tool-many were eager to use it in everyday life: "I wish this was actually real that I could take pictures and edit like this" (P5).Even with existing frictions, some participants felt they would still make use of this prototype in certain ways, such as for more casual scenarios (N = 5) or "to take a quick picture to send to someone" (P6).Their urgent desire stems from desires for independence and control over visual content: "just being able to have more of an awareness of the things that are in image and being able to be more of an active participant in that process...I just really like being able to do stuf like this independently and have an accessible tool that allows you to do it efciently...since we never really had that opportunity" (P6).Some commented on the necessity of this tool when sighted help is not available: "There's a lot of people who don't have someone who can help them edit their pictures" (P3).Even when needing to check obfuscated results with a sighted person, participants appreciated being able to control the photo themselves frst: "I may elicit a close friend who cited to make sure that the picture didn't have any elements left that shouldn't be shared, but I'll still use it to do most of the editing independently myself before" (P11).Participants all believed that with design improvements to reduce these frictions, this tool would bring signifcant positive impact to their life: "I think this is a program that has very high potential if...you take the suggestions and comments that the participants give you" (P11).

DISCUSSION
Our study explored an AI-assisted obfuscation tool design to support blind individuals in independently controlling and editing private visual content in their photos.Our fndings revealed that blind participants were able to use our prototype to interpret and manipulate visual obfuscation details based on considerations related to utility-privacy trade-ofs [10,29,35], albeit sometimes encountering frictions related to accuracy, non-visual communication, and cognitive load.Participants proposed directions for future design ideas to address these frictions and were hopeful that, with such improvements, AI-assisted obfuscation tools would support their agency in private visual content management.These fndings extend prior visual privacy management support for blind individuals (e.g., [8,87]) with new insights for AI-based tool design to allow more user agency in non-visually manipulating visual privacy obfuscation.Here, we discuss the implications of these fndings.

The Role of AI in Accessible Visual Privacy Obfuscation
Participants generally appreciated the level of user control provided in the prototype.Besides controls recommended by prior interview studies (e.g., dismiss/consent obfuscation) [8,71], our study shows that options for confguring obfuscation styles and control over the obfuscation area can help blind users manipulate images to meet needs across private content types, recipients, and visual presentation needs.Further controls could even be useful, such as being able to obfuscate any object in the image, adjust obfuscation characteristics (e.g., degree of blurriness), and crop the image.These additional user-initiated manipulation features could help mitigate the risk that users could feel restricted to only AI-based decisions, especially the inaccurate ones.However, more freedom also increases efort and cognitive load.Balancing user agency and efort is thus a non-trivial design goal that requires considering contextual and personal factors.
In turn, we suggest shifting the role of AI in the obfuscation process based on users' needs, adopting Chung et al. [19]'s framework for creative support tool design.Existing obfuscation approaches mainly focus on an implementation-aiding role where the AI makes most execution decisions for users (what and how to obfuscate).This approach could beneft individuals who have less desire or capacity to confgure obfuscation details in a given situation.However, at the user's command, the system should be able to perform an evaluation-aiding role that provides information key to the obfuscation decision (e.g., private object visibility prior and after obfuscation) but not overpowering what the blind user intends to do with the information.Another important role that has rarely been considered in blind individuals' obfuscation support is ideation-aiding.Participants in our study commonly had difculty envisioning obfuscation efects and would like the system to provide non-visual previews to ease their decisions.Future AI-assisted obfuscation tools should consider including further guidance in these previews, such as ranking options by resulting photo's utility and remnants of private content.
Beyond choosing from these roles, more refned customization could further help meet users' personal needs, such as (a) specifying a user-defned obfuscation style for a specifc private object category (e.g., blur for human face, black out for text) but leaving all other decisions to AI, or (b) allowing only some automated obfuscation (e.g., automatic obfuscation of a particularly concerning category) and keeping the AI on an evaluation-aiding role most of the time.

The Role of Sighted Help in Accessible
Visual Privacy Obfuscation Our fndings also revealed a common desire for sighted help to check obfuscation results, especially when participants felt unsure about system accuracy.Blind individuals' collaborative visual privacy management with sighted friends and family is not new [7,87].While efective and potentially constructive for interpersonal relationships [87], this approach leaves blind individual's privacy management to the availability and reliability of sighted people and entails potential undesired social cost as well as misalignment in obfuscation goals.Participants in our study commented on the value of being able to make an initial obfuscation attempt before involving sighted assistance, for reasons such as independence, efciency, privacy concerns, and autonomy.
Based on these insights, future research can reconsider the role of sighted help in visual privacy management.For example, participants emphasized the need to gradually build trust in an AI-assisted obfuscation system-for initial usage, they envisioned needing more sighted help to assess and familiarize themselves with such a tool.Future tool design could consider mechanisms to support such collaborative assessment of the tool, such as providing a record of sighted feedback of a model's performance across diferent types of photos or a set of example photos for pairs to test and discuss.Beyond this initial learning phase, blind individuals and sighted assistants may fnd it useful at times to work together on the obfuscation manipulation, for which future tool design should reference mixedability collaboration design insights (e.g., [20,21,37,47,56,60]).

Photo Sense-making to Support Obfuscation Manipulation
Our participants made obfuscation decisions based on how they envisioned viewers may interpret the resulting photo (Section 4.2.3),refecting prior work with sighted users [10,29,35].However, these decisions rely heavily on sense-making of the obfuscated photos, posing more challenges for blind individuals compared to sighted counterparts.
Extending work on image sense-making support for blind people (e.g., [38,49,70]), we note some needs similar to general photo exploration (e.g., inclusion of spatial information in caption, hierarchical access to photo content, variation in visual information wants, preference for objects presented as a list) [38,70], but also other needs unique to private photo obfuscation.In particular, participants needed descriptions beyond object labels-they desired concrete information about the visibility status of the private object and visual appearance of the obfuscated area-which existing tools for visual interpretation fail to support.Computer vision models are known to work less accurately with blurry or dark photos taken by blind individuals [26], and in our study, obfuscated areas often resulted in new false positive object detections.One solution could be to develop models or pipelines that are able to identify obviously obfuscated areas (e.g., blurred or blacked out) rather than attempting to classify those pixels as a non-obfuscated object.These areas could also be described, for example, as "blurry" or otherwise manipulated.
Future tool design should also consider better guiding blind users to understand the results of an obfuscation, such as by summarizing the diferences between the obfuscated and original imagessuggested by our participants and in prior work [56], using multimodalities to facilitate visual change perceptions [68], and letting users switch between a number of versions quickly to make the contrasts between diferent options more salient.Regarding the varied information-wants people may have for an obfuscated photo area, involving a visual question answering mechanism could be particularly helpful [71].

Accessible Image Editing Beyond Obfuscation
Our participants showed strong interest in using features of our prototype for general image editing, echoing interest from the blind community in visual content consumption and creation (e.g., [40,63,86]).To date, research on non-visual photo editing support has still been sparse [13,54,81].Some of our design recommendations could apply directly to this general image editing space-such as providing non-visual previews for diferent visual efects and caution around cognitive demand, though other needs would likely difer.For example, many participants were interested in aesthetic photo touch-up, for which feedback around an edited photo's artistic characteristics, such as styling, mood, angle and lighting, as well as the appearance of the focal fgure (e.g., person, animal, scenery) would likely matter more compared to what is needed for visual privacy obfuscation.Towards this extension, future research could consider existing general visual art description guidelines [40] and explore how such guidelines may or may not apply from an editor's perspective.This knowledge would be critical to future development of AI as well as training of sighted help for assisting photo review and editing.

Limitations
Because of the early stage of this research, our paper focused on a qualitative, exploratory study, using a preliminary prototype design that relied on pre-processing photos.This approach inherently limited what tasks our participants could do, including what photos they obfuscated and the types of objects surfaced to them.Although this approach allowed us to gain an understanding of initial reactions to such AI-based support, these insights may or may not generalize to use in the feld.For example, all photos in our study contained only one private object, due to characteristics of available private visual dataset [66], which limited us from exploring design considerations relevant to situations where multiple items need to be obfuscated (e.g., ranking and categorizing detected private objects).Future studies should consider building a higher-fdelity prototype to further examine the efectiveness of AI-assisted visual obfuscation tools and related design considerations including and beyond the ones proposed in our work.To build such a prototype, technical innovations in the underlying computer vision models are necessary towards better processing of private photos taken by blind people, innovating multimodal models to segment and edit user-specifed visual content, as well as computational optimization that allows on-device computation for users' privacy preservation.Future work could also consider exploring ways to refne the Caption-Anything & ChatGPT private object detection approach, such as by including the OCR result in object captions for ChatGPT to process or incorporating alternative large language models to enhance the detection of captions relevant to diferent privacy categories.Further, we did not obtain an in-depth understanding of participants' past photo editing and obfuscation experience (e.g., editing tool usage) which would likely afect their reactions to new image editing tools.We encourage future studies to further situate accessible visual privacy obfuscation tool design in blind individuals' frst-hand experiences.Last, all authors of this work are sighted and could have potentially brought bias to the design and research practices.We practice refexivity [64] and have sought to center design ideas from blind individuals' perspectives.

CONCLUSION
In this work, we explored how blind individuals react to and make use of a preliminary prototype design for obfuscating private visual content in their photos.Through 12 user studies, we uncovered blind participants' mental models and usage patterns with private photo content manipulations, factors that infuenced their obfuscation decisions, frictions they experienced with the prototype design, and their design feedback on this line of tools.Overall, participants were excited for potential opportunities to gain more control on visual privacy through this tool, though they emphasized on the importance of reducing frictions related to inaccuracy, poor obfuscation descriptions, and cognitive load.Their specifc design ideas inform future accessibility design and computer vision research to reconsider the roles of AI and human assistance as well as alternative visual description practices and model development in supporting accessible photo editing, for visual privacy preservation and beyond.

Figure 2 :
Figure 2: An overview of participants' general workfow in using the prototype to obfuscate private content in photos.
Familiarization Task Photo Detection Error Demo Photo Photo caption: a credit card and a wallet sitting on a table Photo caption: a white Private object: the jpmorgan dress with a stufed bear on it chase black card (credit card) Private object: none

Table 4 :
Participants' obfuscation choices across private content categories used in the independent image obfuscation tasks.