Pic2Tac: Creating Accessible Tactile Images using Semantic Information from Photographs

We introduce Pic2Tac, a novel system that automatically converts photographs into tactile images. It offers an alternative way to communicate visual information that is difficult to express using braille or alternative text. Current methods for creating tactile images are either limited in representation, or require handmade artefacts. Pic2Tac employs a unique approach that avoids a literal representation of image content (e.g. contours). Instead, it detects salient semantic content within photographs and translates them into tactile images using dedicated ‘tactile words’. Foreground objects are represented using icons, and patterns are used for background regions. The resulting binary image is printed on swell paper, where black regions rise to form a tactile image. Studies involving 60 participants, both sighted and with visual impairments, demonstrate the effectiveness of these tactile images in communicating semantic meaning. Our findings show that tactile and visual descriptions of scenes matched significantly. Overall, Pic2Tac is an affordable way to create accessible tactile images, costing only 1.50 USD per sheet.

Figure 1: Pic2Tac is a novel system that automatically translates visual photographs into accessible tactile images using computer vision algorithms to identify salient foreground objects and background regions.These are replaced with tactile words -icons for objects and patterns for the background.Together, they form a binary image that can be printed on swell paper, making them tactile.Our studies demonstrate that participants can reliably interpret tactile images through touch, entirely independent of sight.

ABSTRACT
We introduce Pic2Tac, a novel system that automatically converts photographs into tactile images.It offers an alternative way to communicate visual information that is difficult to express using braille or alternative text.Current methods for creating tactile images are either limited in representation, or require handmade artefacts.Pic2Tac employs a unique approach that avoids a literal representation of image content (e.g.contours).Instead, it detects salient semantic content within photographs and translates them into tactile images using dedicated 'tactile words'.Foreground objects are represented using icons, and patterns are used for background regions.The resulting binary image is printed on swell paper, where black regions rise to form a tactile image.Studies involving 60 participants, both sighted and with visual impairments, demonstrate the effectiveness of these tactile images in communicating semantic meaning.Our findings show that tactile and visual descriptions of scenes matched significantly.Overall, Pic2Tac is an affordable way to create accessible tactile images, costing only 1.50 USD per sheet.

INTRODUCTION
Pictures are an important means of communication.Once-forgotten memories come to life as they are reminisced and celebrated through photographs among family and friends.Visual elements, such as pictures and graphics make news more engaging, facilitate education, and develop connections on social media, where approximately two billion photographs are uploaded daily 1 .The visually impaired community too, should have equal access to the enriching world of visual pictures and experience the wonders of storytelling without barriers.
Our aim is to develop an automatic, cost-effective system that facilitates accessibility to photographic content through tactile means.Given the considerable disparity between visual and tactile senses (with the former being approximately 1000 times more acute than the latter) [2,54], images must be simplified to ensure comprehension through tactile exploration.Alt-text is a common assistive technology that provides audible, or text-based descriptions of images.However, its utility value falls if users must, for instance, simultaneously listen to both alt-text and a friend's conversation.Approximately 2.2% of visually impaired individuals worldwide are also deaf and this creates a unique challenge for accessibility [47].These individuals have an option of using braille to access textual descriptions, but may still face difficulties as braille itself does not translate pictures into tactile form and it is only possible to comprehend images when alt-texts are available [17].It is widely believed that only a small portion of the blind community can read braille, as the learning process is tedious and time-consuming, and they often lack motivation and resources [19].Additionally, alt-text and braille are both language-specific, which restricts their universal applicability.With these issues in mind, we seek a complementary system for communicating picture content via touch; an inclusive, easy-tolearn design accommodating all abilities would prove immensely beneficial, facilitating multi-sensory learning and enhancing overall accessibility [18,33,36,37].
Tactile replicas of masterpieces can be found in museums and art galleries [9,46] and books may have embossed pages 2 , allowing them to be enjoyed through touch.While these manual creations offer unique values and a specific kind of enjoyment for artists and audiences, they are time-consuming to create and require significant financial resources.
We aim to reduce costs by automating the process of translating photographs into tactile form.Until recently, this process relied on literal simplifications such as edge maps [54,55] or object boundary detection [14].However, literal representations can be difficult to understand for two main reasons: Firstly, the underlying image processing algorithms often introduce spurious edges and boundaries whilst omitting important ones, leading to difficulties in interpretation.Secondly, even with "perfect" edges, the vast variance in the shape of object boundaries (e.g, a person sitting, standing, running) further complicates interpretation.We argue that the automatic translation of photographs into tactile images should be based on the semantic content of images.In this regard, we build upon the research by Pakėnaitė et al [42] who detected and classified the salient objects (e.g.'person', 'dog', 'car') in photographs.These objects were then represented using icons akin to road signs, chosen for their simplicity, readability, and universality.However, these images were missing the crucial background context, which plays a pivotal role in shaping the narratives or descriptions used to elucidate the foreground.This work aims to tackle the challenge of conveying background information in photographs through touch, thereby enriching the overall context.
We propose a picture-to-tactile method named as Pic2Tac, which automatically translates photographs into images that are suitable for tactile display.A distinctive feature of Pic2Tac from previous methods is its ability to translate both the foreground and background of photographs.For the foreground, the method involves representing salient objects using recognisable icons -as described in prior work [42].However, because background regions like sky, grass, and water can take various shapes, icons are not appropriate; instead, we employ semantically meaningful tactile patterns.For instance, the pattern for 'sky' fills the identified sky region, distinctly differentiated from the pattern used for 'water', or other elements.For a visual representation, please refer to Figure 1.
We conducted three evaluation studies involving 60 participants to assess the effectiveness of tactile patterns in communicating semantic information in photographs; more specifically, addressing the following research questions: RQ 1: Can tactile images comprising patterns successfully convey the semantic contents of photographic backgrounds?RQ 2: How can both background and foreground image content be optimally presented simultaneously?RQ 3: How does background information influence the interpretation of foreground?Our key findings are: (i) participants described tactile images similarly to how sighted observers describe corresponding photographs; (ii) depicting the foreground and background of a photograph side-by-side is advantageous; and (iii) participants constructed narratives and used relational language to interpret tactile images, demonstrating a deeper understanding beyond simple semantic content listing.
In summary, our work makes the following contributions: • A novel system for automatically translating both the foreground and background of photographs into tactile images.• A language-independent approach that can be learned in less than an hour, ensuring accessibility regardless of the user's linguistic background.• Low consumable costs, providing an economical approach.
The latter characteristic is due to our output being designed for printing on swell paper, which features a layer of heat-reactive microcapsules.Black markings are raised when a heat fuser is used, creating an embossed image.The cost per sheet is approximately 1.50 USD.Swell paper printers are comparably priced, if not less expensive, than alternatives like 3D printers or devices based on raising tactile pins 3 .These printers are readily available in communal locations such as libraries or schools.

RELATED WORK
The need to communicate visual information through non-visual sensory channels is undeniable.This has led to the exploration of automatic image-based approaches that can complement conventional methods like alt-text and braille.In this section, we will focus on some of the relevant literature in this field.We will also introduce the fundamental limitations of perception that must be taken into account when designing tactile images.

Creating Tactile Images Using Automated Techniques
Efforts have been made to automate the translation of various picture types, such as graphics [31], paintings [10], and notably, photographs [1,55,58], into tactile forms through the application of computer vision techniques.One common approach is to use literal simplification methods like edge maps [54,55], using (for example) the Canny operator [8].However, edge detection can introduce noise and spurious edges, resulting in challenging interpretations, especially for complex shapes.Others utilise segmentation to isolate objects, create silhouettes, and draw their boundaries [38].Despite the improvement over edge maps, the silhouette shapes depend on factors such as object pose, and perspective which presents interpretation challenges even when viewing visually.We advocate for a symbolic approach instead of a literal translation of visual information into tactile images, an argument supported by work to depict both 3D models and 2D photographs.Panotopoulou et al [43] used non-linear cameras to project 3D models onto the plane.This technique allows objects to be viewed simultaneously from multiple perspectives.Despite the distortion in the resulting images, they correctly connect all parts of the objects and maintain the correct relative size.The authors found these images highly effective in conveying 3D shapes through touch, compared to standard linear projections.However, our approach uses photographs, not 3D models as input.Pakėnaitė et al [42] detected and recognised salient foreground objects that they replaced with class icons, suitably scaled and rearranged to avoid overlap, creating a collage of tactile words.This approach effectively communicates semantic content, but having black icons on 3 The Graphiti refreshable tactile display: www.orbitresearch.com/product/graphitia white background opens wide variations in interpretation.For example, an image showing a clock icon could be perceived as a clock on a tower building or a bedside alarm clock.Our aim is to offer background context to reduce this ambiguity.

Perception of Tactile Images
The successful communication of visual information through touch relies on understanding how the human brain processes multisensory information [6].Due to the limited bandwidth of touch compared to vision [2,20,25,30,54], tactile images must convey semantic content in a straightforward manner [13].
Human capacity in information processing is shown to be limited [16,48].A study revealed that successive was better than simultaneous presentation for an audio display [6], which indicated that perception can become overwhelmed with too much information.Little work has been done for a tactile display, therefore consideration is needed for how to combine tactile outputs, in particular for our attempt to translate foreground and background information from visual photographs.
Tactile maps designed for orientation purposes and mathematical diagrams such as graphs and charts often utilise tactile patterns and textures to differentiate between various colours or areas [5,53].Recommendations outlined in sources like "Guidelines and Standards for Tactile Images, 2010" [4] are often followed, but a standardised system at an international level is yet to be established [53].Consequently, there is currently no universally accepted framework for effectively translating visual photographs specifically.An introduction of iconography for foreground objects of photographs has proven useful [13,42], facilitating easy learning and identification through touch.This approach considers the challenging nature of haptic identification of objects on a 2D medium, even with basic raised line drawings [23,28,56].
Information in the background of an image can contribute to overall interpretation and context.Elements such as the sky, walls, and roads often lack distinct shapes, making icons unsuitable for representing them.One approach is to employ patterns within semantically segmented regions; research suggests that people can memorise between 9 to 12 tactile patterns through touch [26,32,41,45,52].The perception of dots and lines changes when specific measurements such as gap distances [24] and thickness [27] vary.Therefore, a set of patterns must be chosen carefully, with consideration for the finer details in tactile sensory characteristics to ensure the intelligibility, enjoyment, and consistency of the tactile experience.
In summary, the automatic translation of visual images into tactile equivalents is valuable yet challenging as it requires the use of semantic rather than literal information.Previously, only foreground objects were represented in tactile form, leading to high ambiguity; our contribution addresses this limitation by incorporating background information through carefully designed patterns, serving as the tactile equivalent of words.

Pic2Tac: FROM PICTURE TO TACTILE
We have created an automatic system called Pic2Tac, designed to translate photographs into tactile images.This system employs state-of-the-art computer vision algorithms to separate the foreground from the background in the input picture.The output consists of binary images featuring symbols selected from a small library of tactile words.Icons are used for foreground objects, while patterns represent background regions.Both foreground and background images are suitable for tactile display on swell paper, either as an all-in-one composition or arranged side-by-side.Figure 2 provides an illustrative overview.

Identifying background and foreground
Our contribution is the development of the Background strand, which works as follows.Firstly, foreground objects are cut out, then employ in-painting techniques [39,40], to fill the "holes" created.This results in obtaining a background visual picture.Next, we run the benchmark semantic segmentation network from the ADE20K MIT Scene Parsing Benchmark, loading pre-trained models: HR-NetV2 as an encoder [51] and C1 as a decoder [59,60].ADE20K has around 27K scene images including indoors, urban, and nature.HRNetV2 is a recently proposed model that retains high-resolution representations without the traditional bottleneck design, and C1 is a one-convolution module.The purpose of semantic segmentation is to divide the background picture into different regions that can be identified as part of a small set of "stuff", a term commonly used in Computer Vision literature.Currently, we use a set: Stuff = {Walking Path, Road, Ground, Mountain, Snow, Sky, Stairs, Wall, Water}, intended to cover common background regions seen in photographs of indoors, urban areas, and nature [7].However, for different themes of photographs (e.g.sports, museums, shopping, cooking), other sets could be considered.Furthermore, we manually grouped certain background classes to simplify the representation.For example, classes such as 'pavement' and 'crosswalk' are grouped into 'Walking Path'; 'floor', 'grass', 'field' and 'sand' into 'Ground'; 'building' and 'tower' into 'Wall' and 'river', 'sea' and 'lake' into 'Water' etc. Automatic ways to group such background classes in this paper are left for future work, but manual grouping allows users the flexibility to configure the tactile words used by our system based on personal preferences and requirements.
Background elements such as the sky or road, often occupy significant portions of a photograph and can take on various shapes; silhouette of these elements lacks distinctiveness.Therefore, icons are not appropriate forms of tactile words for the background.Instead, we opt to draw the border of background regions with a separating line and fill them with tactile patterns to signify their class.We also shrank the fill area, creating gaps for the borderlines to be discerned effectively.This representation offers cues regarding the proportion of an image belonging to a particular background class, a detail that may not be achievable through the use of icons.
We have carefully selected nine patterns (as seen in Figure 3) based on the principles outlined in [4,45].These patterns are intentionally kept simple, acknowledging the inherent limitations of our sense of touch.The specific mapping between these patterns and background classes is arbitrary.While certain patterns, such as wavy lines representing water exhibit a weak semantic connection by resembling the background class, the extent to which such connections hold, remains an open question.More importantly, we employed recommended patterns, maintained a consistent encoding scheme, and minimised the need for users to memorise intricate mappings as much as possible.This approach aims to facilitate the learning process, enabling individuals to interpret tactile patterns as background elements.To ensure tactile interpretability, we fixed the scale of the selected patterns for all image sizes.The spacing between textured lines was kept consistent, touchable, and intelligible.For example, each pixel is around 0.26mm, so the width of pattern lines was at least 0.5mm thick or 2 pixels wide, and the gaps between such lines were at least 2.8mm.In our context, the goal was not to perceive individual dots but to have the textures "feel" different.For instance, our pattern dots for the 'ground' had a 1.06mm diameter width with a 1.58mm gap between the dots, while the pattern dots for the sky had a 0.53mm diameter width and a 0.79mm gap between the dots.These chosen patterns are intelligible on embossed paper and other tactile displays [45].To manage complexity and ensure easy memorisation [26,32,41,45,52], the set size of Stuff is limited to 9. Regions that match nothing in the set    are replaced with a blank fill.To visually distinguish between different regions, borders are presented as a 1.1mm wide line with a gap of 1.1mm on each side.Finally, to ensure practicality, images are scaled no larger than 288 x 480 pixels (3 x 5 inches), which is approximately the size of a person's palm [54,57].
As seen in Figure 2, we incorporated the Foreground strand as described in [42], which has been demonstrated to be useful in conveying foreground information effectively.It includes Mask R-CNN [21], which is trained on MS-COCO dataset [34], and used to recognise object instances with pixel-level accuracy.A second network, PiCANet [35], determines the salience of all identified objects, with the most salient objects like 'person', 'dog', and 'car' are then included in the library.The objects are replaced with a class icon scaled to fit and moved to avoid overlap where necessary.The result is a collage of black icons on a white ground, representing the foreground.

Tactile outputs
Notably, our system supports two types of tactile image printing from an input photograph: (1) "all-in-one": In this type, both the foreground and background of the input photograph are combined on a single swell paper.Each icon representing an object in the photograph is placed inside a fitted blank box.This placement helps distinguish foreground and background in the tactile image.( 2) "side-by-side": Here, the foreground and background of the input photograph are depicted separately on two different papers.The foreground image is enclosed within a bounding box to indicate the image border.This bounding box provides users with spatial relation cues and helps determine the placement and size of the foreground object in the tactile image.
In the next section, we systematically evaluate the suitability of tactile patterns for conveying photographic backgrounds.We also show that background context influences the interpretation of foreground content.

USER STUDIES
This section presents the results of three user studies conducted to evaluate the efficacy of Pic2Tac's tactile outputs.It outlines participant details, study procedures, and results, offering valuable insights into designing tactile experiences from photographs.
Participants.Our studies were designed to test the ability of tactile words to communicate semantic background.For clarity, we made no particular hypothesis regarding visual impairment -this is left for future work.Rather, we believe that gaining an understanding of tactile perception is an important focus for this first study.To assess the utility of the tactile words, we used two non-intersecting groups, each of 30 participants: the "tactile" and the "visual" group.This allowed us to compare descriptions based on visual perception, with the descriptions of corresponding images based on tactile exploration in Study 1.No control groups were needed for Studies 2 & 3 because they compared different representational forms for tactile images.Ages for the tactile group ranged from 17 to 59 years, with a mean age of 35.The participants in visual groups were anonymous.
The tactile group completed all studies sequentially, averaging a total duration of 1 hour with short breaks interspersed between studies.The tactile group comprised 9 individuals with visual impairments and 21 sighted participants; all except congenitally blind participants were blindfolded during the test.It is important to note we are testing the communicative ability of tactile words, so the inclusion of blindfolded participants does not replicate or simulate the actual lived experience of individuals with visual impairments [3]; our evaluation tests the intelligibility of images through touch.
An inclusive design perspective [36] would benefit everyone.For example, tactile images may be useful to sighted individuals in a situation where they cannot look at something, or where a multisensory presentation could be beneficial [18,33,37].It is noteworthy that participants in the tactile group were drawn from different countries, and the tactile tests were conducted in the participant's mother language (the visual group used English).One of the participants was Deaf and used sign language, another was deaf-blind and employed tactile sign language to describe the tactile images.The tactile group was asked about any previous exposure to tactile images, see results below in Table 1.We began by evaluating whether the descriptions provided for a tactile image experienced through touch closely correspond to those given for the original photograph experienced through sight.The goal is to determine the extent of similarity between the two sets of descriptions, as this would indicate how much of the background context has been preserved by our translation from visual to tactile representation.

S i g h t e
Procedure.We used photograph inputs with no foreground objects; the tactile group experienced images through touch via an in-person study setup and the visual group experienced images online.Neither group was aware of the other.
Our training has two steps, detailed below for the tactile group.The visual group underwent analogous training using photographs.Participants first familiarised themselves with the patterns by analogy of the tactile words, as shown in Figure 3. Subsequently, the participants were helped compile these tactile words into the equivalent of a sentence.Specifically, they were given Figure 4 and were told, "There is a sky, wall, and road with some empty regions for trees since there is no relevant pattern for them.It's a picture of a church".During testing, participants received five new pictures (Figure 5).The tactile group was given tactile images, while the visual group received photographs.Both groups received the images independently in a random order.For each image, participants were first instructed to list all the    they identified, then provide an overall image description.To avoid cognitive overload, they had 1 minute to touch and verbalise each image, with a 10-second warning before the time expired.No feedback was given to the participants regarding their responses.
Results. Figure 5 presents the word cloud representation of participants' responses, adhering to a design recommendation for a better visualisation [22].
On average, participants in the tactile group recognised 75% of the patterns -blindfolded sighted participants recognised 74% ( = 19.98)and visually impaired recognised 76% ( = 13.47).As expected, the group that viewed photographs visually had no difficulty recognizing the background stuff.A single dominant word was used to describe each tactile image: "field", "beach", "bridge", "mountains" and "windows".These words, except the last, were shared with the visual group.Therefore they accurately represent the scenes depicted in the corresponding photographs.In the case of image 5e, the photograph shows paintings, but the word "windows" was deduced as the closest match as paintings were not provided with a tactile pattern.For image 5c, the deduction of "bridge" from limited data can be considered an impressive achievement of human cognition.The analysis also indicates that the dominant words were mostly shared between sighted and visually impaired groups, suggesting that tactile images are read similarly amongst both groups.
Examining differences between visual and tactile groups revealed a notable trend: adjectives were more frequently employed by the visual group, though surprisingly, the tactile group also used adjectives, even without direct visual cues.For instance, a sighted participant in the visual group and a congenitally blind one in the tactile group both used the word "beautiful" to describe the same image.This finding underscores the dynamic nature of descriptive language across sensory modalities and raises questions about the interplay between visual and tactile perception in the linguistic representation of images.

Study 2: Optimal Depiction of Background with Foreground
RQ 2: How can both background and foreground image content be optimally presented simultaneously?In other words, should they be combined "all-in-one" or arranged "side-by-side"?
Procedure.All participants were from the tactile group, and everyone was blindfolded except for those who were congenitally blind.The experiment began with a short training session that   During testing, participants were given both all-in-one and sideby-side images as shown in Figure 7 and Figure 8 respectively.In each case, they were first asked to list all patterns and icons first, then give an overall description.For a deeper understanding of our approach, we introduced a new clock icon (not included in the training phase).Finally, participants were asked to state their preference: whether the background and foreground should be presented all-in-one or side-by-side.
Recall that the icon of a clock was not included in the familiarisation phase.It was recognised at only 33% on average in all-in-one images, and 50% of the time in side-by-side images.In more detail, 29% ( = 46.3) of sighted and 44% ( = 52.7) of visually impaired participants recognised the clock icon in the all-in-one image.Similarly, 54.76% ( = 49.76) of sighted and 38.89% ( = 48.59) of visually impaired participants recognised the clock icon in side-byside images.
In most cases, identification rates were higher for side-by-side images compared to all-in-one images, potentially due to the overwhelming amount of information presented in the latter.This could explain why, during discussions, 80% of the 30 participants expressed a subjective preference for side-by-side depictions over all-in-one.For further details, 7 of 9 visually impaired participants and 18 of 21 sighted participants preferred the side-by-side format.Procedure.To understand how the interpretation of foreground icons is influenced by background context, we used five different foregrounds.Each foreground was displayed side-by-side with two different backgrounds, as shown in Figure 8.  Results. Figure 8 shows not only each foreground and their two backgrounds but also includes the dominant narratives people used.(There were 30 × 2 = 60 descriptions in total, one per image per participant).It is clear that background context strongly influences how the foreground icons are semantically interpreted.It is worth noting that all icons were identified correctly 95% of the time (see Study 2).However, the narrative description constructed depends on the background.For example, a clock icon in isolation is just a clock.Yet within a context, the icon could signify a clock on a wall within a room, or a clock positioned high on a tower.These results chime with our intuition -context does matter.
Broader observations that were informally made during the studies are of interest too.Some participants mentioned they visualised a car as either stationary or on the move based on the background.Some participants explained that the pattern for 'sky' often led them to decide if the image was indoors or outdoors.Prepositions of place (e.g."in front of", "next to", "behind"), and verbs (e.g."driving", "cycling", "walking", "climbing") were used, suggesting that participants attempted to create narratives for the images and relate the objects to one another, rather than simply listing out the semantic content.Participants were multi-national; a participant in Morocco identified the tower in Figure 8d as "Kutubiyya Mosque", whereas one living near the seaside identified it as a "lighthouse"  and another identified it as "Old Joe", the name given to a clock tower near their home.This implies that the understanding of tactile images is affected by the individual's locality, which is a reasonable observation.

DISCUSSION
Pic2Tac automatically translates photographs into tactile display images using tactile words.User studies have yielded promising results; the appendix contains a gallery of further results.These were not used in testing but do showcase some of the image variety that Pic2Tac is capable of.Below are some key discussion points regarding potential applications, further insights, and societal impacts.
Potential Applications.Pic2Tac is a fast and cost-effective alternative to hand-made images or bespoke artefacts found in galleries.It offers easy training compared to braille systems.Users have the liberty to modify the icons, patterns, and their interpretations to align with their preferences and specific applications.The resulting output can be directed to any device capable of accommodating binary tactile imagery at a sufficiently high spatial resolution [45].Because of these characteristics, Pic2Tac holds potential value in applications for real-life scenarios.Making photographs in textbooks accessible through touch could enhance the learning experience, including for those who previously relied solely on verbal descriptions.Likewise, tourist sites could supply tactile guides, and libraries or commercial print and photocopying services could enable customers to print their photographs and share their memories interactively.If new and cost-effective tactile display devices become available for personal use, the range of applications will expand significantly.
Insights from Perceptual Psychology of Tactile Image Recognition.Successfully delivering visual information through touch depends not only on the mechanics of the sensory substitution device but also on understanding how the human brain processes multisensory information [6].Background patterns were recognised in all studies, showing that carefully chosen patterns can easily be distinguished [45].However, the bandwidth of touch means tactile images must carry lower information content than visual images, so some ambiguity is inevitable.Yet only 9 patterns were used to successfully communicate the semantic information in 20 different photographs.This aligns with previous research suggesting using a limited number of symbols to avoid overload [26,32,41,52].
Most participants preferred side-by-side presentation of background and foreground, as icons were easier to localise and identify compared to all-in-one images.Separation of background and foreground potentially eased the perceptual load and shortened the time for exploring the image [15,33].However, some participants found that they had to put effort into mental mapping when information was separated, and some believed that, with time and practice, they could adapt to all-in-one images.Future research should explore representing foreground objects more distinctly to enhance all-in-one tactile picture reading [13].
For the initial study of Pic2Tac, our focus was to test the ability of tactile words to communicate semantic background.Therefore, formulating hypotheses regarding performance or experiential distinctions between sighted and visually impaired participants was left for future work.Nevertheless, a pattern is revealed based on the available data: visually impaired participants consistently demonstrated superior performance in identifying patterns across all studies.Notably, over half the participants entered the study with no prior exposure to tactile picture reading, the majority belonging to the sighted category.This prompts consideration of whether such a discrepancy should be taken into account in the design of tactile representations.Formulating definitive conclusions proves challenging due to limitations in sample size and potential biases inherent in the study's design.Despite these challenges, the findings serve as a foundation for future research and highlight the importance of embracing inclusive design principles.
Societal Impacts to the Blind Community.Participants were surprisingly complimentary about our tactile images."Beautiful" was said by one congenitally blind participant, because they liked how stimulating some of the patterns felt.Some said they found the participation session fun, like putting pieces of a puzzle together.Through discussions, it emerged that all low-vision participants had given up learning braille because they found it too challenging.Late blind participants well experienced in braille said they had not used it for some time but may reconsider using it again.Our output tactile images are not only easy to use and seemingly popular, but they could also provide a way to practice tactile reading skills by keeping the touch senses stimulated engagingly.
Of the two congenitally blind participants, one gave impressive results and could closely identify most scenes including "Big Ben" for London Bridge (in Figure 5c) and "Car in front of Eiffel Tower" for Paris image (in 8j).Despite the lack of visual experiences, their near-perfect narratives indicate that they interpreted such images symbolically.They suggested their success may be due to their collection of tactile pictures and souvenirs, including many figurines of famous landmarks.The fact that they can cope with such a collection agrees with existing results from Klatzky et al [29] who shows 20 participants successfully recognising 100 common objects by touch within 1-2 seconds.
A deafblind co-author specifically designed outputs to be tactile with no added sound cues.This means that deafblind individuals could participate in the study with few adjustments.As hearing has the second-best information processing capacity after sight [6], the most assistive technologies for the blind community are audio-based [20].The deafblind community is thought to be one of society's most vulnerable groups [49].The previous survey highlights the lack of availability of tactile materials for students with visual impairments [44].We have created a new approach for translating photographs that is accessible to a wider community, and potentially inclusive in that it could provide a multi-sensory experience of images for the sighted as well [36].
Universally Accessible.An interesting observation was made when participants identified the image of a clock tower (in Fig- ure 8d) based on their cultural background.Despite coming from three different countries with diverse cultural norms, all participants comprehended our tactile outputs and provided meaningful descriptions influenced by their local culture.Moreover, participants described the images in their native languages, demonstrating the universal understanding of patterns for tactile images, unlike braille.

LIMITATIONS AND FUTURE WORK
Through observations, patterns, and familiar icons were consistently better recognised in side-by-side images rather than allin-one.The utilisation of bounding boxes in foreground images provides users with essential spatial relation cues, facilitating an understanding of the placement and size of the foreground object in the tactile image.However, it is imperative to acknowledge that the side-by-side representation unequivocally introduces disruptions to specific aspects of locational correspondence.Subsequent research endeavours might consider exploring potential solutions to address this challenge, such as incorporating gridlines, to establish a more reliable locational reference.
The lived experience of people with visual impairments cannot be accurately replicated with a blindfold [3].A definite avenue for future work is to test our method outside an experimental setting and observe its applications in different real-life settings.Sustained use of Pic2Tac with personal customisation over months or years is also an interesting open question, as our studies were necessarily much shorter in time scale.
The preference for describing image content varies across different sources, such as the news, social media, and employment websites [50].Consequently, an algorithm could be improved to optimally output images for different cases.Whilst it was proven that only a low number of patterns can be memorised [26,32,41], adapting an expanded number of distinct symbols for different real-life scenarios is one opportunity for future research work.Furthermore, simple icons for foreground objects do not reveal any sensitive personal information.While this is efficient for privacy reasons, it might be worth exploring translating additional personal details to improve the experience of reminiscing about photographs.Moreover, investigating alternative symbolic representations for incorporating additional cues like colours and day-night differentiations, could address some disparities observed between tactile and visual groups, thereby enriching the overall image description framework.Another potential research direction involves developing interfaces that enable the adaptation of patterns and icons, catering to specific scenarios like shopping or sports, and allowing personalisation based on cultural or individual preferences.
Previous works have shown that users tend to comprehend audiotactile graphics better when they are interactive [11,12].While Pic2Tac was purposely made to be a tactile-only system for accessing photographs, future work could include investigating the use of additional functions such as sound cues.

CONCLUSION
Pic2Tac is a unique system designed to automatically convert photographic input into tactile images that convey semantic content, including foreground and background details.Through three user studies, we have demonstrated the effectiveness of our method in comparison with visual images.We observed that presenting the foreground and background of a tactile image side-by-side is the preferred approach.This approach offers several advantages, including aiding end users in constructing narratives and using relational language to interpret tactile images more proficiently.The potential contribution to the community seems substantialone participant said "It could break a barrier for people like me who have lost sight suddenly due to an accident later in life.Patterns can be quickly learned, unlike Braille which requires a minimum of weeks of training".Our approach is simple and reliable.The outputs were printed on embossed paper for the user studies but can also be sent to any suitable device.Participants enjoyed tactile interactions with our images, finding the patterns stimulating and gaining access to visually engaging content.Our approach is also suitable for public spaces like libraries or classrooms.In summary, we have provided a novel approach for communicating content via touch; a small technical step perhaps, but one with a potentially large impact on different communities worldwide. Photograph

Figure 2 :
Figure 2: Pic2Tac in overview, from photograph to tactile image.There are two strands -background and foreground (on blue highlight).The input is a photograph and the output is a pair of binary images for background and foreground.The final output can be all-in-one or side-by-side.

Figure 4 :
Figure 4: Tactile image of a photograph was used as an example for tactile image reading familiarisation.Participants in the visual group viewed the photograph.Participants in the tactile reading group touched the tactile image.

Figure 5 :
Figure 5: Images used in Study 1: tactile outputs on the left and visual photographs on the right, independently experienced by tactile and visual groups respectively.Word clouds provide a qualitative summary, with font sizes varying to reflect the frequency of each mentioned word.The percentage of its word being mentioned out of all other words for the image is indicated in superscript.Synonyms are organised in columns.(Enlarge images to avoid the Moiré effect)

Figure 6 :
Figure 6: Training materials for Study 2: tactile icons for foreground objects and side-by-side tactile image example.

4. 3 3 :
Study 3: Background Conditions & Descriptions of Foreground RQ How does background information influence the interpretation of foreground?

Figure 7 :
Figure 7: How should foreground and background be presented together?80% of participants preferred side-by-side rather than all-in-one.

Figure 8 :
Figure 8: Participants' description of the foreground is heavily influenced by the background they are placed on.Here we show the most common narratives for a collection of foregrounds on two backgrounds.(Enlarge images to avoid the Moiré effect)

Table 1 :
Distribution of participants in % based on levels of experience to tactile images.