Sound Designer-Generative AI Interactions: Towards Designing Creative Support Tools for Professional Sound Designers

The practice of sound design involves creating and manipulating environmental sounds for music, films, or games. Recently, an increasing number of studies have adopted generative AI to assist in sound design co-creation. Most of these studies focus on the needs of novices, and less on the pragmatic needs of sound design practitioners. In this paper, we aim to understand how generative AI models might support sound designers in their practice. We designed two interactive generative AI models as Creative Support Tools (CSTs) and invited nine professional sound design practitioners to apply the CSTs in their practice. We conducted semi-structured interviews and reflected on the challenges and opportunities of using generative AI in mixed-initiative interfaces for sound design. We provide insights into sound designers’ expectations of generative AI and highlight opportunities to situate generative AI-based tools within the design process. Finally, we discuss design considerations for human-AI interaction researchers working with audio.


INTRODUCTION
The practice of sound design is the creative use of sounds [89] to produce innovative music compositions and build convincing cinematic experiences for flms and games [70].This practice involves the creation of new sounds and manipulation of existing ambient sound recordings, such as a dog barking or footsteps.In the last few years, generative models for audio have been applied to creating music [1,24,60] and have moved from being exclusively a research endeavor to fnding practical applications [4,8,30].Such models are well studied for their potential to support co-creation in the human-AI interaction literature [61,62].And yet, despite the growing adoption of AI models as co-creation tools for music production [28], very few empirical studies exist to assess their potential to ofer new possibilities to the practice of sound design and foley sound synthesis.
Our everyday sonic environment is usually composed of not just music or speech, but also a myriad of environmental sounds [89].Often, sound designers work with environmental sounds that lack the rhythmic and harmonic structures normally found in musical compositions.Further, the parameters used for synthesizing such sounds are diferent from musical sounds.For instance, when synthesizing footstep sounds, sound designers are more likely to be interested in manipulating object and material properties (such as type of foor, shoes, etc.) than acoustic features such as pitch or loudness, which are typically associated with music.Similarly, AI models trained on environmental sounds difer from AI models for music in the way they are controlled or steered [72] using either musical or material attributes during generation.Thus, AI-based tools used to assist sound designers need to be studied through specifcally designed studies.
Most human-AI interaction studies for audio focus on the applicability of such steerable interfaces to empower novice users in their creative goals [7,36,61,62,102].Expert sound design practitioners spend years developing their creative design process and building inventories of sounds to apply in their next design project [89].As such, their needs, expectations, and ways of working with AI-based tools necessarily difer from those of novices'.Thus, in this paper, we aim to explore: How can generative AI-based co-creation tools assist expert sound designers in their creative practice?
We developed two interactive generative AI models as Creative Support Tools (CSTs) [35,80] to explore the potential of this technology in assisting sound designers.As in [49], we use an experimental design strategy of deploying interfaces in real-world contexts to provoke discussions and answer research questions.We deployed our CSTs with sound designers to gather information about their expectations of AI, and the current challenges and opportunities for generative AI in their practice.Further, we captured the designer's interpretations of AI by designing the interactivity with our CSTs using an element of ambiguity [37].While we developed two CSTs in this study, we did not aim to compare them with each other.Instead, we aimed to provide designers with two unique ways of interacting with AI-based tools to gather their refections [34,49] on using those tools.
We introduced our CSTs to nine professional sound designers and asked them to apply them in a creative endeavor in their practice.We conducted semi-structured interviews with the participants to refect on their creative goals and the sounds they created using the CSTs.We gained three key insights through inductive refexive thematic analysis [9] of the interviews: • First, we outline an AI-assisted sound design process where we fnd how sound designers situate AI-based tools within their design process in practice.While performing creative tasks, we found that sound designers used AI models for performing fast iterations to create novel sounds and as an alternative to manual feld recording activities.They also used such sounds as layers to give the perception of plausibility to unreal sounds.• Next, we found how sound designers worked with unpredictability and ambiguity and developed an intuition for interacting and controlling AI-generated sounds.We also found that designers often realized or understood failure modes in AI-generated output and worked towards ways of using ambiguity in their sound design.• Lastly, we furthered our understanding of sounds designers' expectations of generative AI to build convincing cinematic experiences, in terms of creator agency and owning the creative process In summary, our contributions are three-fold: (1) we developed a novel understanding of generative AI in supporting creative exploration for the practice of sound design; (2) we developed two AI-based CSTs for future studies on using audio generative AI as a tool for sound designers and (3) we ofered fve design recommendations for future human-AI interaction research for sound design.

BACKGROUND & RELATED WORK 2.1 Sound Design
Sound design is a multi-faceted practice that is both highly technical and artistic in nature and involves creating 'new' sounds.Susini et al., [89] defne 'new' sounds as those that cannot be found in existing sound databases, or recorded sounds that cannot be used in a given context without being manipulated or modifed.Sound design is the deliberate use of such sounds to create an 'atmosphere', mood, or feeling [70] in music composition or other media.Typically, sound designers focus on working out the sonic details or timbres (tone or color of the sound) required to enrich or complement the visual information presented in flms and games [19].They also focus on communicating additional non-verbal information through interactions in games or product design [89].
Previously, Wallas [95] proposed a generalized creativity model which consisted of 4 phases -preparation, incubation, illumination, and verifcation.Similarly, for the specifc purpose of sound design, Susini et al. [89] proposed a model which involved three discrete successive stages: Analysis assisted by Exploration, Creation, and Validation [58,89] with the last two being set in an iterative loop until the sound converges towards an optimized solution [47].The Analysis stage is a research-focused phase, where designers are involved in understanding the perceptual requirements of the project using their own knowledge and background in psychoacoustics and sound cognition.It also involves the purposeful exploration of a large inventory of existing sounds as well as feld recording (recording outside of the studio) of new sounds.In the Creation stage, designers manipulate the sounds or synthetically create new sounds in line with the specifcations from the Analysis stage.This stage may also layer together various sound samples to create montages as a fnal artifact.The fnal stage consists of Validating the sound specimens created either informally based on the designer's intuition or more formally using listening tests (especially while designing sounds for products [58]).
Throughout the sound design process, designers need to employ diferent modes of working -as a researcher during Analysis and exploration phase, as a programmer or employing their tools-based expertise during the Creation phase, and as a qualitative researcher or tester during the Validation phase.Through these phases, they also employ diferent listening techniques such as causal, semantic, or reduced listening [19].Such listening techniques help designers to associate sounds to sources (causal), associate information or meaning (semantic) to them, or focus on fne-grained timbrespecifc details of the sound (reduced) when listening.These modes of working help them develop 'Sonic Vocabularies' [47] and 'Sound Palettes' for various current and future projects.
In [64], Lubart suggests that computers can partner with humans as an enabler, a guide, or a partner or colleague.Similarly, in recent human-AI interaction literature, Weisz et al., [96] suggest that users may view an AI agent as flling the role of an assistant, partner, or collaborator.And that establishing the role of AI in a user's workfow will help users understand how to interact with it efectively.Thus, in this paper, we aim to situate AI-based Creative Support Tools in the context of the sound design process and a sound designer's way of working.

AI-based creativity support tools
AI algorithms have enabled the building of Creative Support Tools (CSTs) [35,80] that are either fully autonomous or support cocreation as Mixed-Initiative Creative Interfaces (MICIs) [29,86].Interactive CSTs have been studied for their co-creation capabilities in visual arts [17,27,43,53,71], in writing [14,23], in fashion [50], in UX design and engineering [38,54,63,67], and in new musical interface design [91].In the feld of audio, machine learning models have long been used for creating music [32], from established tools such as Wekinator [33] to the more recent GAN-based music performance art [90].Further co-creation tools for composing music have been studied for novice co-creation [36,61,62,66,102] as well as with expert musicians [7,45,88].By co-creation, we allude to the human-centered AI's [81] ability to leverage a trained prior to empowering human creators with novel means to generate creative artifacts.In the sound design space, Scurto et al. [77] developed tools based on reinforcement learning algorithms and studied user exploration behaviors for the generated high-dimensional parameter spaces.We take inspiration and further extend their work to situate generative audio AI models in a typical sound design process and explore designers' needs and expectations of such co-creation tools.
In [43], Hertzmann argues that "all art algorithms, including methods based on machine learning, are tools for artists; they are not themselves artists".In that light, we position the interfaces in our study as AI-based MICIs, where a user expresses their intent through a set of control parameters and the AI establishes its initiative and agency by generating an output based on its trained prior.Previously, researchers argued that unpredictability and nondeterminism are detrimental to the user experience of an AI system [5].Recently, Caramiaux et al. [17] showed that such emergent behavior is embraced rather than considered a limitation in the domain of AI-generated visual arts.We take inspiration from their work to explore aspects of such non-determinism in audio AI in the specifc context of sound design.

Interactive Generative AI models for Audio
Currently, a multitude of AI architectures exists for generatively modeling environmental sounds.Each architecture solves a certain set of problems and has its own limitations.For instance, while autoregressive architectures, models that predict future values based on past values, such as Recurrent Neural Networks [48,99], WaveNet [92], or Transformers [94], trained on raw audio, can generate sounds of indefnite duration, the time taken for their responses are usually large which makes their adoption in practice difcult.On the other hand, models based on Generative Adversarial Network (GAN) [39] are responsive but generate samples of a pre-defned duration (usually a few seconds in length).Such models are expressive in their ability to allow the generation of novel sounds or morphs [41] as compared to autoregressive architectures [100].Difusion-based models [44,55,84,85,101] have recently become popular as an alternative to GANs.Although such architectures can generate better-quality sounds than GANs, GANbased architectures are currently able to generate sounds faster and in real-time than difusion models [97].Further, a type of GAN architecture called StyleGAN [52] provides signifcant improvements over other GAN architectures.Thus, in this paper, we focus on using StyleGANs to develop our CSTs for sound design.
Broadly, two approaches exist for controlling generation from AI models.One where the AI models are trained on labeled datasets in a supervised way [40, pg. 137].For such models, generation is controlled using pre-defned labels.Another approach is where AI models are trained on unlabelled datasets, and controllability is inferred using unsupervised methods [40, pg. 142].This approach is especially useful for environmental sounds as they can be recorded easily 'in-the-wild', but it is difcult to reliably annotate them with labels.StyleGANs [52] can be trained on such large unlabeled environmental sound datasets to generate an expressive high-dimensional learned representation called a "latent space".This latent space can be used to search, generate, and manipulate new sounds.Recently, researchers have been working at the intersection of explainable AI (XAI) and arts to explore novel ways to explore such latent spaces for creative endeavors [13,34].Various such algorithms exist to facilitate the exploration of this high-dimensional latent space in a human-understandable and unsupervised way.For instance, in [51], the authors developed an Example-Based Framework (EBF) to search or query the latent space of a pre-trained GAN using synthetically generated sounds.Further, in [79], the authors developed a Semantic Factorization algorithm (SeFa) to fnd vectors for control in the latent space of the model, which can be used to manipulate semantically meaningful attributes on the sounds.In this paper, we used EBF and SeFa to interact with our underlying StyleGAN model to control the generation of sounds in our CSTs.

AUDIO GENERATIVE AI CST DESIGN
We designed and implemented two generative AI audio Creative Support Tools (CSTs) for co-creation in this study.We designed the interactivity for our interfaces based on the principles for AI controllability outlined in Weisz et al. [96]: by using (1) domain-specifc controls (for interface-1), and (2) technology-specifc controls (for interface-2).Domain-specifc controls make use of audio descriptors or acoustic parameters to control the generation from an AI model.Technology-specifc controls, on the other hand, are generic controls that depend on the generative algorithm and are not necessarily related to the audio domain.Such technology-specifc controls allow users to perform manipulations or edits directly in the latent space of the generative model and are typically efective in making changes to the semantic attributes of a sound.For both interfaces, we adopted an interaction pattern of turn-taking [76], where a user makes modifcations to the control parameters to interact with the underlying AI model and the AI responds based on its trained prior.
Both interfaces used the same underlying StyleGAN model and difered only in how the generation was controlled.Further, both interfaces provided opportunities to interact with two StyleGANs -(1) one trained on a dataset of 'Hits & Scratches' called the Greatest Hits Dataset [73], and (2) another trained on a dataset of 'Environmental Sounds' from the DCASE 2023 Foley Sound Synthesis Challenge [21].Using the 'Hits & Scratches' model, the sound designers could generate and explore a small set of timbres related to the impact sounds made by a drumstick hitting various hard and soft surfaces.Using the 'Environmental Sounds' model, the sound designers could generate and explore more complex timbres and sounds such as dog barks, footsteps, gunshots, motor vehicles, rain, and keyboard clicks.Further, on both interfaces, we added some preset sound confgurations, which participants could test during the study.These presets included parameter settings for timbres such as impact sounds on hard and soft surfaces or environmental sounds such as a medium-sized dog barking.All underlying AI models were built and trained using Pytorch [74] and were running on a single RTX 3090 GPU.The interfaces were built as web-based technologies such as Streamlit 1 and ReactJS2 to run on web browsers for ease of access.Please see appendix A.2 for architecture and implementation details for both interfaces.

Interface-1 -Using domain-specifc controls
For interface-1, we employed the use of domain-specifc controls [96] based on acoustic parameters such as frequency band, impulse width, fade-in, fade-out, etc. to guide the generation of the sounds.For this interface we use the EBF framework outlined in [51].In EBF, a set of domain-specifc controls is used to create a synthetic sound using signal processing techniques.This sound is then used to "query" or "search" the latent space of the StyleGAN for a matching, AI-generated sound.A conceptual diagram and screenshot of this interface are shown in Figure 1.A user of this interface conveys their ideas to the AI model by designing a synthetic sound.The AI model in turn uses the synthetic sound to search and generate a matching, more realistic sound.The resulting audio for both the synthetic reference as well as the AI-generated sounds is displayed on the webpage.Additionally, we provided visual feedback to the users by displaying the spectrogram for each sound along with the audio on the webpage.We include this spectrogram visualization to allow the participants to focus on the spectromorphology of the sounds [82], or how the frequencies in the sound change or morph over time.
While we designed this interface to provide opportunities for refection [34,37] by giving greater fexibility in generating multiple types of synthetic sounds, not all synthetic references resulted in meaningfully matching AI-generated sounds.This unpredictability in the AI-generated sounds is due to the limitations of the training data used to train the GAN.We allowed this unpredictability on this interface by design to gather our participants' intuition about AI limitations.

Interface-2 -Using technology-specifc controls
For interface-2, we employed the use of technology-specifc controls [96] based on the SeFa algorithm outlined in [79].In SeFa, dimensions for controlling generation are extracted by performing an eigendecomposition of the learned weights of the StyleGAN.That is, using eigendecomposition, the weights matrix of the StyleGAN are factorized into basis vectors which can then be used to perform latent space manipulations to control semantic audio descriptors on a sound.Such semantic dimensions are usually unlabeled and are typically open to user interpretation of them.Users usually interpret each semantic dimension by performing and observing a few edits made by changing a dimension on the sound.We chose the top 10 dimensions (top 10 eigenvalues after eigendecomposition, see appendix A.2.2) found by the algorithm to perform sound edits on this interface.A conceptual diagram and screenshot of this interface are shown in Figure 2. As for interface-1, we displayed the spectrogram along with the resulting audio on this interface.We designed this interface to provide opportunities for refection [34] by leaving the dimensions unlabeled.We allowed the designers to interpret this ambiguity in the dimensions based on their intuition.

USER STUDY 4.1 Participants
With ethics approval obtained from the University, we recruited nine professional sound design practitioners (six male, two female, one preferred not to say) for this study through snowball sampling.We used this sampling strategy to reach not just academic, but also professional sound designers working in the industry.Starting with the authors' existing network, we then asked individual participants whether they knew of other practitioners interested in participating in our study.In our email, we indicated the study would take at least 1.5 hours to complete.Our sample size was thus pragmatic based on the number of sound designers who were willing to invest the time in this study.Participants had diverse backgrounds in sound design, from designing sounds for products, movies, music, and games to creating sound for data sonifcation projects (Table 1).The median self-reported years of experience in sound design was 10 years (Min = 3 years, Max = 48 years).They were ofered USD45 gift cards as a token of appreciation for their time in the study.

Procedure
Figure 3: Overview of the study procedure An overview of the procedure is shown in Figure 3.The participants were sent a link to a web page outlining the task instructions 3 .This web page included a 3-minute introductory video explaining the tasks and providing a brief overview of the interfaces.To minimize any order efects, participants were randomly assigned into two groups.The frst group attempted the tasks with interface-1 frst before interface-2.The second group performed the tasks in the reverse order.
To familiarize participants with our interfaces, we frst asked them to complete a short close-ended predefned task.Subsequently, we asked them to complete an open-ended creative task to generate sounds they might use in their own practice or performance.As our participants were located in diferent parts of the world, they were asked to perform these tasks at their own pace and time and record their screen activity when performing the open-ended creative task.This approach was adapted from the video-cue recall method [15,16] from the interactive arts literature for our purpose.
We subsequently conducted a semi-structured interview to gauge the participants' experience and feedback on the generative AI interfaces.Our server logs indicated that overall, the participants spent a median of 46.28 minutes (Min=24.11minutes, Max=2 hours, 31.6 minutes, SD=44.46 minutes) exploring and familiarizing themselves with the interfaces.As instructed, participants recorded their screen activity when performing their open-ended creative tasks.For these creative tasks, the median screen recordings for each interface were 2.44 minutes long (Min=1 minute, Max=20.25 minutes, SD=5.36 minutes).We asked the participants to send us their screen recordings in advance of the interview.The interviewer watched the screen recordings before conducting the interview and highlighted parts of the recording where participants employed diferent exploration strategies when using the interfaces.During the interview, we discussed the participants' creative goals using the highlighted parts of the recordings as discussion prompts.The recordings were used as discussion prompts only and not as data for analysis in themselves.All interviews were conducted remotely and lasted for a median of 40 minutes (Min=32 minutes, Max=60 minutes, SD=10.19 minutes).Please see Appendix A.1 for the interview questions.Designed sound to build brand experiences for various international brands and airport authorities.Designed "sonic identities" for brands ranging from sound installations for their public spaces as well as designing product sounds.E.g., the sound of a car's engine, doors opening or closing, etc. Focussing also on data sonifcation projects.
Sound design for vehicle or gardening simulation video games working directly with environmental soundscapes.Designing quad ambiance and sound efects, and implementing them in the game engine.
Sound designer and composer.Designed sounds for over 40 games.Also worked on sound design for flms as well as some movie trailers.
Electroacoustic music composer using Ableton Live and FL Studio.
Music composition for ambient/rock and experimental/avantgarde genres.Worked for theatre and other projects that are in between sound design and music.Co-leader for a desktop Foley system.Also writing music software.

Data Analysis
Due to the exploratory nature of this work, we chose an inductive, refexive thematic analysis (TA) approach [9] for analyzing the interview transcripts.One author conducted all interviews.Two authors (including the interviewer) collaboratively analyzed the data using a bottom-up approach.We frst familiarized ourselves with the transcripts by individually and independently reading them at least twice.We then coded the transcripts with quotes relevant to our research objectives.Next, we collaboratively combined and refned our codes using Atlas.TI4 .As recommended in [10], we use a semantic coding strategy during our coding process where each code captures a semantic observation.For instance, a quote from a participant such as "some randomness (in the AI-generated output) is always refreshing" is considered as one code.Quotes from other participants making similar observations may also be tagged to the same code.This code, amongst similar other codes, is then organized under a theme such as "Non-determinism assists creativity".Through this process, we iteratively refned and identifed 76 codes.We use afnity diagramming to assist us in collaboratively organizing the codes into 12 themes.These themes are organized under the 3 sections in the results section below.
In refexive TA, meaning is not "excavated" [11] from the data, but is subjectively generated through a researcher's interpretation of the data [9].This nature of the analysis makes it difcult to formalize a sample size or defne data saturation (or the minimum number of participants needed before stopping data collection) [11].Thus, instead of defning data saturation for this study, we resorted to deliberately seeking a varied group of participants based on their geographic location, background in sound design, and number of years of experience in sound design.With this, we tried to gather diverse views and opinions of AI during our study.

THEMATIC ANALYSIS FINDINGS
In the following three subsections, we organized the themes from our inductive, refexive thematic analysis into three meta-themes: (1) An AI-assisted sound design process; (2) Working with unpredictability and ambiguity; and (3) Sound designers' expectations of AI for sound design 5.1 An AI-assisted sound design process 5.1.1Fast iterative exploration.Sound designers are always looking for new sounds to use in their work."Like if you're working on a sci-f game, then you can't just use run-of-the-mill sounds.And just so people are always looking for new sounds, like a new palette so to speak"(P1).Some commonly shared frustrations our participants observed in their current design process were around the manual processes of creating new sounds on tight deadlines or low budgets.Creating and manipulating new sounds takes time and it can be frustrating as "a lot of back and forth happens when someone (a client) has something on their mind that they can't verbalize and then you're trying to fgure out what they want"(P1).In such cases, being able to quickly and iteratively create novel sound samples using AI is benefcial.
P1: "It's really useful to be able to go through 20 iterations in less than half the time that it would take me to do it in the traditional way.And then because you can adjust so many parameters so quickly, then you're not stopping and changing things.You're not editing waveforms, you're not changing plugins.So I think it is really useful [...] I think people tend to overstate what creativity is.But to me personally, it is to be able to go through a lot of things quickly and to select the right bit of sound for that purpose." 5.1.2An alternative source to field recording.Often, sound designers sourced new sounds by feld recording them and further processing them to develop new sound palettes.Such manual recording activities can be frustrating as they cannot always control a recording situation."You can't tell everyone in a city 'Be quiet for a second.I need to record this thing"'(P4).Typically, a 5-second recorded audio takes a couple of hours to clean, denoise, and process before use.In such cases, AI-generated sounds can be considered as a suitable and convenient alternative for "fnding interesting source material" (P7).
P4: "Most likely it would be I spend a day with the interface making a bunch of sounds and I just record all of them.I'd delete the ones that I don't think will be useful and I'd keep all the rest [. . .] I'd almost treat this like feld recording in a sense, but instead of me actually going outside to record it, I am going into this interface to capture it." 5.1.3Creating unreal but tangible sound paletes.The bulk of sound in a flm is usually added in post-production [70].Sound designers typically develop and use a custom palette of sound efects for each flm [98]."In sci-f movies [. . .] we want to give people the kind of 'metal' feelings.That this world is made from science, and not really an actual world.To feel that it's a diferent world compared to my living world" (P3).Thus, designers are often on the lookout for unreal, but plausible-sounding sound elements that assist in building immersive experiences for the consumers of such media.Using AI-assisted sound design tools in this study, designers were able to create such fantastical or alien, but tangible sounding palettes.
P4: "Obviously you can make stuf like this in a synthesizer, but the problem is it sounds like a synthesizer, it doesn't sound real.[. . .] And while this (AI-generated sound) doesn't sound like something that's real, because it's in some way based on something that is a real recording, it still has a kind of tangible quality to it.And that's kind of what the value is.You can make synthetic sounds that still sound somewhat like there's a real object doing it." Although the models used in this study were not trained with the goal of generating unreal sounds, the interactivity encoded in them enabled the designers to generate such sound palettes.Five designers(P1, P3, P4, P5, P7) noted that the generative AI tools were better used for generating such sounds rather than replicating real-world recordings.
P7: "We are always on the hunt for those kinds of elements where we can layer something that actually exists with something that does not exist to enhance immersion for the consumer.Those are the elements that are more interesting for me personally.If I want to have the recording of a falling tree, I can just go out and record it.I don't need a tool for that." Further, six sound designers (P1, P3, P4, P5, P7, P8) we interviewed said that they rarely used sounds from their own libraries or external databases as-is in their projects.They usually processed the recordings through 'efect chains' (i.e., using a Digital Audio Workstation (DAW) to process sounds through a chain of efects such as adding/removing distortion, reverb, etc.) to ft the requirements of diferent projects.They found using the generative AI tools in the study useful as part of such efect chains.The interactivity in the tools could be used to extract textural components from various sounds, which can be used as layers to enrich other recorded or synthesized sounds.
P7: "(Describing their creative task result) For me, that would be like a sci-f layer or that could be used in some trailers when there is something popping up.Or when a spaceship fies by.You can use that as a sweetener.Interviewer: What is a sweetener?P7: Yes, say you have a sound, but then you put something (.) on top of it like spices.And then it's like, wow!That's new!" 5.1.4Annoying, but Fun!Both AI-based tools in this study embodied non-determinism in ways of controlling the generated sounds by using either synthetic sound queries (interface-1) or unlabeled dimensions (interface-2).This nature of the AI-based tools was appreciated by our designers for their ability to allow exploration and serendipitous discovery of novel sounds, even when the sounds were not in line with the participant's original task goal.For instance, when performing his open-ended task with interface-2 P2 said: P2: "I understood that I was exploring and there was some discovery.So every once in a while you'll hear me say, 'Oh, I like that!'.Even though it wasn't necessarily exactly what I was looking for, it had something that I liked" Further, most designers noted that while this exploratory nature of the AI-based tool was fun, it would be annoying or frustrating to work with it on task-oriented work on a regular basis, especially on a deadline.P4: "Well, one thing I found fun was seeing how the AI responded to the synthetic reference and how it didn't listen to me, right?So sometimes I made a change and it didn't quite refect that and I found that interesting.But if I worked with this every day and I was on a project with a deadline and I really wanted it to listen to me, then I'd imagine it would stop being fun and it would start becoming frustrating trying to get it to do those specifc things."

Working with unpredictability and ambiguity
5.2.1 Exploration strategies.For interface-1 (domain-specifc controls), the general exploration strategy we found amongst designers was following a 'broader frst, then narrower' strategy.For instance, participant P5 said she would experiment broadly frst, say using a wider range of frequencies, and then narrow down to the specifc perceptual outcomes she had in mind by employing reduced listening (reduced listening, as explicated in section 2.1, is when designers concentrate on the sound for its own sake, as a sound object, independently of its causes or meaning [56]).
P5: "And as I say in the synthetic reference, this worked quite well because the sounds were like (MIMICKING THE SOUND OF A CICADA TRILLING), so I selected frequencies that are typical without too much thinking.Let's say higher frequencies.I did everything quite rough, not knowing the system and then trying to achieve this to get as closer as I could (to the goal)." For interface-2 (technology-specifc controls), to understand the parameter space they were exploring, participants employed multiple strategies such as -(1) simply playing around with each control and observing its efect on the generated sound (P1, P2, P5, P8, and P9), or (2) by using a 'Systematic Change without Compounding' where the parameters are reset to the original positions frst and only one parameter is changed at a time to observe or isolate its change (P4), or (3) by using a 'Min-Max' strategy by observing the generated output at the minimum and maximum limits of a parameter's range (P3, P7).While P3 used the 'Min-Max' strategy to clearly isolate the change made by a parameter, P7 used that strategy to see how far he could push a control to get something "new or weird"(P7) out of it.
Overall, we observed that the participants who approached the exploration with both interfaces systematically discovered new sounds and were generally satisfed with the exploration, even when the outcomes did not match their original goals.One participant (P6), who reportedly approached the exploration randomly and without a goal, found it difcult to get any satisfactory output and gave up performing the task.Although other participants discovered interesting new sounds from their explorations, they expressed their desire for more predictability in the controls so as to be able to use the tools regularly.

Opportunities from ambiguity.
As outlined in Section 3, both interfaces used the same underlying AI model, but with different interactivity mechanisms governed by diferent levels of ambiguity, to control the generation of sounds.All designers in this study noted that while both interfaces could generate unpredictable outputs from the AI models, the controls on interface-1 (domain-specifc controls) were more intuitive and comprehensible as compared to those on interface-2 (technology-specifc controls).This was primarily because interface-2 had (1) unlabeled controls, and (2) a higher number of controls than those on interface-1.When using interface-2, some designers noted that the exploration seemed like a "trial and error" (P7), while others (P2, P3, P4, P5) found that this "lowest form of control" (P4) gave them greater opportunities for exploration as there were more control parameters to "twist" (P3).
P4: "One thing, of course, is it's less intuitive in the sense that nothing's labeled, [. . .] but by not giving it a name, it actually made more sense in a way, because you just see that as an abstract quality, the AI is doing something with it.So just naming them arbitrarily kind of made you pay attention to what they were actually doing more and not expecting something that it wasn't going to do.The lack of specifcity makes it feel open in a diferent way." Further, when using interface-2, all designers expressed the need to be able to label the dimension based on their own preferences.Designers gravitated towards labeling the dimensions based on either semantic changes (P1, P6, P8) or acoustic changes (P2, P7) they observed in the generated output.

5.2.3
Modes of working with audio interfaces.Although designers indicated that labeling dimensions would enable them to use the interfaces better, especially when using interface-2 (technologyspecifc controls), two designers (P3, P5) reported that they relied on auditioning to understand the role of each parameter, even when using labeled controls on interface-1 (domain-specifc controls).Such designers built an intuitive knowledge about the efect of each control parameter on the generated sounds and did not rely on the descriptions provided on the interface.P5: "Usually I don't even read descriptions much.I just listen to what comes out.It's a nicer way of exploring for me.And then when I'm familiar, I can control it." Further, while using interface-1, we noticed fve designers (P1, P2, P3, P5, P8) stopped listening to the synthetic sounds and focused on listening to the efect of the parameter change directly on the matching AI-generated sound itself.Reading and observing the changes on the synthetic spectrogram was sufcient for them to understand the efect of the change they made.And thus they could focus more on the efect of their changes on the AI-generated output.
P2: Since [..] the AI-generated (sound) was really what I was exploring, [..] and I can read the spectrograms well enough to know that I just didn't have to go through that intermediate step.So spectrograms were helpful in kind of building out what the goal was." Finally, two designers (P3, P8) found it easier to create atomic units of sounds, such as a single impact sound or a single dog bark.Then fxing and editing the important semantic and perceptual aspects of that single unit, and subsequently looping or repeating it in a DAW.This gave them better control of the creative process in adjusting the variability of the sounds to their liking.
P3: "I want to create the sound that is, actually can be used in my work.I think it should be one -how to say, one should sound, not (THUD THUD THUD THUD).Only one (THUD).If I need more of this, I can copy-paste (loop or repeat it in a DAW)."

5.2.4
Understanding unpredictability of the response.As outlined in section 3.1, for interface-1 (domain-specifc controls) we gave greater fexibility in generating the synthetic sounds, while not all synthetic sounds resulted in meaningfully matching AIgenerated sounds.For instance, the Greatest Hits dataset was limited by a certain range of rate of impact (number of impact events per second).When designers tried to query the AI model for higher rates, the model generated unpredictable responses.During our interviews, we discussed the nature of the generated sounds and asked our participants if they understood the reasons behind the AI's unpredictability due to its limitations.Three participants (P2, P4, P5) were familiar with the idea that AI was limited by its training data.Participant P2 had experience with building and using AI models, and participants P4 and P5 were familiar with popular generative models such as ChatGPT [3] or DALLE-2 [2].Participants' prior experience with the limitations of generative AI, across diferent modalities, might have made it easier for them to reconcile their understanding of the failure modes in our interfaces, especially when the changes they made did not align with their expectations.For instance, while explaining the unpredictable response from interface-1, P4 said: "P4: Often, changes in synthetic reference didn't clearly correlate to the changes in the AI-generated sound.[. . .] Sometimes the fade-out parameter didn't really do that much to the AI-generated sound.Interviewer: Can you tell me why you think that is happening?P4: Why?Not exactly sure why it wasn't following along exactly, but I'm guessing it's because it's trained on a certain kind of response that already has a certain type of fade-out innate in it, and so when you change the fade-out, there's only so much it can change based on what kind of input it has had."

Sound designers' expectations of generative AI
5.3.1 Cinematic efect over accuracy.Through our interviews, we found that interviewees focussed mostly on the overall perceptual aspects of the sounds they worked with.Aspects such as where the sound originated from were not necessarily important to them.
For instance, although we set up our AI CSTs to generate 'Hits & Scratches' impact sounds made by a drumstick, the sound designers used the models to create novel base sounds and sweeteners for footsteps (P1), fantastical 'adolescent monsters' (P3), trilling cicadas (P5), sci-f whooshes and fying machines (P7), and as layers over percussive drum beat (P8).
P1: "I think the most important thing, whether it's movies or games, is not accuracy so much, but immersion.So the footsteps that you hear in a movie, do not sound like that in the real world.P4: "So a lot of other AI seem to be trying to replace a creator so that someone can get sounds who don't know how to make them, whereas this one seems more useful for someone who already knows how to make sounds but just wants to add to their arsenal by having another tool." In [89], Susini, et al. emphasized that sound design as a practice is not just concerned with generating new sounds, but is also associated with a designer-led research-oriented design process grounded in psychoacoustics and sound cognition.Although most generative AI systems focus exclusively on the generation of new sounds, they do not focus on "what the sound should do, or what it should be" (P5).As such, the results from our interviews suggest that the best use of AI is as a Creative Support Tool, as a part of a larger creative process owned and controlled by the designer.
P5: "I would like to keep the ownership of the creative process.I imagine the sound as it should be because it comes from a long research [. . .] The creative design process is much more than making the sounds.It is more about knowing what you want and fnding the right tools.[. . .] So if the AI is also part of the research process, it could have good ideas." Finally, our results indicate that AI algorithms have the technological capability to provide means for creators with novel ways of creating sound for their work which traditional signal-processing techniques cannot do.For instance, in our study, we observe two such instances where designers were able to discover novel base sounds for their sound palettes during exploration or extract sweeteners or textural components to layer over other sounds (see section 5.1.3).This capability to be able to modify audio signals in novel ways gives creators greater opportunities to create new artifacts.
P4: "The approach where it is more about creating the individual units of sound rather than the fnished product of sound, makes much more sense.It seems at least to be more achievable than what AI seems to be doing in the visual space.Because it doesn't always necessarily understand composition, it gets things roughly in place.What I've seen on people using AI for sound is that it's good to get good approximations, but not necessarily always to do things all the way."

5.3.3
Need for focus on AI for sound design.Most current research in audio synthesis focuses on music and speech production and very little work exists to model environmental sounds [78].This feeling was conveyed by P4 during the interview: P4: "A lot of the applications you're seeing right now are kind of in the infant stages a lot of the time.From what I've seen so far in sound there haven't been that many great uses of AI so far, at least ones commercially available or available on the market.And a lot of that, I think is because they're taking a more music approach where they're trying to streamline the job of a music producer." Further, given the recent surge in text-to-audio models, two designers (P4, P5) felt that AI models that needed to be prompted using text would be a barrier for sound design which needs granular, continuous, and "intimate control" (P2) to design sounds.Developing controls over AI models where designers can "leverage their current skills" (P5) instead of learning newer ways to prompt AI models would be more benefcial for creator use.

DISCUSSION
In this study, we sought to investigate how generative AI technologies could support sound design practitioners in their creative work.We found that AI-based CSTs could assist sound designers in their creative process by providing means to iterate over ideas quickly, by generating fantastical and novel-sounding elements, and by reducing the need to manually source individual artifacts via feld recording for their creative work.Further, we found that although the unpredictability of controlling the AI-generated artifacts assisted in the serendipitous discovery of new sounds, the exploratory nature and unpredictability in controlling the generation could be a hindrance to task-oriented work.Further, in our study, the sound designers employed various strategies while exploring the design space generated by the AI-based CSTs.These strategies helped them better understand the limitations of the generation capabilities of AI-based tools.Finally, while AI algorithms are usually incentivized to accurately replicate real-world sounds, in contrast, we found that sound designers were more interested in the overall perceptual aspects of the sound than its accuracy.We thus found that AI-based CSTs could easily be integrated as part of a larger creative design process, owned and controlled by the designer.Such CSTs can produce novel sound elements that sound designers can incorporate into their compositions as layers over other sounds or use as individual components for a better cinematic efect than the accuracy in their compositions.

AI assistance in the practice of sound design
Recently, human-AI interaction researchers have been increasingly interested in understanding how mixed-initiative creative interfaces (MICIs) [29] can be applied in a work setting in diferent domains of creative work [68,69,96].In our work, we respond to these questions in the context of sound design by proposing a mode of working with generative AI where designers perform exploration and creation using AI-based CSTs.Findings from our exploratory study suggest that such tools can assist in a fast iterative exploration (section 5.1.1)to help sound designers fnd novel sounds to use in their work.This fnding is in line with some recent research on CSTs in the visual domain, in music composition, and in storytelling where algorithmic tools were used predominantly for idea generation [14,20,22,53,61,62].Further, such AI-based tools can generate synthetic surrogates of real-life sensory information (such as, in our case, feld recordings (section 5.1.2))which can constitute realistic and convincing alternatives to this information.Consequently, (sound) designers would be able to save the time and resources they would need to obtain this information in the frst place.This observation could be extended beyond the realm of sound and also include visuals and other sensory modalities.
In [17], researchers note that while the unpredictability (section 5.2) emerging from AI-based tools supports creativity, it could be a hindrance to task-oriented creative work.We further this understanding for sound design (section 5.1.4)and fnd that sound designers might overcome this limitation by performing exploration (section 5.2.1) as a separately focused task [26], by employing "reduced listening" (P5), to "build a library" (P4) of novel sound palettes for use in their projects.The possibility of using CSTs in this way to generate novel individual units of sounds, instead of entire compositions, gives professionals another tool "in their arsenal" (P4) and more ownership of their creative process (section 5.3.2).

Constrained and Unconstrained Randomness
Previously, researchers have investigated the role of constrained and unconstrained randomness in interactive systems on user experience [59,93].In [59], using an example of a music-listening interactive system the authors observe that, at times, unconstrained randomness can contribute to rich user experience (such as serendipity).
They also note that this positive experience depends upon the size of the audio library, where large-sized libraries can have detrimental efects on the listener experience.In such cases, adding constraints to randomness (by constraining content) gives the users the ability to manipulate or control the afective state of their user experience.We observe this duality of unpredictability and constraint in our study.Our impact sounds 'Hits & Scratches' model was smaller and more constrained in terms of the variety of sounds generated as compared to the 'Environmental Sounds' model which generated sounds from seven classes.Our participants found models with large variances in timbres, such as the environmental sounds model, detrimental to targeted creative exploration.For instance, participant P7 reported: "The variety of sounds that I got out of the (environmental sounds model) was very extreme.I think that a tool that ofers such a broad variety of results is like a two-edged sword." Further, our interface-1 was constrained in terms of providing means to explore the AI's latent space using only synthetic sounds, as compared to interface-2 which provided means for unconstrained exploration directly in the latent space of the model.While using our CSTs, P6 reported: "(Interface-1) was just like playing with an old synthesizer or something.It was quite easy to grab things and just tweak them and see what happened.(With interface-2) none of these settings did anything I was expecting at all.".Our fndings thus indicate that constraints implemented by either smaller models (such as the 'Hits & Scratches' model) or by using synthetic sounds for steering the CSTs assisted designers in better understanding the capabilities of AI (see section 5.2.4) than when using larger models or interface-2.

Refections on designing and implementing AI-based tools for sound design
On selecting interactive AI models: While we implemented two CSTs in this study, our aim was not to compare them with each other but to provide our participants with two unique ways of interacting with the underlying AI model.While selecting algorithms for interactivity, we aimed to explore algorithms that worked primarily in a post-hoc fashion (i.e., worked on existing pre-trained GAN models).We found that using methods such as SeFa [79], we could integrate any available pre-trained GAN models from existing marketplaces [31,46,103].Further, using methods such as EBF [51], enabled us not only to use domain-specifc controls for exploration but also additionally constrain multi-class large audio models using class-based soft constraints [83].Using these soft constraints, the designers could target their exploration to a part of the latent space oriented toward that class.We thus found both these methods effective in providing a wide range of options for exploration [81] within our CSTs.Such methodologies for designing interactivity over AI models can be easily extended to other modalities such as images.In light of the recent environmental impact [25] due to the training of large generative AI models, we suggest future CSTs, for all modalities including sound design, could make use of existing pre-trained models by leveraging such post-hoc methods for interactivity.
On visualizing sounds: While designing our interfaces, we visualize the spectrogram of the generated sound because the controls on both interfaces modifed the spectromorphology [82] of the sound.Interestingly, through our interviews, we found that these visualizations provided means for the designers to describe their creative goals in spectromorphological terms.For instance, participants used terms such as "seeing the individual events" (P2), "fade-in is quite long" (P4), or "removing the initial transient and softening it to leave the body and tail" (P4), etc.Previously, researchers in the explainable AI (XAI) for arts [12,13] used latent space visualizations to explain or debug their creative goals.We build upon this work and suggest that spectrogram visualizations could provide a great means for designers to communicate their creative goals and to understand the output of AI-based CSTs.

Ambiguity in interactive user control
On interface-2, we deliberately left the dimensions unlabeled to allow the designers to interpret the dimensions based on their intuition.The ambiguity in the dimensions made the exploration "more open (P4)"(section 5.2.2) and diferent participants came up with diferent semantic or acoustic explanations for the efect of each dimension on the edited sound (section 5.2.1).Participant P6 reported that Dimension 6 on the interface seemed to semantically change if the source of the sound was "outside or inside the room".Further P1 reported that Dimension 7 and 10 were similar to acoustic high-pass and low-pass flters and P3 commented that Dimension 10 changed the pitch of the sound.By naming the dimensions diferently, by using semantic or acoustic labels, the designers were able to use the sound design space in their creative work in a personalized way.Further, with interface-2, participants had to adopt a more varied number of strategies to meaningfully explore the sound design space (section 5.2.1) as compared to interface-1.Therefore, although interface-2 opened up more personalized avenues for the designers to interact with the AI, the ambiguity in the dimensions got in the way of its agentive fow [57], a highly engaging state of interacting with an AI-based CST.The ambiguity in the controls in interface-2 made the designers focus more on the intricacies of the system itself, rather than just focus on their creative output.

DESIGN RECOMMENDATIONS FOR HUMAN-AI INTERACTION IN SOUND DESIGN
In this section, we outline fve design recommendations for interactive generative AI.We specifcally reported some quotes capturing rich insights from our expert practitioners to inspire our readers.
DR1: Design interactivity using intuitive controls.From among our participants, P2 and P9 had extensive prior experience designing audio interfaces, synthesizers, and programming desktop foley systems.Their advice on designing a good perceptually relevant set of controls for sound synthesis systems is as follows.They suggest a good control should be: • Perceptually monotonic: If you moved a control forward to change the sound by an X amount, then moving it more in the same direction should do more of X. • Perceptually linear: This principle builds upon monotonic controls.If you moved a control by an X amount in the forward direction, and then you moved it the same amount in the reverse direction, both changes to the sound should be perceptually the same.• Perceptually orthogonal: If you had multiple controls, a change in one control should be independent of others.
These principles are especially important when developing technology specifc controls (as on interface-2) as these controls are extracted by an algorithm from the latent space of a generative model.We thus propose future human-AI interaction researchers focus on constraining such algorithms to yield specifc changes based on these principles.DR2: Variety is a two-edged sword.The general trend in large language models or image generation research is to build large overarching generalizable AI models that cater to generating a large variety of images, art, or text.A similar trend is observed in audio, where a large audio model generates music, environmental sounds, as well as speech [6,60].Such large audio models can perform well as tools for exploration but are less useful for task-oriented work.This is particularly due to the complexity of the learned latent space.Small changes in the parameter space of such models can lead to large perceptual changes in the generated sounds.Participant P7 termed this variety as a "two-edged sword".We thus propose that future interactive AI applications for sound design focus on giving designers the ability to explore smaller models trained on a more targeted range of sounds.Or provide means to constrain the exploration of large audio models based on class, semantics, or other perceptual aspects of the sound (see section 6.3).DR3: More cinematic efect than accuracy.In section 5.3.1 and 5.1.3we showed that our participants valued perceptual aspects of the generated sounds and the AI's ability to generate 'unreal but tangible' sound palettes, more than the accuracy or the origin of the sound.Currently, most audio AI algorithms objectively incentivize the replication of real-world sounds.While real-world sound replications are useful as an alternative to feld recording (section 5.1.2),they will have very limited use in being able to generate novel sound palettes.We thus propose that there is value in pursuing a research approach where AI models "do not replicate real life too well"(P4) and are able to extract textures and patterns from sounds which current signal processing techniques cannot do.This approach would give artists and creators more creative tools in their arsenal, rather than simply automating the generation of real-world sounds that they can record easily.DR4: Seeing sounds as an alternative to listening.Previously, Cartwright et al [18] demonstrated that when using visual representations of sounds such as spectrograms, they collected better annotations for sound events than when using audio alone.Visual spectrogram representations of the sounds gave annotators an opportunity to 'glance-and-click' on the sound events while listening which improved the accuracy of the collected annotations.In our study, we make a similar observation.At times, the designers used the spectrograms on the interfaces as a proxy for listening."It's very nice to have the spectrogram because this gives you a good forecast.It is a good shortcut to imagine how it will sound like so you can even not listen to it" (P5).We thus propose that using such visual representations of the sounds can reduce the cognitive load associated with making small edits and stopping to listen to the generated sounds, especially when doing exploratory work.DR5: Improving the explainability of dimensions.As observed in section 5.2.3 and 5.2.2, although most designers found the ambiguity in dimensions a hindrance to task-oriented work, they observed that giving them the ability to personalize the dimension names would improve the usability of such tools and the explainability of the dimensions (especially with interface-2)."With the 10-D interface, I found myself wanting to change the label after I explored it so that I could remember what it did for me" (P2).Further, in our conversations with P6 and P7, we observed that for understanding and learning controls on synthesizer interfaces, designers usually relied on not just the names of the controls, but also their ranges and units of control.For instance, units such as 'dB per octave' are associated with fltering frequencies.P6 observed that on interface-2 all dimensions operated in a range of [-5, +5] with no units, which made it difcult to memorize the function of each control.We thus propose future human-AI interaction research to encompass dimensional controllability for sound models to rescale the ranges and adjust or assign units on controls to ft existing techniques on commercial synthesizer interfaces.

LIMITATIONS AND FUTURE WORK
Audio AI research is evolving rapidly with newer innovations in building larger, faster, and better-quality generative audio architectures.While we use StyleGANs [52] to build our CSTs, other alternatives based on AI architectures such as Difusion [60] are emerging as potential alternatives.Although we have tried to keep our inferences on assessing the potential of AI for sound design free from any technical constraints or usability issues, new modes of interactivity will change how designers perceive and use AI.Therefore, more research will be needed in the future to understand how the practice of sound design evolves along with newer AI models.
We conducted this study with nine professional sound designers from diverse geographic, years of experience, and sound design backgrounds.With this, although we present a rich description of how AI-based CSTs can be used by sound designers in a work setting, given the qualitative nature of our study our fndings might not generalize to broader populations.Further, the study was conducted where the participants used the AI CSTs for only a few tasks.Our future work will focus on capturing patterns of usage as well as studying the diferent parameter exploration strategies in depth in a professional work setting, over longer periods, and in various phases within the sound design project cycle.

CONCLUSION
In this paper, we investigated how sound designers can use generative audio AI models in their creative practice.We designed and implemented two interactive audio AI CSTs and invited nine professional sound designers to apply the CSTs in their practice.Through semi-structured interviews, we gathered insights on how to situate AI-based tools in the sound design process, the sound designer's ways of working with unpredictability and ambiguity in AI, and their expectations of generative AI-based tools.Further, we reported fve design recommendations for future interactive AI-based creative support tools for sound design.Through this work, we hope to bring focus to this area of interactive audio AI and explore opportunities to improve AI assistance in the practice of sound design.

A APPENDIX: SUPPLEMENTARY MATERIALS A.1 Semi-structured Interview Questions
As discussed in section 4.2, our interview consisted of three parts: • Participant's background and experience: Through these questions, we focused on capturing the participant's experience with sound design -Can you describe some of the projects that you typically work with?

A.4 Attribution for icons and images
Most images in this paper were created by the authors using a combination of various drawing tools.Some visual icons were sourced from the following websites: • In Figure 1: the sound designer icon is sourced from Flaticon.com; the domain-specifc controls icon is sourced from a "slider" icon by Inggit Jaya from thenounproject.com.• In Figure 2: the sound designer icon is sourced from Flaticon.com;

Figure 1 :
Figure 1: A conceptual diagram (a), and screenshot (b) of interface-1.(a) A sound designer can use the domain-specifc controls from ○ 1 to generate a synthetic reference sound seen in ○ 2 .This synthetic reference sound is used to "query" or "search" the latent space of an AI model shown in ○ 3 to generate a matching AI-generated sound in ○ 4 .(b) The screenshot shows the placement of the controls and the synthetic and generated sounds as viewed by the designer on the web interface.Please see Appendix A.2 for a link to a Google Colaboratory version of this interface, and A.4 for image attributions.

Figure 2 :
Figure 2: A conceptual diagram (a), and screenshot (b) of interface-2.(a) To edit the audio in ○ 1 such that the number of impacts in the sound increases, a sound designer can use the technology-specifc controls extracted from the latent space of a StyleGAN shown in ○ 2 to perform direct latent space manipulation shown in ○ 3 and ○ 4 , resulting in the edited audio sample in ○ 5 .(b) The screenshot shows the placement of the controls and the generated sounds as viewed by the designer on the web interface.Please see Appendix A.2 for a link to a Google Colaboratory version of this interface, and A.4 for image attributions.

Figure 4 :
Figure 4: Architectural components driving the audio AI interfaces used in the study

Table 1 :
Participant Details Original sound creation and implementation for online and theatrical flms, games, music production, and live performances.Field recording.Post-production work includes dialogue editing, sound mixing, audio restoration, and foley mixer.Educator for sound design.
Creative agency and ownership.Currently, most research in generative AI focuses on building omnipotent intelligent agents that can do it all -agents that can create art or compose music directly instead of being an enabler for creativity.While tools with greater AI agency would work well for novice users, for sound design experts, there are more opportunities for AI as an enabler rather than being a creator in itself.