While anthropologists have frequently taken artificial intelligence (AI), machine learning (ML), and data science as objects of study (among many other examples, see Besteman and Gusterson 2019; Forsythe 2001; Helmreich 1998; Seaver 2022; Wilf 2013), the field is perhaps the last in the humanities or social sciences to consider integrating these new technical practices into its repertoire of research methods. However, a small segment of anthropologists is beginning to do so (see Pedersen 2023). For the most part, anthropological appropriation of AI and related technical practices (e.g., ML and data science) has focused on developing systems that collect and analyse ethnographic data, with the latter tendency largely mirroring the adoption of computational text analysis in the digital humanities (e.g., Bamman et al. 2014) and computational social sciences (Edelmann et al. 2020).
Nevertheless, anthropologists still have yet to consider how such technologies can be developed as a means of depicting or performing cultural practice in ways that transcend the basic limitations of traditional forms of ethnographic media (i.e., text, film, etc.). This is surprising for at least two reasons. First, for nearly the entirety of the field's modern history, anthropologists have continually explored modes of sharing ethnographic knowledge beyond text such as moving image (De Brigard 1995), sound recording (Feld 1991), live performance (Turner and Turner 1982), and installation art (Hartblay 2018). In this context, it is rather odd that the field largely overlooks the possibility of algorithmic forms of ethnographic media, though such experiments are not entirely absent (Collins et al. 2017; Underberg and Zorn 2013). Second, as Diana Forsythe (2001) suggested more than two decades ago, AI systems, particularly those that are built to execute extant human practices, are nearly always engaged in the work of ethnographic depiction and performance whether AI researchers recognise this or not. As she illustrated, this is largely due to the fact that the production of AI systems that successfully perform a given human practice requires designers to pursue a prolonged, detailed process of gaining practical understanding of these actions through embodied co-presence with an expert practitioner. In Forsythe's analysis, this essentially means that ethnographic participant observation (or something so closely resembling it as to be indistinguishable) is a basic requirement for building humanlike AI. While her work has been pivotal in the anthropology of science and technology, those drawing from her work largely overlook one of its basic implications: AI is ultimately a means of computationally modelling and performing the thinking, sensing, and acting that constitute the core of human practice. It is a mode of posing hypotheses and articulating claims about particular forms of ‘humanness’, which are then enacted through technological performance (see also Laurel 1993; Suchman 2006).
Still, Munk et al. (2022) have recently offered a practical demonstration of the use of ML to build a computational ethnography of human practices of interpretation in ordinary social interactions. Their experiment focuses on the digitally native practice of Facebook reactions and aspires towards the creation of an algorithm that ‘passes for’ a user of the platform. Like other recent examples of the adoption of such techniques in anthropology, a significant portion of their work focuses on building a system that analyses thousands of Facebook posts and user reactions. Drawing on this dataset, their system attempts to predict how users might respond to various kinds of content. Their work highlights the way that contemporary technical practices like ML and data science often function as a means of modelling how human beings make sense of their world and respond to it. That is, it uses these new computational strategies to generate performant ethnographic models of a contemporary social practice: encountering sensory phenomena and indicating one's emotional response to these stimuli.
Despite the power of their experiment, however, much remains unexplored. Interpreting and responding to social media content is indeed now a vital aspect of social life, but it drastically differs from the domain of face-to-face encounters in real time, embodied co-presence. Online interactions may take place quickly, but their temporality is fundamentally discrete. By contrast, there is a continuous flow of energy and sensation between human beings present before one another in face-to-face social interaction. Similarly, online interactions (as well as their analysts) both privilege the textual and segmental dimensions of communication that overwhelmingly dominate digital social space. This leaves open the question of how such computational ethnographic media might be built to display competence in nonsegmental and nonverbal communicative forms, which despite their near obviation in digital social life remain vital to human interaction. Above all, one must not overlook the tremendous difference between predicting likely moves in an interaction and experiencing the aftermath of those actions. If one wishes to build a system that ‘passes for’ a member of a given social world (as Munk et. al suggest), the system needs to both take actions as well as deal with the consequences of those choices as its human interlocutors evaluate them.
This article illustrates how anthropologists might develop ethnographic media forms that perform the perceptual and interactive practices of making sense of the world and responding to it in real time, face-to-face, embodied co-presence based in AI and its descendent paradigms. It does so through a discussion of an experimental ethnographic practice in which I have designed a variety of artificially intelligent virtual performers of free improvisation, a musical subculture within the contemporary avant-garde. Drawing on several years of ethnographic fieldwork in Berlin, Chicago, and the San Francisco Bay Area as a woodwind performer active in free improvisation, I have designed these systems to spontaneously compose and perform music while listening and responding to fellow improvisers. In other words, I have built these systems to take the place of a human improviser by programming the system to listen and make sense of what other improvisers are doing by parsing their audible actions into data that these systems analyse as they perform with human musicians. As I argue in what follows, such systems represent a powerful alternative ethnographic media form that transcends the basic limitations of fixed media, or formats such as text, film, or sound recording that cannot simulate the inherent indeterminacy and contingency of interaction with other (human) subjects.
Aside from designing these systems, most of my fieldwork has focused on subjecting them to the critique of human musicians active in free improvisation, the performance practice I have built these systems to engage in. This largely consists of inviting improvisers to play with these systems and compare them to a human performer. In just the same way that AI is always an expression of its designers implicit assumptions and claims about humanness as a performance, encounters between my systems and human improvisers foreground the question of how ‘human’ is both a biological category as well as a role to be performed, with specific rules and conventions of performance varying considerably depending on the human practice in question. Because the systems I have constructed are not human, and yet feel very much like human (musical) interlocutors, encounters with these systems enable subjects to articulate and critique various elements of humanness as a performance that they would not otherwise address in their engagement with other (biologically human) improvisers. Among other reasons, this is due to the fact that human beings typically only critique the ‘humanness’ of other human beings at incidents of ‘misanthropic’ behaviour (howsoever this pejorative variable may be locally or contextually defined). And yet humanness as a performance consists of far more than what manifests at the boundary of misanthropy and includes a repertoire that permeates a variety of ordinary registers of human sociality. Likewise, as I detail later on, the boundary of humanness is often conflated with a subject's self-conception of the boundary between the practices that define their particular social milieu and its Others.
This article focuses on moments when the question of ‘humanness’ emerges in terms of an agent's ability to accurately make sense of the other (human) agent's intentions and preferences as a social interaction proceeds and do so entirely through a nonverbal domain. I highlight these moments because this layer of system design most closely overlaps with a specific practice within AI-descendent disciplines that has been of key interest in recent anthropology: data science (see Douglas-Jones et al. 2021). The claims about humanness I highlight in this article hinge entirely upon a system's ability to accurately predict present or near-future actions based on an ongoing digestion of real time audio data. That is, the layer of the system that most closely corresponds with the work of data science is the layer that human subjects often implicate as the key determinant of their ability to experience humanness in the system's interactive presence.
This discussion aims to demonstrate how encounters between AI and the people whose practices such systems are based on are an important ethnographic site for the anthropological study of ‘humanness’ as performance and all its dizzying (and perhaps disturbing) variety. It is enjoins anthropologists to expand their mode of engagement with AI from analysis and critique to development and use. While anthropologists have been slow (at best) to appropriate such technologies for their own field's purposes, encounters of the kind I describe here are the centre of gravity of a key domain for contemporary applied anthropology: user experience, human computer interaction, and related practices at the intersection of engineering and design. If the encounters I describe here enable us to observe how AI elicits the unsettling variability of how subjects conceive performative humanness, the same is likely true for the vast swathes of contemporary technical industries where the same is taking place, regardless of whether such encounters take place in the sector of ‘pure’ or ‘applied’ research.
An Interactive Algorithmic Ethnographic Performer…
Anthropologists have always shown an interest in media and modalities for sharing ethnographic knowledge and thought beyond the classic ethnographic medium of text. At the same time, exploration of novel ethnographic media has nearly always remained within the category of ‘fixed media’, or media forms for which the timeline of content remains set and predetermined1. Despite the dynamism of fixed media like moving image or sound recording, fixed media do not fundamentally respond to their user's actions. The only exception is the user's timeline browsing (i.e., flipping pages, seeking and skipping to a particular time marker in a film or audio recording). Hence they are an inherently one-way experience. To state the obvious, one can look at or listen to such media, but they can never look back, listen, or more importantly, respond.
Anthropological fixation on fixed media is particularly striking given continually rising interest in ‘multimodal’ and ‘sensory’ ethnographies. Multimodality calls for an awareness of the diverse media and sense modalities that form contemporary social life as well as for a pursuit of media and presentation formats beyond text. Sensory ethnography denotes a parallel concern that foregrounds the primacy of sensation in the constitution of social and cultural life. Both paradigms are deeply concerned with how the classic ethnographic text is often unable to really allow a reader to understand the specificity of auditory, visual, kinetic, tactile and other sensory features of the field site. While this work has raised interest in alternative ethnographic media, exploration remains resolutely limited to the domain of fixed media. This is rather ironic given that sensation is (at the very least) a bi-directional process. Sensation is the basis of interaction, even in heavily mediated and sensorially underwhelming modalities like SMS or email. One senses, responds, and thus generates sensory effects that other agents register and respond to through their own actions. Despite a deep concern for all of this, sensory and multimodal anthropologies fixate upon the media forms that are incapable of representing—or really, performing—the multidirectionality that is integral to sensation as social practice.
Of course, we increasingly live in a world in which we are inundated with technologies and media forms that possess this basic capacity for (at least) bi-directional interaction through the senses in the form of chatbots, automated voice assistants, robots and all manner of other machines acting like people. As Forsythe (2001) and Suchman (2006) have argued, such technologies always contain their designer's assumptions about humanness as a performance. As such, it is common to say that they ‘represent’ such assumptions. Yet representation is entirely inadequate as a conceptual framework for understanding what a machine is doing when it speaks, listens to us, and pretends to be a person. It is not representing conversation or social interaction; it is performing these integral actions of social life, howsoever poorly. This is much like the practice of performing ‘scenes’ the ethnographer encountered in their fieldwork, a practice known variously as the ‘ethnography of performance’ (Conquergood 1985), ‘ethnographic performance’ (Saldaña 1998) or ‘performance ethnography’ (Jones 2002).
Yet regardless of terminology, these practices all preserve a basic distinction between audience and performer, a differential of agency and authorship materially (de)substantiated in the proverbial, invisible ‘fourth wall’2. For this reason, I refer to the machines that act like people I describe in this article as interactive algorithmic ethnographic performers rather than performances. Much like a human performer, they are capable of continually producing distinct performances anew through their interactions with others3. More importantly, as others have pointed out regarding experiments in multimodal ethnography, they ‘generate relations with research participants’ (Dattatreyan and Marrero-Guillamón 2019: 220). While this is true of fixed media forms of multimodal ethnography, something fundamentally different obtains for the interactive algorithmic ethnographic performer. It very literally generates unusual relations between ethnography and its human interlocutor because it does not allow its human counterpart to engage with the media form as a user or an audience member. By sensing its human interlocutor, it turns both itself and its human counterpart into interacting performers.
As an experimental ethnographer, I have designed such systems based on my fieldwork and experience as a performer of free improvisation (primarily on woodwinds) in Berlin, Chicago, and San Francisco. This has largely taken place through an intuitive, creative process of playing with other musicians, observing them in performance, and listening to (or watching) recordings and then using all of these experiences as a basis for building a machine that plays like the musicians I have encountered. A machine that does this—that ‘passes for human’—must do more than just generate the sounds that human beings might and must actively sense its environment, analyse this data, devise responses, and execute them, all in real time and simultaneously. As a result, my work in constructing these systems has focused on me interpreting my own listening processes as well as those of others and using code as a way of writing field notes based on these experiences of participant observation. As others have noted about ethnography very generally and multimodal ethnography specifically, while the musicians I construct with computing machinery are certainly inspired by human players, they are necessarily also my creative ‘invention’ (Dattatreyan and Marrero-Guillamón 2019) rather than strictly neutral, documentarian, objectivist description.
Physically, these systems consist of microphones, assorted cables, an audio interface4, computer, and loudspeaker. The algorithms I have designed in the programming environment Max/MSP are based on my observations of other players (including through recordings) as well as from my own interactions with them as an improviser. Functionally, these systems take the role of a fellow improviser by composing and performing music in real time while also listening (through microphones and digital signal processing) and adjusting their course of action in response to their human interlocutors and anything else in their acoustic environment. Sonically, the system's behaviour resembles that of a typical performer of free improvisation, which focuses on avoiding pulse and harmony in favour of exploring noisy or inharmonic timbres (see Smalley 1997, for a description of these sound types). While human players are typically constrained to the physical possibilities of their instruments, voices, or whatever they use to perform, the improvising systems I have designed exploit the possibilities of contemporary sound synthesis and are capable of performing a variety of different instruments.
Because others have used the term ‘algorithmic ethnography’ to describe related practices, it is necessary to clarify how an algorithmic ethnographic performer differs. The algorithmic ethnographic performer is distinct from the project of an ethnography of algorithms (Christin 2020; Seaver 2018), or the study of how human practitioners produce, revise, and experience (digital) algorithms. I am also not referring to ‘digital anthropology’ or the ethnographic study of cultural practices that take place primarily through digital technologies (e.g., Boellstorff 2015). Moreover, I am not referring to the recreation of the observational and analytical faculties of a human ethnographer in the form of software (Zheng et al. 2015).
…of Free Improvisation…
The term ‘free improvisation’ is rather vague. There are numerous practices—musical or otherwise—that involve a significant degree of freedom and are thoroughly improvisatory. Hence it is important to note that I use this term to refer to a specific cultural practice with its origins in the work of Black American jazz musicians in the 1960s. However, despite the racial identity of the artists who first brought this practice to prominence, the three scenes I have studied in Berlin, Chicago, and San Francisco are all overwhelmingly white and male. While the term ‘improviser’ could refer to any performer (whether of music or not) whose practices significantly involve improvisation, this term (rather than a more specific term) has been how performers active in free improvisation specifically have referred to fellow practitioners. Unless otherwise noted, my use of this term is consistent with how it was used my performers of free improvisation in my fieldwork.
Sonically, performers of free improvisation tend to avoid several canonical structures of musical organisation such as harmony, pulse-based rhythm, and explicit, pre-determined form and instead privilege exploration of the full timbral possibilities of their instruments, voices, or other sound-making apparati5. Socially, practitioners of free improvisation strive to enact an egalitarian ethos within scenes dedicated to this musical practice through a network of consistent patterns of verbal and nonverbal communication between participants constituting a set of social routines that define this practice in distinction from others (Banerji 2023). Among other themes, these patterns contribute to a sensation of creative freedom as participants avoid any behaviours that might be perceived as instructing, directing, or criticising other players in the scene.
…and its Critics
Much of my fieldwork has been dedicated to staging encounters with improvisers and the virtual improvisers I have designed over the past several years. Most of this work has focused on testing one system in particular, known as ‘Maxine’6, which I first designed in 2009 (Banerji 2012, 2016)7. These encounters have largely consisted of private meetings I have arranged for improvisers to meet with me at my studio for the purpose of playing with Maxine. The improvisers I invite for this purpose are typically those who I have gotten to know through my involvement in free improvisation as a performer as well as a researcher with an interest in this musical practice. Upon meeting, I then allow improvisers to play with Maxine for as long as they like and then ask them to discuss how the system compares to a human performer. I also invite performers to either wait to see if the piece ends the way it typically does between improvisers (i.e., with mutual silence and the resumption of eye contact, which is typically avoided during play itself; see Banerji 2023) or whenever they would like to comment on the system's behaviour (and would prefer not to wait to comment till later since they may be likely to forget what they had to say).
Initially, my purpose in testing Maxine in this manner was the same purpose that any designer has in doing the same: refining the system's design through feedback from relevant human practitioners. However, a far less instrumental and much more meaningful social scientific and humanistic utility of this practice quickly emerged. Because of the commitment to egalitarianism that improvisers strive to uphold (even if they do not always succeed)8, they tend to avoid instructing, directing, or criticising fellow players. By stark contrast, improvisers are far more forthright about these matters when they encounter Maxine. In many cases, Maxine has been criticised for actions I myself have taken in playing with others but never been confronted about. Improvisers themselves also often remark upon how these encounters with Maxine enable this to happen in a way that is wholly unheard of in their routine interactions with one another as players. Such was the case for ‘Torsten’9, a white Swedish bassist in his 40s, during a session with Maxine at his apartment in Berlin in April 2015. In the midst of a long list of complaints about the system, he remarked upon how he had the same complaints about other players, but that he would never feel comfortable confronting them about these issues. As he put it, ‘I wish I could tell other people things like this!’
Encounters with Maxine elicit commentary on many topics. Given the system's gendered name, gender performance is a frequent theme. Similarly, since listening is an integral element of this social practice, improvisers often discuss their sense of how ‘well’ the system listened, as well as their definitions of ‘good’ listening, which tend to be highly variable. Other themes include notions of freedom and egalitarianism as they manifest themselves in sonic expressive practice, the structural role of silence, the definition of ‘improvisation,’ and of course the notion of ‘humanness’ as a performance. On this last theme, improvisers have regularly critiqued the system's humanness in terms of its ability to make sense of the real time stream of sound (and thus audio data) that emerges in a given piece. In other words, the element of the design of this system that most closely corresponds with data science is often the one that most prominently figures in debates about whether it behaves as a human improviser would or not.
The factors that lead improvisers to feel more comfortable criticising Maxine than they would fellow (human) players are both specific to this practice as well as general to human beings encountering such systems. On one level, the greater ease improvisers experience in criticising Maxine is entirely a result of the egalitarianism they aspire to put in place in these scenes. Whereas this egalitarianism compels them to conceal their judgements in routine circumstances, encounters with Maxine enable them to express these easily. On another, however, the ease improvisers experience in critiquing Maxine is likely a more general human phenomenon. All issues of variation aside, human beings tend not to openly, verbally question the humanness of how their (biologically human) interlocutors conduct themselves in face-to-face social encounters unless this involves ‘misanthropic’ behaviour (however this variable may be situationally dependent). Conversely, questioning Maxine's humanness comes easily as it is always questionable as to whether or how a machine can successfully perform humanness, this being true even for the most optimistic advocates for AI.
Amy and Maxine
As it is for many improvisers, Maxine's humanness was a key theme of this system's encounter with Amy, a white double-bassist in her late 30s based in Oakland, CA. One afternoon in March of 2014, we met at Amy's rehearsal space, which was located on a sparse residential street in a quiet neighbourhood in North Oakland. Like many improvisers, Amy was pleasantly surprised by how much playing with Maxine felt like playing with another human improviser. After a brief talk in which we spoke about our various recent musical activities around town, Amy and Maxine played a brief improvised duo lasting just under four minutes. In this piece, I set Maxine to play a synthesised version of ‘inside piano,’ a contemporary avant-garde instrumental practice in which a pianist works with both the keys of the instrument while also manipulating the strings directly with various objects and their hands. Rather than directly diving into criticism of the system, Amy facetiously comments that she needs time to ‘get used to [Maxine's] aesthetic’ in the same way she would with a real player. However, she also notes feeling that Maxine has a ‘short attention span’ in terms of its ability to focus on a single idea for a prolonged period of time.
For their second piece together, I chose to move Maxine to the system's electric guitar setup, which I had designed with much inspiration from Derek Bailey, a white British guitarist who was one of the practice's early exponents and whose playing style featured extensive manipulation of the instrument's basic sonic properties. This piece lasts just over two minutes. As before, Amy is pleased with Maxine's playing, but still finds that the system resembles an improviser who fails to stay with one idea long enough, at least according to Amy's tastes.
As an improviser myself, I had a vague idea of what Amy meant. Still, I was unsure just when Amy might have wanted Maxine to stay with an idea for a longer period of time or if this depended, perhaps, on what sounds Maxine was making at a given point in time. In situations like these, I often encourage the human improviser to stop in the middle of the piece to remark on what they are thinking rather than wait till the end of the piece. Stopping like this is wholly out of the ordinary in routine social interactions between improvisers, even if they are utterly quotidian for musicians in other arenas. Pausing in this manner is the hallmark of rehearsing as a social activity and invites the group to reflect on any ‘problems’ that have emerged. Given the commitment to egalitarianism in free improvisation (even if somewhat superficial), it is unusual for players to stop in the middle of a piece in the manner I was suggesting. Instead, players tend to wait till the piece ends as it does customarily with mutual silence followed by resumed eye contact between players.
In their third piece together, Maxine again plays the electric guitar setup from before. At my suggestion, perhaps, Amy chooses to end this piece in the middle rather than waiting for the customary silence that defines a typical ending. She stops just before the two-minute mark. About ten seconds before she stops, she and Maxine create a loosely interlocking pattern in which Amy plays several similar short note sequences with Maxine playing distorted guitar tones at the end of each of Amy's phrases10. Maxine stops doing so for around two and a half seconds before Amy signals me to turn off the machine so she can speak. She is disappointed that the system, whom she refers to as ‘he’11, suddenly ‘backed off a little bit’ right when they ‘were startin’ to develop a thing’. Presumably, Amy is referring to the possibility of building tension on a particular idea or ‘thing’ at this point in the piece.
Amy turns the conversation from mere matters of coordination and mutual interactional sensemaking to the issue of humanness itself. Amy felt the system withdrew from the interaction ‘in the way that a real person might not have’. On a very superficial level, Amy claims that the system fails to perform humanness. Critical to this failure is the system's inability to accurately predict Amy's intentions based on the stream of sonic data available at that point in the interaction. That is, the ability to engage this basic data scientific task correctly is what Amy points to in explaining why the system falls outside her definition of performative humanness. Yet the comment is as much about humanness as it is about ethnographic verisimilitude. Amy is telling me, like so many others have, that the system falls short as an ethnographic medium depicting (or really, performing) free improvisation as a social practice.
We decide to try another piece with Maxine on guitar. At around two minutes and ten seconds into this piece, Amy plays three descending melodic figures with varying lower target notes12. Each phrase decreases in tempo. In the midst of this, Maxine plays intermittent mangled guitar tones, which closely resemble the way an electric guitarist (like Derek Bailey, perhaps) might use the heel of their hands or everyday objects like screws, barbecue skewers, or other objects to achieve the same sonic effect. At the end of these three figures, Amy stands silently over her bass, listening to the system for ten seconds, during which Maxine continues with sparsely distributed distorted guitar tones.
Amy starts laughing; I smile in response. Breaking our silence, I admit to her that ‘I felt like you were trying to end there’ before I shut the system off so we can talk.
‘I was trying to end there…and you know I thought we were together. But then, it's ok, you know? This happens with real people, too, like “oh, ok. Little guitar solo moment.”’
In this instance, Amy finds, for better or worse, that Maxine's performance certainly evinces humanness. However, it evinces a humanness that reminds Amy of improvisers who are annoying (or at least amusingly irritating) rather than a human improviser she trusts or cherishes. Yet the error is very similar to what Amy points out as an issue in the previous piece. Maxine is unable to detect and subsequently predict Amy's intended direction for how the rest of the improvisation should proceed, or in this case, conclude. In the previous piece, Amy claims that the system is not like ‘a real person’ because it fails to parse her intentions. In this one, Amy still finds that Maxine is in error and has yet again failed to take notice of where she wants the piece to go. However, here she finds that this error is what reminds her of a person. Even if the system performs an undesirable humanness (from Amy's perspective), humanness is nevertheless what it gives off. Likewise, though it falls short of Amy's ideal, the system is still an accurate ethnographic rendering of the practice and its human practitioners, as she concedes.
Considering these last two pieces together, it seems that Amy's initial claim that Maxine is not like ‘a real person’ may have less to do with a strict claim about humanness as some kind of generic trait and more to do with Amy's conception of the norms of free improvisation as a form of social interaction. Not a ‘real person’ really might translate to ‘a person I find annoying’ or who fails to uphold some abstract ideal of how this form of music-making should proceed. Conversely, Amy uses the phrase ‘a real person’ not to refer to just any person, but with the implicit assumption that ‘a real person’ refers to an ideal improviser. Of course, in just the next piece, Maxine turns out to be much like a ‘real person’. I point to Amy's experience with Maxine not only because it exemplifies the way that participants construct humanness in terms of an agent's ability to use sense perception in a pragmatically appropriate way within social interaction but also because it features a pattern common across how several other improvisers have discussed Maxine's humanness. Whereas they may initially assert that the system bears no resemblance to a human player, they often subsequently (and unconsciously) change their stance to suggest that Maxine is like a human improviser, but one they find vexatious. The term ‘human’ often functions as a stand in for the subject's conception of the key features of members of their own social milieu or grouping rather than some universally valid repertoire of characteristically ‘human’ behaviours.
Locating the Anthropology of Humanness
Maxine is just one of hundreds, if not thousands, of extant machines built to act like people. Like all these other machines, both Maxine's design as well its reception in its interactions with human interlocutors raises the question of what humanness consists of as a set of performative features. Similarly, all of this work pushes the point that there is very little unity in the concept of ‘humanness’ as performance, especially when compared to the relatively stability and coherence of biological definitions of humanness. For every improviser who asserts that one element of Maxine's behaviour is ‘human,’ another asserts that the same is ‘not human’. Again, the human/nonhuman distinction may be taken at face value, but in many cases, it is simply a cypher for other concerns or issues. Despite what subjects might claim, ‘not human’ is just as likely to refer to ‘not an ideal human’ or ‘not one of us’ as it is to refer to a system that truly fails to evince any humanness at all. As I have suggested in this article, much of the distinction between ‘human’ and ‘nonhuman’ may hinge upon whether the system in question possess the ability to parse and respond to data it receives in the midst of an interaction. This is not to say that other elements of the system's behaviour are not relevant for evaluations of humanness. Rather, my point is that the perceptual faculty may be a key element of how human beings define humanness as a performance.
While anthropologists are still only slowly warming up to the idea of creating systems like Maxine as a means of understanding and ‘depicting’ (for lack of a better term) contemporary cultural practices like free improvisation, this work has been a central aspect of AI research in industrial settings and commercial application. Developing AI systems that evince ‘humanness’ in social interaction with a (biologically) human being is a key goal shared by hundreds of engineers working in this domain (see Svenningsson and Faraon 2019, among many other examples). Likewise, insights similar to what I present here are now an increasing consensus of AI researchers interested in encoding humanness in their systems. That is, engineers increasingly recognise that ‘humanness’ is far from universal (see Lortie and Guitton 2011).
At the same time, anthropologists, particularly those who have moved on from academia, continually find themselves very near to these practices within technology industries, particularly in the domain of human computer interaction (HCI) or user experience research (UXR). Given the rapid proliferation of AI-based technologies in the wake of ChatGPT, anthropologists in HCI or UXR are increasingly likely to be engaged in applied ethnographic work concerned with the technological performance of humanness. Likewise, they are also likely to be involved in primary research to understand how expectations for the performance of humanness vary (see Rodwell 2022). Ironically, then, it is far more likely that anthropologists in applied settings will be better positioned to ethnographically examine the technological reconfiguration of the concept of humanness as performance than those in academia.
Acknowledgements
The author thanks the Fulbright U.S. Young Journalist Program and the Berlin Program for Advanced German and European Studies for financial support for fieldwork for this project. The author also thanks organizers of this special issue and participants of the related panel at American Anthropological Association annual meeting in 2021.
Notes
Despite its utility for describing distinctions between media in various domains, the term ‘fixed media’ appears to have its primary discursive origin in the history of avant-garde musical composition in the twentieth century (N. Collins et al. 2013: 125–127). Again, there are still some important exceptions to the dominance of fixed media in anthropological experimentation with alternative formats (S. G. Collins et al. 2017; Underberg and Zorn 2013).
The fourth wall is certainly often ‘broken’, yet the fact that it is referred to as ‘breaking’ at all clearly indicates that it is a marked behaviour rather than the unmarked norm.
Again, it is odd that such performant, interactive media forms are yet to become a clearly established presence in the repertoires of anthropological ethnography given continual calls for modes which are ‘multisensorial rather than text based, performative rather than representational, and inventive rather than descriptive’ (Dattatreyan and Marrero-Guillamón 2019: 220).
This is a device that converts audio signal from analog to digital and vice-versa.
It is common for performers of free improvisation to appropriate objects not designed as instruments for use as instruments (e.g., balloons, sewing machines, rocks; see Atton 2012).
Despite its name, I never intended for the system to embody feminine social presence in social interaction. Nevertheless, my mostly male interlocutors have frequently described its behaviour in this manner, though this readily illustrates how ideas about social femininity are wildly divergent. Many participants have also spontaneously shifted the pronouns they use for the system, including neuter, as they discuss its behaviour. There is no question that critiques of this system reveal ideologies about gender performance, but this is a topic deserving of a longer discussion than what is possible here.
Maxine is hardly the first system of its kind; for a partial review of such systems, see Banerji 2018.
Veena Das might call this an example of ‘moral striving’, a term she uses to highlight the basic fact that much of what subjects aspire to on the moral plane differs significantly from what they are able to achieve. Despite the differential, striving continues (Das 2010). If egalitarianism is the goal, improvisers fall short on one major point, which is the stark lack of nonwhite, nonmale members in these scenes despite frequent pronouncements that they are ‘open to everyone’ (Corbett 2016: 1); see Banerji 2021.
I have used pseudonyms for all ethnographic interlocutors I mention in this article.
To hear an audio recording of this portion of Amy and Maxine's duo, please visit https://www.youtube.com/watch?v=FjNJM0rlLec.
She later clarifies that this is because I had explained that Maxine's guitar setup was very much inspired by Derek Bailey.
To hear an audio recording of this portion of Amy and Maxine's duo, please visit https://www.youtube.com/watch?v=AmWmF9XIxZA.
References
Atton, C. (2012), ‘Genre and the Cultural Politics of Territory: The Live Experience of Free Improvisation’, European Journal of Cultural Studies, 15 (4), 427–41.
Bamman, D., T. Underwood, and N. A. Smith (2014), ‘A Bayesian Mixed Effects Model of Literary Character’, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 1 (Long Papers), 370–379.
Banerji, Ritwik (2012), ‘Maxine's Turing Test: A Player-Program as Co-Ethnographer of Socio-Aesthetic Interaction in Improvised Music’, in Philippe Pasquier, Arne Eigenfeldt, and Oliver Bown (eds.), Proceedings of the 1st International Workshop on Musical Metacreation (MUME 2012) (Palo Alto, CA: AAAI Press), 2–7.
Banerji, Ritwik (2016), ‘Balancing Defiance and Cooperation: The Design and Human Critique of a Virtual Free Improviser’, in Hans Timmermans (ed.), Proceedings of the International Computer Music Conference (Utrecht, the Netherlands: HKU University of the Arts Utrecht), 48–53.
Banerji, Ritwik (2018), ‘De-Instrumentalizing HCI: Social Psychology, Rapport Formation, and Interactions with Artificial Social Agents’, in Michael Filimowicz and Veronika Tzankova (eds.), New Directions in Third Wave Human-Computer Interaction (1: Technologies; Heidelberg, Germany: Springer Nature), 43–66.
Banerji, Ritwik (2021), ‘Whiteness as Improvisation, Nonwhiteness as Machine’, Jazz and Culture, 4 (2), 56–84.
Banerji, Ritwik (2023), ‘Free Improvisation, Egalitarianism, and Knowledge’, Jazz Perspectives, (Ahead of Print), 1–24.
Besteman, C. and H. Gusterson (2019), Life by Algorithms: How Roboprocesses Are Remaking Our World (Chicago, IL: University of Chicago Press).
Boellstorff, T. (2015), Coming of Age in Second Life: An Anthropologist Explores the Virtually Human (Princeton, NJ: Princeton University Press).
Christin, A. (2020), ‘The Ethnographer and the Algorithm: Beyond the Black Box’, Theory and Society 49, no. 5: 897–918.
Collins, N., Schedel, M., and S. Wilson, (2013), Electronic Music (Cambridge, England: Cambridge University Press).
Collins, S. G., et al. (2017), ‘Ethnographic Apps/Apps as Ethnography’, Anthropology Now 9, no. 1: 102–118.
Conquergood, D. (1985), ‘Performing as a Moral Act: Ethical Dimensions of the Ethnography of Performance’, Text and Performance Quarterly 5, no. 2: 1–13.
Corbett, J. (2016), A Listener's Guide to Free Improvisation (Chicago, IL: University of Chicago Press).
Das, V. (2010), ‘Moral and Spiritual Striving in the Everyday: To Be a Muslim in Contemporary India’, in Anand Pandian (ed.), Ethical Life in South Asia (Bloomington: Indiana University Press), 232–53.
Dattatreyan, E. G. and I. Marrero-Guillamón (2019), ‘Introduction: Multimodal Anthropology and the Politics of Invention’, American Anthropologist 121, no. 1: 220–228.
De Brigard, E. (1995), ‘The History of Ethnographic Film’, Principles of Visual Anthropology (Berlin: de Gruyter), 13–43.
Douglas-Jones, R., A. Walford, and N. Seaver (2021), ‘Introduction: Towards an Anthropology of Data’, Journal of the Royal Anthropological Institute 27, no. S1: 9–25.
Edelmann, A., et al. (2020), ‘Computational Social Science and Sociology’, Annual Review of Sociology 46: 61–81.
Feld, S. Voices of the Rainforest (1991) (Compact Disc, Rykodisc) RCD 10173.
Forsythe, D. (2001), Studying Those Who Study Us: An Anthropologist in the World of Artificial Intelligence (Stanford, CA: Stanford University Press).
Hartblay, C. (2018), ‘This Is Not Thick Description: Conceptual Art Installation as Ethnographic Process’, Ethnography 19, no. 2: 153–182.
Helmreich, S. (1998), Silicon Second Nature: Culturing Artificial Life in a Digital World (Berkeley: University of California Press).
Jones, J. L. (2002), ‘Performance Ethnography: The Role of Embodiment in Cultural Authenticity’, Theatre Topics 12, no. 1: 1–15.
Laurel, B. (1993), Computers as Theatre (Reading, MA: Addison-Wesley).
Lortie, C. L. and M. J. Guitton (2011), ‘Judgment of the Humanness of an Interlocutor is in the Eye of the Beholder’, PLoS One 6, no. 9: 1–7.
Munk, A. K., A. G. Olesen, and M. Jacomy (2022), ‘The Thick Machine: Anthropological AI between Explanation and Explication’, Big Data & Society 9, no. 1: 20539517211069891.
Pedersen, M. A. (2023), ‘Editorial Introduction: Towards a Machinic Anthropology’, Big Data & Society 10, no. 1: 1–9.
Rodwell, E. (2022), ‘Artificial Speech Is Culture: Conversational UX Designers and the Work of Usable Conversational Voice Assistants’, Proceedings of the International Conference on Human-Computer Interaction, 137–148.
Saldaña, J. (1998), ‘Ethical Issues in an Ethnographic Performance Text: The “Dramatic Impact” of “Juicy Stuff”’, Research in Drama Education 3, no. 2: 181–196.
Seaver, N. (2018), ‘What Should an Anthropology of Algorithms Do?’, Cultural Anthropology 33, no. 3: 375–385.
Seaver, N. (2022), Computing Taste: Algorithms and the Makers of Music Recommendation (Chicago, IL: University of Chicago Press).
Smalley, D. (1997), ‘Spectromorphology: Explaining Sound-Shapes’, Organised Sound 2, no. 2: 107–126.
Suchman, L. A. (2006), Human-Machine Reconfigurations: Plans and Situated Actions (New York: Cambridge University Press).
Svenningsson, N. and M. Faraon (2019), ‘Artificial Intelligence in Conversational Agents: A Study of Factors Related to Perceived Humanness in Chatbots’, Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference, 151–161.
Turner, V. and E. L. B. Turner (1982), ‘Performing Ethnography’, The Drama Review: TDR 26, no. 2: 33–50.
Underberg, N. M. and E. Zorn (2013), Digital Ethnography: Anthropology, Narrative, and New Media (Austin: University of Texas Press).
Wilf, E. Y. (2013), ‘Sociable Robots, Jazz Music, and Divination: Contingency as a Cultural Resource for Negotiating Problems of Intentionality’, American Ethnologist 40, no. 4: 605–618.
Zheng, K., et al. (2015), ‘Computational Ethnography: Automated and Unobtrusive Means for Collecting Data In Situ for Human–Computer Interaction Evaluation Studies’, in Cognitive Informatics for Biomedicine: Human Computer Interaction in Healthcare, (ed.) V. L. Patel, T. G. Kannampallil, and D. R. Kaufman (Cham: Springer International Publishing), 111–140.