On Transcription

Samira Bose examines the figure of the transcriber through the TV show Mindhunter, Lacan’s stenotypist, and her own experiences in magazines and archives.


Last year, I conducted a series of interviews with my archivist colleagues. We recorded them on my phone, perched between the three of us, as we leaned awkwardly forward each time to speak (much of the recording was punctuated by my insistent refrain: “LOUDER”). I had prepared what I considered to be “crisp” questions, but of course we meandered a lot—it was a thinking-out-loud kind of session—and in the auto-transcript a significant portion of the text was “uhhh” and “I just feel like…” and some gossip.

Already familiar with what my colleagues might be trying to say, and with a tight framework, I swiftly—perhaps even hastily—edited out our tentative speculations, which, in the final transcripts, ended up seeming more like decisive statements. After their publication, I sent them to our colleague, Paul, who raised a thoughtful provocation—he said he wished there was a video accompanying the transcript, as “so much more would be conveyed from hearing your tones and inflections as you all spoke about the many different aspects.”

This led me to think about the practice of transcribing—my own journey with it, how I’ve encountered it in things I’ve read or watched, as well as anecdotes from conversations with friends and colleagues. In this piece, I share a series of notes on transcribing and its process of listening, typing out, editing, what makes it into the final draft, and what’s left out. At a moment when even podcasts are generating automated scripts, I’m interested in centring and reflecting on the figure of the transcriber, how they mediate between a live event or audio and the typed transcript, and what they may learn in the process.


*   *   *


My own journey with transcribing began during a winter internship I did as an undergrad at a long-form investigative magazine in New Delhi. It’s typically expected in print journalism that a large part of an internship is transcribing interviews for the staff reporters, which for someone entitled and in their early-20s can seem terribly tedious. Back then, we didn’t use any auto-transcribing tools, and I had to listen to and type out every word and time stamp manually, nearly rote memorising everything by virtue of all the attention required.

There is an absorption of the materials that happens—almost at a granular level. Focus is imperative. Detailed attention to every word is essential—versus, for example, listening to a podcast in a distracted state. There’s also the stretching of time: a person is said to speak seven times faster than they write, and transcription decelerates the time of speaking to match that of writing.

I was surprised at the politically controversial transcripts I was entrusted with (OK, my internship contract specified I couldn’t legally disclose their specifics), and was made privy to the juiciest exposés long before they were circulated. I was secretly taking great pleasure in the reporters’ probes, and their impressively manipulative methods of extracting leads or confessions, but couldn’t stomach some of the more painful parts of the discussions, which one is assumed to be able to endure slowly, continuously, as part of the task.

When I read the carefully edited pieces later, it was interesting how selective the quotes were. Of course, it makes sense the information is made legible in a politically purposeful way, but somehow I found myself disappointed about something trivial I couldn’t explain—while I remembered the voice being quoted, I couldn’t really hear the tone, the pitch, the cadence of those who had been interviewed. Some of the interviews I transcribed had been with prison inmates, and some were furtive meetings with anonymous sources met at crowded metro station cafes.

The recordings were full of the atmosphere of the meetings—the clanging of metal bars, the shuffling of feet, the bustle of a canteen at midday, I swear I could hear the nervous tapping of fingers on the table-top—and as a transcriber I removed all that when typing out what was actually being said, with the words then made available as sources to serve as factual evidence. The transcript needed to be as verbatim as possible, the narrative or mood was not the point. And yet, it was the atmosphere that enabled those lengthy exchanges. I felt somehow as if between the audio recording—which as a medium is able to capture and hold not only voice but the surroundings—and the text transcript, there was a flattening. I will attempt to probe this tension in this piece, between what the transcript is supposed to do, and where the transcribers can take it.

I want to be clear: I admire the journalists for foregrounding injustice using varied sources, including interviews. At the time, I didn’t really know where to go with my feelings of disappointment, which are rightfully irrelevant to what’s at stake—but it made me think about the figure of the transcriber. Transcription may seem like it’s just a technical, manual requisite, but in fact transcribers are individuals—they think, feel, discern, make mistakes, and therefore to some extent participate in or effect what goes into the final document or archive.


*   *   *


What Paul said about my interviews with my colleagues reminded me of my time at the magazine, and it also led me to revisit a transcript in Asia Art Archive’s Collections that I’ve enjoyed and read multiple times. The document has elements that hint at how transcribers intervene, change, or evolve methodologies to make the spoken event legible, or to evoke the tone of the conversation.

The deployment of punctuation—or paying attention to how it is wielded—can portray an emotion or tone, or at least evoke parts of the conversation that are difficult to capture in words alone. In the archive of artist and pedagogue Gulammohammed Sheikh is a conversation between him and his friend, artist Bhupen Kakkar, titled “A Dialogue on Drawing.” I find this one transcript alone elucidates several examples. The first and possibly easiest is to put an emotion or action in parentheticals or brackets. In this case, it’s laughter:



Adding “(laughs)” entirely changes how the same sentence is read. While reading “(laughs)” does give a sense of Sheikh’s tone—and of the conversation more generally—it also feels amiss in that we do not know if Kakkar smiled in response, or if he wanted to go in a more serious direction. Are they sharing an inside joke? Is it an awkward or tentative laugh?

Another way in which the transcript has attempted to capture the informality of the conversation is by purposefully keeping the bits of Hindi that would flow as part of what’s being said. Not only is it kept, it’s also underlined:



The “ye achchha nahin hai” means “this is not good.” Elsewhere:



The term “riyaz” can translate to “practice” (as in classical music), but the “ke chalo” stands out in that it sort of means “let’s go” (though it’s used more around the lines of “okay fine”). What I want to highlight is the casualness, and that it can’t quite be said in English. The underlining, without adding a translation anywhere, is also of interest—it may be a kind of stance they’re taking, or relate to the audience they’re addressing (those who will understand these loanwords).

In transcriptions, inverted commas feel like a conundrum. Sometimes when reading out a passage or quoting in person, we either open or close the quote by stating it, or use a signal with our index and middle fingers to “show” inverted commas. However, when read in a transcript, it’s more complicated. It’s obvious when they’re directly quoting, but what is the tone in which it’s said in the conversation? An example:



How do “life” and “reality” sound when Sheikh is saying them out loud? Were the inverted commas added later to highlight the way in which the terms are used, or was it because of how they were said?

Perhaps my favourite punctuation mark in transcripts is the ellipsis, which Kakkar and Sheikh use abundantly here. It can be used to illustrate so much. As good friends, the two appear to be finishing each other’s sentences:



There is the cut off, or the helpful intervention when one is floating or lost in thought:



There is also a kind of post-processing where the ellipsis is written on and transformed by hand. The ellipsis becomes like a dotted line to be written on:



The purposeful harnessing of subtle interventions as the one highlighted above can enhance the way in which readers receive the text, and shift the transcript from being a mere record to an interview, an exchange.


*   *   *


As I had mentioned earlier, during my internship I didn’t have access to auto-transcript generating tools, so using them now has attuned me to a shift from the deep listening bit of transcription, more towards the editorial—a messy, skeletal auto-transcript is generated for us to re-organise, punctuate, and render more legible. However, it comes with some caveats. A source of tear-jerking laughter in our office has been the “translation” of the names of artists from our India collections, which have roots in varied South Asian languages. Mrinalini Mukherjee is understood as “McCurdy person,” K. G. Subramanyan is in fact perceived as “Katie,” and Jyoti Bhatt gets a Western persona in the guise of “Jody Bart.”

The auto-transcript heightens awareness of what one tends to leave out while transcribing manually. This includes starting sentences with “So,” saying “you know” every few lines, and ending everything with “right?” These do not stand out as much in person in spoken conversations, but are somewhat frustrating in transcripts as they contribute significantly to the word count and length, and also interfere in the flow of reading. Their presence in auto-transcripts is prominent, and I usually let them slip in every once in a while for “authenticity” (even as it feels somehow manipulative).



These fillers characterise our everyday speech and interactions, and it is precisely their quotidian but ubiquitous presence that is a conundrum for me when I edit. Within my art-world-kind-of-writing spaces, I’m often given leverage and even encouraged to make my published interviews appear more conversational. What is the place of such “authenticity” in institutional uses of transcription?

My Colombo-based friend, Pramodha Weerasekera, shared about a project she was involved with as a law student, where she worked on transcriptions of conversations involving speakers of Sri Lankan English. This was in the context of a partnership to build a linguistics corpus of Sri Lankan English between her university in Colombo and one in a small town near Frankfurt, Germany.

One of her tasks was to check transcriptions edited by Linguistics students based in Germany, through what is called a “native speaker check.” In a voice note she stated, “In India you all say, ‘no?’ as a confirmation at the end of a statement, but in Sri Lanka Sinhala speakers say ‘ne?’ or ‘men,’ which really doesn’t make sense to anyone outside of context…it’s just an additional thing.”

These native speaker checks, by her and others, were for these kinds of “additional words, the accent, and the ‘broken’ English” that’s very difficult for the Germans to gauge through linguistic barriers—and the task was to clarify them with linguistic markers through transcription software. The point was for the transcripts to be made legible in a Standard English linguistics corpus format, highlighting elements of speech that may reveal contextual twists of phrase or terms, but without any further explanation.

Pramodha now hates transcribing with a vengeance.


*   *   *


The title sequence for Mindhunter is a careful preparation of a reel-to-reel recording device, which concludes with the mic pointing at the audience. Set to an uncanny soundtrack, the otherwise prosaic devising is disrupted by violent images of bloodied limbs for the swiftest millisecond before returning to the recorder. It serves as a warning of the content of the show—serial killers—but I’m most interested in how a recording device focused on conversation invokes the event of the violence, and is haunted by it.



The series is set in late-1970s and early-1980s USA, and is a semi-fictional account of FBI agents that introduce the term “serial” to distinctive kinds of murders taking place in those decades. They work on developing criminal profiling techniques by interviewing serial killers in jail, as a way to recognise their motives, key characteristics, and prevent them in the future. The show is a whole lot of conversations, which are recorded and transcribed, and serve as research materials—the viewers learn about the violence mostly through dialogue, and somehow their evocation in speech makes it even more chilling and eerie.

There was one incident of transcript interference that particularly struck me. In the ninth episode of the first season, Born to Raise Hell, the agents interview Richard Speck, an infamous serial killer who brutally murdered eight student nurses in Chicago in 1966. They start the recording and begin the interview by sticking to their formal script, but one of the agents—Holden, who is considerably emboldened by this episode—realises they aren’t getting through to him. He then makes a misogynistic and offensive statement, almost as if to build rapport with Speck (rather than provoke). It doesn’t work, and Speck spirals and ends up throwing the baby bird he is nurturing into a fan. After the interview, Holden, like my colleague Paul, says this:


Image: Screenshot from <i>Mindhunter,</i> Season 2, Episode 9.
Image: Screenshot from Mindhunter, Season 2, Episode 9.
Image: Screenshot from <i>Mindhunter,</i> Season 2, Episode 9.
Image: Screenshot from Mindhunter, Season 2, Episode 9.


Later on in their office, as their tape is being transcribed by their relatively meek colleague, Holden goes and pressures him to redact his statement so as to not get into trouble. Time passes, but Speck puts a case against Holden stating that he was mentally harassing him, which leads to an investigation by the FBI Office of Professional Responsibility. When the team goes up, they only take the redacted transcript with them, and eventually hope the incident passes. The meek transcriber, however, later feels morally and ethically obligated, and sends the tape in anonymously, leading to serious consequences in the later episodes.

I don’t care at all for the FBI’s questionable moral compass, but this episode raised two questions for me: What forces intervene in the crossing over of the event and its record into the transcript? Why and when do transcribers feel they have to intervene?


Image: Screenshot from Mindhunter, Season 2, Episode 9.


*   *   *


In early June, Suvani Suri visited our Asia Art Archive in India office for a chat. Suri is an artist who works with and around sound. We spoke about this and that, but especially about her recently published essay, “The Search for Hassaina’s Song and Other Phonophanies,” which is her sojourn in the digitised recordings of the Linguistic Survey of India, some of which are available in the Digital South Asia Library at the University of Chicago. The LSI was spearheaded by an ambitious Irish linguist, George A Grierson, who was an administrator on behalf of colonial British India. A part of Suri’s essay ruminates on the “excesses” that make their way into the recordings. She writes,

I returned to the cacophony of my questions: whose voices were brought into this massive archive? What about the uninvited excesses that may have trickled in, puncturing their way through and slinking into the material? What were the vocalisations being mined for? Can the process of extracting and canning languages be merely one of preservation? As they stay locked in material archives for decades, do they change? Ripen? Shimmer? Mature? Mutate?

Suvani’s provocations present a compelling conundrum for the transcriber—what can be described as excess? I’ve already referred to atmosphere, tones, cadences, and the manner in which these can be hinted at within a transcript; however, if these cannot be brought in, where do the transcribers go with what they’ve heard?

While we were chatting, I mentioned one of my recent freelance transcribing projects for a senior art critic who had been interviewed for a podcast. I expressed how clear and lucid she was, and how much easier it was to transcribe her speech than myself and peers who tend to add “uh” and “like” and “um” every few words. Suvani revealed that, in fact, she had been the sound editor to the podcast and struggled with how much mumbling there was, which she was told to edit out, and that she had used a transcript to make sense of the editing—we later figured out that she had been sent my transcript.

This made me think further about the process of interpreting a mumble in a transcript, which can be done through an awareness of context, by making sense of the rest of the discussion, or can be deciphered by Googling. Both the sound-editor and transcriber are trying to make the event of the recording legible, which requires them to listen again and again. Here’s where the more speculative aspects of the transcriber’s task come in, where they have to follow a hunch, where they’re taken outside the closed space between the recording and the text.

However, there are a number of notations transcribers can use if unable to make sense of the recording, including [garbled], [inaudible], [unintelligible], or [unclear]. Unless those present during the event of recording intervene in the editing of the transcript, the mumble is marked, and the reader too can only speculate what had been said.


*   *   *


This piece has thus far been an attempt to think about the transcriber as responsive, and the act of transcription as a productive one—but what are disruptive possibilities in the process of transcribing? Artist Aasma Tulika, during her residency at Ashkal Alwan in Beirut (2019-20), conducted two “exercises of listening with automated transcription.” In the first, she asked a number of residents to recite a document they had to sign as part of Lebanese foreign affairs protocol. They were not allowed any belongings when they entered, so they left only with their memories of what they had read before signing. Their testimonies were recorded using the Google live transcription app, and while their statements were fairly similar, the app rendered them with minor errors and differentiation.

I suppose I have been speaking about the transcriber-as-person who marks the transcript through human difference. Here, it is the errors in the transcript as interpreted by the machine that draws on its limited algorithm. The app’s job is to simply render the recitation into text, but the absurdist and frankly hilarious statements that come out of the exercise are striking—the person-transcriber’s task, even as editor of the automated transcript, is to make sense.

In the second exercise, a group surrounds a laptop playing a YouTube video of a vitriolic anchor on an aggressive Indian news channel. This is also being transcribed by the Google live transcription app. The group claps, sings, knocks, and drowns out the anchor’s hate speech, and the app catches their varied sound-making. When I encountered this exercise, I was struck by how it isn’t just about listening carefully, but a “refusal to listen.”


*   *   *


In Transcribing Lacan’s Lectures: Memoirs of a Disgruntled Keybasher Turned Psychoanalyst, Maria Pierrakos talks about her time as psychoanalyst Jacques Lacan’s stenotypist for his annual Seminars in Paris from 1967 to 1979, where she elaborates on her firm view of him being an “imposter.” While transcribers use recordings, stenotypists transcribe live events; and in Maria’s moment, they were primarily women with typewriters. She writes, “The stenotypist must be a faithful, silent instrument; she must make herself as transparent as possible. Her existence must only be noticed in the quality of her reports.” And yet, as she sat in the front row and easily transcribed Lacan’s slow speaking voice, she allowed herself to “be transported by my feelings and associations” which eventually formed her critiques for Lacan’s theoretical frameworks that shape this publication and also her own work.

Unlike the audience that swooned with awe for the “Master’s” words, she noticed the purposeful seduction and control that she claimed obscured her from the more interesting aspects of the research. She describes at length, and with a lot of wit, the atmosphere of the room, the manner in which audiences would respond and react to—even imbibe and imitate—Lacan’s statements and gestures. She attributes this kind of religiosity as typical of French intelligentsia at that time.

My favourite bit in the piece is a photograph which apparently appeared frequently in the press at the time, which Pierrakos describes as “showing a handsome Lacan with a fine head of white hair, his hand outstretched illustrating some truth or other, and next to him a frog-like, sulky-looking me.”


Image: Lacan leading a seminar, image attribution unknown.
Image: Lacan leading a seminar, image attribution unknown.


Pierrakos writes about what’s going on with the sulky figure off-centre (herself). Beyond a source of income, she hoped to “vicariously absorb psychoanalytic theory through osmosis” through her stenotyping, as she was also in training and under analysis. Apart from the intense listening of the seminars in person, later on she would also transcribe his recorded voice over thousands of pages as it boomed in her apartment—and from the beginning to the very end, she felt deceived. I cannot offer much in terms of the psychoanalytic frameworks she speaks of, but I’m interested in how her knowledge of Lacan’s work, and subsequent critique, emerges from her immersion as a stenotypist, stemming from the stretched time spent transcribing his voice.

This may seem to be mentioned in passing, but it’s significant when thinking about the gendered connotations of their association. Pierrakos writes that she “awards the gold medal for boorishness” to Jacques Lacan, who, in the twelve years she worked with him, never addressed her directly or by name, always patronisingly as “keybasher.” The keybasher was always paying attention.

Speaking of the knowledge of the transcriber, Pierrakos wrote that stenotypists such as herself had enough data to do an exhaustive sociological examination of group structures at France at any point in time, but that her interest was in the “interplay” between the person and the subject being discussed, the kind of persona they put on, their performance, the way they dressed, their “tone of voice, whether warm, dry or flat.”

She writes, “This is why verbatim records make such dull reading: apart from those dealing with purely technical subjects, when the spoken word is frozen into written speech, with no literary effects, this only goes to show how discourse is about so much more than words.”

And yet, Dr Wendy Carr, lead researcher and criminal psychologist on the Mindhunter team, conducts investigations and uses transcripts for an entirely different end, at one point saying:


Image: Screenshot from <i>Mindhunter,</i> Season 1, Episode 2.
Image: Screenshot from Mindhunter, Season 1, Episode 2.
Image: Screenshot from <i>Mindhunter,</i> Season 1, Episode 2.
Image: Screenshot from Mindhunter, Season 1, Episode 2.


The tension remains.


*   *   *


Etymologically, “transcribe” is cut into two terms in Latin: trans “across, beyond; over” and scribere “to write” (from PIE root skribh- “to cut”). In many ways, it’s inherent to its practice to cut out. It is not so much that I lament the loss in the process, as much as I’m interested in what the transcriber can do or “write” within what appears to be a straightforward exercise—what is made to cross over between what is heard and what is typed out?



Samira Bose is Asia Art Archive in India’s Curator.

The images of transcript excerpts were drawn from "A Dialogue on Drawing" between Gulammohammed Sheikh and Bhupen Kakkar, from the catalogue "Drawing '94'," published to accompany an exhibition by Espace Gallery at All India Fine Arts & Crafts Society (AIFACS) Gallery, New Delhi.



Samira BOSE

Tue, 4 Jun 2024

Relevant content

The Scream and the Whisper
LIKE A FEVER | Conversations

The Scream and the Whisper

Yaniya Lee speaks with two friends about language and power, Blackness and signification, and strategising ways of being

writing dossiers
Crossing Borders: Art and Artists’ Writings Across Languages
LIKE A FEVER | Conversations

Crossing Borders: Art and Artists’ Writings Across Languages

On the challenges of researching art writing across multiple languages and translations in South Asia

Extensions and Expansions: In Conversation with Reliable Copy
LIKE A FEVER | Conversations

Extensions and Expansions: In Conversation with Reliable Copy

Reliable Copy discusses the circulations, influences, and audiences in Bangalore’s publishing scene and beyond

so close, yet so off

so close, yet so off

Karen Cheung interrogates the desire to name, and giving yourself permission to write something that feels real