Recent & Upcoming Talks

2023

Quelques éléments sur l'expertise vocale en criminalistique
Quelques éléments sur l'expertise vocale en criminalistique

La voix humaine ne peut pas être assimilée à des indices véritablement biométriques comme les traces papillaires ou l’ADN. Cependant, notre expérience quotidienne tend à nous faire penser le contraire puisque nous nous appuyons spontanément sur la voix d’un individu pour confirmer son identité. La tension entre science et croyance populaire est donc très marquée. Cette tension nous est régulièrement retransmise par les média qui couvrent les rares procès où la voix des personnes mises en causes occupe une place centrale. Après plusieurs décennies émaillées de controverses, l’expertise vocale en criminalistique se construit aujourd’hui dans une dynamique plus apaisée. Elle bénéficie entre autres de la mise en place d’un dialogue fécond entre universitaires et laboratoires des forces de l’ordre encouragé par l’obtention de projets financés communs. Après un rappel du contexte entourant l’application des méthodes de comparaison de voix en criminalistique, je présenterai les trois volets de mon implication récente dans des collaborations avec le Service National de Police Scientifique : 1) la caractérisation phonétique acoustique de la variation inter-individuelle au moyen de réseaux de neurones artificiels, 2) le décryptage des clichés sur l’expertise vocale véhiculés par les séries policières et 3) l’étude des stéréotypes liés à l’accent d’un individu par le biais de potentiels évoqués en EEG.

2022

Voix et preuve scientifique dans les séries policières : le projet VoCSI-Telly
Voix et preuve scientifique dans les séries policières : le projet VoCSI-Telly

2021

Du groupe à l'individu, du corpus à l'expérimentation, du spectrogramme au deep learning pour la phonétique
Du groupe à l'individu, du corpus à l'expérimentation, du spectrogramme au deep learning pour la phonétique

Since scientists’ individual epistemological preferences infallibly shape the output of their research, this thesis starts with a presentation of the author’s position with respect to a number of methodological issues pertaining to the field of contemporary phonetics. Such concepts as corpora, experimental techniques, quantitative methods, and the role of technology are discussed with the aim of making the author’s scientific values and biases more explicit. The following chapters offer a selection of research works the author has carried out since his PhD in 2008. They show an evolution from corpus-based acoustic phonetics to more experimental protocols involving a great diversity of instruments and data types. From the automatic classification and acoustic-articulatory description of British Isles accents to the development of the gradient phonemicity hypothesis; from the study of speech rhythm to psycholinguistic experiments with French learners of English, the thesis covers the main findings and highlights how this wide array of interests and methods has served two consistent goals: an agnostic approach to new puzzles, and the possibility to efficiently help students develop their own scientific identity. The final part of the thesis addresses the forthcoming paradigm shift that deep learning will bring about in many academic fields with illustrations from the author’s recent work.

2020

De la comparaison de voix en criminalistique à la N400 sociophonétique
De la comparaison de voix en criminalistique à la N400 sociophonétique

En criminalistique, la comparaison de voix consiste à déterminer si deux enregistrements vocaux émanent d’un même individu ou non. Notre objectif dans le cadre du projet ANR VoxCrim (en cours) vise à établir une norme certifiée définissant les conditions nécessaires à l’application d’un protocole étalonné dans ce domaine. Au-delà des éléments très techniques de modélisation acoustique (que je ne développerai pas ici), on relève au moins deux aspects qui influencent la perception d’une expertise vocale : 1) la représentation des technologies de la parole dans la conscience collective – qu’on sait très dépendante des séries télévisées – et 2) les stéréotypes liés à la voix d’un individu. C’est ce dernier point qui constitue le cœur de ma présentation. Nous avons récemment réalisé une étude en électroencéphalographie avec la technique des potentiels évoqués dans le but de mettre en évidence les attentes générées dans l’esprit des auditeurs par un accent socialement marqué. Lorsque le contenu sémantique d’un énoncé n’est pas congruent avec ces attentes stéréotypées, un effet N400 a pu être mis en évidence. Il apparaît donc que l’activation de stéréotypes liés aux accents s’effectue dans un temps extrêmement court et semble relever des mêmes mécanismes que ceux qui régissent le traitement lexico-sémantique.

2019

Britishness in British heavy metal: a sociophonetic perspective
Britishness in British heavy metal: a sociophonetic perspective

Somebody’s pronunciation says a lot about who they are: geographical origins, education level, or socioeconomic background. The way we speak contains a great deal of information about our identity and how we present ourselves to the rest of the world. This observation does not only apply to speaking; it also holds true for singing. In 1983, sociolinguist Peter Trudgill published a study entitled Acts of Conflicting Identity: the sociolinguistics of British pop-song pronunciation, in which he identified a tendency for British pop artists like The Beatles to adopt an American-like pronunciation when singing. He concluded that this Americanization in songs by British artists was a consequence of the American domination of the music industry at that time, because it seemed appropriate for artists to sound American when performing a predominantly American activity. The aim of this study is to see whether this principle applies to Heavy Metal in its original form as a distinctly British genre. Indeed, Heavy Metal was born out of the daily struggles of Northern England’s working class youth facing de-industrialization in the late 1970s and early 1980s, and can thus be considered emblematic of a particular Northern British and more generally British identity. To study whether British Heavy Metal has followed the popular pattern of Americanization, or if it has maintained its British roots in terms of pronunciation, we analyzed sung and spoken productions by two British Heavy Metal bands: Iron Maiden and Def Leppard. We focused on two pronunciation features that distinguish Northern British English from Southern British English and in turn, Southern British English from American English. The first one relates to the pronunciation of the two types of vowels found in words such as foot (/ʊ/) and strut (/ʌ/), and is technically referred to as the FOOT-STRUT Split. The second pronunciation feature we looked at concerns the consonant /t/ when found in the middle of a word, such as in water, atom or better. In American English, this /t/ is pronounced in such a way that it sounds more like a /d/. This process is known as T Voicing and is one of the main distinguishing characteristics between British and American English. The following questions were thus addressed: 1. Based on analyses of the FOOT-STRUT Split and T Voicing, do Iron Maiden and Def Leppard show a tendency to Americanize their pronunciation in their sung productions? 2. If so, what are the factors influencing this preference? Is it only a question of imitation as Trudgill suggests or do other factors come into play?

Phonetics and Artificial Intelligence: ready for the paradigm shift?
Phonetics and Artificial Intelligence: ready for the paradigm shift?

Modern phonetics has relied, to a large extent, on researchers’ ability to extract patterns from visual representations of speech. In this respect, if linguists were medical doctors, phoneticians would be radiologists. Speaking of radiologists, recent progress in artificial intelligence has made it possible for certain deep learning algorithms to outperform human pathologists at detecting abnormalities in medical images (Litjens et al., 2017). If the analogy holds, it is fair to ask whether artificial intelligence can beat phoneticians at their own game or, at least, constitute a significant addition to their toolbox. My contention is that the advent of deep learning opens up a whole new research programme for the humanities in general, and phonetics in particular. While deep neural networks (DNNs) have been duly praised for bringing about a major breakthrough in applied fields like automatic speech (Hinton et al., 2012) or image (Simonyan & Zisserman, 2015) recognition, we are only just starting to realize how fundamental research in our field can benefit from them (Ferragne et al., 2019; Pellegrini & Mouysset, 2016). There are at least three reasons why DNNs will trigger a paradigm shift in phonetics. Firstly, unlike other quantitative techniques, DNNs can extract relevant representations from the speech signal without the need for a human expert to provide the system with hand-picked features (Goodfellow et al., 2016, for a comprehensive account of DNN properties). As a result, typical workflows now boast improved reproducibility; the possibility is raised that previously unnoticed parameters can be brought to light; and manual segmentation – a major bottleneck in phonetic analysis – is no longer needed in some cases. Secondly, deep learning will contribute to bringing the old parsimony-driven paradigm to a close. There is a whole record of experimental research that demonstrates that mental phonetic representations are detailed and multidimensional (Pierrehumbert, 2016). So, now that the high-dimensionality taboo has been broken, and that increasingly powerful and cheap computing resources have become available, the time is just right for the emergence of DNNs in phonetics, with their rich inputs and outputs. Thirdly, the current focus on explicability in the deep learning community has led to effective methods to visualize what DNNs learn (Chattopadhay et al., 2018). My claim here is that scientific findings based on visuals are key to bridging the divide between the hard sciences and humanities. And “visible speech”, the powerful synaesthetic cornerstone of contemporary phonetics, is more than ever legitimatized by DNN-based methods. Moreover, such techniques undoubtedly represent the logical alternative to the modern unreasonable urge to (over-) use inferential statistics and its misleading probability values. I will illustrate these claims with examples taken from on-going work in this nascent research field. I will more specifically focus on how convolutional neural networks used in image recognition and computer vision can be adapted to the study of phonetics. I will discuss the advantages and shortcomings of this novel approach, and I hope to show that while deep learning lies at the intersection of experimental and corpus phonetics, it offers the best of both worlds.