SYNOPSIS SERIATION: A COMPUTER MUSIC PIECE MADE WITH TIME–FREQUENCY SCATTERING
SYNOPSIS SERIATION: A COMPUTER MUSIC PIECE MADE WITH TIME–FREQUENCY SCATTERING AND INFORMATION GEOMETRY Vincent Lostanlen LS2N CNRS Nantes, France Florian Hecker Edinburgh College of Art The University of Edinburgh Edinburgh, UK RÉSUMÉ Cet article présente Synopsis Seriation (2021), une création musicale générée avec l’aide de l’ordinateur. L’idée cen- trale consiste à ré-organiser des fragments de pistes dans une œuvre multicanal pré-existante afin de produire un flux stéréo. Nous appelons “sériation” la recherche de la plus grande similarité de timbre entre fragments successifs dans chaque canal ainsi qu’entre canal gauche et canal droite. Or, puisque le nombre de permutations d’un ensemble est la factorielle de son cardinal, l’espace des séquences possibles est trop vaste pour être exploré directement par l’humain. Là contre, nous formalisons la sériation comme un problème d’optimisation NP-complet de type “voyageur de commerce” et présentons un algorithme évolutionniste qui en donne une solution approximée. Dans ce cadre, nous définissons la dissimilarité de timbre entre deux fragments à partir d’outils issus de l’analyse en ondelettes (diffusion temps-fréquence) ainsi que de la géométrie de l’informa- tion (divergence de Jensen–Shannon). Pour cette œuvre, nous avons exécuté l’algorithme de sériation sur un corpus de quatre œuvres de Florian Hecker, comprenant notam- ment Formulation (2015). La maison de disques Editions Mego, Vienne, a publié Synopsis Seriation en format CD, assorti d’un livret d’infographies sur la diffusion temps- fréquence conçu en partenariat avec le studio de design NORM, Zurich. 1. INTRODUCTION In mathematics, the seriation problem seeks to arrange ele- ments of a finite set U into a sequence u1 . . . uN in such a way that distances d(ui, uj) are small if and only if |i −j| is also small [12]. Seriation bears a resemblance with the traveling salesperson problem (TSP), which aims to min- imize the average distance d(ui, ui+1) between adjacent elements in the sequence. Drawing inspiration from these mathematical ideas, the piece Synopsis Seriation (2021, see Figure 1) consists of a sequence of musical parts whose ordering in time reflects similarity in timbre. The set U corresponds to an unstruc- tured collection of musical material: in our case, various pre-existing creations gathered under the name of Seriation Input. Seriation Input amounts to 283 minutes of audio in total, comprising hundreds of musical parts. Figure 1. Album cover of Synopsis Seriation, released in March 2021 by Editions Mego, Vienna. The CD im- print represents the time–frequency scattering transform of the piece, which serves as a feature for the segmen- tation and structuration of the piece. Graphical design by NORM, Zurich. Website: https://editionsmego.com/ release/EMEGO-256 The search space of all possible sequences is too vast to be explored manually. Indeed, the number of possible arrangements of U is equal to N! = N × (N −1) × . . . 2. This number is over one million for N > 10 and over one billion for N > 13. Coping with such a combinatorial explosion thus requires the help of the computer. In this article, we describe the algorithmic workflow which has led to the synthesis of Synopsis Seriation. On a conceptual level, the worfklow involves a virtual agent which “listens” to Synopsis Input, segments it into temporal parts, and ultimately rearranges those parts to maximize the auditory similarity between adjacent parts. One originality of our approach is that the virtual agent operates purely in the audio domain, without resorting to an external notation system such as MIDI or MusicXML. Furthermore, the agent does not assume that the input follows a traditional structure of repeated sections, such as verse-chorus or AABA forms. Lastly, the agent assigns parts of Seriation Input to either a stereophonic output by optimizing a joint objective of temporal consistency and binaural (left-right) consistency. Time–frequency scattering (Section 2) Information geometry (Section 3) Evolutionary computing (Section 4) Figure 2. Flowchart of the computational stages involved in the synthesis of Synopsis Seriation. Time–frequency scattering is the acoustic frontend, information geometry performs sequential changepoint detection, and evolution- ary computing solves a variant of the traveling salesperson problem (TSP). See paragraph below for details. Our proposed procedure of seriation is akin to a family of digital audio effects known as concatenative synthesis [24]. Generally speaking, concatenative synthesis operates by assembling short audio segments which are taken from a large corpus so as to achieve a certain similarity objec- tive. In this sense, our choice of audio descriptor (time– frequency scattering) and segmentation algorithm (general- ized likelihood ratios) could potentially apply to real-time concatenative synthesis frameworks, such as CataRT [23]. However, we note that CataRT produces sounds according to a local target specification that is expressed in terms of sound descriptors or via an example sound. On the contrary, Synopsis Seriation does not rely on a predefined target; instead, it formulates a global problem of combina- torial optimization (the TSP) and arranges all segments of Synopsis Input accordingly. This formulation guarantees a one-to-one mapping between audio material in Synopsis Input and Synopsis Seriation. The flowchart in Figure 2 summarizes the technical components of Synopsis Seriation. To begin with, Sec- tion 2 presents the acoustic frontend of the virtual listening agent: namely, time–frequency scattering. Time–frequency scattering is an operator whose architecture resembles spec- trotemporal receptive fields (STRF) in auditory neurophys- iology and convolutional neural networks (convnets) in deep learning. Section 3 presents the algorithm which seg- ments the Synopsis Input audio stream into parts. This algorithm is a numerical application of information geome- try, a field of research at the intersection between statistical modeling and differential geometry. Section 4 presents the algorithm which rearranges the segments parts and produces the Synopsis Seriation stereophonic piece. This algorithm is massively parallel and converges by evolution- ary optimization. Section 5 presents the CD booklet of Synopsis Seriation, containing computer-generated visual- izations of time–frequency scattering as well as creations of graphical design which summarize the functioning of the virtual listening agent. Lastly, Section 6 discusses the link between Synopsis Seriation and prior works on the spatiotemporal structuration of music, notably Iannis Xenakis’s Diatope. 2. TIME–FREQUENCY SCATTERING Time–frequency scattering comprises three stages. The first stage is a constant-Q transform (CQT) followed by log2 λ t Figure 3. Interference pattern between wavelets ψα(t) and ψβ(log2 λ) in the time–frequency domain (t, log2 λ) for different combinations of amplitude modulation rate α and frequency modulation scale β. Darker shades of red (resp. blue) indicate higher positive (resp. lower negative) values of the real part. pointwise complex modulus. The second stage is a convo- lutional operator in the time–frequency domain with wave- lets in time and log-frequency, again followed by pointwise complex modulus. The third stage is a local averaging of every scattering coefficient over the time dimension. 2.1. Constant-Q wavelet transform We build a filter bank of Morlet wavelets of center fre- quency λ > 0 and quality factor Q = 12 via the equation ψλ(t) = λ exp −λ2t2 2Q2 × (exp(2πiλt) −κ), (1) where the corrective term κ guarantees that each ψλ has one vanishing moment, i.e., a null average. We discretize the center frequency variable as λ = ξ2−j/Q where j is integer and ξ is a constant. In this way, there are exactly Q wavelets per octave in the filterbank. To make sure that the filter bank covers the Fourier domain unitarily, the center frequency of the first wavelet (j = 0) should lie at the midpoint between the center frequency of the second wavelet (j = 1) and the center frequency of the complex conjugate of the first wavelet, hence: ξ = 1 2 2−1/Qξ + (fs −ξ) = fs 3 −2−1/Q (2) where fs denotes the sampling frequency. The CD standard fs = 44.1 kHz yields ξ = 21 448 Hz. We set the number of wavelets to J = 96, hence a range of J/Q = 8 octaves below ξ. The minimum frequency is 2−8ξ = 84 Hz. Let the asterisk symbol (∗) denote the convolution prod- uct. Given a signal x(t) of finite energy, we define its CQT as the following time–frequency representation: U1x(t, λ) = |x ∗ψλ| (t) = Z R x(τ)ψλ(t −τ) dτ , (3) indexed by time t and wavelet center frequency λ. 2.2. Spectrotemporal receptive field For the second layer of the joint time–frequency scattering transform, we define two wavelet filterbanks: one over the time dimension and one over the log-frequency dimension. In both cases, we set the wavelet profile to Morlet (see Equation 1) and the quality factor to Q = 1. With a slight abuse of notation, we denote these wavelets by ψα(t) and ψβ(log λ) even though they do not have the same shape as the wavelets ψλ(t) of the first layer, whose quality factor is equal to Q = 12. Frequencies α, hereafter called amplitude modulation rates, are measured in Hertz (Hz) and discretized as 2−n 2 5ξ with integer n. Frequencies β, hereafter called frequency modulation scales, are measured in cycles per octave (c/o) and discretized as ±2−n 2 5Q−1 with integer n. The edge case β = 0 corresponds uploads/s3/ synopsis-seriation.pdf
Documents similaires










-
29
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Jan 11, 2021
- Catégorie Creative Arts / Ar...
- Langue French
- Taille du fichier 2.6776MB