Structural Analysis of Large Amounts of Music Information (SALAMI)
The final audio-oriented project focuses on recorded music. The stated goal of Structural Analysis of Large Amounts of Musical Information (SALAMI) is “to develop a state-of-the-art infrastructure for conducting research in music structural analysis.” In its emphasis on tool development and research methodology, it shares a great deal with the other Digging Into Data initiatives such as Data Mining with Criminal Intent and Mining a Year of Speech. Also like the Mining project, SALAMI is solidly based in an interdisciplinary research tradition: music information retrieval (MIR). Although the history of MIR is not so long as computational linguistics, its leading researchers have been developing tools and conventions for working with digital music data over the past decade and are poised to “scale up” to work with “large amounts of music information.”
While from a strictly computational point of view digitized speech recordings may share a lot with music recordings, the kinds of questions one might ask about music data are naturally very different from the kinds of questions one might ask about recorded speech. Musicologists are interested in audio signals in all their complexity, so MIR specialists interested in developing computer tools must go beyond the simple alignment of notation with a recording to identifying the formal and structural characteristics of pieces, collections, and genres of music.
Structural analysis, through musical annotation, is a skill widely taught to students learning music theory. The sequencing of elements such as verse, chorus, and bridge can form recognizable patterns characteristic of different musical genres, and the variations in these patterns are part of what makes an individual piece unique. By comparing annotations of different compositions it is possible to explore theories about how different composers stretch the limits of the genres in which they work, or how one composer’s work may influence another. It would be impossible, however, to perform structural analysis of very large music corpora manually: for one, it would be too time-consuming and expensive, and for another, since structural analysis is an interpretive exercise rather than an empirically exact science, it is impossible to guarantee that one person’s analysis of a piece of music would be the same as another’s.
The SALAMI team sought to test the accuracy of a range of computer algorithms designed to detect musical structures, measure the performance of these algorithms against human annotators, make adjustments to the algorithms as necessary, and, finally, produce a large web-accessible corpus of analyses of several hundred thousand recordings. With this data, musicologists could explore structural similarities among diverse pieces as well as examine in greater detail the empirically measurable characteristics of music that relate to how we understand a piece’s individual parts. The team was divided into three groups, each of which worked independently on a set of clearly defined tasks. In this way, the distributed team worked together in a way similar to the DMCI partners.
Partners at McGill University worked with a team of students to produce a “ground truth” set of musical annotations against which to measure the importance of the computer algorithms. In order to ensure that the “ground truth” set was representative of the range of genres included in the entire corpus assembled for the project, researchers focused on using recordings for which they had reasonably detailed and reliable metadata. Partners at Oxford and the University of Southampton devised a theoretical model that provides a way to express the relationship between machine analysis of recordings and the musicological concepts that govern how scholars understand their structures. This includes a method for representing musical annotations graphically into sections and hierarchies and a way of describing these annotations using Linked Open Data. The team at the University of Illinois built the computer infrastructure for collecting and analyzing hundreds of thousands of musical works, including popular and classical music, jazz, folk music, world music, and a variety of live recordings.
When the results of five different structural analysis algorithms were compared to the human annotations of the “ground truth” recordings, the segmentations identified by some algorithms aligned more closely to the human annotations than others did. The highest level of similarity between analyses, however, was achieved when more than one person analyzed a single piece. In the judgment of the SALAMI team, the results stressed the need for methodologies that incorporated comparisons of multiple analyses of recordings rather than seeking to perfect one definitive algorithm. To this end, they have designed an interactive interface within which scholars can examine the results of multiple structural analysis algorithms at the same time [See below]. Within this interface, a user may examine and play back individual segments of a piece and decide for him or herself how accurate or inaccurate each method of analysis is for that piece. The different algorithms as well as human-generated annotations are aligned across a timeline and color-coded for easy comparison. By facilitating interactions with complex data sets through visual means, the SALAMI visualizer provides for musicologists what Voyant Tools offers to historians working with the Old Bailey database using the DMCI methodology. The multi-layered and color-coded display achieves something like what the collaborators on Digging Into the Enlightenment hope ultimately to achieve: a way of indicating a range of possible options, or a level of uncertainty about the information depicted in a visualization.
Comparison of the segmentation analysis of one musical piece performed by various algorithms with ground-truth data produced for the SALAMI project
The SALAMI team’s final goal is to produce multi-layered analyses of approximately 200,000 pieces of music and make them accessible to users through their interactive visualizer. This “scaling up” of the project posed a couple of problems for the team. Performing analyses with the algorithms tested for this project consumed roughly five to six minutes of compute time per piece. To generate analyses for all of the pieces in the corpus, a typical computer would take five to six years, a highly impractical prospect. For this reason, completing the project requires the team to prepare to run calculations on a supercomputer. The University of Illinois partners have reconfigured each of the algorithms they have tested so that they can work on a standard supercomputer, and they have compressed and migrated the corpus to a mass storage device that can be searched, retrieved from, and decompressed by the supercomputer as needed during calculations. Once they have identified and scheduled time on a supercomputer cluster, they plan to publish data for the entire corpus. While the team has already presented at numerous conferences about their work, arguably the most important outcome of the project will be its published data.
Another aspect of this project that it is important to emphasize is the labor necessary to prepare the “ground truth” set, and how funding this labor constituted a significant amount of the project’s resources. As in the case of copious tests of the forced alignment algorithms required for the Mining project and for the tedious hand-correction of metadata necessary for the Dynamic Variorum Editions initiative, computationally-intensive research, especially if it employs new methodologies, often requires the validation of results through other means. While perhaps disappointing to the creators of the structural analysis algorithms used for SALAMI, the finding that these algorithms did not perform as well as human annotators on the test corpus was highly significant, raising new questions about how the human brain processes and understands music. This will give MIR specialists much food for thought in the coming years.
- J. Stephen Downie (University of Illinois Urbana Champaign, US) is a music information retrieval and computational musicology specialist based at the Graduate School of Library and Information Science at UIUC who led the NSF-funded portion of the project, which, once complete, will have generated hundreds of thousands of structural analysis files for musical pieces.
- David De Roure (formerly University of Southampton, now University of Oxford, UK) is an computer scientist with expertise in distributed information systems, Web 2.0, and Semantic Web technologies and served as the Principal Investigator of the JISC-funded portion of the project, which included the development of a standardized ontology for musical structures based upon the Resource Description Framework (RDF).
- Ichiro Fujinaga (McGill University, Canada), Associate Professor of Music Technology, is the Principal Investigator of the SSHRC-funded portion of the project who directed the preparation of the open source “ground truth” data against which the team measured the performance of the structural analysis algorithms.
Advisors, data contributors, and other contributors
- Mert Bay (University of Illinois Urbana Champaign, US)
- John Ashley Burgoyne (McGill University, Canada)
- Alan B. Craig (University of Illinois Urbana Champaign, US)
- Tim Crawford (Goldsmiths University of London, UK)
- Andreas Ehmann (University of Illinois Urbana Champaign, US)
- Benjamin Fields (Goldsmiths University of London, UK)
- Linda Frueh (Internet Archive, US: data contributor)
- Eric J. Isaacson (Indiana University, US)
- Lisa Kahlden (Anthology of Recorded Music, Database of Recorded American Music: data contributor)
- Kevin R. Page (Oxford e-Research Centre, University of Oxford, UK)
- Yves Raimond (British Broadcasting Corporation, UK)
- Jordan B. L. Smith (formerly McGill University, Canada, now Queen Mary, University of London, UK)
- Michael Welge (NCSA, University of Illinois Urbana Champaign, US)
Christa Emerson, David Adamcyk, Elizabeth Llewellyn, Meghan Goodchild, Michel Vallières, Mikaela Miller, Parker Bert, Rona Nadler, and Rémy Bélanger de Beauport
De Roure, D., K. R. Page, B. Fields, T. Crawford, J. S. Downie, and I. Fujinaga. 2011. An e-Research approach to Web-scale music analysis. Philosophical Transactions of Royal Society A. 369: 3300–17.
Smith, J. B. L., J. A. Burgoyne, I. Fujinaga, D. De Roure, and J. S. Downie. 2011. Design and creation of a large-scale database of structural annotations. Proceedings of the International Society for Music Information Retrieval Conference. Miami, FL. 555–60.
Ehmann, A., M. Bay, J. S. Downie, I. Fujinaga, and D. De Roure. 2011. Exploiting music structures for digital libraries. Proceedings of the Joint Conference on Digital Libraries. Ottawa, ON. 479–80.
Other writings and media
De Roure, D., J. S. Downie, and I. Fujinaga. 2010. SALAMI: Structural analysis of large amounts of music information. Proceedings of the UK e-Science All Hands Meeting 2010, Cardiff, Wales.
Lectures and talks
Osaka Symposium on Digital Humanities. Osaka, Japan. 2011. J. S. Downie, D. De Roure and I. Fujinaga. Large-scale music audio analyses using high performance computing technologies: Creating new tools, posing new questions.
Joint Conference on Digital Libraries, Ottawa, ON. 14 June 2011. I. Fujinaga. The structural analysis of large amounts of music (SALAMI) project.[released 2012/02/16]: http://salami.music.mcgill.ca/index.php/2012/02/salami-release-0-1/