20 Years of Perfect Pitch in Popular Music - How Elastic Audio Technology forced a Change in Aesthetics, Perception and Listener Expectation
An Empirical Analysis Based on Quantitative Research
carried out and written by
Have music creators retuned the listening ear?
For his book Perfecting Sound Forever Greg Milner interviewed legendary multi-platinum Grammy-award-winning audio engineer brothers Chris and Tom Lord-Alge on perfect pitch in current pop music:
Autotune has done as much as Protools itself to change the sound of pop music. Today It’s used on “pretty much every fuckin’ record out there,” Tom Lord-Alge says. (Greg Milner, Perfecting Sound Forever)
Reknown music and pop culture author Simon Reynolds estimates the likelihood of a total pitch correction infiltration similarly high in his in-depth essay How Auto-Tune Revolutionized the Sound of Popular Music:
Chances are that any vocal you hear on the radio today is a complex artifact that’s been subjected to an overlapping array of processes. (Simon Reynolds, How Auto-Tune Revolutionized the Sound of Popular Music)
Specialised vocal producer Chris O’Ryan’s insights on his use of pitch manipulation technology emphasize the above statements. O’Ryan works with star artists like Katy Perry, Mary J Blige, and Justin Bieber. From their talks Reynolds reports about O’Ryan’s workflow:
At the extreme, the recording of the singer might take three or four hours, and then he’ll spend two to four days working it over in Melodyne, on his own. (Simon Reynolds, How Auto-Tune Revolutionized the Sound of Popular Music)
The often uttered statements about a possibly almost 100% level of penetration with pitch corrected vocals in today’s pop music cannot be proved. Such quantitative examinations wouldn’t be of any help for audio professionals anyway. Greater value for music producers might be derived from drawing conclusions out of empirically gathered listener perceptions, finding out to which degree they already are used to vocals produced with perfect pitch. Their overall preferences plus adjectives they assign to their perception of differently intense treated vocals might actually give valuable indication, if and to which level of intensity pitch manipulation already is common sense. In her work Autotune, Labor, and the Pop-Music-Voice Catherine Provenzano reflects as follows about the power music producers these days exercise on their client’s most intimate art - the presentation of their voice:
The listening that autotuned versions represent entails, in a very real sense, a deafening, a silencing, of what was once heard and hearable, in favor of what it is imagined the listener wants to hear. (Catherine Provenzaro, The Relentless Pursuit of Tone)
But what actually is the exact kind of vocal pitching the listener wants to hear, after more than twenty years of the above presumed constant pitch correction penetration? The following study researches this and asks: Has software-based pitch correction on vocals in popular music changed aesthetics, perception and listener expectations so fundamentally that it now has become mandatory for producers? Provenzaro furthermore conjectures that music creators are actively driving this ongoing process:
And as the sound of autotuned work becomes naturalized, unheard though not unhearable, they work subtly toward a retuning of the listening ear and a reskilling of the listening body. (Catherine Provenzaro, The Relentless Pursuit of Tone)
In order to give producers usable enlightenment about the actual state of the “retuning of the listening ear”, this study set up a survey including professionally produced and carefully detailed real-world listening tests. Will gathered data confirm the citated subjective impressions?
Six dedicated music production samples manufactured for listening tests
Listening tests utilized two songs with three versions each. All six around-30-second-samples were meticulously manufactured to exactly model real-world pitch correction scenarios.
Both songs’ playbacks were final production masters. Both vocal tracks were the ones used on the released masters with all identical final processing on their respective three versions, except for the amount and type of pitch correction used. The first song is a current pop song, released in 2018 in survey participant's native german language. For the study its original Protools sessions were fully available because this study's author is the track’s producer. The second song is a 37 year old classic hit record with original vocal track and original playback track retrieved from Youtube in surprisingly good quality. The dry historic vocal was retro-treated with then standard 80ies Urei 1176 compression and Lexicon 480 reverb emulation to match the supposed original treatment. Playback and vocal track were meticulously mixed down and remastered to match the original record to the extent of unrecognizable differences. Playback- and vocal re-tuning were scientifically assisted by FFT analysis using iZotope Insight. The so crafted “new” original was meticulously fine-adjusted by ear, so that it finally partly cancelled out the original record while overlaying it with reversed phase.
All six newly manufactured song samples strictly served science for a very limited time period. Eventual copyright issues were prevented by utilizing temporary private streaming-only playlists (figure 01). None of the files was made available elsewhere.
The pop song used the compiled final vocal track with all processing of its production master, but without manipulated pitch. The singer is a professionally established singer fulfilling recording studio standards. He had been casted nationwide out of around forty singers for this record. This is the exact song that got him the album production deal.
The classic hit record used its final vocal track - certainly with no software-based pitch correction available at that point in time in 1983.
For the Melodyne versions both song’s vocal tracks were pitch-edited syllable by syllable by the study author, an extensively routined professional record producer with german charts No.1 artists on his clients list. Goal was to preserve both singers' natural attitude and vocal expression with minor tasteful vibrato treatment - but with perfect pitch throughout their performance.
The pop song’s Melodyne version is the exact final version, released to German radio in 2018.
Both song’s Autotune versions basically used the melodyned tracks, with Autotune’s automatic pitch correction algorithm applied additionally. Goal was to exactly match a modern Autotune approach used on many hit records - perfect pitch, artistically digitized for modern pop radio, but without all too obvious tuning artefacts. Reference tracks for processing intensity were as diverse hit records as Michael Bublé’s It’s a beautiful day, Linkin Park’s Numb and Keith Urban’s Horses. The settings used for the Autotune versions were derived from the setting mixing engineer Eric Valentine reveals on Youtube, while deconstructing his Protools ITB mix session for Horses. Eric Valentine (producer/ engineer, Keith Urban, Slash, Maroon 5) on Autotune in his mix for Horses:
“It’s just part of the sound and the style of this poppier stuff.” (Eric Valentine, Youtube)
Michael Bublé frankly talks about Autotune utilized on his global hit song It’s a beautiful day:
“I need to get on pop radio. And if my songs don't sound like all the other songs, I'm not getting on pop radio.” (Bryan Taylor for The Globe and Mail)
Survey with real world listening tests
Survey procedure and listening conditions
The survey was carried out online using Google Forms. Participants were invited via e-mail, various facebook groups and private facebook profiles. Invitation was restricted to german territory. During participation people were asked to listen to the audio examples via dedicated playlists on Hearthis, which is a quite similar service to Soundcloud. Only one participant reported technical issues while listening. Participants were encouraged to listen in random order and as often as they wanted. They were asked to use earphones or loudspeakers if possible, in order to better be a able to hear small differences. They were told that the three different versions of each song only differ in the vocal version that is used. In order to avoid biased decisions, it was not mentioned that the different vocal versions used different amounts of pitch correction. The respective three versions were put in different random running order on the two song-playlists. More than 2300 plays were counted on the 6 files during survey period. From this large number it may be concluded that participants were highly engaged.
164 people participated in the survey. Only some of them seemingly intentionally left out specific questions. Obviously they went straight to deciding their favorites, without answering introductory questions. These forms can well be used when looking at absolute evaluation numbers, but simply do not contribute to some of the relative analysis, when looking at listener groups. Only one obviously unengaged form was counted out. All other submitted forms were consistent. Absolute numbers vary for the above mentioned reason and therefore do not sum up to 163. They are added to graphics, in order to reflect respective numbers of participants.
Participants almost equally come from three background groups: music listeners, musicians and audio engineers/ music producers, called professionals in the course of the survey analysis. There are many people with musical backgrounds, because these groups were invited and interested over-proportionally. Age groups are distributed closely similar to the german population as a whole. As expected, streaming has become the dominant preferred music consumption medium. Personal genre taste widely differs from Adult Contemporary to Urban.
Wide survey reception in musical background types, age groups and consumption habits are supposed to make appropriate conditions for a survey not only speaking in absolute numbers, but also enlightening developments and relations by comparing relative numbers. For objectivity judgement reasons it is noticed that the study's author is a professional music producer.
Choice of songs
There were two specific song examples chosen for the survey. Two songs with three versions each was considered maximum, in order not to over-stress participants' attention span. One was an obvious choice: A modern pop (but not dance music) arrangement in the listener’s native german language. A medium electronic track was chosen, because more extensively electronic chart music meanwhile is so obviously dominated by autotuned voices, that people might have perceived that vocal-sound genre-specific. As a consequence choosing too electronic song examples might have distorted the numbers towards Autotune, especially when participants would listen to english spoken music as a foreign language, which might make a song more artificial sounding anyways. On the other hand, styles that more often happen off-chart, like Alternative Rock or Singer/Songwriter music, might have biased the survey overly in the other direction. “Alles so wie es sein muss” by german pop duo HIER und JETZT was chosen as an appropriate compromise, because during its short run on German radio it was observed as basically accepted by stations playing AC repertoire as well as more modern mainstream pop stations. This in-between-the-styles quality made it hard for the track to rotate high on either format, but makes it a good candidate for testing with a wide listener range.
The second track is a unique experiment. How would listeners react to the pitch corrected version of a singer’s voice, which they've been hearing in its natural form all through his more than twenty years global career span plus twenty more years highly rotating on AC radio? Besides huge familiarity there is a second reason that certainly was going to push the absolute evaluation numbers above average towards the original vocal: Michael Jackson’s extremely expressive vocal style absolutely never was meant to be autotuned in the first place. Thus it was clear from the start that the absolute numbers surely would not be representative. But relative numbers might be all the more meaningful, especially when the final test results would point in the same direction as pre-tests did.