• To save this word, you'll need to log in. Log In

speech melody

Definition of speech melody

Love words.

You must — there are over 200,000 words in our free online dictionary, but you are looking for one that’s only in the Merriam-Webster Unabridged Dictionary.

Start your free trial today and get unlimited access to America's largest dictionary, with:

  • More than 250,000 words that aren't in our free dictionary
  • Expanded definitions, etymologies, and usage notes
  • Advanced search features

Dictionary Entries Near speech melody

speechmaking

speech organ

Cite this Entry

“Speech melody.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/speech%20melody. Accessed 25 Jun. 2024.

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day, remuneration.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

Plural and possessive names: a guide, more commonly misspelled words, your vs. you're: how to use them correctly, every letter is silent, sometimes: a-z list of examples, more commonly mispronounced words, popular in wordplay, 8 words with fascinating histories, 8 words for lesser-known musical instruments, birds say the darndest things, 10 words from taylor swift songs (merriam's version), 10 scrabble words without any vowels, games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

Last updated 20/06/24: Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

speech melody

  • > The Cambridge Guide to Teaching English to Speakers of Other Languages
  • > Pronunciation

speech melody

Book contents

  • Frontmatter
  • List of figures
  • List of abbreviations
  • Acknowledgements
  • List of contributors

Introduction

  • Chapter 1 Listening
  • Chapter 2 Speaking
  • Chapter 3 Reading
  • Chapter 4 Writing
  • Chapter 5 Grammar
  • Chapter 6 Vocabulary
  • Chapter 7 Discourse
  • Chapter 8 Pronunciation
  • Chapter 9 Materials development
  • Chapter 10 Second language teacher education
  • Chapter 11 Psycholinguistics
  • Chapter 12 Second language acquisition
  • Chapter 13 Bilingualism
  • Chapter 14 Sociolinguistics
  • Chapter 15 Computer-assisted language learning
  • Chapter 16 Observation
  • Chapter 17 Classroom interaction
  • Chapter 18 English for academic purposes
  • Chapter 19 English for specific purposes
  • Chapter 20 Assessment
  • Chapter 21 Evaluation
  • Chapter 22 Syllabus design
  • Chapter 23 Language awareness
  • Chapter 24 Language learning strategies
  • Chapter 25 Task-based language learning
  • Chapter 26 Literature in the language classroom
  • Chapter 27 Genre
  • Chapter 28 Programme management
  • Chapter 29 Intercultural communication
  • Chapter 30 On-line communication
  • Postscript: The ideology of TESOL

Chapter 8 - Pronunciation

Published online by Cambridge University Press:  07 September 2010

When talking about pronunciation in language learning we mean the production and perception of the significant sounds of a particular language in order to achieve meaning in contexts of language use. This comprises the production and perception of segmental sounds , of stressed and unstressed syllables , and of the ‘speech melody’, or intonation . Also, the way we sound is influenced greatly by factors such as voice quality, speech rate and overall loudness. Whenever we say something, all these aspects are present simultaneously from the very start, even in a two-syllable utterance such as Hello!

Pronunciation plays a central role in both our personal and our social lives: as individuals, we project our identity through the way we speak, and also indicate our membership of particular communities. At the same time, and sometimes also in conflict with this identity function, our pronunciation is responsible for intelligibility: whether or not we can convey our meaning. The significance of success in L2 (second language) pronunciation learning is therefore far-reaching, complicated by the fact that many aspects of pronunciation happen subconsciously and so are not readily accessible to conscious analysis and intervention.

All this may explain why teachers frequently regard pronunciation as overly difficult, technical or plain mysterious, while at the same time recognising its importance. The consequent feeling of unease can, however, be dispelled relatively easily once a basic understanding has been achieved.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Pronunciation
  • By Barbara Seidlhofer
  • Edited by Ronald Carter , University of Nottingham , David Nunan , The University of Hong Kong
  • Book: The Cambridge Guide to Teaching English to Speakers of Other Languages
  • Online publication: 07 September 2010
  • Chapter DOI: https://doi.org/10.1017/CBO9780511667206.009

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

  • DOI: 10.1016/j.specom.2005.02.014
  • Corpus ID: 8385061

Speech melody as articulatorily implemented communicative functions

  • Published in Speech Communication 1 July 2005
  • Linguistics

Figures and Tables from this paper

figure 1

326 Citations

Prosody , tone , and intonation.

  • Highly Influenced

The Phonetics of Prosody

Predicting utterance pitch targets in yorùbá for tone realisation in speech synthesis, a musical approach to speech melody.

  • 13 Excerpts

Modeling tone and intonation in Mandarin and English as a process of target approximation.

Toward invariant functional representations of variable surface fundamental frequency contours: synthesizing speech melody via model-based stochastic learning, is prosody the music of speech advocating a functional perspective, syllable as a synchronization mechanism that makes human speech possible, speech prosody in phonetics and technology, 164 references, the penta model of speech melody : transmitting multiple communicative functions in parallel.

  • Highly Influential
  • 10 Excerpts

THE PENTA MODEL OF SPEECH MELODY: TRANSMITTING MULTIPLE COMMUNICATIVE FUNCTIONS IN PARALLEL

Articulatory constraints and tonal alignment, understanding tone from the perspective of production and perception. language and linguistics, the frame/content theory of evolution of speech production, tonal elements and their alignment, the c/d model and prosodic control of articulatory behavior, tonal and durational variations as phonetic coding for syllable grouping, prosody modeling with soft templates, pitch targets and their realization: evidence from mandarin chinese, related papers.

Showing 1 through 3 of 0 Related Papers

Breadcrumbs Section. Click here to navigate to respective pages.

Speech Melody

Speech Melody

DOI link for Speech Melody

Click here to navigate to parent product.

Variation in speech melody is an essential component of normal human speech. Equipment was available which could produce a voicing buzz but it is only relatively that such devices have been built which allow the user to come close to realistically mimicking the pitch variation of natural speech. Pitch refers to human perception, that is whether one perceives sounds as ‘high’ or ‘low.’ The basic unit of speech melody is the intonation phrase (IP), which is a complete pattern of intonation. IPs are built around the nucleus, the only obligatory element of an IP. The nucleus consists of a single syllable on which a variety of nuclear tones are realized. In some languages a rising intonation pattern is probably the most frequent way of producing questions in colloquial speech. Commands are often said with a falling pattern. This is particularly the case if a superior is talking to an inferior and if there is no possibility of discussing the issue.

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Poetic speech melody: A crucial link between music and language

Winfried menninghaus.

1 Department of Language and Literature, Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany

Valentin Wagner

Christine a. knoop, mathias scharinger.

2 Phonetics Research Group, Department of German Linguistics & Marburg Center for Mind, Brain and Behavior, Philipps-University Marburg, Marburg, Germany

Associated Data

Raw data and analysis scripts are available at Open Science Framework (DOI: 10.17605/OSF.IO/96JRM ).

Research on the music-language interface has extensively investigated similarities and differences of poetic and musical meter, but largely disregarded melody. Using a measure of melodic structure in music––autocorrelations of sound sequences consisting of discrete pitch and duration values––, we show that individual poems feature distinct and text-driven pitch and duration contours, just like songs and other pieces of music. We conceptualize these recurrent melodic contours as an additional, hitherto unnoticed dimension of parallelistic patterning. Poetic speech melodies are higher order units beyond the level of individual syntactic phrases, and also beyond the levels of individual sentences and verse lines. Importantly, auto-correlation scores for pitch and duration recurrences across stanzas are predictive of how melodious naive listeners perceive the respective poems to be, and how likely these poems were to be set to music by professional composers. Experimentally removing classical parallelistic features characteristic of prototypical poems (rhyme, meter, and others) led to decreased autocorrelation scores of pitches, independent of spoken renditions, along with reduced ratings for perceived melodiousness. This suggests that the higher order parallelistic feature of poetic melody strongly interacts with the other parallelistic patterns of poems. Our discovery of a genuine poetic speech melody has great potential for deepening the understanding of the music-language interface.

Introduction

Throughout most of its long history, poetry was part of the oral tradition and sung to genuine musical melodies [ 1 ]. This implies that many poems were, in all likelihood, not invented independently of musical melodies. With (Western) high-art poetry, however, this direct link between poems and melodies has been increasingly blurred. Modern poetry primarily constitutes a tradition of written texts that were and are read (silently or aloud), but no longer sung. It is also not known that authors wrote these poems with specific melodies in mind. To be sure, many poems impose higher prosodic regularity on language by virtue of implementing special metrical patterns. However, metrical stress and/or duration patterns alone are not known to predict genuine melodic pitch contours. Still, numerous authors of poems of the modern written tradition keep using the word “song” (German Lied or Gesang , French chant , etc.) in the title both of individual poems and of entire volumes of poetry; this applies at least across all major Western literary traditions. Similarly, the modern discourse on poetry has time and again emphasized musical properties of poems and a close link between poetry and music. It thus seems that, even though these poems are typically not sung, they still somehow convey an impression of song-likeness and melodiousness . It is precisely this persistent phantom of lyrical melody in the written tradition of poetry, which our study investigated. Our ambition was to show that this wide-spread emphasis on the song-likeness of written poetry is not just based on a vague comparison, let alone a mere phantom, but is actually reflective of a special, fully language-dependent, and objectively measurable type of repeated pitch and duration contours.

Recent brain imaging research has provided substantial evidence for such a close link between language and music processing in general ([ 2 , 3 ]). It is, however, still far from clear which formal features are actually shared by songs and spoken poems. For instance, both classical poetics and modern accounts have discussed similarities and differences between poetic and musical meter ([ 4 , 5 ]). Melodic features, however, have not been an issue in this context. 18 th -century author Joshua Steele ([ 6 ]) is an exception to this rule: he proposed improvised musical (i.e., pitch-based) notations for poems, but did not relate these notations to perception. The present study is the first to extract in a systematic fashion the structure of pitch and duration sequences characteristic of individual poems from the acoustic sound envelope of various spoken renditions of these poems, and to correlate these measures with subjectively perceived melodiousness. In this effort, we exclusively focused on poems featuring a sustained rhyme, meter and stanza structure that was particularly predominant in 19th century Europe. This clearly is a limitation; however, the statistical measure we propose for poetic speech melody allows to control the hypothetical melody-effect for a potential dependence specifically on the prosodic hyper-regularity driven by meter.

Ratings for the perceived “melodiousness” of an utterance have repeatedly been used to capture perceptual features of spoken language (e.g. [ 7 , 8 ]). A “melodious” sound is often contrasted with a “monotonous” sound ([ 7 ]); by implication, a melodious sound envelope should be rich in variation. Moreover, there is broad consensus that pitch is indeed essential for both musical and speech-related prosody ([ 9 , 10 ]).

However, linguistic research on pitch has hardly gone beyond the phrase or sentence level. Recurrent pitch and duration contours characteristic of entire poems can only be captured by extending the analysis of such contours far beyond these narrow limits. Specifically, the analogy with musical songs suggests that the level of stanzas––which typically encompasses multiple individual phrases––is first and foremost the higher-order textual level of poems that is most likely to support recurrent melodic patterns. Notably, compositional arrangement in recurrent stanzas is also exactly what sets poetry of the type analyzed here most conspicuously apart from other genres of (literary) language ([ 11 ]). At the same time, conformity to a specific stanza pattern is a precondition for larger textual units to be sung to a given musical melody.

Accordingly, the arrangement in stanzas and the recognition of an inherent language-based recurrence of pitch contours ––which we henceforth call “poetic speech melody”––should be intrinsically linked. We therefore expected that the stanza level should be particularly important for the construct of a poetic speech melody characteristic of entire poems. In order to be able to test this assumption, we included two additional units of analysis that are likewise important compositional building blocks of poems: all individual as well as rhyming lines only.

Pitch contours repeated over larger units of linguistic utterances are prosodic (phonological) parallelisms that implement structures of recurrence across these units. In this sense, poetic speech melody adds a novel dimension to the parallelism hypothesis of poetic language ([ 8 , 12 , 13 , 14 ]). This hypothesis stipulates that poetic language in general––whether found in traditional poems, free verse, political campaign slogans, commercial ads, or elsewhere––differs from ordinary language primarily by implementing multiple patterns of linguistically optional, yet perceptually salient, recurrence. The pertinent features include a great variety of often co-occurring phonological, syntactic, and semantic parallelisms (such as alliteration, assonance, anaphora, etc.), which are also frequently used in contexts other than rhymed and metered verses.

Most parallelistic features are limited to individual words, phrases and sentences, and hence not continued throughout entire texts (such as alliteration, assonance, anaphora, etc.). In contrast, meter, strophic structure and stanza-based recurrent pitch contours are higher order parallelistic features that imply a continuous parallelistic patterning across larger textual units. Non-continuous parallelistic features are likely to interact far less strongly both among each other and with continuous parallelistic features than the latter among each other. For instance, an alliteration can be removed or replaced by another one without necessarily affecting the overall metrical and strophic patterning. However, removing the ongoing metrical structure of poems should significantly affect their melodic structure, too, as musical melody cannot stay the same if its meter is altered or even completely eliminated. We therefore expected that the repetition of pitch contours across stanzas should be more sensitive to the experimental removal of poetic meter than to changes that primarily affect only individual words and syllables. At the same time, pitch contours of individual musical melodies can by no means be derived from their underlying meter only. Accordingly, we expected that poetic speech melody, too, cannot be reduced to a mere effect of meter.

Multi-parallelistic sentences and texts are linguistic analogues to multi-layered structures of symmetry, repetition and variation which are well-established as core features of the aesthetic appeal of both music and visual objects ([ 15 , 16 , 17 , 18 ]). Therefore, the focus on parallelistic structures in sentences and texts has the potential to allow for comparisons across aesthetic domains. Moreover, sensitivity to and priming through repetition is a general and highly important mechanism of human language perception and learning ([ 19 ]). Thus, the construct we propose relies on basic mechanisms of both aesthetic perception in general and language perception in particular. Notably, these mechanisms determine language use in general, including ordinary speech. This includes the near-regular distribution of stressed and unstressed syllables e.g. [ 20 , 21 ]) and the alternation of characters with large vs. small vertical spatial extent as well as the alternation of spaces between characters (e.g. [ 22 ]).

Our study specifically tested three hypotheses. Hypothesis 1 predicted for the set of 40 original poems that quantitative melodiousness scores obtained by autocorrelations of syllable pitch and duration should correlate with subjectively perceived melodiousness and hence be predictive of aesthetic appreciation. We extracted the melodic structure of these poems by identifying pitch and duration values for each syllable. This effectively allowed us to treat syllables in speech just like notes in music (for a similar approach, see [ 23 ] and [ 24 ]). In order to analyze the recurrence structure of pitch and duration, we subjected pitch and duration values to autocorrelation analyses ([ 25 ]). Autocorrelation is the cross-correlation of a time-series signal with itself at different time points; it has previously been used for detecting meter and melodic properties in music ([ 26 , 27 , 28 ]).

A second hypothesis informed our recourse to experimental modifications of the original poems. If poetic speech melody is indeed a higher order parallelistic structure of pitch and duration contours repeated across stanzas, it should be closely associated with the specific selection and combination of phonetic and prosodic building blocks implemented by the poet in the original text. Consequently, HYPOTHESIS 2 stipulates that increasing alterations and rearrangements of the wording should interfere with the hypothetical overarching melodic structure and reduce both objectively measured and subjectively perceived melodiousness.

If this hypothesis regarding different versions of the same poems holds true, it adds to the evidence for the existence of poetic speech melody that can be obtained through comparing different poems. Moreover, if modification-driven differences in melodiousness as measured by pitch and duration autocorrelations are largely convergent for the same poems across a variety of speakers, this would strongly support the notion that the construct of poetic speech melody is essentially speaker-independent.

Finally, we tested whether composers’ choices to put specific poems to music or not correlate with our measure of poetic speech melody. The affirmative option was our more speculative Hypothesis 3.

Materials and methods

Ethics statement.

For all reported experiments, written informed consent was obtained from all participants, in accordance with the declarations of Helsinki. All experiments were approved by the Ethics Council of the Max Planck Society.

Stimulus material

Poem corpus.

We used a selection of 40 relatively unknown poems from the later 18th to the mid-20th century (written on average 160 years ago). The poems were originally collected and all of the other versions created for a study on how poetic diction influences emotional responses, specifically, feelings of joy, sadness, and being moved, and how these emotional responses correlate with aesthetic virtue attributions (specifically, beauty and melodiousness) as well as overall liking ([ 8 ]). All of the poems feature regular meter and either an ABAB (cross rhyme), AABB (pair rhyme), or ABBA (enclosed rhyme) scheme. On average, the poems consist of 4 ± 1 stanzas with 16 ± 4 lines and a total of 136 ± 40 syllables. Half of the poems were subsequently set to music, most prominently by romantic composers (information retrieved from the LiederNetArchive, http://www.lieder.net/lieder/index.html ). Poem modification was done in four steps of removing parallelistic features, thus yielding five versions of each poem (see Table 1 ).

The glossary column provides more detailed explanations of the specific modifications. The last column gives average number of syllables of the original and modified versions of the selected 40 poems.

VersionPropertiesExampleGloss.Number of syllables [± SD]
AOriginalAh Sun-flower! weary of time,
Who countest the steps of the Sun:
Seeking after that sweet golden clime
Where the travellers journey is done.
Four verse lines, anapaestic trimeter (with verse-initial variation), cross-rhymed (abab)136 ± 40
BMeter, no rhymeAh Sun-flower! weary of time,
Who countest the steps of the Sun:
Seeking after that sweet golden place
Where the travellers journey is through.
Rhyme removed by replacing the verse-final words in line 3 and 4136 ± 38
CRhyme, no meterAh Sun-flower! weary of the time,
Who countest all the steps of the Sun:
You are seeking after that sweet golden clime
Where the journey of the traveller is done.
Meter removed by adding syllables to break up the pattern145 ± 42
DNo rhyme, no meterAh Sun-flower! weary of the time,
Who countest all the steps of the Sun:
You are seeking after that sweet golden place
Where the journey of the traveller is through.
Integration of the modifications B and C147 ± 41
ENo rhyme, meter and further parallelistic propertiesAh Sun-flower! weary of the time,
Who countest all the phases of the Sun:
Pursuing that sweet golden place
Where the journey of the traveller is through.
Removal of the alliteration in “steps”/”sun” and the alliteration and assonance in “seek”/”sweet”142 ± 42

Importantly, the modifications of the various parallelistic features did not affect several other features that are likewise characteristic of the type of poetry used in our study. For instance, a low degree of narrative content, unmediated evocations of highly personal and highly emotional speech situations, and the frequent addressing of an absent person/agent who is or was highly significant for the lyrical speaker are found across all versions of the poems. Moreover, non-parallelistic features of poetic diction (such as metaphor, ellipsis, etc.) were also kept as constant as possible. Finally, non-metered and non-rhymed poems account for a substantial share of 20th century poetry. For all these reasons, the modified versions that were relatively low in parallelistic features were also readily accepted as poems. Table 1 illustrates the steps of modifications we employed on our set of 40 German poems. In order to make these steps intelligible to a broader readership, we illustrate them on an English analogue which is based on the first stanza of the poem “Ah Sun-flower!” by William Blake. (For a detailed German example of all differences between versions A and E, see Supplementary Materials in Menninghaus et al., 2017.)

Spoken renditions of the poem corpus

Professional speaker.

The 200 stimuli overall (40 poems in five versions each) were recited by a professional speaker who is a trained actor, certified voice actor and speech trainer. The digital recordings (sampling rate 44.1 kHz, amplitude resolution 16 bits) were made in a sound-attenuated studio. The speaker was instructed to recite all poem versions with a relatively neutral expression and without placing too much emphasis on a personal interpretation Errors during reading were corrected by repeating the respective parts of the poems.

Synthetic voices

In order to obtain speaker-independent evidence of poem-based pitch and duration recurrences, we opted for several control conditions. One control condition involved computer-generated voices in a text-to-speech application (natively implemented in MAC OSX 10.11). All 5 versions of the 40 poems were synthesized using a male and a female voice at a syllable rate of ~4 syllables/s. We used the voices called ANNA (female) and MARKUS (male) in their standard-settings. The algorithm first translates text input into a phonetic script and then synthesizes each word using pre-recorded speech templates. Global prosodic features are applied by default and triggered by punctuation (e.g. question intonation is triggered by question mark). We decided to use these voices on the basis of the overall acceptable voice quality which was judged to be superior to many other text-to-speech synthesis applications, including some that allow for a more detailed control over acoustic-phonetic properties.

Nonprofessional speakers

Another control involved 10 nonprofessional native German speakers (4 males, 6 females, mean age 29 ± 7 years) who read a feasible subset of 8 original poems (version A) and 8 modified versions (version E) of these poems. Participants in this production study were recruited from the participant pool of the Max Planck Institute for Empirical Aesthetics and received monetary compensation. They were asked to read the 16 poems in randomized order, naturally and in a standing position in a sound-attenuated recording studio (sampling rate 44.1 kHz, amplitude resolution 32 bits). They were also asked to avoid strong expressivity in their renditions. Errors during reading were corrected by repeating the respective parts of the poems. Prior to acoustic analyses, the recorded poems were modified in order to match the written versions (replacement of erroneous passages, removal of non-text-based additions). All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society and were undertaken with the written informed consent of each participant. On average, the recording session took about one hour per participant.

Acoustic analyses

Our acoustic analyses focused on the primary acoustic cue of linguistic pitch, i.e., the fundamental frequency of sonorous speech parts (F0, [ 29 , 30 ]), and on syllable duration.

Preprocessing

Digitized poem renditions were automatically annotated using the Munich Automatic Segmentation system (MAUS), and annotation grids were imported into the phonetic software application PRAAT ([ 31 ]). Annotation was based on syllabic units; this was motivated by the observation that the syllable is a core linguistic unit in poetry ([ 32 ]) and the minimal unit in the prosodic hierarchy ([ 33 ]). The annotation was manually inspected and corrected by a trained phonetician and native speaker of German. Corrections included marking silent periods larger than 200 ms as pauses and shifting syllabic boundaries to zero crossings in order to arrive at consistent cuttings, as is usual in phonetic annotation.

In principle, the measure of recurrence we used is entirely independent of manual chunking or manual pre-processing. However, we decided for an approach that includes a “hand-made” control and fine-tuning of the exact syllable boundaries; we expected that this additional effort could improve the correlations between our statistical textual measure and the data for subjective perception. (A quantification of the degree to which our manual intervention actually improved the correlations was, however, beyond the scope of our paper.)

For all analyses, our syllable-based pitch estimation followed the approach in Hirst [ 34 ]. In a first pass, the fundamental frequency of the entire poem was calculated using an autocorrelation approach, with the pitch floor at 60 Hz and the pitch ceiling at 700 Hz. The 25% and 75% quartiles (Q25 and Q75) of pitches were identified and used for determining the pitch floor and ceiling in the second pass of fundamental frequency estimation. The second-pass pitch floor was 0.25 * Q25, while the pitch ceiling was 2.5 * Q75. Pitch extraction is illustrated in Fig 1 . Averaged pitch and duration values for each speaker/voice are provided in Table 2 .

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g001.jpg

A. Top: The digitized speech signal was annotated, using syllabic units (example poem: August von Platen [1814], Lass tief in dir mich lesen). Bottom: The pitch contour was obtained by a two-pass fundamental frequency (F0) estimation. From sonorous parts, the mean pitch at three measurement positions was calculated. B. Mean pitch values were mapped onto semitones, using the MIDI convention, and syllable duration was mapped onto musical length. For illustration purposes, the resulting notation was shifted two octaves up. C. Discrete pitch and duration values were subjected to autocorrelation analyses. Apart from an overall measure of autocorrelation strength, the study focused on autocorrelation values at lags that correspond to poetic structure, such as (all) individual lines, rhyming lines, and stanzas.

Each speaker in the control group of 10 speakers produced a subset of 16 poems (8 original [A] versions, 8 modified [E] versions). The syllable rate is computed as the number of non-silent syllabic units per time unit.

SpeakerMean syllable pitch [Hz ± SD]Mean syllable rate [1/sec ± SD]Number of poems [ ]
99 ± 63.0 ± 0.2200 (40x A, B, C, D, & E, respectively)
102 ± 24.1 ± 0.2200
173 ± 34.1 ± 0.2200
196 ± 34.2 ± 0.316 (8x A, 8xE)
217 ± 52.9 ± 0.216
218 ± 63.3 ± 0.216
129 ± 33.1 ± 0.216
214 ± 73.4 ± 0.316
242 ± 32.9 ± 0.216
227 ± 53.6 ± 0.216
141 ± 23.6 ± 0.216
137 ± 34.0 ± 0.216
126 ± 23.2 ± 0.216

Following Patel et al. [ 23 ], we computed the mean pitch for each syllable across the three measurement positions beginning, middle, and end. In rare cases (<0.2% of all data points), pitch could not be determined and was interpolated based on the pitches of neighboring syllables. In addition to syllable-based pitch information, we also calculated the physical duration of each syllable. We excluded pauses from the pitch and duration analyses.

Pitch transformation

Pitch and duration values were discretized, using MIDI conventions, in order to arrive at a more music-analogous basis for subsequent pitch analyses. This implied that raw pitch values (in Hz) were transformed into semitones on the MIDI scales with numeric values ranging from 21 to 108, using the following formula:

with d = MIDI pitch value and F = raw pitch (in Hz).

Syllable durations were mapped onto musical note durations, with the simplifying assumption that a whole note corresponds to 1 s. The smallest note duration values were mapped to a 16th note, which thus corresponded to a minimal syllable duration of 62.5 ms. For illustration purposes only, the MIDI pitch values were transposed two octaves up (see Fig 1B ).

Autocorrelation analyses

We performed autocorrelation analyses on the time series of syllable/note pitches and syllable/note durations. The discrete autocorrelation R at lag L for the signal y ( n ) was calculated as

Autocorrelations were determined for lags L = 0 up to 90% of the length of the respective time series.

The significance of each autocorrelation value was estimated using a permutation analysis. For this purpose, autocorrelations were computed for 10,000 randomly shuffled syllable sequences. For each time lag, the absolute value of the autocorrelation was compared to the autocorrelation value for the original syllable sequence of each poem. If the autocorrelation value for the shuffled sequence was smaller than the autocorrelation value for the original sequence in more than 95% of all 10,000 permutations (α < 0.05), the hypothesis that the autocorrelation value of the original sequence equaled the autocorrelation value of the random sequence at the corresponding lag was rejected, and the autocorrelation value in question was then considered “significant”. Only significant autocorrelations were used for subsequent analyses.

For the main study, we determined the average distances (in number of syllables) between syllables of successive stanzas, rhyming lines only, and all verse lines, as we hypothesized that pitch and duration contours yield recurrent patterns across the main compositional building blocks of the poems (most prominently the stanza). For the modified poem versions from which rhymes were removed, distances (in number of syllables) between rhyming lines were determined based on where the rhyming word would have been found, had it not been replaced by a non-rhyming counterpart. Subsequently, we determined the mean autocorrelation value of each poem rendition for these three textual units (all individual lines, rhyming lines only, stanzas).

Melodiousness rating data

Ratings of subjectively perceived melodiousness (given on a 7-point scale) were collected for all poem versions (A-E). Because our hypothesis considers the original version A as the most melodious one and we were interested in the hypothetical decrease of perceived melodiousness relative to this version, we consistently used version A as anchor version against which we separately compared all other versions.

Participants and procedure

Procedure and experimental setup was similar to the study reported by ([ 8 ]). Overall, 320 students (224 women and 96 men) participated (80 per the individual comparisons of version A with versions B, C, D, and E), with a mean age of 23.6 years (SD = 4.3, min = 18, max = 42). Each participant listened to two versions (original and one of the modified versions) of four poems recited by the professional speaker and was subsequently asked to rate these two versions on several 7 point-scales capturing emotional responses (positive and negative affect, sadness, joy and being-moved) and dimensions of aesthetic evaluations (melodiousness, beauty, and liking). For the present study, we exclusively focus on the melodiousness ratings.

The mean melodiousness ratings obtained for the original version (A) across the four different data sets did not significantly differ from one another ( F (3,117) = 2.05, p = 0.110; see Fig 2 ). We therefore collapsed the ratings for version A across the four data sets.

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g002.jpg

Notably, melodiousness ratings for the original poems (version A) did not differ between experiments.

Statistical analyses

Correlation analyses.

Correlation analyses were all based on Spearman’s rank correlation coefficients (carried out in the statistical software package R, Version 3.4, The R Foundation, Vienna, 2017).

Analyses of variance

We report all results as a mixed-effect ANOVA with F -values that were estimated by the lmerTest package ([ 35 ]). Post hoc analyses were calculated based on the multcomp package in R ([ 36 ]) and consisted of Bonferroni-adjusted t -tests with z -transformed t -values.

Correlations

For the dependent variable mean autocorrelation (for the three textual units of line, rhyme, and stanza) we calculated separate models for (a) the 200 renditions by the professional speaker and the two synthetic voices and for (b) the 16 renditions by the 10 nonprofessional speakers. All models included poem as a random variable and the fixed effects poem version (A to E for the professional speaker and computer voice models, and A vs. E for the nonprofessional speaker model), acoustic measure (pitch or duration), and speaker (professional speaker vs. the two synthetic voices for model (a) and 10 nonprofessional speakers for model (b), as well as all possible interactions). Finally, the models also included the fixed effect textual unit (all lines, rhyming lines, stanzas).

In order to further examine the relationship between autocorrelations and melodiousness ratings, we calculated additional models for all poem versions as recited by the professional speaker, with mean ratings as the dependent variable. The model included the covariate mean autocorrelation (together with the fixed effects acoustic measure and textual unit ).

We finally were interested whether autocorrelations would vary dependent on whether the respective poems were set to music or not. For this reason, we calculated a model for the original poems with the dependent variable mean autocorrelation (across stanzas) and the fixed effect set to music (1 = yes, 0 = no). We additionally ran mixed-effects logistic regressions ([ 37 ]) with set to music as the dependent variable and autocorrelation (across stanzas) as the fixed effect.

Original poems: Correlations between autocorrelations scores and subjective melodious ratings

Focusing on the original poems only, we first correlated the mean melodiousness ratings obtained from the four different data sets involving four different groups of participants (see “Participants”) with the mean autocorrelation scores of pitch and duration across stanzas as extracted from the rendition of these poems by a professional speaker. There was a significant correlation effect for pitch-based autocorrelations (rho = 0.31, t = 2.00, p < 0.05) but not for duration-based autocorrelations (rho = 0.19, t = 1.32, p = 0.20, see Fig 3 ). Importantly, thus, our statistical measure of the melodiousness of speech captures objective differences of the acoustic rendition of different poems that are predictive of the subjective impressions of melodiousness during listening to these poems.

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g003.jpg

Pitch-based autocorrelations were significantly correlated with melodiousness ratings.

We further analyzed whether pitch- and duration autocorrelation values varied depending on the respective meter of the original poems in our corpus. Performing a two-sample t-test, we first compared the mean ratings obtained for the iambic poems (N = 27) with those obtained for the trochaic ones (N = 13). The test revealed no significant difference in melodiousness ratings for the two groups of poems ( t = 0.61, p = 0.54). Next, we examined whether there was an interaction between meter in general (be it iambic or trochaic) and acoustic property (pitch, duration) with respect to the mean autocorrelations across stanzas. The corresponding model did not show a significant interaction ( F (1,38) = 0.74, p = 0.39) and hence no effect of meter ( F (1,38) = 0.01, p = 0.92). That is, neither melodiousness ratings nor autocorrelation scores depend on the meter of the original poems.

Effects of poem modification and autocorrelation lag

Professional speaker and synthetic voices.

Mean autocorrelations decreased as a function of poem version ( F (4,3471) = 71.65, p < 0.001): values were highest for the original versions (A) and lowest for the most modified versions (E). This effect interacted with textual unit ( F (8,3471) = 7.98, p < 0.001): modifications affected most strongly autocorrelations computed across stanzas ( Fig 4B ). The analysis also yielded a main effect of speaker ( F (2,3471) = 25.97, p < 0.001), with higher autocorrelations for the professional speaker than for either of the synthetic voices. The main effect of textual unit ( F (2,3471) = 55.12, p < 0.001) reflected the following scaling of autocorrelations: all lines<rhyming lines only<stanzas. This effect crucially depended on acoustic measure dimension ( textual unit x acoustic measure : F (2,3471) = 16.78, p < 0.001) and was further influenced by speaker ( speaker x acoustic measure x textual unit : F (4,3471) = 14.12, p < 0.001). Notably, the scaling all lines<rhyming lines only<stanzas particularly held for pitch and for the professional speaker ( Fig 4A ). No other main effects or interactions depended on speaker (all F s < 2, p > 0.19).

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g004.jpg

A. The scaling of autocorrelations with textual unit (all lines<rhyming lines only<stanzas) crucially depended on acoustic dimension and on speaker (PS: professional speaker, SV: synthetic voice). B. Illustration of the poem version effect for the professional speaker. The strongest effect is seen for pitch-based autocorrelations across stanzas. Error bars indicate standard errors of the mean.

Poem modification led to decreased autocorrelation values ( F (1,833) = 191.40, p < 0.001), and autocorrelation values scaled with textual unit as in the previous analysis (i.e., all lines<rhyming lines only<stanzas; main effect textual unit : F (1,833) = 49.67, p < 0.001). The significant textual unit x acoustic measure interaction ( F (2,833) = 14.87, p < 0.001) revealed that this scaling order only held for pitch. Post hoc analyses showed that, for pitch, the stanzas showed larger autocorrelation values than the rhyming lines ( z = 3.96, p < 0.01). Inversely, for duration, the stanzas showed smaller autocorrelation values than the rhyming lines ( z = -3.46, p < 0.01). This effect further depended on poem version and was driven by the original poems (significant interaction textual unit x acoustic measure x poem version : F (2,833) = 7.16, p < 0.001). The stanza>rhyming lines relation held for pitch ( z = 4.53, p < 0.001), and the rhyming lines>stanzas relation held for duration ( z = -3.60, p < 0.001), but no differences were found for the modified poems (pitch: z = 0.75, p = 0.91; duration: z = −1.12, p = 0.53; Fig 5 ). The main effects of speaker ( F (9,833) = 2.93, p < 0.05) revealed speaker-dependent differences in the autocorrelations. Importantly, the effect of speaker did not show significant interactions with any of the other effects (all F s < 2, p > 0.12).

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g005.jpg

Overall, autocorrelations are higher for original than for modified poems, but differ depending on acoustic dimension (pitch or duration) and textual unit (all lines, rhyming lines only, stanzas). Error bars indicate standard errors of the mean.

Correlations of autocorrelations scores and subjective melodiousness ratings across poem modifications

Correlating the melodiousness ratings obtained for all 200 poem versions (40 poems in five variants each) with the autocorrelation scores of each of these poems versions revealed that melodiousness ratings strongly depended on the poems’ modification ( F (4,156) = 42.41, p < 0.001), with decreasing melodiousness ratings for increasing levels of poem modification (see Fig 2 ). The linear decrease of melodiousness ratings with increasing modification [coding version A as 0, version B and C as 1, and version D and E as 2 and 3 respectively] is substantiated by a significant Spearman correlation (rho = −0.59, p < 0.001; based on the mean melodiousness ratings per poem version).

In the model comprising the mean autocorrelation values of all poem versions as fixed effect, there was a significant interaction of textual unit and mean autocorrelation ( F (2,1104) = 2.32, p < 0.05) that depended on poem modification, as seen in the three-way interaction of textual unit x poem modification x mean autocorrelation ( F (8,1104) = 2.23, p < 0.05). The decomposition of these interactions revealed that mean autocorrelations at line-lags never correlated with melodiousness ratings (i.e. independent of modification, rho = 0.06, t = 1.22, p = 0.23), whereas mean autocorrelations at rhyme-lags (rho = 0.11, t = 2.04, p < 0.05) and even more so at stanza-lags (rho = 0.21, t = 4.34, p < 0.01) correlated positively with melodiousness ratings, with the original poem version showing the by far strongest correlation with the autocorrelation measure. Thus, we observe an overall positive correlation of mean autocorrelations and melodiousness ratings relatively independent of modification. This finding indicates that our statistical measure of melodiousness captures statistical differences of the phonetic signal that correlate with perceptual differences not just for the prototypical rhymed and metered poems but likewise for their far less prototypical versions. Although not substantiated by a significant interaction, we looked at correlations between ratings and autocorrelations at stanza-lags separately for pitch- and duration-based autocorrelations ( Fig 6 ). These correlations proved to be significant for both pitch (rho = 0.10, p < 0.05) and duration (rho = 0.14, p < 0.01).

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g006.jpg

These correlations involve all poem versions.

Whereas the removal of ongoing meter required systematic changes of the wording or at least the word order throughout all lines of the poems, the other three modifications affected only individual words and were altogether very subtle. Given that our data also show substantial individual variance regarding the ratings for all versions, it is fairly remarkable that we did find a significant correlation between autocorrelation scores and melodiousness ratings for all poem versions. Anticipating that this correlation should be far more pronounced when looking at the end points of the experimental modifications only, we computed an additional correlation analysis for versions A and E only. Results strongly confirmed this expectation: The correlations for both pitch (rho = 0.22, p < 0.05) and duration (rho = 0.36, p < 0.05) were highly significant when looking at the pooled data from versions A and E. By contrast, when looking at the pooled data from versions B, C, and D, correlations did not reach significance, neither for pitch (rho = 0.05, p = 0.62) nor for duration (rho = −0.03, p = 0.74).

Autocorrelations and musical settings

Whether or not a poem has been set to music ( musical setting 1 or 0) correlates with mean autocorrelations ( F (1,399) = 18.68, p < 0.001). This effect differs depending on the textual unit (all individual lines, rhyming lines only, or stanzas; interaction musical setting x textual unit : F (2,427) = 2.90, p = 0.05). Poems set to music particularly show higher autocorrelations across stanzas ( z = 4.50, p < 0.001, Fig 7 ; all other comparisons z < 2, p > 0.19). A mixed-effects logistic model further confirmed that musical settings are predicted by overall autocorrelation across stanzas ( z = 3.40, p < 0.001). Again, we found a stronger predictive effect for pitch ( z = 2.84, p < 0.01) than for duration ( z = 2.27, p < 0.05). Notably, this finding was obtained based solely on the original poems, and hence independent of any experimental modification.

An external file that holds a picture, illustration, etc.
Object name is pone.0205980.g007.jpg

Autocorrelation values are higher for poems that have been set to music than for poems that have not been set to music. This particularly holds for autocorrelations across stanzas. Error bars indicate the standard error of the mean.

The most seminal finding of our study is that pitch contours of original poems show a highly recurrent and largely speaker-independent structure across stanzas . These recurrent pitch contours are an important higher order parallelistic feature that had previously escaped attention both in linguistics and in literary scholarship. Crucially, the quantitative measure of pitch recurrence across stanzas correlated significantly with listeners’ melodiousness ratings, lending strong support to our hypothesis that relevant and distinctive dimensions of melodic contour in spoken poetry can indeed be approximatively captured by a fairly simple and abstract autocorrelations measure. Moreover, the fact that mean subjective melodiousness ratings for the 40 original poems nearly converged for four independent groups of participants, is already in itself a remarkable finding that strongly hints at some objective correlate of these ratings.

As anticipated, pitch and duration autocorrelations decreased as other parallelistic properties of the poems were experimentally removed. Importantly, across all poem versions, higher degrees of pitch recurrences predict higher subjective melodiousness ratings, and they do so already for the original (unmodified) poems only, independent of any experimental modification we performed. Duration recurrences also predict melodiousness ratings when analyzed across all poem versions, but not when only the original poems are considered. The absence of an effect of duration autocorrelation for the original poems may suggest that the construct of melody in spoken poetry is mainly based on discrete pitches, in close resemblance to music.

Overall, this pattern of finding suggests that our measure of melodic recurrence––autocorrelations of pitch and duration––is indeed not only suited for analyzing classical metered and rhymed poems of the type that was preeminent in 19 th century Europe. From a technical point of view, the measure can easily be applied to all types of speech. It is widely acknowledged that every type of speech has an inherent rhythm and beat (e.g. [ 20 , 21 ]), i.e. a (quasi)-regular distribution of phonetic (speech sound and prosodic) features in time. Our measure is well-suited to capture these distributions in future research. Given that we found an overall positive correlation of melodiousness ratings with autocorrelations across different levels of modifications (i.e. relatively independent of meter and rhyme), we expect that the perceptual consequences of melodic recurrence also hold beyond poetry, albeit to a lesser degree.

Furthermore, pitch recurrences were also predictive of whether or not specific poems were set to music. Our study is thus the first to operationalize the phantom of a “song”-like poetic speech melody of spoken poems by recourse to a measure that can quantify it. It is also the first to empirically illustrate the powerful effects poetic speech melody can exert on the aesthetic evaluation of poetry by nonprofessional listeners as well as on decisions of composers to set particular poems to music.

At first sight, consistencies in syllable pitch (and duration) structure within larger constituents of spoken language may not seem surprising, as previous research on intonation contours and linguistic rhythm (e.g. [ 38 , 39 ]) has revealed that phrase endings are prosodically marked, in that, for instance, pitches yield a downward movement or a falling contour, and that this prosodic marking may co-occur with phrase-final lengthening ([ 40 , 41 ]).

Furthermore, prosody is differently treated in trochaic and iambic meter. The inverse strong-weak and weak-strong patterning of syllables characterizing trochees and iambs is accompanied by prosodic cues that are analogously interpreted across different languages and even mark an important distinction for a nonhuman species ([ 42 ]). Stated in the so-called iambic/trochaic law ([ 43 , 44 ]), the strong-weak patterning in trochees is brought about by higher intensity and pitch in strong and lower intensity and pitch in weak syllables. On the other hand, the weak-strong patterning in iambs corresponds to a difference in syllable duration, with a relatively short syllable in weak position and a relatively long syllable in strong position.

Thus, to a certain degree, metrical structure alone already supports a regular patterning of pitches and durations. This may certainly be one factor that explains why original poems show high autocorrelations based on these measures. However, this explains neither the correlations of pitch (and partly also of duration) autocorrelations with melodiousness ratings, nor the relationship between pitch structure and the likelihood of a poem being set to music. Since a bit more than two thirds of our original poems feature iambic and the remaining ones have trochaic meter, the aforementioned duration emphasis of iambs should have prevailed over the pitch emphasis of trochees and should in sum total have resulted in a stronger duration than pitch effect. However, we here report precisely the opposite, namely, a stronger predictive power of the pitch autocorrelations. Moreover, we did not find any significant correlations between autocorrelations values and meter (be it iambic or trochaic).

We therefore suggest that the construct of melody in spoken poetry is neither a mere phantom implicitly endorsed by the longstanding tradition to call poems “songs” nor a mere side effect of metrical structure. Rather, it is a measurable, quantifiable entity of its own that explains effects that are not otherwise predictable by existing paradigms and methods of analyzing linguistic prosody.

To be sure, we are fully aware that our analyses cannot provide a full theory of melody or melodic features (for research in this direction, see e.g. [ 45 , 46 ]). Clearly, the autocorrelations scores are exclusively linked to degrees of repetition and not to specific harmonic qualities of the tone sequences. However, for all its abstractness, the predictive power of this statistical measure both for subjectively perceived melodiousness and for decisions of composers to set specific poems to music strongly suggests that the measure does have a bearing on genuine aesthetic perception.

We certainly acknowledge that melody in music and melody in speech differ in certain aspects. For instance, the pitch range in speech is far narrower than in musical melody ([ 47 , 48 ]). Nevertheless, the proposed measure of pitch- and duration-based autocorrelations appears to be a fruitful measure that at least approximately captures melodic properties of both music and language.

As predicted by our theoretical considerations, similar pitch sequences were most prominently found across stanzas. After all, it is primarily the stanza pattern that is consistently repeated in poems, whereas individual lines vary frequently in the number of syllables. Since recurrent meter and rhyme patterns have been shown to enhance prosodic fluency ([ 14 ]), it is likely that recurrent melodic contours also contribute to such parallelism-driven fluency effects which, in turn, enhance aesthetic appreciation ([ 49 ]). In fact, we propose that our results can largely be explained by reference to the ease-of-processing hypothesis of aesthetic liking.

Crucially, the stanza effect in our study turned out to be consistently independent of the speaker. It was most pronounced when poems were recited by humans (professional or nonprofessional). Surprisingly––and highlighting the independence of poetic speech melody from the actual rendition by any speaker and hence its strong reliance on an inherent textual property––, even the poem versions that were recited by synthetic voices confirmed the melody effect at the stanza level. The ratings of perceived melodiousness likewise correlated most strongly with stanza-based pitch and duration autocorrelations.

The strong effect of the poem modifications on melodiousness as spontaneously rated by non-expert listeners suggests that poetry recipients are highly sensitive to perceiving multiple co-occurring and strongly interacting parallelistic patterns at a time and that they are also capable of rapidly integrating these patterns into a complex percept. Such automatic detection and integration of multiple optional patterns of poetic parallelism can be conceived as an analogue to the low-level perception of multiple symmetries and other autocorrelations in complex visual aesthetics ([ 50 ]).

Moreover, parallelistic patterning has been shown to enhance the memorability of poetic language ([ 51 ]). As genuine musical melodies clearly support the memorability of the lyrics underlying them, it is worth investigating the extent to which poetic speech melody also increases verbatim recall and potentially also the privileged storage of poems or other texts in memory ([ 51 ]).

Finally, our analyses of poetic speech melody reveal a hitherto unknown reason for why some poems have been set to music while others have not: the higher the degree of pitch recurrences of corresponding syllables across the stanzas of a given poem, the higher the likelihood that it has been set to music. Thus, our findings provide an empirical basis for the view that melodic aspects of poetry are inherent properties of the verbal material itself ([ 52 ]), and that an intuitive awareness of these properties seems to guide composers in finding the “right” musical melody ([ 53 ]).

Regarding the relationship of linguistic prosody to music, our methods advance attempts by Halle and Lerdahl [ 54 ] to introduce generative methods for “text setting” (i.e., setting texts to music) by capitalizing on the fact that linguistic and musical prosody are similar ([ 9 , 55 , 56 ]). Our findings––particularly the predictive power of stanza-related autocorrelations for musical settings––suggest that composers are not only aware of the relationship between linguistic prosody and music, but also have the skills to implement the transformation that Halle and Lerdahl [ 54 ] described.

In his novel War and Peace ([ 57 ]), Tolstoy evocatively refers to this notion of a genuine inherent melodiousness of poetic language: “The uncle sang […] in the full and naive conviction that the whole meaning of a song lies in the words only and that the melody comes of itself, and that […it] exists only as a bond between the words.” In the end, this is exactly our finding: there is, indeed, a melody emerging from the words only––from the process of selecting and combining them––, and this auto-emergent melody bestows an additional musical coherence on the entire word sequence.

We certainly acknowledge that the relation between language and music is unlikely always to be as straightforward as our analyses of a specific type of poetry suggest. For instance, prose, too, can be set to music (operatic recitative), and some texts set to music may not show pronounced musical contours (monotonic chanting in certain religious traditions). Furthermore, many poems do not feature any sustained rhyme and/or metrical pattern and/or no stanza structure (e.g. [ 24 ]). However, even in these cases, our measure of pitch and duration recurrence may well be able to shed light on the relations between intrinsic language-dependent intonation and musical melody as well as between intrinsic linguistic rhythm and musical beat.

Summing up, our data strongly support the notion that the spontaneous recognition of recurrent melodic patterns extends well beyond music proper and the expectations of tonal harmony with which they are associated in music. Our study shows that spoken texts show, in their compositional entirety, a genuine and consistent patterning of recurrent pitch and duration contours, that a melodiousness of this type can be captured statistically using the same metrics across acoustic domains, and that recipients readily project their intuitive percept of inherent language-based melody onto melodiousness ratings in a way that is highly consistent with our statistical measure of melodiousness. Thus, our study turns the phantom of a poetic speech melody in spoken texts into a non-metaphorical, unquestionably real and measurable entity.

Finally, both classical ([ 58 ]) and modern poetics ([ 13 ]) suggest that poetic language is only gradually and not categorically different from ordinary language. Seen in this light, the perceptual sensitivity to poetic speech melody that we report in this study is unlikely to be exclusively acquired through repeated exposure to poetry. Therefore, we expect the construct we introduce in this study to be helpful in making progress on other issues that have thus far remained fairly elusive, specifically melody-like structures in rhetorical speeches, spoken religious liturgy and other types of ritual language that are rich in parallelistic structures. The measure may also be helpful for making progress on the difficult issue of “prose rhythm” ([ 59 ]), provided that in prose, too, higher order recursive structures can be identified that are analogues, if only less rigid ones, to the lines and stanzas of poetry, and hence can serve as reference units for more fine-grained autocorrelation analyses.

Acknowledgments

We wish to express our thanks to Alexander Lindau for his help with the recordings of the nonprofessional speakers and Andreas Pysiewicz for his support in recording entire poem corpus. We also thank Anna Roth, Miriam Riedinger, and Julia Vasilieva for their support during data preprocessing and text annotation, and Julia Merrill for her support on questions regarding the music-language interface. Finally, we are grateful to Nicola Leibinger-Kammüller for directing our attention to Tolstoy’s remark about an auto-emergent melody of verbal sequences and to all colleagues who shared inspirations and comments regarding our methods: Eugen Wassiliwizky, Michaela Kaufmann, Pauline Larrouy-Maestri, Elke Lange, Alessandro Tavano, and David Poeppel.

Funding Statement

The research presented here was funded by the Max Planck Society.

Data Availability

  • All categories

Research project

Melody in speech

All languages use melody in speech, primarily via rises and falls of the pitch of voice. Such pitch variation is pervasive, offering a wide spectrum of nuance to sentences – an additional layer of meaning. For example, saying “yes” with a rising pitch implies a question (rather than an affirmation). Melody is essential for communication in social interaction.

Vici

Languages employ melody in diverse ways to convey different layers of meaning in speech. In languages like Standard Dutch, a wide range of sentence-level meanings is conveyed by pitch variation, such as asking questions, highlighting important information, signaling intention, and conveying attitude or emotion.

This, however, is not the only level at which melody is used to convey meaning. The majority of the world’s languages (60-70%) are tone languages, which use pitch variation to distinguish individual word meanings (e.g. shi means ‘yes’ with a falling tone but ‘stone’ with a rising tone). Tone language speakers are nevertheless able to use pitch variation to express sentence-level meanings , too.

Worldwide, tone languages vary widely . Two well-recognized differences are particularly relevant. One concerns the form of word-level melody; some languages mainly use pitch levels (high, mid, low) to distinguish words, while others employ pitch contours as well (e.g., rising vs. falling). Another concerns the function of word-level melody; in some languages, tones not only distinguish words, but also identify grammatical function (e.g. different tenses of a verb or different cases of a noun). Thus, in a tone language, multiple layers of information, both at the word and sentence level, are conveyed in the same melodic signal in speech.

While we recognize that these different layers of meaning as well as the behavioral and neuro-cognitive aspects of producing and interpreting them are tightly intertwined, how they are connected and how those connections may be manifested differently in the world's languages remains poorly understood. This project therefore proposes to address the following significant, yet, unresolved questions. 

  • How do typologically different tone languages differ in how speakers vary pitch to signal both word-level tone and sentence-level intonation?
  • How do these differences affect the way listeners use pitch variation to recognize words and interpret sentences?
  • Do tonal system differences also modulate language-specific brain networks involved in deriving meanings from melodic signals?

To address these questions, this interdisciplinary proposal includes well-controlled systematic comparisons of the way pitch variations relate to word- and sentence-level meanings in typologically different tonal systems. The aim of this research program is to understand the general and language-specific mechanisms that guide the production, comprehension, and neural processing of pitch variation in tone languages.

  • Skip to primary navigation
  • Skip to main content
  • Skip to footer
  • Sign up for our free course: Top 3 Ways to Master the American Accent

Sign up for our free course Top 3 Ways to Master the American Accent

Sign up for our cheatsheet the sounds of american english (visual guide).

Intonation, the rise and fall of pitch in speech melody, is crucial in American English. It’s intertwined with rhythm, defining the language’s character.

YouTube blocked? Click here to see the video.

Video Text:

Today I’m going to talk about intonation. I’ve touched on this subject in various other videos without ever explicitly defining it. And today, that’s what we’re going to do. But I’m also going to reference these other videos, and I really encourage you to go watch those as well.

If you’ve seen my videos on word stress, then you’ve already heard me talk a little about pitch. Stressed syllables will be higher in pitch, and often a little longer and a little louder than unstressed syllables. And there are certain words that will have a stress within a sentence, content words. And certain words that will generally be unstressed, and those are function words. For information on that, I invite you to watch those videos.

Intonation is the idea that these different pitches across a phrase form a pattern, and that those patterns characterize speech. In American English, statements tend to start higher in pitch and end lower in pitch. You know this if you’ve seen my video questions vs. statements. In that video, we learned that statements, me, go down in pitch. And questions, me?, go up in pitch at the end. So these pitch patterns across a phrase that characterize a language are little melodies. that characterize a language are little melodies. for example, the melodies of Chinese. If you haven’t already seen the blog I did on the podcast Musical Language, I encourage you to take a look at that. It talks about the melody of speech.

Understanding and using correct intonation is a very important part to sounding natural. Even if you’re making the correct sounds of American English, but you’re speaking in the speech patterns, or intonation of another language, it will still sound very foreign.

Intonation can also convey meaning or an opinion, an attitude. Let’s take for example the statement ‘I’m dropping out of school and the response ‘Are you serious?’ Are you serious? A question going up in pitch conveys, perhaps, an open attitude, concern for the person. Are you serious? But, are you serious? Down in pitch, more what you would expect of a statement, are you serious? The same words, but when it is intoned this way, it is conveying a judgement. Are you serious, a negative one. I don’t agree that you should be dropping out of school. I’m dropping out of school. Are you serious? I’m dropping out of school. Are you serious? With the same words, very different meanings can be conveyed. So intonation is the stress pattern, the pitch pattern, of speech. The melody of speech. If you’ve read my bio on my website, you know melody is something I’m especially keen on, as I studied music through the master’s level. Yes, that was yours truly, thinking a lot about melody. Now, you know that in American English, statements will tend to go down in pitch.

Let’s look at some examples. Here we see two short sentences. Today it’s sunny. I wish I’d been there. And you can see for both of them, that the pitch goes down throughout the sentence. Here we have two longer sentences, and though there is some up and down throughout the sentences, for both sentences, the lowest point is at the end. I’m going to France next month to visit a friend who’s studying there. It’s finally starting to feel like spring in New York.

The software I used to look at the pitch of those sentences is called Pratt, and there’s a link in the footer of my website. So it’s at the very bottom of every page. I hope you’re getting a feel for how important intonation is to sounding natural and native in American English. I hope you’ll listen for this as you listen to native speakers, and that if you haven’t already done so, that you’ll go to my website and do that you’ll go to my website and do So you hear them several times to get the melody That’s it, and thanks so much for using Rachel’s English.

Back to Blog Roll

Enjoying the read? Pass it on!

Related Posts

English conversation practice: dirty to clean, better than ai [hint: yes], anyone can do this | master the american accent, hear the difference immediately..

Get the Top 3 Ways to Master the American Accent course absolutely free. Sign up today to unlock your American voice.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

languages-logo

Article Menu

speech melody

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The melody of speech: what the melodic perception of speech reveals about language performance and musical abilities.

speech melody

1. Introduction

1.1. assessing musical abilities, 1.2. assessing pronunciation skills and the melodic perception of speech, 2. materials and methods, 2.1. participants, 2.2. educational status, 2.3. musical measurements, 2.3.1. musical background, 2.3.2. musical aptitude: advanced measures of music audiation, 2.3.3. singing ability, 2.4. language measurements, 2.4.1. language background, 2.4.2. language performance, 2.4.3. melodic language ratings, 2.5. short-term memory measurement, 2.6. testing procedure, 2.7. statistical analysis and procedure, 3.1. descriptives of the measurements, 3.2. statistical results 1: relationships among the selected variables (correlations and regression models), 3.3. statistical results 2: group differences for high vs. low melodic language perceivers (t-tests for independent samples), 3.4. statistical results 3: interactions between the musical status and the high and low melodic language perceivers on the language performance tasks (two-way anova), 4. discussion, 4.1. correlational analysis and regression: pronunciation, 4.2. melodic perception of languages and performance, 4.3. musical abilities, musical status, and the melodic perception of speech, 5. conclusions, supplementary materials, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Anton, Ronald J. 1990. Combining Singing and Psychology. Hispania 73: 1166. [ Google Scholar ] [ CrossRef ]
  • Baddeley, Alan D. 2003. Working memory. Looking back and looking forward. Nature Reviews Neuroscience 4: 829–39. [ Google Scholar ] [ CrossRef ]
  • Baddeley, Alan David. 2010. Working Memory. Current Biology 20: R136–40. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Baddeley, Alan David, Susan Gathercole, and Costanza Papagno. 1998. The Phonological Loop as a Language Learning Device. Psychological Review 105: 158–73. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Berkowska, Magdalena, and Simone Dalla Bella. 2009. Acquired and Congenital Disorders of Sung Performance: A Review. Advances in Cognitive Psychology 5: 69–83. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Biedroń, Adriana, and Mirosław Pawlak. 2016. New Conceptualizations of Linguistic Giftedness. Language Teaching 49: 151–85. [ Google Scholar ] [ CrossRef ]
  • Chobert, Julie, Clément François, Jean-Luc Velay, and Mireille Besson. 2014. Twelve Months of Active Musical Training in 8- to 10-Year-Old Children Enhances the Preattentive Processing of Syllabic Duration and Voice Onset Time. Cerebral Cortex 24: 956–67. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Christiner, Markus. 2020. Musicality and Second Language Acquisition: Singing and Phonetic Language Aptitude Phonetic Language Aptitude. Doctoral thesis, University of Vienna, Vienna, Austria. [ Google Scholar ]
  • Christiner, Markus, and Susanne M. Reiterer. 2013. Song and Speech: Examining the Link Between Singing Talent and Speech Imitation Ability. Frontiers in Psychology 4: 874. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Christiner, Markus, and Susanne M. Reiterer. 2015. A Mozart Is Not a Pavarotti: Singers Outperform Instrumentalists on Foreign Accent Imitation. Frontiers in Human Neuroscience 9: 482. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Christiner, Markus, and Susanne M. Reiterer. 2018. Early Influence of Musical Abilities and Working Memory on Speech Imitation Abilities: Study with Pre-School Children. Brain Sciences 8: 169. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Christiner, Markus, and Susanne Reiterer. 2019. Music, Song and Speech. In The Internal Context of Bilingual Processing . Edited by John Truscott and Michael Sharwood Smith. Bilingual Processing and Acquisition 8. Amsterdam: John Benjamins, vol. 3, pp. 131–56. [ Google Scholar ]
  • Christiner, Markus, Stefanie Rüdegger, and Susanne M. Reiterer. 2018. Sing Chinese and Tap Tagalog? Predicting Individual Differences in Musical and Phonetic Aptitude Using Language Families Differing by Sound-Typology. International Journal of Multilingualism 15: 455–71. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Conway, Andrew R. A., Michael J. Kane, and Randall W. Engle. 2003. Working memory capacity and its relation to general intelligence. Trends in Cognitive Sciences 7: 547–52. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Conway, Andrew R. A., Nelson Cowan, Michael F. Bunting, David J. Therriault, and Scott R. B. Minkoff. 2002. A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence 30: 163–83. [ Google Scholar ] [ CrossRef ]
  • Coumel, Marion, Markus Christiner, and Susanne M. Reiterer. 2019. Second Language Accent Faking Ability Depends on Musical Abilities, Not on Working Memory. Frontiers in Psychology 10: 257. [ Google Scholar ] [ CrossRef ]
  • Dalla Bella, Simone, and Magdalena Berkowskaa. 2009. Singing Proficiency in the Majority. The Neurosciences and Music III: Disorders and Plasticity 1169: 99. [ Google Scholar ] [ CrossRef ]
  • Dalla Bella, Simone, Jean-François Giguère, and Isabelle Peretz. 2007. Singing Proficiency in the General Population. Journal of the Acoustical Society of America 121: 1182–89. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Dalla Bella, Simone, Jean-François Giguère, and Isabelle Peretz. 2009. Singing in Congenital Amusia. Journal of the Acoustical Society of America 126: 414–24. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Delogu, Franco, Giulia Lampis, and Marta Olivetti Belardinelli. 2006. Music-to-language transfer effect: May melodic ability improve learning of tonal languages by native nontonal speakers? Cognitive Processing 7: 203–7. [ Google Scholar ] [ CrossRef ]
  • Demorest, Steven M., and Peter Q. Pfordresher. 2015. Singing Accuracy Development from K-Adult. Music Perception 32: 293–302. [ Google Scholar ] [ CrossRef ]
  • Demorest, Steven M., Peter Q. Pfordresher, Simone Dalla Bella, Sean Hutchins, Psyche Loui, Joanne Rutkowski, and Graham F. Welch. 2015. Methodological Perspectives on Singing Accuracy. Music Perception 32: 266–71. [ Google Scholar ] [ CrossRef ]
  • Deutsch, Diana, Trevor Henthorn, and Rachael Lapidis. 2011. Illusory transformation from speech to song. The Journal of the Acoustical Society of America 129: 2245–52. [ Google Scholar ] [ CrossRef ]
  • Dörnyei, Zoltán, and Stephen Ryan. 2015. The Psychology of the Language Learner Revisited . Second Language Acquisition Research Series; New York and London: Routledge. [ Google Scholar ]
  • Engle, Randall W., Stephen W. Tuholski, James E. Laughlin, and Andrew R. A. Conway. 1999. Working memory, short-term memory, and general fluid intelligence: A latentvariable approach. Journal of Experimental Psychology 128: 309–31. [ Google Scholar ] [ CrossRef ]
  • Field, Andy. 2009. Discovering Statistics Using IBM SPSS Statistics . Los Angeles: Sage. [ Google Scholar ]
  • Fonseca-Mora, Carmen. 2000. Foreign Language Acquisition and Melody Singing. ELT Journal 54: 146–52. [ Google Scholar ] [ CrossRef ]
  • François, Clément, Julie Chobert, Mireille Besson, and Daniele Schön. 2013. Music Training for the Development of Speech Segmentation. Cerebral Cortex 23: 2038–43. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Giles, Howard, and Andrew Billings. 2014. Assessing Language Attitudes:Speaker Evaluation Studies. In The Handbook of Applied Linguistics . Edited by Alan Davies. Blackwell handbooks in linguistics 17. Malden: Blackwell, pp. 187–209. [ Google Scholar ]
  • Gordon, Edwin. 1989. Advanced Measures of Music Audiation . Chicago: GIA. [ Google Scholar ]
  • Gordon, Reyna L., Daniele Schön, Cyrille Magne, Corine Astésano, and Mireille Besson. 2010. Words and Melody Are Intertwined in Perception of Sung Words: EEG and Behavioral Evidence. PLoS ONE 5: e9889. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Honing, Henkjan. 2011. Musical Cognition: A Science of Listening , 1st ed. Somerset: Taylor and Francis, Available online: https://ebookcentral.proquest.com/lib/gbv/detail.action?docID=4931097 (accessed on 5 April 2021).
  • Hornbach, Christina M., and Cynthia C. Taggart. 2005. The Relationship Between Developmental Tonal Aptitude and Singing Achievement Among Kindergarten, First-, Second-, and Third-Grade Students. Journal of Research in Music Education 53: 322–31. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Jackendoff, Ray, and Fred Lerdahl. 2006. The Capacity for Music: What Is It, and What’s Special About It? Cognition 100: 33–72. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kaushanskaya, Margarita, and Viorica Marian. 2009. The Bilingual Advantage in Novel Word Learning. Psychonomic Bulletin & Review 16: 705–10. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Koelsch, Stefan, Katrin Schulze, Daniela Sammler, Thomas Fritz, Karsten Muller, and Oliver Gruber. 2009. Functional architecture of verbal and tonal working memory: An FMRI study. Human Brain Mapping 30: 859–73. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kragness, Haley E., Swathi Swaminathan, Laura K. Cirelli, and E. Glenn Schellenberg. 2021. Individual differences in musical ability are stable over time in childhood. Developmental Science 24: e13081. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Krumhansl, Carol L., and Frank C. Keil. 1982. Acquisition of the Hierarchy of Tonal Functions in Music. Memory & Cognition 10: 243–51. [ Google Scholar ] [ CrossRef ]
  • Kuhl, Patricia K., Jean E. Andruski, Inna A. Chistovich, Ludmilla A. Chistovich, Elena V. Kozhevnikova, Viktoria L. Ryskina, Elvira I. Stolyarova, Ulla Sundberg, and Francisco Lacerda. 1997. Cross-Language Analysis of Phonetic Units in Language Addressed to Infants. Science 277: 684–86. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Larrouy-Maestri, Pauline, Yohana Lévêque, Daniele Schön, Antoine Giovanni, and Dominique Morsomme. 2013. The Evaluation of Singing Voice Accuracy: A Comparison Between Subjective and Objective Methods. Journal of Voice 27: 259.e1–259.e5. [ Google Scholar ] [ CrossRef ]
  • Law, Lily NC, and Marcel Zentner. 2012. Assessing musical abilities objectively: Construction and validation of the Profile of Music Perception Skills. PLoS ONE 7: e52508. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Loui, Psyche, David Alsop, and Gottfried Schlaug. 2009. Tone Deafness: A New Disconnection Syndrome? Journal of Neuroscience 29: 10215–20. [ Google Scholar ] [ CrossRef ]
  • Ludke, Karen M., Fernanda Ferreira, and Katie Overy. 2014. Singing Can Facilitate Foreign Language Learning. Memory & Cognition 42: 41–52. [ Google Scholar ] [ CrossRef ]
  • Majerus, Steve, Martine Poncelet, Christelle Greffe, and Martial van der Linden. 2006. Relations Between Vocabulary Development and Verbal Short-Term Memory: The Relative Importance of Short-Term Memory for Serial Order and Item Information. Journal of Experimental Child Psychology 93: 95–119. [ Google Scholar ] [ CrossRef ]
  • Margulis, Elizabeth H., Rhimmon Simchy-Gross, and Justin L. Black. 2015. Pronunciation difficulty, temporal regularity, and the speech-to-song illusion. Frontiers in Psychology 6: 48. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Marques, Carlos, Sylvain Moreno, São Luís Castro, and Mireille Besson. 2007. Musicians Detect Pitch Violation in a Foreign Language Better Than Nonmusicians: Behavioral and Electrophysiological Evidence. Journal of Cognitive Neuroscience 19: 1453–63. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • McMullen, Erin, and Jenny R. Saffran. 2004. Music and Language: A Developmental Comparison. Music Perception 21: 289–311. [ Google Scholar ] [ CrossRef ]
  • Meyer, Martin, Kai Alter, Angela D. Friederici, Gabriele Lohmann, and D. Yves von Cramon. 2002. FMRI Reveals Brain Regions Mediating Slow Prosodic Modulations in Spoken Sentences. Human Brain Mapping 17: 73–88. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Milovanov, Riia, and Mari Tervaniemi. 2011. The Interplay Between Musical and Linguistic Aptitudes: A Review. Frontiers in Psychology 2: 321. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Milovanov, Riia, Minna Huotilainen, Paulo A A Esquef, Paavo Alku, Vesa Välimäki, and Mari Tervaniemi. 2009. The role of musical aptitude and language skills in preattentive duration processing in school-aged children. Neuroscience letters 460: 161–65. [ Google Scholar ] [ CrossRef ]
  • Moreno, Sylvain. 2009. Can Music Influence Language and Cognition? Contemporary Music Review 28: 329–45. [ Google Scholar ] [ CrossRef ]
  • Oechslin, Mathias S., Martin Meyer, and Lutz Jäncke. 2010. Absolute Pitch-Functional Evidence of Speech-Relevant Auditory Acuity. Cerebral Cortex 20: 447–55. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Papagno, Costanza, and Giuseppe Vallar. 1995. Verbal Short-Term Memory and Vocabulary Learning in Polyglots. Quarterly Journal of Experimental Psychology Section A 48: 98–107. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pastuszek-Lipinska, Barbara. 2008. Influence of Music Education on Second Language Acquisition. Paper presented at the Acoustics, Paris, France, October 5; pp. 5125–30. [ Google Scholar ]
  • Patel, Aniruddh D. 2003. Language, Music, Syntax and the Brain. Nature Neuroscience 6: 674–81. [ Google Scholar ] [ CrossRef ]
  • Patel, Aniruddh D. 2007. Music, Language, and the Brain . Oxford: Oxford University Press, Available online: http://gbv.eblib.com/patron/FullRecord.aspx?p=415568 (accessed on 12 March 2020).
  • Patel, Aniruddh D. 2011. Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis. Frontiers in Psychology 2 2: 142. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Patel, Aniruddh D., and Joseph R. Daniele. 2003. An Empirical Comparison of Rhythm in Language and Music. Cognition 87: B35–B45. [ Google Scholar ] [ CrossRef ]
  • Perkins, Judy M., Jane A. Baran, and Jack Gandour. 1996. Hemispheric Specialization in Processing Intonation Contours. Aphasiology 10: 343–62. [ Google Scholar ] [ CrossRef ]
  • Perrachione, Tyler K., Evelina G. Fedorenko, Louis Vinke, Edward Gibson, and Laura C. Dilley. 2013. Evidence for shared cognitive processing of pitch in music and language. PLoS ONE 8: e73372. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Pfordresher, Peter Q., and James T. Mantell. 2014. Singing with Yourself: Evidence for an Inverse Modeling Account of Poor-Pitch Singing. Cognitive Psychology 70: 31–57. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Posedel, James, Lisa Emery, Benjamin Souza, and Catherine Fountain. 2012. Pitch perception, working memory, and second-language phonological production. Psychology of Music 40: 508–17. [ Google Scholar ] [ CrossRef ]
  • Purnell-Webb, Patricia, and Craig P. Speelman. 2008. Effects of Music on Memory for Text. Perceptual and Motor Skills 106: 927–57. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rainey, David W., and Janet D. Larsen. 2002. The Effect of Familiar Melodies on Initial Learning and Long-Term Memory for Unconnected Text. Music Perception 20: 173–86. [ Google Scholar ] [ CrossRef ]
  • Rutkowski, Joanne, and Martha Snell Miller. 2002. A Longitudinal Study of Elementary Children’s Acquisition of Their Singing Voices. Update: Applications of Research in Music Education 22: 5–14. [ Google Scholar ] [ CrossRef ]
  • Salame, Pierre, and Alan D. Baddeley. 1989. Effects of background music on phonological short-term memory. Quarterly Journal of Experimental Psychology Section A 41: 107–22. [ Google Scholar ] [ CrossRef ]
  • Salvador, Karen. 2010. How Can Elementary Teachers Measure Singing Voice Achievement? A Critical Review of Assessments, 1994–2009. Update: Applications of Research in Music Education 29: 40–47. [ Google Scholar ] [ CrossRef ]
  • Schmader, Toni, and Michael Johns. 2003. Converging evidence that stereotype threat reduces working memory capacity. Journal of Personality and Social Psychology 85: 440–52. [ Google Scholar ] [ CrossRef ]
  • Schön, Daniele, Cyrille Magne, and Mireille Besson. 2004. The Music of Speech: Music Training Facilitates Pitch Processing in Both Music and Language. Psychophysiology 41: 341–49. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Schulze, Katrin, and Stefan Koelsch. 2012. Working Memory for Speech and Music. Annals of the New York Academy of Sciences 1252: 229–36. [ Google Scholar ] [ CrossRef ]
  • Slevc, L. Robert, and Akira Miyake. 2006. Individual differences in second-language proficiency: Does musical ability matter? Psychological Science 17: 675–81. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sloboda, John. 2008. The ear of the beholder. Nature 454: 32–33. [ Google Scholar ] [ CrossRef ]
  • Swaminathan, Swathi, and E. Glenn Schellenberg. 2020. Musical ability, music training, and language ability in childhood. Journal of Experimental Psychology: Learning, Memory, and Cognition 46: 2340. [ Google Scholar ] [ CrossRef ]
  • Swaminathan, Swathi, E. Glenn Schellenberg, and Safia Khalil. 2017. Revisiting the association between music lessons and intelligence: Training effects or music aptitude? Intelligence 62: 119–24. [ Google Scholar ] [ CrossRef ]
  • Thiessen, Erik D., and Jenny R. Saffran. 2009. How the Melody Facilitates the Message and Vice Versa in Infant Learning and Memory. Annals of the New York Academy of Sciences 1169: 225–33. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tremblay-Champoux, Alexandra, Simone Dalla Bella, Jessica Phillips-Silver, Marie-Andrée Lebrun, and Isabelle Peretz. 2010. Singing Proficiency in Congenital Amusia: Imitation Helps. Cognitive Neuropsychology 27: 463–76. [ Google Scholar ] [ CrossRef ]
  • Turker, Sabrina. 2019. Exploring the Neuroanatomical and Behavioural Correlates of Foreign Language Aptitude. Doctoral thesis, Karl-Franzens-Universität, Graz, Austria. [ Google Scholar ]
  • Turker, Sabrina, Susanne M. Reiterer, Annemarie Seither-Preisler, and Peter Schneider. 2017. “When Music Speaks”: Auditory Cortex Morphology as a Neuroanatomical Marker of Language Aptitude and Musicality. Frontiers in Psychology 8: 2096. Available online: https://pubmed.ncbi.nlm.nih.gov/29250017/ (accessed on 5 April 2020). [ CrossRef ] [ PubMed ] [ Green Version ]
  • van Hell, Janet G., and Andrea Candia Mahn. 1997. Keyword Mnemonics Versus Rote Rehearsal: Learning Concrete and Abstract Foreign Words by Experienced and Inexperienced Learners. Language Learning 47: 507–46. [ Google Scholar ] [ CrossRef ]
  • Wallace, Wanda T. 1994. Memory for Music: Effect of Melody on Recall of Text. Journal of Experimental Psychology: Learning, Memory, and Cognition 20: 1471–85. Available online: https://psycnet.apa.org/record/1995-04389-001 (accessed on 23 April 2020). [ CrossRef ]
  • Wechsler, David. 1939. The Measurement of Adult Intelligence . Baltimore: Williams and Wilkins. [ Google Scholar ] [ CrossRef ]
  • Wen, Zhisheng, and Peter Skehan. 2011. A New Perspective on Foreign Language Aptitude Research: Building and Supporting a Case for “Working Memory as Language Aptitude”. Ilha do Desterro 60: 15–44. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Williamson, Victoria J., Alan D. Baddeley, and Gramham J. Hitch. 2010. Musicians’ and nonmusicians’ short-term memory for verbal and musical sequences: Comparing phonological similarity and pitch proximity. Memory & Cognition 38: 163–75. [ Google Scholar ] [ CrossRef ] [ Green Version ]
AbbreviationMeaning
AMMAAdvanced Measures of Music Audiation
AMMA rhythmRhythmic AMMA score
AMMA tonalTonal AMMA score
ESEducational status
High melodic LPHigh melodic language perceivers
Low melodic LPLow melodic language perceivers
Melodic PMean of the composite score of all five melodic ratings
No of FLNumber of foreign languages spoken
PPerception
PRPronunciation score
PR totalMean composite score of all five language performance measurements
STMShort-term memory
VariablesMean (M)Standard Deviation (SD)
Melodic ratings for Chinese5.852.39
Melodic ratings for Japanese5.832.45
Melodic ratings for Russian5.732.24
Melodic ratings for Tagalog6.911.92
Melodic ratings for Thai4.702.11

5.801.37
Chinese pronunciation (PR)2.370.83
Japanese pronunciation (PR)4.821.38
Russian pronunciation (PR)3.601.35
Tagalog pronunciation (PR)2.361.18
Thai pronunciation (PR)1.640.73

2.960.89
AMMA rhythm28.704.26
AMMA tonal25.865.10
Melodic singing ability5.981.50
Rhythmic singing ability6.771.18
Short-term memory (STM)15.233.84
VariableMelodic PMelodic Singing AbilityRhythmic Singing AbilityAMMA TonalAMMA RhythmSTMESNo. of FL
PR total0.466 **0.512 **0.501 **0.401 **0.324 **0.503 **0.231 *0.503 **
Melodic P 0.1680.1810.2030.225 *0.235 *0.309 **0.304 **
Melodic singing ability 0.964 **0.434 **0.446 **0.244 *0.283 **0.370 **
Rhythmic singing ability 0.419 **0.417 **0.259 *0.254 *0.370 **
AMMA tonal 0.789 **0.1200.1270.208
AMMA rhythm 0.1940.0480.227 *
STM 0.0790.201
ES 0.367 **
PredictorPartial Correlation (pr)p-Value
Step 1: R = 0.52, F(1, 80) = 30.25, p < 0.001
No. of FL (foreign lang.)0.52<0.001
Step 2: R = 0.65, F(1, 79) = 19.73, p < 0.001
No. of FL (foreign lang.)0.49<0.001
STM0.45<0.001
Step 3: R = 0.71, F(1, 78) = 12.41, p < 0.001
No. of FL (foreign lang.)0.46<0.001
STM0.44<0.001
AMMA tonal0.37<0.001
Step 4: R = 0.74, F(1, 77) = 8.79, p = 0.004
No. of FL (foreign lang.)0.41<0.001
STM0.42<0.001
AMMA tonal0.340.002
Melodic P. total 0.320.004
Step 5: R = 0.77, F(1, 76) = 6.9, p = 0.010
No. of FL (foreign lang.)0.33
STM0.40<0.001
AMMA tonal0.240.031
Melodic P. total 0.340.002
Melodic singing ability0.290.010
Dependent variable: pronunciation (PR) total
Variables Low Melodic LP: MeanLow Melodic LP: SEHigh Melodic LP: MeanHigh Melodic LP: SEtdfpr
Chinese PR *2.110.122.620.12−3.0284p < 0.003r = 0.31
Japanese PR *4.300.215.320.18−3.6884p < 0.001r = 0.37
Russian PR3.200.193.970.21−2.7584p < 0.007r = 0.29
Tagalog PR *1.980.152.720.19−3.0284p < 0.003r = 0.31
Thai PR *1.330.091.930.11−4.2184p < 0.001r = 0.42
PR total *2.580.123.330.13−4.2484p < 0.001r = 0.42
Melodic singing ability5.720.246.230.21−1.6084p = 0.11 r = 0.17
Rhythmic singing ability6.570.186.960.18−1.5384p = 0.13r = 0.16
AMMA tonal24.950.7226.730.81−1.6384p = 0.11r = 0.18
AMMA rhythm27.830.6829.520.60−1.8784p = 0.07 r = 0.20
STM14.450.5815.980.58−1.8784p = 0.07 r = 0.20
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Christiner, M.; Gross, C.; Seither-Preisler, A.; Schneider, P. The Melody of Speech: What the Melodic Perception of Speech Reveals about Language Performance and Musical Abilities. Languages 2021 , 6 , 132. https://doi.org/10.3390/languages6030132

Christiner M, Gross C, Seither-Preisler A, Schneider P. The Melody of Speech: What the Melodic Perception of Speech Reveals about Language Performance and Musical Abilities. Languages . 2021; 6(3):132. https://doi.org/10.3390/languages6030132

Christiner, Markus, Christine Gross, Annemarie Seither-Preisler, and Peter Schneider. 2021. "The Melody of Speech: What the Melodic Perception of Speech Reveals about Language Performance and Musical Abilities" Languages 6, no. 3: 132. https://doi.org/10.3390/languages6030132

Article Metrics

Article access statistics, supplementary material.

ZIP-Document (ZIP, 233 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Privacy Policy

speech melody

Janáček and speech melodies

Leoš Janáček based vocal melodies in his operas on the concept of nápěvky mluvy (speech melodies)—patterns of speech intonation as they relate to psychological conditions—rather than on a strictly musical basis. He used such melodic motives, characterizing a specific person in a specific dramatic situation, in both vocal and orchestral parts, enabling him to integrate the two parts into a compact unit for the utmost dramatic effect.

This according to “ Význam nápěvků pro Janáčkovu operní tvorbu ” (The significance of speech melodies in Janáček’s operas) by Milena Černohorská, an essay included in Leoš Janáček a soudobá hudba (Leoš Janáček and contemporary music; Praha: Panton, 1963, pp. 77–80).

Janáček found the source of speech melodies in spoken phrases of people of various social and cultural backgrounds, recorded in real-life situations. During his ethnomusicological research in Moravia and Slovakia in 1920s, Janáček not only recorded songs and music, but also wrote down the melodies of dialogue fragments and of singers’ comments on specific songs.

Recently discovered autographs of Janáček’s fieldwork notes in the collection of the Etnologický ústav AV ČR, pracovišťě Brno with transcriptions of nápěvky mluvy were published in Janáčkovy záznamy hudebního a tanečního folkloru. I: Komentáře (Janáček´s records of traditional music and dances: I. Commentaries) by Jarmila Procházková (Brno: Etnologický ústav AV ČR , 2006).

Today is Janáček’s 160th birthday! Above, examples of nápěvky mluvy that he transcribed in Čičmany , Slovakia, on 20 August 1911; below, the finale of his Jenůfa , a work often cited for its use of the speech-melody concept.

Share this:

Comments Off on Janáček and speech melodies

Filed under Opera , Romantic era

Tagged as Birthdays , Jenůfa , Leoš Janáček , Opera

Comments are closed.

  • Search for:

Subscribe to Bibliolore

Email Address

Top Posts & Pages

  • Preparing for concerts and competitions using RILM Abstracts of Music Literature with Full Text: A performer's perspective
  • Mahler and Beyoncé
  • Public Enemy brings the noise
  • MC5 and the American ruse
  • Moroccan insult contests
  • A famous gondola song
  • Thakur and Mussolini
  • Smithsonian Collections Object: The Sony TPS-L2 “Walkman” Cassette Player, National Museum of American History
  • Hanuman and Suvannamaccha
  • Queering Bruce Springsteen

Cantopop and Speech-Melody Complex

Article sidebar, main article content.

It is generally accepted that speech and melody are distinctive perceptual categories, and that one is able to overcome perceptual ambiguity to categorize acoustic stimuli as either of the two. This article investigates the speech-melody experience of listening to Cantonese popular songs (henceforth Cantopop songs), a relatively uncharted territory in musicological studies. It proposes a speech-melody complex that embraces native Cantonese speakers’ experience of the potentialities of speech and melody before they come into being. Speech-melody complex, I argue, does not stably contain the categories of speech or melody in their full-blown, asserted form, but concerns the ongoingness of the process of categorial molding, which depends on how much contextual information the listeners value in shaping and parsing out the complex. It follows, then, that making a categorial assertion implies a breakthrough of the complex. I then complicate speech-melody complex with the concept of “anamorphosis” borrowed from the visual arts, a concept that calls into question the signification of the perceived object by perspectival distortion. When reconfigured in the sonic dimension, “anamorphic listening,” I suggest, is less about at which point one listens to some “distorted” sonic object but more about one’s processual experience of negotiating the hermeneutic values in their different listening-ases. The listener engages, then, in the process of molding and remolding, creating and negating, the two enigmatic categories, creating new sonic objects along the way. Through my analysis of two Cantopop songs and interviews with native Cantonese speakers, I suggest that Cantopop may invite an anamorphic listening, and that more broadly, it serves as an important, yet thus far under-explored, genre to theorize about the relationships between music and language.

Article Details

Copyright © 2019 by the Society for Music Theory.   All rights reserved.

[1] Copyrights for individual items published in  Music Theory Online  ( MTO ) are held by their authors. Items appearing in  MTO  may be saved and stored in electronic or paper form, and may be shared among individuals for purposes of scholarly research or discussion, but may  not  be republished in any form, electronic or print, without prior, written permission from the author(s), and advance notification of the editors of  MTO.

[2] Any redistributed form of items published in  MTO  must include the following information in a form appropriate to the medium in which the items are to appear:

This item appeared in  Music Theory Online  in [VOLUME #, ISSUE #] on [DAY/MONTH/YEAR]. It was authored by [FULL NAME, EMAIL ADDRESS], with whose written permission it is reprinted here.

[3] Libraries may archive issues of  MTO  in electronic or paper form for public access so long as each issue is stored in its entirety, and no access fee is charged. Exceptions to these requirements must be approved in writing by the editors of  MTO,  who will act in accordance with the decisions of the Society for Music Theory.

This document and all portions thereof are protected by U.S. and international copyright laws. Material contained herein may be copied and/or distributed for research purposes only.

ORIGINAL RESEARCH article

A musical approach to speech melody.

\r\nIvan Chow

  • Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, ON, Canada

We present here a musical approach to speech melody, one that takes advantage of the intervallic precision made possible with musical notation. Current phonetic and phonological approaches to speech melody either assign localized pitch targets that impoverish the acoustic details of the pitch contours and/or merely highlight a few salient points of pitch change, ignoring all the rest of the syllables. We present here an alternative model using musical notation, which has the advantage of representing the pitch of all syllables in a sentence as well as permitting a specification of the intervallic excursions among syllables and the potential for group averaging of pitch use across speakers. We tested the validity of this approach by recording native speakers of Canadian English reading unfamiliar test items aloud, spanning from single words to full sentences containing multiple intonational phrases. The fundamental-frequency trajectories of the recorded items were converted from hertz into semitones, averaged across speakers, and transcribed into musical scores of relative pitch. Doing so allowed us to quantify local and global pitch-changes associated with declarative, imperative, and interrogative sentences, and to explore the melodic dynamics of these sentence types. Our basic observation is that speech is atonal. The use of a musical score ultimately has the potential to combine speech rhythm and melody into a unified representation of speech prosody, an important analytical feature that is not found in any current linguistic approach to prosody.

Introduction

It is common to refer to the pitch properties of speech as “speech melody” in the study of prosody ( Bolinger, 1989 ; Nooteboom, 1997 ; Ladd, 2008 ). However, is this simply a metaphorical allusion to musical melodies, or does speech actually have a similar system of pitch relations as music? If it does not, what is the nature of speech’s melodic system compared to that of music? A first step toward addressing such questions is to look at speech and music using the same analytical tools and to examine speech as a true melodic system comprised of pitches (tones) and intervals. This is the approach that we aim to implement and test in the present study. In fact, it was the approach that was adopted in the first theoretical treatise about English intonation, namely Joshua Steele’s An Essay Toward Establishing the Melody and Measure of Speech to be Expressed and Perpetuated by Peculiar Symbols , published in 1775. Steele laid out a detailed musical model of both the melody and rhythm of speech (we will only concern ourselves with the melodic concepts here). He represented syllabic pitch as a relative-pitch system using a musical staff and a series of “peculiar symbols” that would represent the relative pitch and relative duration of each spoken syllable of an utterance. The key innovation of Steele’s approach from our standpoint is that he attempted to represent the pitches of all of the syllables in the sentences that he analyzed. Another advantage of his approach is that his use of the musical score allowed for both the rhythm and melody of speech to be analyzed, both independently of one another and interactively.

This is in stark contrast to most contemporary approaches to speech melody in linguistics that highlight a subset of salient syllabic pitches and thereby ignore all the rest of the melodic signal in a sentence, assuming a process of interpolation between those salient pitches. Many such approaches are based on qualitative labeling of pitch transitions, rather than acoustic quantification of actual pitch changes occurring in an utterance. At present, no musical elements are incorporated into any of the dominant phonetic or phonological models of speech melody. These models include autosegmental metrical (AM) theory ( Bruce, 1977 ; Pierrehumbert, 1980 ; Beckman and Pierrehumbert, 1986 ; Gussenhoven, 2004 ; Ladd, 2008 ), the command-response (CR) model ( Fujisaki and Hirose, 1984 ; Fujisaki et al., 1998 ; Fujisaki and Gu, 2006 ), and the “parallel encoding and target approximation” model ( Xu, 2005 ; Prom-on et al., 2009 ). Perhaps the closest approximation to a musical representation is Mertens’ (2004) Prosogram software, which automatically transcribes speech melody and rhythm into a series of level and contoured tones (see also Mertens and d’Alessandro, 1995 ; Hermes, 2006 ; Patel, 2008 ). Prosogram displays pitch measurements for each syllable by means of a level, rising, or falling contour, where the length of each contour represents syllabic duration ( Mertens, 2004 ). However, this seems to be mainly a transcription tool, rather than a theoretical model for describing the melodic dynamics of speech.

Prosody vs. Speech Melody vs. Intonation

Before comparing the three dominant models of speech melody with the musical approach that we are proposing (see next section), we would like to first define the important terms “prosody,” “speech melody,” and “intonation,” and discuss how they relate to one another, since these terms are erroneously taken to be synonymous. “Prosody” is an umbrella term that refers to variations in all suprasegmental parameters of speech, including pitch, but also duration and intensity. On the other hand, “speech melody” and “intonation” refer strictly to the pitch changes associated with speech communication, where “intonation” is a more restrictive term than “speech melody”. “Speech melody” refers to the pitch trajectory associated with utterances of any length. This term does not entail a distinction as to whether pitch is generated lexically (tone) or post-lexically (intonation), or whether the trajectory (or a part thereof) serves a linguistic or paralinguistic function.

While “speech melody” refers to all pitch variations associated with speech communication, “intonation” refers specifically to the pitch contour of an utterance generated post-lexically and that is associated with the concept of an “intonational phrase” ( Ladd, 2008 ). Ladd (2008) defines intonation as a linguistic term that involves categorical discrete-to-gradient correlations between pattern and meaning. Intonation differs from pitch changes associated with “tones” or “accents”, which are determined lexically and which are associated with the syllable. By contrast, paralinguistic meanings (e.g., emotions and emphatic force) involve continuous-to-gradient correlations ( Ladd, 2008 ). For example, the angrier someone is, the wider is the pitch range and intensity range of their speech ( Fairbanks and Pronovost, 1939 ; Murray and Arnott, 1993 ).

Contemporary Phonological Models of Speech Melody

In this section, we review three dominant models of speech melody: AM theory, the CR model, and the parallel encoding and target approximation (PENTA) model. Briefly, AM theory only highlights phonologically salient melodic excursions associated with key elements in intonational phrases, including pitch accents and boundary tones ( Pierrehumbert, 1980 ; Liberman and Pierrehumbert, 1984 ). On the other hand, CR imitates speech melody by mathematically generating pitch contours, and connecting pitch targets so as to create peaks and valleys along a gradually declining line ( Cohen et al., 1982 ; Fujisaki, 1983 ). Finally, PENTA assigns a pitch target to each and every syllable of an intonational phrase. Each target is mathematically derived from a number of factors, including lexical stress, narrow focus, modality, and position of the syllable within an intonational phrase. The final pitch contour is then generated as an approximation of the original series of pitch targets, in which distance between pitch targets is reduced due to contextual variations ( Xu, 2005 , 2011 ).

Auto-Segmental Metrical Theory

The ToBI (Tone and Break Index) system of prosodic notation builds on assumptions made by AM theory ( Pierrehumbert, 1980 ; Beckman and Ayers, 1997 ). Phonologically salient prosodic events are marked by pitch accents (represented in ToBI as H ∗ , where H means high) at the beginning and middle of an utterance; the end is marked by a boundary tone (L–L%, where L means low); and the melodic contour of the entire utterance is formed by interpolation between pitch accents and the boundary tone. Under this paradigm, pitch accents serve to mark local prosodic events, including topic word, narrow focus, and lexical stress. Utterance-final boundary tones serve to convey modality (i.e., question vs. statement; continuity vs. finality). Pitch accents and boundary tones are aligned with designated stressed syllables in the utterance and are marked with a high (H) or low (L) level tone. In addition, pitch accents and boundary tones can further combine with a series of H and L tones to convey different modalities, as well as other subtle nuances in information structure ( Hirschberg and Ward, 1995 ; Petrone and Niebuhr, 2014 ; German and D’Imperio, 2016 ; Féry, 2017 ). Consequently, the melodic contour of an utterance is defined by connecting pitch accents and boundary tones, whereas strings of syllables between pitch accents are unspecified with regard to tone and are treated as transitions. AM is considered to be a “compositional” method that looks at prosody as a generative and combinatorial system whose elements consist of the abovementioned tone types. This compositionality might suggest a mechanistic similarity to music, with its combinatorial system of scaled pitches. However, the analogy does not ultimately work, in large part because the tones of ToBI analyses are highly underspecified at the pitch level; the directionality of pitch movement is marked, but not the magnitude of the change.

Command-Response Model

Fujisaki and Hirose (1984) and Fujisaki and Gu (2006) proposed the CR model based on the physiological responses of the human vocal organ. In this model, declination is treated as the backbone of the melodic contour ( Cohen et al., 1982 ; Fujisaki, 1983 ). Declination is a reflection of the physiological conditions of phonation: speech occurs during exhalation. As the volume of air in the lungs decreases, the amount of air passing through the larynx also decreases, as does the driving force for vocalization, namely subglottal pressure. This results in a decrease in the frequency of vocal-fold vibration. CR replicates this frequency change by way of a gradual melodic downtrend as the utterance progresses. In this model, the pitch range of the melodic contour is defined by a topline and a baseline. Both lines decline as the utterance progresses, although the topline declines slightly more rapidly than the baseline, making the overall pitch range gradually narrower (i.e., more compressed) over time. In addition to declination, tone commands introduce localized peaks and valleys along the global downtrend. Although tone commands do not directly specify the target pitch of the local peaks and valleys, they are expressed as mathematical functions that serve to indicate the strength and directionality of these localized pitch excursions. Both AM and CR are similar in that pitch contours are delineated by sparse tonal specifications, and that syllables between tone targets are treated as transitions whose pitches are unspecified. However, the two models differ in that tone commands in the CR model are not necessarily motivated by phonologically salient communicative or linguistic functions. These commands are only used to account for pitch variations in order to replicate the observed pitch contours. This difference thus renders the CR model largely descriptive (phonetic), rather than interpretive (phonological), as compared with AM theory.

Parallel Encoding and Target Approximation Model

PENTA ( Xu, 2005 ; Prom-on et al., 2009 ) takes an articulatory-functional approach to representing speech melody. It aims to explain how speech melody works as a system of communication. Recognizing the fact that different communicative functions are simultaneously conveyed by the articulatory system, PENTA begins with a list of these functions and encodes them in a parallel manner. Each syllable obligatorily carries a tone target. The resulting melodic movement for each syllable is generated as an approximation of a level or dynamic tone-target. The pitch target of each syllable is derived based on its inherent constituent communicative functions that coexist in parallel (e.g., lexical, sentential, and focal). Pitch targets are then implemented in terms of contextual distance, pitch range, strength, and duration. The implementation of each pitch target is said to be approximate, as pitch movements are subject to contextual variations. According to Xu and Xu (2005) , the encoding process can be universal or language-specific. In addition, this process can vary due to interference between multiple communicative functions when it comes to the rendering of the eventual melodic contour. In other words, how well the resulting contour resembles the target depends on factors such as contextual variation (anticipatory or carry-over, assimilatory or dissimilatory) and articulatory effort. PENTA is similar to the CR model in that the fundamental frequency ( F 0 ) trajectory of an utterance is plotted as “targets” based on a number of parameters. Such parameters include directionality of the pitch changes, slope of the pitch target, and the speed at which a pitch target is approached. Nonetheless, PENTA sets itself apart from CR and AM in that it establishes a tone target for every syllable, whereas CR and AM only assign pitch accents/targets to syllables associated with localized phonologically-salient events (e.g., pitch accents, boundary tones).

Perhaps the only contemporary system that combines rhythm and melody in the same analysis is Rhythm and Pitch, or RaP ( Dilley and Brown, 2005 ; Breen et al., 2012 ). While based largely on AM’s system of H’s and L’s to represent tones, Breen et al. (2012 , p. 277) claim that RaP differs from ToBI in that it “takes into account developments in phonetics, phonology and speech technology since the development of the original ToBI system.” Instead of using numbers to represent relative boundary strength on the “Breaks” tier in ToBI, RaP uses “X” and “x” to mark three degrees of prominence (strong beat, weak beat, and no beat), as well as “))” and “)” to mark two degrees of boundary strength. On the “rhythm” tier, strong beats are assigned to lexically stressed syllables based on metrical phonology ( Nespor and Vogel, 1986 ; Nespor and Vogel, 2007 ). In addition, the assignment of prominence follows the “obligatory contour principle” ( Leben, 1973 ; Yip, 1988 ) by imposing that prominent syllables must be separated from one another by at least one non-prominent syllable, as well as by differences in the phonetic realization of content vs. function words. Although RaP sets itself apart from other systems by acknowledging the existence of rhythm and beats (i.e., pockets of isochronous units in lengthy syllable strings) as perceptual parameters, it still treats rhythm and pitch as completely separate, rather than integrated, parameters, and makes no provision to analyze or account for potential interactions between syllabic duration and pitch.

Toward a Musical Approach

While all of the linguistic models discussed here claim to represent speech prosody, the fact that speech rhythm is integral to speech prosody and that rhythm and melody interact is largely ignored. As such, these models are only successful at representing some aspects of speech prosody, but present limitations at capturing the larger picture. The use of musical notation to represent speech prosody offers several advantages over AM theory and PENTA. First, the use of the semitone-based chromatic scale provides for a more precise characterization of speech melody, compared to the impoverished system of acoustically unspecified H and L tones found in ToBI transcriptions. As pointed out by Xu and Xu (2005) , AM Theory is strictly a linear model in that the marking of one tone as H or L essentially depends on the pitch of its adjacent syllables (tones). It is hence impossible to analyze speech melody beyond the scope of three syllables under the AM paradigm. In addition, the use of semitones might in fact provide a superior approach to describing speech melody than plotting pitch movements in hertz, since semitones correspond to the logarithmic manner by which pitches (and by extension intervals) are perceived by the human ear, although the auditory system clearly has a much finer pitch-discrimination accuracy than the semitone ( Oxenham, 2013 ). In addition, musical notation can simultaneously represent both the rhythm and melody of speech using a common set of symbols, which is a feature that no current linguistic model of speech prosody can aspire to. As such, the use of musical notation not only provides a new and improved paradigm for model speech melody in terms of intervals, but it also provides a more precise and user-friendly approach that can be readily integrated into current prosody research to further our understanding of the correspondence between prosodic patterns and their communicative functions. Speech melody denoted by musical scores can be readily learned and replicated by anyone trained in reading such scores. As a result, transcribing speech prosody with musical notation could ultimately serve as an effective teaching tool for learning the intonation of a foreign language.

Finally, with regard to the dichotomy in linguistics between “phonetics” and “phonology” ( Pierrehumbert, 1999 ), we believe that the use of musical notation to represent speech melody should be first and foremost tested as a phonetic system guided by the amount of acoustic detail present in the observed melodic contours. These details presumably serve to express both linguistic and paralinguistic functions. To further understand the communicative functions of speech melody, the correspondence between specific prosodic patterns and their meaning would then fall under the category of phonological research, using the musical approach as a research tool. For example, the British school of prosodic phonology has historically taken a compositional approach to representing speech melody and its meaning, where melody is comprised of tone-units. Each tone-unit contains one of six possible tones ( Halliday, 1967 , 1970 ; O’Connor and Arnold, 1973 ; among others) – high-level, low-level, rise, fall, rise-fall and fall-rise – each of which conveys a specific type of pragmatic information. For example, the fall-rise often suggests uncertainty or hesitation, whereas the rise-fall often indicates that the speaker is surprised or impressed. The length of a tone-unit spans from a single word to a complete sentence. The “tonic syllable” is the essential part of the tone-unit that carries one of the six abovementioned tones. Stressed syllables preceding the tonic are referred to as “heads”; unstressed syllables preceding the head are referred to as “pre-heads.” Finally, unstressed syllables following the tonic are referred to as the “tail.”

The principal aim of the current study is to examine the utility of using a musical approach to speech melody and to visualize the results quantitatively as plots of relative pitch using musical notation. In this vocal-production study, we had 19 native speakers of Canadian English read aloud a series of 19 test items, spanning from single words to full sentences containing multiple intonational phrases. These sentences were designed to examine declination, modality, narrow focus, and utterance-final boundary tones. We decided to analyze these particular features because their correspondence to linguistic meaning is relatively well-defined and because their implementation is context-independent. In other words, melodic patterns associated with the test sentences remain stable when placed within various hypothetical social contexts ( Grice and Baumann, 2007 ; Prieto, 2015 ). We transcribed participants’ melodic contours into relative-pitch representations down to the level of the semitone using musical notation. The aim was to provide a detailed quantitative analysis of the relative-pitch properties of the test items, demonstrate mechanistic features of sentence melody (such as declination, pitch accents, and boundary effects), and highlight the utility of the method for averaging productions across multiple speakers and visualizing the results on a musical staff. In doing so, this analysis would help revive the long-forgotten work of Steele (1775) and his integrative method of representing both speech rhythm and melody using a common system of musical notation. A companion musical model of speech rhythm using musical notation is presented elsewhere ( Brown et al., 2017 ).

Materials and Methods

Participants.

Nineteen participants (16 females, mean age 19.8) were recruited from the introductory psychology mass-testing pool at McMaster University. Eighteen of them were paid a nominal sum for their participation, while one was given course credit. All of them were native speakers of Canadian English. Two thirds of the participants had school training or family experience in a second language. Participants gave written informed consent for taking part of the study, which was approved by the McMaster Research Ethics Board.

Test Corpus

Participants were asked to read a test corpus of 19 test items ranging from single words to various types of sentences, as shown in Table 1 . This corpus included declarative sentences, interrogatives, an imperative, and sentences with narrow focus specified at different locations. The purpose of using this broad spectrum of sentences was to analyze different prosodic patterns in order to construct a new model of speech melody based on a syllable-by-syllable analysis of pitch.

www.frontiersin.org

TABLE 1. Sentences in the test corpus.

In addition to examining the melody of full sentences, we used a building-block approach that we call a “concatenation” technique in order to observe the differences in the pitch contours of utterances between (1) citation form (i.e., a single word all on its own), (2) phrase form, and (3) a full sentence, which correspond, respectively, to the levels of prosodic word, intermediate phrase, and intonational phrase in the standard phonological hierarchy ( Nespor and Vogel, 1986 ). For example, the use of the concatenation technique resulted in the generation of corpus items that spanned from the single words “Yellow” and “Telephone,” to the adjectival phrase “The yellow telephone,” to the complete sentences “The yellow telephone rang” and “The yellow telephone rang frequently.” This allowed us to compare the tune of “yellow” in citation form to that in phrases and sentences. Gradually increasing the length of the sentences allowed us to observe the corresponding pitch changes for all the words in the sentences.

Before the experiment began, participants filled out questionnaires. They were then brought into a sound-attenuated booth and seated in front of a computer screen. Test sentences were displayed using Presentation ® software (Neurobehavioral Systems, Albany, CA, United States). All vocal recordings were made using a Sennheiser tabletop microphone, and recorded at a 44.1 kHz sampling rate as 16 bit depth WAV files on Presentation’s internal recording system. Before the test sentences were read, warm-up tasks were performed in order to assess the participant’s vocal range and habitual vocal pitch. This included throat clears, coughs, sweeps to the highest and lowest pitches, and the reading of the standard “Grandfather” passage.

Participants were next shown the items of the test corpus on a computer screen and were asked to read them aloud in an emotionally neutral manner as if they were engaging in a casual conversation. The 19 items were presented in a different random order for each participant. Each item was displayed on the screen for a practice period of 10 s during which the participant could practice saying it out loud. After this, a 10 s recording period began as the participant was asked to produce the utterance fluently twice without error. The second one was analyzed. In the event of a speech error, participants were instructed to simply repeat the item. For words that were placed under narrow focus, the stressed word or syllable was written in capital letters (e.g., “My ROOMmate had three telephones”).

In order to transcribe the pitch contour of the recorded speech, we analyzed the F 0 trajectory of the digitized speech signal using Praat ( Boersma and Weenink, 2015 ), an open-source program for the acoustic analysis of speech. Steady-state parts of the voiced portion of each syllable were manually delineated – including the vowel and preceding voiced consonants – and the average pitch (in Hz) was extracted. This was done manually for all 2,337 syllables (123 syllables × 19 participants) in the dataset. In a number of situations, the terminal pitch of a test item was spoken in creaky voice such that a reliable pitch measurement was not obtainable for that syllable. When this occurred, it affected either the last syllable of a single word spoken in citation form or the last syllable of the final word of a multi-word utterance. In both cases, it was necessary to discard the entire item from the dataset. While the preceding syllabic pitches could be estimated with accuracy, the absence of the last syllable would mean that the last interval in the group analysis would be inaccurate if the other syllables were included. For this reason, the full item was discarded. This affected 13% of the 361 test items (19 items × 19 participants), almost all of them terminal syllables spoken in creaky voice for which no reliable pitch measurement could be obtained.

Pitch-changes (intervals) were converted from Hz into “cents change” using the participant’s habitual pitch as the reference for the conversion, where 100 cents is equal to one equal-tempered semitone in music. Conversion from Hz to semitones allows for a comparison of intervals across gender and age ( Whalen and Levitt, 1995 ), as well as for group averaging of production. In order to get an estimate of a participant’s habitual pitch, we took the mean frequency of the productions of all the items in the test corpus, excluding entire items that were discarded due to creaky voice. Musical intervals were assigned after the group averaging had been completed. Intervals were assigned to the nearest semitone, assuming the 12-tone chromatic scale, where a ±50-cent criterion separated adjacent chromatic pitches. It is important to note that our transcriptions are no more accurate than the semitone level and that we did not attempt to capture microtonality in the speech signal. Hence, it sufficed for us to assign an interval to the closest reasonable semitone. For example, a major second, which is a 200 cent interval, was defined by pitch transitions occurring anywhere in the span from 150 to 249 cents. It is also important to note that “quantization” to the nearest interval was only ever done with the group data, and that all single-subject data were kept in their raw form in cents throughout all analyses. For the full-corpus analysis of intervals that will be presented in Figure 8 , intervals are shown in raw form without any rounding to semitone categories.

As a normalization procedure for the group results, the intervals were averaged across the 19 speakers and then placed onto a treble clef for visualization, with middle G arbitrarily representing the mean habitual pitch of the speakers. Transcriptions were made with Finale PrintMusic 2014.5. Note that this approach presents a picture of the relative pitch – but not the absolute pitch – of the group’s productions, where the absolute pitch was approximately an octave (females) or two (males) lower than what is represented. Virtually all of the single-participant productions fit within the range of a single octave, represented in our transcriptions as a span from middle C to the C one octave above, resulting in roughly equal numbers of semitones in either direction from the G habitual pitch. For the transcriptions presented in Figures 1 – 7 , only sharps are used to indicate non-diatonic pitches in a C major context. In addition, sharps only apply to the measure they are contained in and do not carry over to the next measure of the transcription.

www.frontiersin.org

FIGURE 1. Concatenation of “yellow” and “telephone” to form “The yellow telephone.” The figure shows the average relative-pitch changes associated with the syllables of the test items. For Figures 1 – 7 , the transcriptions are shown using a treble clef, with the habitual pitch arbitrarily assigned to middle G. All intervals are measured in semitones with reference to participants’ habitual pitch, down to the nearest semitone, as based on a 50-cent pitch window around the interval. (A) Citation form of “yellow.” (B) Citation form of “telephone.” (C) Concatenation to form the adjectival phrase “The yellow telephone.” Notice the contour reversal for “llow” (red box) compared to citation form in (A) . The curved line above the staff indicates a melodic arch pattern in Figures 1 – 7 .

www.frontiersin.org

FIGURE 2. Concatenation of “Saturday” and “morning” to form “Saturday morning”. (A) Citation form of “Saturday.” (B) Citation form of “morning.” (C) Concatenation to form the phrase “Saturday morning.” Notice the contour reversal for “tur” and “day” (red box) compared to the citation form in (A) .

www.frontiersin.org

FIGURE 3. Expansion of “Alanna” to “Alanna picked it up.” (A) Citation form of “Alanna.” (B) Expansion to create the sentence “Alanna picked it up.” Notice the contour reversal for “nna” (red box) compared to the citation form in (A) .

www.frontiersin.org

FIGURE 4. Expansion to from longer sentences starting with “The yellow telephone”. As words are added to the end of the sentence, declination is suspended on the last syllable of the previous sentence such that the characteristic drop between the penultimate and final syllables can serve to mark the end of the sentence at the newly added final word. (A) Melodic pattern for “The yellow telephone rang.” (B) Melodic pattern generated by adding the word “rang” to the end of the sentence in (A) . The red box highlights the difference in pitch height between the syllables “le-phone” in (A,B) , demonstrating the suspension of declination occurring on these syllables in (B) . (C) Melodic pattern generated by adding the word “frequently” to the end of the sentence in (B) . The red box around “phone rang” highlights the difference in pitch height between these syllables in (B,C) . The point of suspension of declination has moved from “phone” to “rang” in (C) .

www.frontiersin.org

FIGURE 5. Melodic contours for long sentences consisting of two intonational phrases, characterized by two melodic arches. Both sentences in this figure contain phrases based on items found in Figures 2 – 4 . The sentence in (A) combines sentence B in Figure 4 and sentence B in Figure 3 . The sentence in (B) combines sentence C in Figure 2 and sentence B in Figure 4 . The melodic contour of a long sentence consisting of two intermediate phrases shows two arched patterns, similar to those in the sentences presented in Figure 4 . These sentences provide further evidence of contour reversals, melodic arches, suspensions of declination, and terminal drops. See text for details.

www.frontiersin.org

FIGURE 6. Identical sentences but with narrow focus placed sequentially on different words. All four panels have the same string of words, but with narrow focus placed on either (A) my, (B) roommate, (C) three, or (D) telephone. Pitch rises are observed on the focus word in all instances but the last one. The symbol “>” signifies a point of focus or accent. For ease of presentation, only the stressed syllable of roommate and telephone is shown in block letters in the transcription.

www.frontiersin.org

FIGURE 7. Sentence modality. A comparison is made between (A) an imperative sentence, (B) a yes–no interrogative, and (C) a WH-interrogative. Of interest here is the absence of declination for the imperative statement in (A) , as well as the large terminal rise at the end of the yes–no interrogative in (B) .

All of the transcriptions in Figures 1 – 7 are shown with proposed rhythmic transcriptions in addition to the observed melodic transcriptions. While the purpose of the present study is to quantify speech melody, rather than speech rhythm, the study is in fact a companion to a related one about speech rhythm ( Brown et al., 2017 ). Hence, we took advantage of the insights of that study to present approximate rhythmic transcriptions of the test items in all of the figures. However, it is important to keep in mind that, while the melodic transcriptions represent the actual results of the present study, the rhythmic transcriptions are simply approximations generated by the second author and are in no way meant to represent the mean rhythmic trend of the group’s productions as based on timing measurements (as they do in Brown et al., 2017 ). In other words, the present study was not devoted to integrating our present approach to speech melody with our previous work on speech rhythm, which will be the subject of future analyses.

The results shown here are the mean pitches relative to each participant’s habitual pitch, where the habitual pitch is represented as middle G on the musical staff. While we are not reporting variability values, we did measure the standard deviation (SD) for each syllable. The mean SD across the 123 syllables in the dataset was 132 cents or 1.32 semitones. For all 19 test items, the last syllable always had the largest SD value. When the last syllable of the test items was removed from consideration, the SD decreased to 120 cents or 1.2 semitones.

Phrasal Arches

Figures 1A,B show the citation forms of two individual words having initial stress, namely yellow (a two-syllable trochee) and telephone (a three-syllable dactyl). As expected for words with initial stress, there is a pitch rise on the first syllable ( Lieberman, 1960 ; Cruttenden, 1997 ; van der Hulst, 1999 ), followed by a downtrend of either two semitones (yellow) or three semitones (telephone). Figure 1C shows the union of these two words to form the adjectival phrase “The yellow telephone.” Contrary to a prediction based on a simple concatenation of the citation forms of the two words (i.e., two sequential downtrends), there is instead a contour reversal for yellow such that there is now a one-semitone rise in pitch between the two syllables (red box in Figure 1C ), rather than a two-semitone drop. “Telephone” shows a slight compression of its pitch range compared to citation form, but no contour reversal. The end result of this concatenation to form an adjectival phrase is a melodic arch pattern (shown by the curved line above the staff in Figure 1C ), with the pitch peak occurring, paradoxically, on the unstressed syllable of yellow. The initial and final pitches of the phrase are nearly the same as those of the two words in citation form.

Figures 2A,B show a similar situation, this time with the initial word having three syllables and the second word having two syllables. As in Figure 1 , the citation forms of the words show the expected downtrends, three semitones for Saturday and two semitones for morning. Similar to Figure 1 , the joining of the two words to form a phrase results in a contour change, this time a flattening of the pitches for Saturday (Figure 2C , red box), rather than the three-semitone drop seen in citation form. A similar type of melodic arch is seen here as for “The yellow telephone.” As with that phrase, the initial and final pitches of the phrase are nearly the same as those of the two contributing words in citation form. “Morning” shows a small compression, as was seen for “telephone” in Figure 1C .

Figure 3 presents one more example of the comparison between citation form and phrasal concatenation, this time where the word of interest does not have initial stress: the proper name Alanna (an amphibrach foot). Figure 3A demonstrates that, contrary to expectations, there is not a pitch rise on the second (stressed) syllable of the word, but that the syllable was spoken with the identical pitch as the first syllable. This is followed by a two-semitone downtrend toward the last syllable of the word. Adding words to create the sentence “Alanna picked it up” again produces a contour reversal to create a melodic arch centered on the unstressed terminal syllable of Alanna (Figure 3B , red box).

Sentence Arches

Figure 4 picks up where Figure 1 left off. Figure 4A recopies the melody of the phrase “The yellow telephone” from Figure 1C . The next two items create successively longer sentences by adding words to the end, first adding the word “rang” and then adding the word “frequently” to the latter sentence. Figure 4B shows that the downtrend on “telephone” that occurred when “telephone” was the last word of the utterance is minimized. Instead, there is a suspension of declination by a semitone (with reference to the absolute pitch, even though the interval between “le” and “phone” is the same in relative terms). The downtrend then gets shifted to the last word of the sentence, where a terminal drop of a semitone is seen. Figure 4C shows a similar phenomenon, except that the word “rang” is part of the suspended declination. The downtrend in absolute pitch now occurs on “frequently,” ending the sentence slightly below the version ending in “rang.” Overall, we see a serial process of suspension of declination as the sentence gets lengthened. One observation that can be gleaned from this series of sentences is that the longer the sentence, the lower the terminal pitch, suggesting that longer sentences tend to have a larger pitch range than shorter sentences. This is also shown by the fact that “yellow” attains a higher pitch in this sentence than in the shorter sentences, resulting in an overall range of five semitones, compared to three semitones for “the yellow telephone.” Hence, for longer sentences, expansions occur at both ends of the pitch range, not just at the bottom.

Figure 5 compounds the issue of sentence length by now examining sentences with two distinct intonational phrases, each sentence with a main clause and a subordinate clause. The transcriptions now contain two melodic arches, one for each intonational phrase. For illustrative purposes, the phrases of these sentences were all designed to contain components that are found in Figures 1 – 4 . For the first sentence (Figure 5A ), the same suspension of declination occurs on the word “rang” as was seen in Figure 4C . That this is indeed a suspension process is demonstrated by the fact that the second intonational phrase (the subordinate clause) starts on the last pitch of the first one. The second phrase shows a similar melody to that same sentence in isolation (Figure 3B ), but the overall pattern is shifted about two semitones downward and the pitch range is compressed, reflecting the general process of declination. Finally, as with the previous analyses, contour reversals are seen with both “yellow” and “Alanna” compared to their citation forms to create melodic arches.

A very similar set of melodic mechanisms is seen for the second sentence (Figure 5B ). A suspension of declination occurs on “morning” (compared to its phrasal form in Figure 2C ), and the second intonational phrase starts just below the pitch of “morning.” The phrase “On Saturday morning” shows an increase in pitch height compared to its stand-alone version (Figure 2B ). In the latter, the pitches for Saturday are three unison pitches, whereas in the longer sentence, the pitches for Saturday rise two semitones, essentially creating an expansion of the pitch range for the sentence. This suggests that longer sentences map out larger segments of pitch space than shorter sentences and that speakers are able to plan ahead by creating the necessary pitch range when a long utterance is anticipated. The second phrase, “the yellow telephone rang,” has a similar, though compressed, intervallic structure compared to when it was a stand-alone sentence (Figure 4B ), indicating declination effects. In addition, the phrase occurs lower in the pitch range (1–2 semitones) compared to both the stand-alone version and its occurrence in the first phrase of Figure 5A , as can be seen by the fact that the transition from “phone” to “rang” is G to F# in the first sentence and F to E in the second. Overall, for long sentences consisting of two intonational phrases, the melody of the first phrase seems to be located in a higher pitch range and shows larger pitch excursions compared to the second intonational phrase, which is both lower in range and compressed in interval size. In other words, more melodic movement happens in the first phrase. As was seen for the set of sentences in Figure 4 , expansions in pitch range for longer sentences occur at both ends of the range, not just at the bottom.

Narrow Focus

Figure 6 examines the phenomenon of narrow focus, where a given word in the sentence is accented in order to place emphasis on its information content. Importantly, the same string of words is found in all four sentences in the figure. All that differs is the locus of narrow focus, which was indicated to participants using block letters for the word in the stimulus sentences. Words under focus are well-known to have pitch rises, and this phenomenon is seen in all four sentences, where a pitch rise is clearly visible on the word under focus, and more specifically its stressed syllable in the case of polysyllabic words “roommate” and “telephone.” All sentences showed terminal drops between “le” and “phones,” although this drop was largest in the last sentence, where the pitch rise occurred on “telephone” and thereby led to an unusual maintenance of high pitch at the end of a sentence. Perhaps the major point to be taken from the results in Figure 6 is that each narrow-focus variant of the identical string of words had a different melody. Another interesting effect is the contour inversion for “roommate” that occurs when this word precedes the pitch accent (the 1-semitone rise in Figures 6C,D ), compared to when it follows it (Figure 6A ) or is part of it (Figure 6B ). This suggests that, in the former cases, speakers maintain their pitch in the high range in preparation for an impending pitch accent later in the sentence.

Sentence Modality

Figure 7 looks beyond declaratives to examine both an imperative statement and two types of interrogatives, namely a yes–no and a WH question (where WH stands for question-words like what, where, and who). Figure 7A presents a basic command: “Telephone my house!”. The sentence shows a compressed pitch pattern at a relatively high part of the range, but with a small melodic arch to it, perhaps indicative of the high emotional intensity of an imperative. One noticeable feature here is the loss of the terminal drop that is characteristic of declarative sentences and even citation forms. Instead, pitch is maintained at one general level, making this the most monotonic utterance in the dataset. Perhaps the only surprising result is that the initial stressed syllable “Te” has a slightly lower pitch than the following syllable “le” (79 cents in the raw group data), whereas we might have predicted a slightly higher pitch for the first syllable of a dactyl, as seen in the citation form of “telephone” in Figure 1B . Hence, a small degree of arching is seen with this imperative sentence. This stands in contrast to when the first word of a sentence is under narrow focus, as in Figure 6A (“MY roommate has three telephones”), where that first word clearly shows a pitch rise.

Figures 7B,C present a comparison between the two basic types of questions. The results in Figure 7B conform with the predicted pattern of a yes–no question in English, with its large pitch rise at the end ( Bolinger, 1989 ; Ladd et al., 1999 ; Ladd, 2008 ; Féry, 2017 ). The terminal rise of 4 semitones is one of the largest seen in the dataset. The melodic pattern preceding the terminal rise is nearly flat, hence directing all of the melodic motion to the large rise itself. Two features of this sentence are interesting to note. First, whereas long declarative sentences tend to end about three semitones below the habitual pitch, the yes–no question ended a comparable number of semitones above the habitual pitch. Hence, the combination of a declarative sentence and a yes–no interrogative map out the functional pitch range of emotionally neutral speech, which is approximately eight semitones or the interval of a minor 6th. Second, the melodic pattern for “telephone” during the terminal rise is opposite to that in citation form (Figure 1B ). Next, Figure 7C presents the pattern for the WH question “Whose telephone is that?”. The melody is nearly opposite in form to the yes–no question, showing a declining pattern much closer to a declarative sentence, although it lacks the arches seen in declaratives. In this regard, it is closer to the pattern seen with the imperative, although with a larger pitch range and a declining contour. Potential variability in the intonation of this question is discussed in the “Limitations” section below. Overall, the yes–no question and WH-question show strikingly different melodies, as visualized here with notation.

Interval Use

Figure 8 looks at the occurrence of interval categories across all productions of the 19 test-items by the 19 participants. A total of 1700 intervals was measured after discarding items having creaky voice on the terminal syllable. Among the intervals, 37% were ascending intervals (0 cents is included in this group), while 63% were descending intervals. The mean interval size was -45 cents. Fully 96% of the intervals sit in the range of -400 to +400 cents. In other words, the majority of intervals are between a descending major third and an ascending major third, spanning a range of eight semitones or a minor 6th. The figure shows that speech involves small intervallic movements, predominantly unisons, semitones and whole tones, or microtonal intervals in between them. A look back at the transcriptions shows that speech is quite chromatic (on the assumption that our approximation of intervals to the nearest semitone is valid). It is important to point out that the continuous nature of the distribution of spoken intervals shown in Figure 8 is quite similar to the continuous nature of sung intervals for the singing of “Happy Birthday” found in Pfordresher and Brown (2017) . Hence, spoken intervals appear to be no less discrete than sung intervals.

www.frontiersin.org

FIGURE 8. Frequency distribution of interval use in the test corpus. This figure presents the relative frequency of pitch-intervals across the 19 test items and the 19 participants. The y -axis represents the absolute frequency of each interval from a pool of 1700 intervals. Along the x -axis are the intervals expressed as cents changes, where 100 cents is one equal-tempered semitone. Descending intervals are shown in red on the left side, and ascending intervals are shown in blue on the right side, where the center of the distribution is the unison interval having no pitch change (i.e., two repeated pitches), which was color coded as blue. 96% of the intervals occur in the span of –400 to +400 cents.

Large intervals are rare. They were only seen in situations of narrow focus (Figure 6 ) and the yes–no interrogative (Figure 7B ), both cases of which were ascending intervals. Large descending intervals were quite rare. A look at the ranges of the sentences across the figures shows that the longest sentences had the largest ranges. Expansion of the range occurred at both the high and low ends, rather than simply involving a deeper declination all on its own, suggestive of phonatory planning by speakers. However, even the longest sentences sat comfortably within the span of about a perfect fifth (seven semitones), with roughly equal sub-ranges on either side of the habitual pitch.

It is difficult to address the question of whether there are scales in speech, since even our longest sentences had no more than 15 pitches, and the constituent intonational phrases had only about 8 pitches. If scaling is defined by the recurrence of pitch classes across a melody, then the overall declination pattern that characterizes the melody of speech does not favor the use of scales. If nothing else, there seems to be a coarse type of chromaticism to the pitch pattern of speech, with semitones (or related microtonal variants) being the predominant interval type beyond the unison. Our working hypothesis is that scaling is a domain-specific feature of music, and that speech is basically an atonal phenomenon by comparison, which makes use of a weak type of chromaticism, operating within the compressed pitch range of standard speech production.

We have presented an analysis of speech melody that differs from all contemporary approaches in linguistics but that has similarities to Joshua Steele’s 1775 attempt to capture the melody of speech using symbols similar to musical notation on a musical staff. Compared to other current approaches that merely indicate points of salience or transition in the speech signal, our method permits a quantification of all of the relevant pitch events in a sentence, and does so in a manner that allows for both comparison among speakers and group averaging. This permits a global perspective on speech melody, in addition to simply considering pitch changes between adjacent syllables/tones. We have used this method to analyze a number of key phonetic and phonological phenomena, such as individual words, intonational phrases, narrow focus, and modality. In all cases, the results have provided quantitative insight into these phenomena in a manner that approaches using qualitative graphic markers like H(igh) and L(ow) are unable to.

The general method that we are presenting here consists of three major components: (1) a method for transcribing and thus visualizing speech melody, ultimately uniting melody and rhythm; (2) use of the transcriptions to analyze the structural dynamics of speech melody in terms of intervallic changes and overall pitch movement; and (3) a higher-level interpretation of the pitch dynamics in terms of the phonological meaning of intonation as well as potential comparisons between language and music (e.g., scales, shared prosodic mechanisms). Having used Figures 1 – 7 to demonstrate the visualization capability of musical transcription, we will now proceed to discuss the results in terms of the dynamics of speech melody.

Some Melodic Dynamics of Speech

Figure 9 attempts to summarize the major findings of the study by consolidating the results into a generic model of sentence melody for a long declarative sentence containing two principal intonational phrases (as in Figure 5 ). Before looking at the full sentences in the corpus, we first consider the citation form of the individual polysyllabic words that were analyzed. All of them showed the expected phenomenon of a pitch rise on the stressed syllable. This was seen with the words yellow, telephone, Saturday, and morning in Figures 1 – 4 , but only minimally with Alanna, which showed a pitch drop on the last syllable but not a pitch rise on the stressed syllable.

www.frontiersin.org

FIGURE 9. Generic model of the melody of a long declarative sentence. The left side of the figure shows the approximate pitch range for the emotionally neutral intonation of a standard speaker, with the habitual pitch marked in the center, and the functional pitch range mapped out as pitch space above and below the habitual pitch. See text for details about the mechanisms shown. P4, perfect 4th.

Looking now to the melodic dynamics of phrases and full sentences, we noted a number of reproducible features across the corpus of test items, as summarized graphically in Figure 9 .

(1) The starting pitch of a sentence tended to be mid-register, at or near a person’s habitual vocal pitch (represented in our transcriptions as middle G). An analysis of the pitch-range data revealed that the habitual pitch was, on average, five semitones or a perfect 4th above a person’s lowest pitch.

(2) Sentences demonstrated an overall declination pattern, ending as much as four semitones below the starting pitch, in other words very close to participants’ low pitch. Much previous work has demonstrated declination of this type for English intonation ( Lieberman et al., 1985 ; Ladd et al., 1986 ; Yuan and Liberman, 2014 ). The exception in our dataset was the yes–no interrogative, which instead ended at a comparable number of semitones above the habitual pitch. The combination of a declarative and a yes–no interrogative essentially mapped out the functional pitch range of the speakers’ productions in the dataset.

(3) That pitch range tended to span about 4–5 semitones in either direction from the habitual pitch for the emotionally neutral prosody employed in the study, hence close to an octave range overall.

(4) Longer sentences tended to occupy a larger pitch range than single words or shorter phrases. The expansion occurred at both ends of the pitch range, rather than concentrating all of the expansion as a greater lowering of the final pitch.

(5) Sentences tended to be composed of one or more melodic arches , corresponding more or less to intonational phrases.

(6) Paradoxically, the peak pitch of such arches often corresponded with an unstressed syllable of a polysyllabic word, typically the pitch that followed the stressed syllable.

(7) This was due to the contour reversal that occurred for these words when they formed melodic arches, as compared to the citation form of these same words, which showed the expected pitch rise on the stressed syllable.

(8) The pitch peak of the arch was quantified intervallically as spanning anywhere from 1 to 3 semitones above the starting pitch of the sentence.

(9) However, melodic arches and other types of pitch accents (like narrow focus) underwent both a pitch lowering and compression when they occurred later in the sentence, such as in the second intonational phrase of a multi-phrase sentence. In other words, such stress points showed lower absolute pitches and smaller pitch excursions compared to similar phenomena occurring early in the sentence. Overall, for long sentences consisting of two intonational phrases, the melodic contour of the first phrase tended to be located in a higher part of the pitch range and showed larger pitch excursions compared to the second intonational phrase, which was both lower and compressed.

(10) For sentences with two intonational phrases, there was a suspension of declination at the end of the first phrase, such that it tended to end at or near the habitual pitch. This suggests that speakers were able to plan out long sentences at the physiological level and thereby create a suitable pitch range for the production of the long utterance. It also suggests that the declarative statement is a holistic formula, such that changes in sentence length aim to preserve the overall contour of the formula.

(11) Sentences tended to end with a small terminal drop , on the order of a semitone or two. The exceptions were the imperative, which lacked a terminal drop, and the yes–no interrogative, which instead ended with a large terminal rise .

(12) The terminal pitch tended to be the lowest pitch of a sentence, underlining the general process of declination. Again, the major exception was the yes–no interrogative.

(13) For declarative sentences, there was a general pattern such that large ascending intervals occurred early in the sentence (the primary melodic arch, Figure 9 ), whereas the remainder of the sentence showed a general process of chromatic descent. This conforms with an overarching driving mechanism of declination.

(14) The overall pitch range tended to be larger in longer sentences, and the terminal pitches tended to be lower as well, by comparison to single words or short phrases.

(15) Speech seems to be dominated by the use of small melodic intervals , and hence pitch proximity. Unisons were the predominant interval type, followed by semitones and whole tones, a picture strikingly similar to melodic motion in music ( Vos and Troost, 1989 ; Huron, 2006 ; Patel, 2008 ).

(16) Our data showed no evidence for the use of recurrent scale patterns in speech. Instead, the strong presence of semitones in the pitch distribution suggested that a fair degree of chromaticism occurs in speech. Hence, speech appears to be atonal.

Interpreting the Results in Light of Linguistic Models of Speech Melody

Having summarized the findings of the study according to the musical approach, we would like to consider standard linguistic interpretations of the same phenomena.

When pronounced in isolation, the stressed syllables of polysyllabic words such as “yellow” and “Saturday” were aligned with a high pitch. The melodic contour then dropped two semitones for the second syllable, resembling that of an utterance-final drop. On the other hand, when “yellow” and “Saturday” were followed by additional words to form short phrases, the melodic contour seen in citation form was inverted, resulting in pitch peaks on the unstressed syllables of these words. Figure 10 presents a direct comparison between a ToBI transcription and a musical transcription for the yellow telephone. AM theory postulates that the pitch-drop in the citation forms of “yellow” and “telephone” represents the transition between the pitch accent (on the stressed syllable) and the boundary tone. In “The yellow telephone,” the (1-semitone) rise would be treated as a transition between the first H ∗ pitch accent on “yel-” and the H of the H-L-L% boundary tone. But this rise is never treated as a salient phonological event. This change motivates AM theory to consider the “H ∗ -L-L%” tune as compositional, which can be associated with utterances of different lengths ( Beckman and Pierrehumbert, 1986 ). Nonetheless, it is not clear as to why H ∗ entails a 1-semitone rise, whereas H-L% is manifested by a two semitone drop.

www.frontiersin.org

FIGURE 10. Comparing ToBI transcription and musical transcription. The figure presents a comparison between a ToBI transcription (A) and a musical transcription (B) of “The yellow telephone.” (B) Is a reproduction of Figure 1C .

The observed phrase-level arches – with their contour reversals on polysyllabic words – ultimately lead to the formation of sentence arches in longer sentences. The results of this study indicate that, in general, the melodic contours of utterances consisting of simple subject–verb–object sentences tend to be characterized by a series of melodic arches of successively decreasing pitch height and pitch range. Paradoxically, these arches very often peaked at a non-nuclear syllable, as mentioned above. Additional arches were formed as the sentence was lengthened by the addition of more words or intonational phrases. Moreover, declination was “suspended” when additional syllables were inserted between the pitch accent and the boundary tone (Figure 4 ) or when intonational phrases were added as additional clauses to the sentence (Figure 5 ). To the best of our knowledge, no linguistic theory of speech melody accounts for this suspension. In addition, speakers adjust their pitch range at both ends when producing a long utterance consisting of two intonational phrases. Again, as far as we know, the ability to represent pitch-range changes across a sentence is unique to our model. With both phrases and the boundary tone sharing the same pitch range, the pitch range occupied by each phrase becomes narrower. The first phrase starts at a higher pitch than normal and occupies the higher half of the shared pitch range, while the second phrase begins at a lower pitch and occupies the lower half of the pitch range. At the end of the second phrase, the phrase-final drop is reduced.

To a first approximation, our melodic arches map onto the intonational phrases of phonological theory, suggesting that these arches constitute a key building block of speech melody. For standard declarative sentences, the arches show a progressive lowering in absolute pitch and a narrowing in relative pitch over the course of the sentence, reflecting the global process of declination. Melodic contours of English sentences have been characterized as consisting of different components when comparing the British school with AM theory ( Cruttenden, 1997 ; Gussenhoven, 2004 ; Ladd, 2008 ). Despite this, Collier (1989) and Gussenhoven (1991) described a “hat-shape” melodic pattern for Dutch declarative sentences that might be similar to what we found here for English. Whether we are describing the sentence’s speech melody holistically as an arch or dividing the melodic contour into components, we are essentially describing the same phenomenon.

Comparing the different readings of “My roommate had three telephones” when narrow focus was placed on “my,” “roommate,” “three,” and “telephones” (see Figure 6 ), the results revealed that the stressed syllable of the word under focus was generally marked by a pitch rise of as much as three semitones, except when it occurred on the last word of a sentence, where this pitch jump was absent. Pitch peaks were aligned to the corresponding segmental locations. Both observations are consistent with current research on narrow focus and pitch-segmental alignment in spoken English ( Ladd et al., 1999 ; Atterer and Ladd, 2004 ; Dilley et al., 2005 ; Xu and Xu, 2005 ; Féry, 2017 ). Xu and Xu’s (2005) prosodic study of narrow focus in British English indicated that, when a word is placed under narrow focus, the pre-focus part of the sentence remains unchanged. This effect is observed in the initial part of the sentences in Figures 6C,D , in which the melody associated with “my roommate had” remains unchanged in the pre-focus position. Secondly, Xu and Xu (2005) and Féry (2017) reported that the word under narrow focus is pronounced with a raised pitch and expanded range, whereas the post-focus part of the sentence is pronounced with a lower pitch and more restricted pitch range. These effects were also observed by comparing the sentences Figures 6C,D with those in Figures 6A,B . The latter part of the sentence “had three telephones” was pronounced in a lower and more compressed part of the pitch range when it is in the post-focus position. Overall, the use of a musical approach to describe narrow focus not only allows us to observe previously reported effects on the pre-, in-, and post-focus parts of the sentence, but it provides a means of quantifying these effects in terms of pitch changes.

Research in intonational phonology in English indicates that imperative and declarative sentences, as well as WH-questions, are generally associated with a falling melodic contour, whereas the correspondence between speech melody and yes–no (“polar”) questions is less straightforward. Yes–no questions with syntactic inversion (e.g., “Are you hungry?”) are generally associated with a falling melodic contour, whereas those without inversion (e.g., “You are hungry?”) are associated with a rising contour ( Crystal, 1976 ; Geluykens, 1988 ; Selkirk, 1995 ). In addition, questions typically involve some element of high pitch ( Lindsey, 1985 ; Bolinger, 1989 ), whereas such features are absent in statements. While our results are in line with these observations, the comparison of statement and question contours using melodic notation allows us to pinpoint the exact amplitude of the final rises and falls associated with each type of question. Furthermore, it allows us to represent and quantify the difference in global pitch height associated with questions as opposed to statements. This phonologically salient feature is missing in AM and CR, which only account for localized pitch excursions.

Advantages of a Musical Approach Over Contemporary Linguistic Approaches

The Introduction presented a detailed analysis of the dominant approaches to speech melody in the field of phonology. We would now like to consider the advantages that a musical approach offers over those linguistic approaches.

Use of Acoustic Data

Many analyses of speech melody in the literature are based on qualitative representations that show general trajectories of pitch movement in sentences (e.g., Cruttenden, 1997 ). While useful as heuristics, such representations are inherently limited in scope. Our method is based first and foremost on the acoustic production of sentences by speakers. Hence, it is based on quantitative experimental data, rather than qualitative representations.

Quantification and Specification of the Melodic Intervals and Pitch Ranges in Speech

This is in contrast to the use of qualitative labels like H and L in ToBI transcriptions. The musical approach quantifies and thus characterizes the diversity of manners of melodic movement in speech in order to elucidate the dynamics of speech melody. In ToBI, an H label suggests a relative rise in pitch compared to preceding syllables, but that rise is impossible to quantify with a single symbol. The conversion of pitch changes into musical intervals permits a precise specification of the types of pitch movements that occur in speech. This includes both local effects (e.g., syllabic stress, narrow focus, terminal drop) and global effects (e.g., register use, size of a pitch range, melodic arches, intonational phrases, changes with emotion). Ultimately, this approach can elucidate the melodic dynamics of speech prosody, both affective prosody and linguistic prosody.

Analysis of All Syllables in an Utterance

This is again in contrast to methods like ToBI that only mark salient pitch events and ignore the remainder of the syllables. Hence, the musical method can provide a comprehensive analysis of the pitch properties of spoken sentences, including the melodic phenomena analyzed here, such pitch-range changes, post-focus compression, lexical stress, narrow focus, sentence modality, and the like. This is a feature which the musical model shares with PENTA.

Relative Pitch as a Normalization Procedure for Cross-Speaker Comparison

The use of relative pitch to analyze melodic intervals provides a means of normalizing the acoustic signal and comparing melodic motion across speakers. Hence, normalization can be done across genders (i.e., different registers) and across people having different vocal ranges. In fact, any two individual speakers can be compared using this method. Using relative pitch eliminates many of the problems associated with analyzing speech melody using absolute pitch in Hz. No contemporary approach to speech melody in linguistics provides a reliable method of cross-speaker comparison.

Group Averaging of Production

Along the lines of the last point, converting Hz values into cents or semitones opens the door to group averaging of production. Averaging is much less feasible using Hz due to differences in pitch range, for example between women and men. Group averaging using cents increases the statistical power and generalizability of the experimental data compared to methods that use Hz as their primary measurement.

Characterizing Variability in Production

A transcriptional approach can be used to capture pitch-based variability in production, as might be associated with regional dialects, foreign accents, or even speech pathology (e.g., hotspots for stuttering, Karniol, 1995 ). As we will argue in the Limitations section below, it can also capture the variability in the intonation of a single sentence across speakers, much as our analysis of narrow focus did in Figure 6 , showing that each variant was a accompanied by a distinct melody.

A Unification of Melody and Rhythm

Virtually all approaches to speech prosody look at either melody or rhythm alone. Following on the landmark work of Joshua Steele in 1775, we believe that our use of musical notation provides an opportunity for such a unification. We have developed a musical model of speech rhythm elsewhere ( Brown et al., 2017 ). That model focuses on meters and the relative duration values of syllables within a sentence. We used approximate rhythmic transcriptions in the current article (Figures 1 – 7 ) to demonstrate the potential of employing a combined analysis of melody and rhythm in the study of speech prosody. We hope to do a combined rhythm/melody study as the next phase of our work on the musical analysis of speech.

As mentioned in the Introduction, RaP (Rhythm and Pitch) is perhaps the one linguistic approach that takes into account both speech rhythm and melody, albeit as completely separate parameters ( Dilley and Brown, 2005 ; Breen et al., 2012 ). RaP considers utterances as having “rhythm,” which refers to pockets of isochronous units in lengthy strings of syllables (at least 4–5 syllables, and up to 8–10 syllables). In addition, “strong beats” associate with lexically stressed syllables based on metrical phonology. RaP is the first recent model to make reference to the musical element of “beat” in describing speech rhythm, implying that some isochronous units of rhythm exist at the perceptual level. However, the assignment of rhythm and prominence relies heavily on transcribers’ own perception, rather than on empirical data.

Speech/Music Comparisons

The use of musical notation for speech provides a means of effecting comparative analyses of speech and music. For example, we explored the question of whether speech employs musical scales, and concluded provisionally that it does not. There are many other types of questions about the relationship between speech prosody and music that can be explored using transcription and musical notation. This is important given the strong interest in evolutionary models that relate speech and music ( Brown, 2000 , 2017 ; Mithen, 2005 ; Fitch, 2010 ), as well as cognitive and neuroscientific models that show the use of overlapping resources for both functions ( Juslin and Laukka, 2003 ; Besson et al., 2007 ; Patel, 2008 ; Brandt et al., 2012 ; Bidelman et al., 2013 ; Heffner and Slevc, 2015 ). For example, it would be interesting to apply our analysis method to a tone language and attempt to quantify the production of lexical tones in speech, since lexical tone is thought of as a relative-pitch system comprised of contrastive level tones and/or contour tones.

The Score Allows a Person’s Intonation to Be Producible by Someone Else

The use of a musical score is the only visual method that can allow a person to reproduce the prosody of some other person. Hence, the score can be “sung” much the way that music is. While this is certainly an approximation of the pitch properties of real speech, it is unquestionably a huge improvement over any existing method in linguistics, including Prosogram. A system integrating speech rhythm and melody could enable the development of more-effective pedagogical tools to teach intonation to non-native language learners. Moreover, knowledge gleaned from this research can be applied to improve the quality and naturalness of synthesized speech.

Limitations

In addition to the advantages of the musical approach, there are also a number of limitations of our study and its methods. First, we used a simple corpus with relatively simple sentences. We are currently analyzing a second dataset that contains longer and more complex sentences than the ones used in the present study. These include sentences with internal clauses, for example. Second, our pitch analysis is very approximate and is no more fine-grained than the level of the semitone. All of our analyses rounded the produced intervals to the nearest semitone. If speech uses microtonal intervals and scales, then our method at present is unable to detect them. Likewise, our association of every syllable with a level tone almost certainly downplays the use of contour tones (glides) in speech. Hence, while level tones should be quite analyzable with our method, our approach does not currently address the issue of intra-syllable pitch variability, which would be important for analyzing contour tones in languages like Mandarin and Cantonese. Prosogram permits syllabic pitches to be contoured, rather than level, but our approach currently errs on the side of leveling out syllabic pitches. In principle, contour tones could be represented as melismas in musical notation by representing the two pitches that make up the endpoints of the syllable and using a “portamento” (glide) symbol to suggest the continuity of pitch between those endpoints. A similar approach could even be used to represent non-linguistic affective vocalizations.

The current approach requires that users be familiar with musical notation and the concept of musical intervals. Will this limit the adoptability of the approach? In our opinion, it is not much more difficult to learn how to read musical notation than it is to learn how to read ToBI notation, with its asterisks and percentage signs. In principle, pitch contours should be easily recognizable in musical notation, even for people who cannot read it. Hence, the direction and size of intervals should be easy to detect, since musical notation occurs along a simple vertical grid, and pitch changes are recognizable as vertical movements, much like lines representing F 0 changes. In contrast to this ease of recognition, ToBI notation can be complex. The fact that H ∗ H means a flat tone is completely non-intuitive for people not trained in ToBI. The most “musical” part of musical notation relates to the interval classes themselves. This type of quantification of pitch movement is not codable at all with ToBI and thus represents a novel feature that is contributed by musical notation.

Our sample showed a gender bias in that 16 of the 19 participants were female. The literature suggests that females show greater F 0 variability than males ( Puts et al., 2011 ; Pisanski et al., 2016 ) and that they have a higher incidence of creaky voice ( Yuasa, 2010 ). Creaky voice was, in fact, a problem in our analysis, and this might have been well due to the high proportion of females in our sample. Future studies should aim to have a more balanced gender representation than we were able to achieve in this study.

Finally, while our normalization of the speech signal into semitones provides a strong advantage in that it permits group averaging, such averaging also comes at the cost of downplaying individual-level variability. Perhaps instead of averaging, it would be better to look at the families of melodies for a single sentence that is produced by a group of speakers, and put more focus on the individual-level variability than on group trends. In order to illustrate the multiple ways that a single sentence can be intoned, we revisit the WH-question that was analyzed in Figure 7C : “Whose telephone is that?”. Figure 11 uses rhythmic transcription to demonstrate three different manners of intoning this question, the first of which was used in Figure 7C (for simplicity, a single G pitch is used in all transcriptions). Each variant differs based on where the point of focus is, as shown by the word in block letters in each transcription. We chose the version in Figure 10A for our group analysis in Figure 7C , since the melodic pattern of the group average best fit that pattern, with its high pitch on “whose,” rather than on “telephone” or “is.” Hence, while the examination of group averages might tend to downplay inter-participant variability, the transcriptional approach is able to capture the family of possible variants for a given sentence and use them as candidates for the productions of individuals and groups.

www.frontiersin.org

FIGURE 11. Variability in the intonation of a single sentence. The figure uses rhythmic notation to demonstrate three possible ways of intoning the same string of words, where stress is placed on either whose (A) , telephone (B) , or is (C) . It is expected that different melodic patterns would be associated with each rendition, based on where the point of focus is, which would be an attractor for a pitch rise. The proposed point of narrow focus is represented using bold text in each sentence. The symbol “>” signifies a point of focus or accent. The pattern in (A) was most consistent with the analyzed group-average melody shown in Figure 7C .

The musical method that we are presenting here consists of three major components: (1) a method for transcribing and thus visualizing speech melody, ultimately uniting melody and rhythm into a single system of notation; (2) use of these transcriptions to analyze the structural dynamics of speech melody in terms of intervallic changes and pitch excursions; and (3) a higher-level interpretation of the descriptive pitch dynamics in terms of the phonological meaning of intonation as well as potential comparisons between speech and music (e.g., scales, shared prosodic mechanisms). Application of this approach to our vocal-production experiment with 19 speakers permitted us to carry out a quantitative analysis of speech melody so as to look at how syntax, utterance length, narrow focus, declination, and sentence modality affected the melody of utterances. The dominant linguistic models of speech melody are incapable of accounting for such effects in a quantifiable manner, whereas such melodic changes can be easily analyzed and represented with a musical analysis. This can be done in a comprehensive manner such that all syllables are incorporated into the melodic model of a sentence. Most importantly, the use of a musical score has the potential to combine speech melody and rhythm into a unified representation of speech prosody, much as Joshua Steele envisioned in 1775 with his use of “peculiar symbols” to represent syllabic pitches. Musical notation provides the only available tool capable of bringing about this unification.

Author Contributions

IC and SB analyzed the acoustic data and wrote the manuscript.

This work was funded by a grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada to SB.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank Jordan Milko for assistance with data collection and analysis, and Nathalie Stearns and Jonathan Harley for assistance with data analysis. We thank Peter Pfordresher and Samson Yeung for critical reading of the manuscript.

Atterer, M., and Ladd, D. R. (2004). On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. J. Phon. 32, 177–197. doi: 10.1016/S0095-4470(03)00039-1

CrossRef Full Text | Google Scholar

Beckman, M. E., and Ayers, G. (1997). Guidelines for ToBI Labeling, Version 3. Columbus, OH: Ohio State University.

Google Scholar

Beckman, M. E., and Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology 3, 255–309. doi: 10.1017/S095267570000066X

Besson, M., Schön, D., Moreno, S., Santos, A., and Magne, C. (2007). Influence of musical expertise and musical training on pitch processing in music and language. Restor. Neurol. Neurosci. 25, 399–410.

Bidelman, G. M., Hutka, S., and Moreno, S. (2013). Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: evidence for bidirectionality between the domains of language and music. PLoS One 8:e60676. doi: 10.1371/journal.pone.0060676

PubMed Abstract | CrossRef Full Text | Google Scholar

Boersma, P., and Weenink, D. (2015). Praat: Doing Phonetics By Computer Version 5.4.22. Available at: http://www.praat.org/ [accessed October 8, 2015].

Bolinger, D. (1989). Intonation and its Uses: Melody in Grammar and Discourse. Stanford, CA: Stanford University Press.

Brandt, A., Gebrian, M., and Slevc, L. R. (2012). Music and early language acquisition. Front. Psychol. 3:327. doi: 10.3389/fpsyg.2012.00327

Breen, M., Dilley, L. C., Kraemer, J., and Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguist. Linguist. Theory 8, 277–312. doi: 10.1515/cllt-2012-0011

Brown, S. (2000). “The ‘musilanguage’ model of music evolution,” in The Origins of Music , eds N. L. Wallin, B. Merker and S. Brown (Cambridge, MA: MIT Press), 271–300.

Brown, S. (2017). A joint prosodic origin of language and music. Front. Psychol. 8:1894. doi: 10.3389/fpsyg.2017.01894

Brown, S., Pfordresher, P., and Chow, I. (2017). A musical model of speech rhythm. Psychomusicology 27, 95–112. doi: 10.1037/pmu0000175

Bruce, G. (1977). Swedish Word Accents in Sentence Perspective. Malmö: LiberLäromedel/Gleerup.

Cohen, A., Collier, R., and ‘t Hart, J. (1982). Declination: construct or intrinsic feature of speech pitch? Phonetica 39, 254–273. doi: 10.1159/000261666

Collier, R. (1989). “On the phonology of Dutch intonation,” in Worlds Behind Words: Essays in Honour of Professor FG Droste on the Occasion of his Sixtieth Birthday , eds F. J. Heyvaert and F. Steurs (Louvain: Leuven University Press), 245–258.

Cruttenden, A. (1997). Intonation, 2nd Edn. Cambridge: Cambridge University Press. doi: 10.1017/CBO9781139166973

CrossRef Full Text

Crystal, D. (1976). Prosodic Systems and Intonation in English. Cambridge: CUP Archive.

Dilley, L., and Brown M. (2005). The RaP (Rhythm and Pitch) Labeling System. Cambridge, MA: Massachusetts Institute of Technology.

Dilley, L., Ladd, D. R., and Schepman, A. (2005). Alignment of L and H in bitonal pitch accents: testing two hypotheses. J. Phon. 33, 115–119. doi: 10.1016/j.wocn.2004.02.003

Fairbanks, G., and Pronovost, W. (1939). An experimental study of the pitch characteristics of the voice during the expression of emotion. Commun. Monogr. 6, 87–104 doi: 10.1080/03637753909374863

Féry, C. (2017). Intonation and Prosodic Structure. Cambridge: Cambridge University Press. doi: 10.1017/9781139022064

Fitch, W. T. (2010). Evolution of Language. Cambridge: Cambridge University Press.

Fujisaki, H. (1983). “Dynamic characteristics of voice fundamental frequency in speech and singing,” in The Production of Speech , ed. P. F. MacNeilage (New York, NY: Springer), 39–55.

Fujisaki, H., and Gu, W. (2006). “Phonological representation of tone systems of some tone languages based on the command-response model for F0 contour generation,” in Proceedings of the Tonal Aspects of Languages , Berlin, 59–62.

Fujisaki, H., and Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Jpn. E5, 233–242. doi: 10.1250/ast.5.233

Fujisaki, H., Ohno, S., and Wang, C. (1998). “A command-response model for F0 contour generation in multilingual speech synthesis,” in Proceedings of the Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis , Jenolan Caves, 299–304.

Geluykens, R. (1988). On the myth of rising intonation in polar questions. J. Pragmat. 12, 467–485. doi: 10.1016/0378-2166(88)90006-9

German, J. S., and D’Imperio, M. (2016). The status of the initial rise as a marker of focus in French. Lang. Speech 59, 165–195. doi: 10.1177/0023830915583082

Grice, M., and Baumann, S. (2007). “An introduction to intonation-functions and models,” in Non-Native Prosody , eds J. Trouvain, and U. Gut (Berlin: Mouton de Gruyter), 25–52

Gussenhoven, C. (1991). “Tone segments in the intonation of Dutch,” in The Berkeley Conference on Dutch Linguistics , eds T. F. Shannon, and J.P. Snapper (Lanham, MD: University Press of America), 139–155.

Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511616983

Halliday, M. A. K. (1967). Intonation and Grammar in British English. The Hague: Mouton. doi: 10.1515/9783111357447

Halliday, M. A. K. (1970). A Course in Spoken English: Intonation. London: Oxford University Press

Heffner, C. C., and Slevc, L. R. (2015). Prosodic structure as a parallel to musical structure. Front. Psychol. 6:1962. doi: 10.3389/fpsyg.2015.01962

Hermes, D. J. (2006). Stylization of Pitch Contours. Berlin: Walter de Gruyter. doi: 10.1515/9783110914641.29

Hirschberg, J., and Ward, G. (1995). The interpretation of the high-rise question contour in English. J. Pragmat. 24, 407–412. doi: 10.1016/0378-2166(94)00056-K

Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

Juslin, P. N., and Laukka, P. (2003). Communication of emotions in vocal expression and music performance: different channels, same code? Psychol. Bull. 129, 770–814 doi: 10.1037/0033-2909.129.5.770

Karniol, R. (1995). Stuttering, language, and cognition: a review and a model of stuttering as suprasegmental sentence plan alignment (SPA). Psychon. Bull. 117, 104–124. doi: 10.1037/0033-2909.117.1.104

Ladd, D. R. (2008). Intonational Phonology , 2nd Edn. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511808814

Ladd, D. R., Beckman, M. E., and Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology 3, 255–309. doi: 10.1017/S095267570000066X

Ladd, D. R., Faulkner, D., Faulkner, H., and Schepman, A. (1999). Constant “segmental anchoring” of F0 movements under changes in speech rate. J. Acoust. Soc. Am. 106, 1543–1554. doi: 10.1121/1.427151

Leben, L. (1973). Suprasegmental Phonology. Cambridge, MA: MIT Press.

Liberman, M., and Pierrehumbert, J. (1984). “Intonational invariance under changes in pitch range and length,” in Language Sound Structure , eds M. Aronoff, and R. T. Oehrle (Cambridge, MA: MIT Press), 157-233.

Lieberman, P. (1960). Some acoustic correlates of word stress in American English. J. Acoust. Soc. Am. 32, 451–454. doi: 10.1121/1.1908095

Lieberman, P., Katz, W., Jongman, A., Zimmerman, R., and Miller, M. (1985). Measures of the sentence intonation of read and spontaneous speech in American English. J. Acoust. Soc. Am. 77, 649–657 doi: 10.1121/1.391883

Lindsey, G. A. (1985). Intonation and Interrogation: Tonal Structure and the Expression of a Pragmatic Function in English and Other Languages. Ph.D. thesis, University of California, Los Angeles, CA.

Mertens, P. (2004). Quelques aller-retour entre la prosodie et son traitement automatique. Fr. Mod. 72, 39–57.

Mertens, P., and d’Alessandro, C. (1995). “Pitch contour stylization using a tonal perception model,” in Proceedings of the 13th International Congress of Phonetic Sciences , 4, Stockholm, 228–231.

Mithen, S. J. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body. London: Weidenfeld and Nicolson.

Murray, I. R., and Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93, 1097–1108. doi: 10.1121/1.405558

Nespor, M., and Vogel, I. (2007). Prosodic Phonology , 2nd ed. Berlin: Walter de Gruyter doi: 10.1515/9783110977790

Nespor, M., and Vogel I. (1986). Prosodic Phonology. Dordrecht: Foris

Nooteboom, S. (1997). The prosody of speech: melody and rhythm. Handb. Phon. Sci. 5, 640–673.

O’Connor, J. D., and Arnold G. F. (1973). Intonation of Colloquial English: A Practical Handbook. London: Longman.

Oxenham, A. J. (2013). “The perception of musical tones,” in Psychology of Music , 3rd Edn, ed. D. Deutsch (Amsterdam: Academic Press), 1–33.

Patel, A. D. (2008). Music, Language and the Brain. Oxford: Oxford University Press.

Petrone, C., and Niebuhr, O. (2014). On the intonation of German intonation questions: the role of the prenuclear region. Lang. Speech 57, 108–146. doi: 10.1177/0023830913495651

Pfordresher, P. Q., and Brown, S. (2017). Vocal mistuning reveals the nature of musical scales. J. Cogn. Psychol. 29, 35–52. doi: 10.1080/20445911.2015.1132024

Pierrehumbert, J. (1999). What people know about sounds of language. Stud. Linguist. Sci. 29, 111–120.

Pierrehumbert, J. B. (1980). The Phonology and Phonetics of English Intonation. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.

Pisanski, K., Cartei, V., McGettigan, C., Raine, J., and Reby, D. (2016). Voice modulation: a window into the origins of human vocal control? Trends Cogn. Sci. 20, 304–318. doi: 10.1016/j.tics.2016.01.002

Prieto, P. (2015). Intonational meaning. Wiley Interdiscip. Rev. Cogn. Sci. 6, 371–381. doi: 10.1002/wcs.1352

Prom-on, S., Xu, Y., and Thipakorn, B. (2009). Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424. doi: 10.1121/1.3037222

Puts, D. A., Apicella, C. L., and Cárdenas, R. A. (2011). Masculine voices signal men’s threat potential in forager and industrial societies. Proc. R. Soc. Lond. B Biol. Sci. 279, 601–609. doi: 10.1098/rspb.2011.0829

Selkirk, E. (1995). “Sentence prosody: intonation, stress, and phrasing,” in Handbook of Phonological Theory , ed. J. Goldsmith (Cambridge: Blackwell), 550–569.

Steele, J. (1775). An Essay Towards Establishing the Melody and Measure of Speech to be Expressed and Perpetuated by Certain Symbols. London: Bowyer and Nichols.

van der Hulst, H. (1999). “Word stress,” in Word Prosodic Systems in the Languages of Europe , ed. H. van der Hulst (Berlin: Mouton de Gruyter), 3–115.

Vos, P. G., and Troost, J. M. (1989). Ascending and descending melodic intervals: statistical findings and their perceptual relevance. Music Percept. 6, 383–396. doi: 10.2307/40285439

Whalen, D. H., and Levitt, A. G. (1995). The universality of intrinsic F0 of vowels. J. Phon. 23, 349–366. doi: 10.1016/S0095-4470(95)80165-0

Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Commun. 46, 220–251. doi: 10.1016/j.specom.2005.02.014

Xu, Y. (2011). Speech prosody: a methodological review. J. Speech Sci. 1, 85–115.

Xu, Y., and Xu, C. X. (2005). Phonetic realization of focus in English declarative intonation. J. Phon. 33, 159–197. doi: 10.1016/j.wocn.2004.11.001

Yip, M. (1988). The obligatory contour principle and phonological rules: a loss of identity. Linguist. Inq. 19, 65–100.

Yuan, J., and Liberman, M. (2014). F0 declination in English and Mandarin broadcast news speech. Speech Commun. 65, 67–74. doi: 10.1016/j.specom.2014.06.001

Yuasa, I. P. (2010). Creaky voice: a new feminine voice quality for young urban-oriented upwardly mobile American women? Am. Speech 3, 315–337. doi: 10.1215/00031283-2010-018

Keywords : speech melody, speech prosody, music, phonetics, phonology, language

Citation: Chow I and Brown S (2018) A Musical Approach to Speech Melody. Front. Psychol. 9:247. doi: 10.3389/fpsyg.2018.00247

Received: 03 July 2017; Accepted: 14 February 2018; Published: 05 March 2018.

Reviewed by:

Copyright © 2018 Chow and Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Steven Brown, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Article Categories

Book categories, collections.

  • Academics & The Arts Articles
  • Language & Language Arts Articles
  • Linguistics Articles
  • Phonetics Articles

How Phoneticians Measure Speech Melody

Phonetics for dummies.

Book image

Sign up for the Dummies Beta Program to try Dummies' newest way to learn.

Phoneticians transcribe connected speech by marking how words and syllables go together and by following how the melody of language changes. Many phoneticians indicate pauses with special markings (such as [ǀ] for a short break, and [ǁ] for a longer break). You can show changes in with a drawn line called a pitch plot or by more sophisticated schemes, such as the t ones and break indices (ToBI) system. Here are some important terms to know when considering speech melody:

Compounding: When two words come together to form a new meaning (such as "light" and "house" becoming "lighthouse." In such a case, more stress is given to the first than the second part.

Focus: Also known as emphatic stress . When stress is used to highlight part of a phrase or sentence.

Juncture: How words and syllables are connected in language.

Intonational phrase: Also known as a tonic unit , tonic phrase , or tone group . Pattern of pitch changes that matches up in a meaningful way with a part of a sentence.

Lexical stress: When stress plays a word-specific role in language, such as in English where you can't put stress on the wrong "syll ab le."

Sentence-level intonation: The use of spoken pitch to change the meaning of a sentence or phrase. For example, an English statement usually has falling pitch (high to low), while a yes/no question has a rising pitch (low to high).

Stress: Relative emphasis given to certain syllables. In English, a stressed syllable is louder, longer, and higher in sound.

Syllable: Unit of spoken language consisting of a single uninterrupted sound formed by a vowel, diphthong, or syllabic consonant, with optional sounds before or after it.

Tonic syllable: An important concept for many theories of prosody, the syllable that carries the most pitch changes in an intonational phrase.

ToBi: Tone and break indices. A set of conventions used for working with speech prosody. Although originally designed for English, ToBI is now adapted to work with a few other languages.

About This Article

This article is from the book:.

  • Phonetics For Dummies ,

About the book author:

William F. Katz, PhD, is Professor of Communication Sciences and Disorders in the School of Behavioral and Brain Sciences at the University of Texas at Dallas, where he teaches and directs research in linguistics, speech science, and language disorders. He has pioneered new treatments for speech loss after stroke, and he studies an unusual disorder known as "foreign accent syndrome."

This article can be found in the category:

  • Phonetics ,
  • Phonetics For Dummies Cheat Sheet
  • How Phonetics Can Help You Understand Speech and Language Problems
  • How Consonants Are Formed: The Manner of Articulation
  • What Produces Speech: Your Speech Anatomy
  • How Vowels Are Formed: Some Basic Vowel Types in Phonetics
  • View All Articles From Book

logo

Identify music on-the-go! Download the free SoundHound app.

Discover, search, and play any song

soundhound logo

Tap to identify music or sing/hum

  • Anniston/Gadsden

6 lawyers who judge shopped Alabama gender-affirming care case comply with court order under protest

Six of the 11 attorneys found by a three-judge federal panel to have judge shopped their clients’ challenge to Alabama’s ban on gender-affirming care for minors complied under protest with an order to submit their public comments on the panel’s ruling.

All of the 11 attorneys subject to U.S. District Court Judge Liles Burke’s order to turn over any public statements did so by the 5 p.m. Saturday deadline, but six said they were doing so under protest or objection.

M. Christian King, one of the attorneys for two of the six lawyers who filed objections to Burke’s order -- Melody H. Eagan and Jeffrey P. Doss -- said his clients had objections on 1st Amendment grounds to the order “as their compliance will have a chilling effect on [their] free speech rights.”

King said the order was also irrelevant to Burke presiding over potential punishment for the 11 attorneys, pointing out that his clients’ firm, Lightfoot, Franklin & White, LLC, released a statement to media but his clients did not in their individual capacities.

The attorneys also objected on due process grounds “as production [of the statements] was ordered without notice or an opportunity to respond,” he wrote in a court filing.

A three-judge panel comprised of administrative judges from Alabama’s three federal court districts found the attorneys “intentionally attempted to direct their cases to a judge they considered favorable and, in particular, to avoid Judge Burke.”

Barry Ragsdale, an attorney representing three of the lawyers involved in the judge shopping allegations -- James Esseks, Carl Charles and LaTisha Faulks -- issued similar 1st Amendment-related objections to King along with other grounds for the lawyers’ objection, including issuing a short deadline to produce the comments.

He also said while his clients’ employers remarked on the allegations, his clients “did not issue any public statements.

“This Court’s order provides no explanation as to why this Court has ordered this extraordinary production, the relevancy to these proceedings, the legal justification for it, or whether the demand for this information is related in any way to the potential sanctions that this Court might impose,” he wrote.

Robert Segall, attorney for lawyer Scott McCoy, said his client agreed with all of the other objections filed by the five other attorneys.

The order “seeks documents and information protected by the First Amendment; creates a chilling effect with respect to the future exercise of First Amendment rights; violates the Due Process Clause of the Fourteenth Amendment to the United States Constitution; and orders the production of documents that are not, and should not be, relevant to any matter pending before the Court,” Seagall wrote.

McCoy, an attorney for the Southern Poverty Law Center, did not personally comment on the judge shopping allegations, Segall wrote, but he submitted an SPLC statement on the matter to Burke “for the purpose of avoiding delay.”

Burke set hearings for Thursday and Friday in federal court in Montgomery for the 11 attorneys to explain why they should not be disciplined.

Stories by Howard Koplowitz

  • Biden officials sought to remove gender-affirming care minimum age guideline, Alabama court records show
  • Walker County corrections officer arrested for allegedly smuggling contraband into jail
  • Authorities ID possible vehicle in Dauphin Island Parkway hit-and-run wreck that killed woman on bicycle
  • Emmitt Smith: Reggie Jackson’s stories of racism in Birmingham show ‘why DEI is so vital’
  • Newbern, Alabama’s first Black mayor to take office after nearly 4 years under proposed settlement with white predecessors

If you purchase a product or register for an account through a link on our site, we may receive compensation. By using this site, you consent to our User Agreement and agree that your clicks, interactions, and personal information may be collected, recorded, and/or stored by us and social media and other third-party partners in accordance with our Privacy Policy.

an image, when javascript is unavailable

  • The Bold and the Beautiful
  • Days of Our Lives
  • General Hospital
  • The Young and the Restless
  • Comings and Goings
  • Message Boards

Alerts & Newsletters

By providing your information, you agree to our Terms of Use and our Privacy Policy . We use vendors that may also process your information to help provide our services. This site is protected by reCAPTCHA Enterprise and the Google Privacy Policy and Terms of Service apply.

SheKnows.com Lifestyles

Soaps.com is a part of Penske Media Corporation. © 2024 SheMedia, LLC.

All Rights Reserved

  • Days of our Lives

speech melody

  • Comings & Goings

The Feels! The Feels! Peek Inside the Heartwarming 80th Birthday Party for Bold & Beautiful OG John McCook

Amy Mistretta

Monday, June 24th, 2024

  • Share This Article

eric gallery John McCook of the CBS series THE BOLD AND THE BEAUTIFUL, Weekdays (1:30-2:00 PM, ET; 12:30-1:00 PM, PT) on the CBS Television Network. Photo: Cliff Lipson/CBS ©2018 CBS Broadcasting, Inc. All Rights Reserved

All products and services featured are independently chosen by editors. However, Soaps.com may receive a commission on orders placed through its retail links, and the retailer may receive certain auditable data for accounting purposes.

On June 20, The Bold and the Beautiful ’s John McCook celebrated a big birthday milestone — his 80th — and as a way to show just how much he’s loved, the weekend prior, his family and friends gathered to surprise him with a special party. As the actor’s wife Laurette led him around a corner of their beautiful property, he heard, “Surprise!” from a crowd of people and realized what was going on — and as a reaction, as he smiled with joy, McCook turned around and made a motion to moon the guests.

Related Story

B&B Thomas talks in the Forrester office

As Hope Pleads With Thomas to Build a Future Together, Paris Insists She's Not Worried About Losing Her Fiancé

“A great night celebrating my 80th at our home last weekend with all the people I love,” McCook stated. “Thank you all.”

The CBS soap vet also shared a video from the party, which included a speech from his lovely wife, a music performance by his daughter Molly (ex-Margot; Last Man Standing ) and a flashback to one of McCook’s concerts from back in the day.

john mccook cd

And there were even some familiar faces spotted at the event, including Heather Tom (Katie), Don Diamont (Bill) and his wife Cindy Ambuehl, Scott Clifton (Liam), Romy Park (Poppy), Patrick Duffy (ex-Stephen), Alley Mills (ex-Pam; General Hospital ’s Heather) and The Young and the Restless ’ Peter Bergman (Jack) and his wife Mariellen, Melody Thomas Scott and her husband, B&B producer Edward Scott, as well as Lauralee Bell (Christine) and her husband Scott, plus Ray Wise (ex-Ian) and more…

View this post on Instagram A post shared by John T McCook (@johntmccook)

In response to another video from the party, McCook’s former castmate, Daniel McVicar (ex-Clarke) cheered, “Hi John, Happy Birthday! It is great to see you celebrate like that! Big hugs! Ciao Bello!”

We too send the actor our extended birthday wishes!

Text reads, Subscribe to Soaps' Bold and Beautiful Newsletter. Get breaking news alerts, daily recaps, exclusive interviews and spoilers sent straight to your inbox https://cloud.email.soaps.com/signup/

And hear what McCook had to say about Eric’s many romances by viewing our photo gallery below .

Most Popular

speech melody

General Hospital's Big Jason Problem Looks Like It's About to Be Solved: Here's How

B&B Thomas talks in the Forrester office

Sharon Walks In On a Startling Scene — and Lily Accuses Devon of Sabotage

speech melody

Maybe This Time? Bold & Beautiful Return Is an Invitation to Do 'Write' by a Badly Mishandled Character

Advertisement

B&B mashup eric women quinn stephanie brooke

<p>“Eric and Stephanie fell in love when they were students at Northwestern,” says John McCook of <em>Bold & Beautiful</em>‘s original matriarch and patriarch. “She was a powerhouse, a get-things-done type of person, and together, they created this empire. As partners, they were unbeatable. But she lacked a softness, which time and again he would seek elsewhere.” </p>

Bold beautiful Eric Stephanie hand

<p>“During the first few weeks, the scripts established that there was a certain malaise in Eric and Stephanie’s relationship,” reflects McCook. “I thought malaise was a tropical fever! These were two people who loved one another but weren’t necessarily <em>in</em> love.” That would prove true to the very end of Stephanie’s life. </p>

Bold Eric Brooke lang mccook JG

<p>Brooke Logan may believe herself to be Ridge’s “destiny,” but that didn’t stop her from marrying his father! “In many ways, she was everything that Stephanie wasn’t,” suggests McCook. “She was beautiful and sweet and giving, and she needed him. That was so complimentary to what was missing from his life that she became the love of his life for a while. And clearly, he’s never stopped loving her. It’s just taken on a different form in recent years.” </p>

Bold beautiful eric stephanie rick B&B

<p>“Eric and Stephanie would go their separate ways for various reasons,” says McCook. “Usually because he’d decided to stray again! But they wound up coming back together, because they knew that nothing was more important than the family they had created together.” Of course, that family also included children they’d created with other people, including Eric and Brooke’s son Rick (pictured). </p>

Bold Beautiful taylor eric dance HW

<p>Although they’d shared a pleasant flirtation years earlier, it wasn’t until after Stephanie’s death that Eric and Taylor got truly involved. “There was something lovely about them,” reflects McCook, “but it was clearly never going to work.” Not surprisingly, it was his affection for and defense of her longtime rival Brooke which ultimately doomed this pair. </p>

Bold Beautiful donna eric wedding mccook gareis JP

<p>“She was like a Ferrari with the top down,” laughs McCook of Brooke’s sister, Donna. “She was absolutely the midlife crisis that he needed. Sexy, younger and deliciously fun to be with. She was a poke in Stephanie’s eye in a way that Brooke wasn’t. She would say, ‘You’re just a horny old goat’ and he would reply, ‘Yeah, you got it!'” </p>

Lesley Anne Down, John McCook "The Bold and the Beautiful" Set CBS Television City 10/19/05 ©Brian Lowe/jpistudios.com 310-657-9661 Episode # 4689

<p>“How could he not fall under Jackie’s spell?” muses McCook. “That accent alone would bring stronger men than Eric to their knees! They had an interesting relationship and would have gotten married had it not been for Felicia.” Stricken with cancer, Eric and Stephanie’s dying daughter begged her divorced parents to remarry… and then went into remission, leaving Jackie sitting on the sidelines. </p>

Bold Beautiful Eric Sheila CBS/EC

<p>“I can’t say exactly why Eric was attracted to Sheila,” admits McCook, “except that she was younger and she was a sexual being, and he was drawn to that. I mean, he’s very predictable in that way, I suppose. What bothered me was not so much that he was attracted to her but that everybody else saw that she was bad news, and he ignored that!”</p>

B&B Eric Sally Bold Beautiful JP

<p>“I have a huge blow-up of this photo in my dressing room,” reveals McCook with a smile as he fondly recalls the late Darlene Conley (Sally). “He treated her badly by leading her down the primrose path. It was unfair of him, and it was the most evil thing that Eric actually did in all these years, because he’s not much of a manipulator, but he manipulated her and got away with it for a while.”</p>

Bold and Beautiful Lauren Eric

<p>“It seemed as if in some regards, she might be his equal,” suggests the actor of the romance that bloomed when <em>Young & Restless</em>’ Lauren briefly relocated to Los Angeles. “He thought maybe he would have his relationship with her at work and with Stephanie at home. But Lauren would never have stood for that. She was powerful and strong on her own and would have walked out on that.”</p>

Bold Beautiful eric quinn monte carlo

<p>“From the very beginning of their relationship, Eric has been able to reach through Quinn’s armor and find her vulnerability,” believes McCook. “She doesn’t show that anywhere else. With others, she might pretend to be a vulnerable female at times, but it’s a charade. Eric, however, brings it to the surface in a very real way that no one else does.” </p>

Bold Beautiful McCook Lang Sofer Eric Brooke Quinn

<p>“It’s easy to understand why Quinn might feel insecure about Brooke,” acknowledges McCook. “After all, there’s clearly a deep and abiding love between Brooke and Eric. That, of course, brings out the worst in Quinn. And to be frank, Eric isn’t always the best about putting her first!” </p>

B&B mashup eric quinn stephanie

<p>Who would win a battle between the first Mrs. Eric Forrester and the most recent? “Stephanie would likely come out on top,” suggests McCook. “But Quinn would put up a good fight and inflict a few wounds!” </p>

speech melody

Previous in News

Next in News

speech melody

Lily Allen Reveals Turning Down David Harbour in Bedroom

speech melody

RHOA Star Kenya Moore Out After 10 Seasons Following Suspension

Sheknows family:, stylecaster.

  • Rolling Stone
  • HollywoodLife
  • Robb Report
  • Footwear News
  • Sourcing Journal
  • Fairchild Media

Quantcast

  • Search Please fill out this field.
  • Manage Your Subscription
  • Give a Gift Subscription
  • Newsletters
  • Sweepstakes
  • Celebrity Legal & Lawsuits

Jonathan Majors Addresses Assault Conviction During Awards Acceptance Speech: 'I've Seen Darkness in Myself'

The actor opened up about his December 2023 conviction while accepting the perseverance award at the 2024 Hollywood Unlocked Impact Awards

speech melody

Falen Hardge is a writer-reporter at PEOPLE. She has been writing about entertainment, celebrity relationships and everything in between since 2018.

Arnold Turner/Getty

Jonathan Majors says his faith has been both "tested" and "strengthened" by his assault and harassment convictions.

The actor — who is completing 12 months of a court-ordered domestic violence intervention program as part of his sentencing — reflected on his legal woes while accepting the perseverance award at the 2024 Hollywood Unlocked Impact Awards on Friday, June 21.

Accepting the award onstage at the Beverly Hilton Hotel in Los Angeles, Majors, 34, said, "I reckon folks want to know about this last year. As a Black man in the criminal justice system, I felt anger. I felt sadness, hurt, surprise."

"When they snatched me up out of my apartment in handcuffs, I didn't feel like all that," he said. "I didn't feel like Jonathan Majors. ... I felt like a little scared, weak boy. Despite the support and the evidence that was in my favor. I knew s--- was bad. It was bad because of who I was and what I am."

Later in his speech, Majors acknowledged his "shortcomings," stating, "We live in a world where men, Black men in particular, are propped up as either superheroes or super villains."

"But I've come to realize, me personally, I ain't none of that," he continued. "I'm imperfect. I have shortcomings, I acknowledge them. I love my craft."

The actor also said his "faith has been tested and has been strengthened by this testimony," adding, "I've sat in that pitch black, and what I've learned is that we catch a glimpse of light, you run as hard and as fast as you can towards it."

Majors later thanked his family, his girlfriend Meagan Good and several stars — including Will Smith , Tyler Perry and Whoopi Goldberg — for their support.

Speaking to Good, 42, directly, he said, "I love you with all my strength, with all my heart," adding, "You done carried me so, so, so, so many nights."

Before a final round of thank-yous, Majors wrapped up by saying: "I received this award not just as an acknowledgement that I have persevered, but as a command to be there for others and to help them win and if their trials come."

The fourth annual Hollywood Unlocked Impact Awards were hosted by Tiffany Haddish. Designer Christian Louboutin got the Innovator Award, Fat Joe the Culture Award and Cardi B the Inspiration Award.

When Hollywood Unlocked announced Majors as one of its honorees earlier in June, it said the Perseverance Award is "given to an individual who has shown that no matter what adversity they face, they will continue to aspire to inspire."

It described the actor's legal battle as "setbacks" for Majors, who maintains his innocence and said he is still "determined to earn his respect back from his peers and the industry."

Never miss a story — sign up for  PEOPLE's free daily newsletter  to stay up-to-date on the best of what PEOPLE has to offer​​, from celebrity news to compelling human interest stories.

Back in December, a jury found Majors guilty of one count of misdemeanor third-degree assault and one count of second-degree harassment in connection to a 2023 altercation with then-girlfriend Grace Jabbari.

Jabbari, a 31-year-old British actress and dancer,  testified as part of the trial , though Majors did not take the stand.

Majors, known for movies like  Creed III and  Ant-Man and the Wasp: Quantumania, recently lined up his first movie role since his conviction : a revenge thriller titled  Merciless  that is directed by   Martin Villeneuve, the brother of  Dune  director Denis Villeneuve.

The actor has made several red carpet appearances since he was convicted , including at the Frederick Douglass Awards in New York City on June 6. There, he supported his girlfriend as she was celebrated as an honorary co-chair at the New York Urban League’s annual event.

Related Articles

IMAGES

  1. Frontiers

    speech melody

  2. Frontiers

    speech melody

  3. Frontiers

    speech melody

  4. THEORETICAL PHONETICS separate chapters Lecture 14 Components of

    speech melody

  5. THEORETICAL PHONETICS separate chapters Lecture 14 Components of

    speech melody

  6. 8 Speech melody Synonyms. Similar words for Speech melody

    speech melody

VIDEO

  1. Roblox Melody All Characters Soundtracks

  2. Globalization

  3. SARAH PROSKE: "Auf Erden"/ "On Earth" (organ + electronics)

  4. Реакция Мелоди на 10к сабов на твиче? Vtuber Melody

  5. Melody Sauce 2: Reggaeton & Latin Pop Examples

  6. When Melody got "the talk" from motherboard

COMMENTS

  1. A Musical Approach to Speech Melody

    Prosody vs. Speech Melody vs. Intonation. Before comparing the three dominant models of speech melody with the musical approach that we are proposing (see next section), we would like to first define the important terms "prosody," "speech melody," and "intonation," and discuss how they relate to one another, since these terms are erroneously taken to be synonymous.

  2. PDF THE PROSODY OF SPEECH: MELODY AND RHYTHM

    THE PROSODY OF SPEECH: MELODY AND RHYTHM Sieb Nooteboom Research Institute for Language and Speech Utrecht University Trans 10 3512 JK UTRECHT Netherlands 1. INTRODUCTION The word 'prosody' comes from ancient Greek, where it was used for a "song sung with instrumental music". In later times the word was used for the "science of

  3. PDF Speech melody as articulatorily implemented communicative functions

    speech melody. Despite extensive research over the decades, many issues about speech melody are still far from clear. In this paper, I will argue that a better understanding of speech melody may be achieved by jointly consider two basic facts: that speech conveys communicative informa-tion, and that it is produced by an articulatory system.

  4. Speech melody Definition & Meaning

    speech melody: [noun] the intonation of connected speech : the continual rise and fall in pitch of the voice in speech.

  5. Speech melody as articulatorily implemented ...

    Speech is produced by a biomechanical system consisting of articulators and a nervous system that controls them. As in any motor system, its biophysical properties determine its capabilities. In Section 1 will examine some of the properties that have been found critical for our understanding of speech melody. 2.1.

  6. Janáček's Speech-Melody Theory in Concept and Practice

    existence of speech melodies in the composer's operas. However, only John Tyrrell has explored the matter in depth, and many basic questions about Janacek's speech-. melody theory and practice remain unanswered.1 What follows is an attempt to investigate in detail one of the most prominent, and most misrepresented, issues of Janiacek opera ...

  7. Chapter 8

    When talking about pronunciation in language learning we mean the production and perception of the significant sounds of a particular language in order to achieve meaning in contexts of language use. This comprises the production and perception of segmental sounds, of stressed and unstressed syllables, and of the 'speech melody', or intonation.

  8. [PDF] Speech melody as articulatorily implemented communicative

    The PENTA model of speech melody makes a clear separation between the meaning-bearing components of intonation and the primitives ofspeech melody, which are defined purely in form and readily implementable in articulation. Expand. 8. Highly Influential.

  9. Speech Melody

    Variation in speech melody is an essential component of normal human speech. Equipment was available which could produce a voicing buzz but it is only relatively that such devices have been built which allow the user to come close to realistically mimicking the pitch variation of natural speech. Pitch refers to human perception, that is whether ...

  10. (PDF) Prosody: The Rhythms and Melodies of Speech

    The present contribution is a tutorial on selected asp ects of prosody, the rhythms and melodies of speech, based on a course. of the same name at the Summer School on Contemporary Phonetics and ...

  11. A musical approach to speech melody.

    Abstract. We present here a musical approach to speech melody, one that takes advantage of the intervallic precision made possible with musical notation. Current phonetic and phonological approaches to speech melody either assign localized pitch targets that impoverish the acoustic details of the pitch contours and/or merely highlight a few ...

  12. (PDF) The prosody of speech: Melody and rhythm

    Prosody refers to the melodic and rhythmic aspects of speech, including variations in pitch, loudness, duration, and intonation [26]. Jitter and shimmer are acoustic measures that provide ...

  13. Poetic speech melody: A crucial link between music and language

    We certainly acknowledge that melody in music and melody in speech differ in certain aspects. For instance, the pitch range in speech is far narrower than in musical melody ([47,48]). Nevertheless, the proposed measure of pitch- and duration-based autocorrelations appears to be a fruitful measure that at least approximately captures melodic ...

  14. Melody in speech

    Melody in speech. All languages use melody in speech, primarily via rises and falls of the pitch of voice. Such pitch variation is pervasive, offering a wide spectrum of nuance to sentences - an additional layer of meaning. For example, saying "yes" with a rising pitch implies a question (rather than an affirmation). Melody is essential ...

  15. Intonation

    Intonation is the idea that these different pitches across a phrase form a pattern, and that those patterns characterize speech. In American English, statements tend to start higher in pitch and end lower in pitch. You know this if you've seen my video questions vs. statements.In that video, we learned that statements, me, go down in pitch.

  16. Languages

    Research has shown that melody not only plays a crucial role in music but also in language acquisition processes. Evidence has been provided that melody helps in retrieving, remembering, and memorizing new language material, while relatively little is known about whether individuals who perceive speech as more melodic than others also benefit in the acquisition of oral languages.

  17. Janáček and speech melodies

    Leoš Janáček based vocal melodies in his operas on the concept of nápěvky mluvy (speech melodies)—patterns of speech intonation as they relate to psychological conditions—rather than on a strictly musical basis. He used such melodic motives, characterizing a specific person in a specific dramatic situation, in both vocal and orchestral parts, enabling him to integrate the two parts ...

  18. Prosody: The Music of Language and Speech

    Likewise, since the melody of speech and language is at the crossroads of many domains—cognitive motor, emotional motor, attentional motor, to name just a few—paying close scrutiny to the manner in which it is disordered will likely be instrumental in the diagnosis and treatment of speech and language disorders. Frank Boutsen is a former ...

  19. MTO 27.1: Li, Cantopop and Speech-Melody Complex

    Speech-melody complex, I argue, does not stably contain the categories of speech or melody in their full-blown, asserted form, but concerns the ongoingness of the process of categorial molding, which depends on how much contextual information the listeners value in shaping and parsing out the complex. It follows, then, that making a categorial ...

  20. Cantopop and Speech-Melody Complex

    Speech-melody complex, I argue, does not stably contain the categories of speech or melody in their full-blown, asserted form, but concerns the ongoingness of the process of categorial molding, which depends on how much contextual information the listeners value in shaping and parsing out the complex. It follows, then, that making a categorial ...

  21. Frontiers

    A Musical Approach to Speech Melody. Ivan Chow Steven Brown *. Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, ON, Canada. We present here a musical approach to speech melody, one that takes advantage of the intervallic precision made possible with musical notation. Current phonetic and phonological approaches ...

  22. How Phoneticians Measure Speech Melody

    Whether it's to pass that big test, qualify for that big promotion or even master that cooking technique; people who rely on dummies, rely on it to learn the critical skills and relevant information necessary for success. Phoneticians transcribe connected speech by marking how words and syllables go together and by following how the melody of ...

  23. Search for Music Using Your Voice by Singing or Humming, View Music

    midomi.com find and discover music and people. Use your voice to instantly connect to your favorite music, and to a community of people that share your musical interests. Sing your own versions, listen to voices, see pictures, rate singers, send messages, buy music

  24. 6 lawyers who judge shopped Alabama gender-affirming care case comply

    M. Christian King, one of the attorneys for two of the six lawyers who filed objections to Burke's order -- Melody H. Eagan and Jeffrey P. Doss -- said his clients had objections on 1st ...

  25. Bold & Beautiful's John McCook Celebrates His 80th Birthday

    "A great night celebrating my 80th at our home last weekend with all the people I love," McCook stated. "Thank you all." The CBS soap vet also shared a video from the party, which included a speech from his lovely wife, a music performance by his daughter Molly (ex-Margot; Last Man Standing) and a flashback to one of McCook's concerts from back in the day.

  26. In a time of mayhem, music can be the language of statesmen

    The State Department introduced eleven global music ambassadors at a time when we are divided at home and abroad. "We can learn from the world," says Chuck D. Art is an expression of optimism ...

  27. Jonathan Majors Speaks Out After His Assault Conviction at Awards Show

    Jonathan Majors addressed his assault conviction while accepting the perseverance award at the 2024 Hollywood Unlocked Impact Awards on Friday, June 21. In December 2023, a jury found Majors ...