Implications from Music Generation for Music Appreciation
Amy K. Hoover, Paul A. Szerlip, and Kenneth O. Stanley
Department of Electrical Engineering and Computer Science
University of Central Florida
Orlando, FL 32816-2362 USA
{ahoover@eecs.ucf.edu,paul.szerlip@gmail.com,kstanley@eecs.ucf.edu}
Abstract
This position paper argues that fundamental principles that
are exploited to achieve effective music generation can also
shed light on the elusive question of why humans appreciate
music, and which music is easiest to appreciate. In particular,
we highlight the key principle behind an existing approach
to assisted accompaniment generation called functional scaffolding
for musical composition (FSMC). In this approach,
accompaniment is generated as a function of the preexisting
parts. The success of this idea at generating plausible accompaniment
according to studies with human participants suggests
that perceiving a functional relationship among parts in
a composition may be essential to the appreciation of music
in general. This insight is intriguing because it can help to
explain without any appeal to traditional music theory why
humans with no knowledge or training in music can nevertheless
find satisfaction in coherent musical structure.
Introduction
Among the most fundamental questions on the human experience
of music is why we appreciate it so universally
and what makes some pieces more appealing than others
(Hanslick, 1891; Sacks, 2008; Frith, 2004; Gracyk, 1996).
There are many possible approaches to addressing these
questions, from studies of expectation fulfillment (Huron,
2006; Schmuckler, 1989; Pearce and Wiggins, 2012; Abdallah
and Plumbley, 2009) to cultural factors (Balkwill and
Thompson, 1999; Peddie, 2006). Our aim in this paper is to
propose an alternative route to addressing the fundamental
basis for music appreciation, by beginning with an approach
to music generation and from its mechanics drawing implications
for at least one key underlying ingredient in the appreciation
of music. The motivation is that the process of
designing an effective music generator implicitly forces the
designer to confront the basis of music appreciation as well.
After all, a music generator is little use if its products are not
appealing.
Particularly revealing would be a simple principle that almost
always can be applied. The simpler such a principle,
the more plausible that it might explain some aspect of music
appreciation. One approach to assisted music generation
based on such a simple principle is called functional scaffolding
for musical composition (FSMC) (Hoover and Stanley,
2009; Hoover, Szerlip, and Stanley, 2011b,a; Hoover et
al., 2012). Our position is that the principle at the heart of
this approach, initially conceived as a basis for generating
accompaniment, offers a unique hint at the machinery behind
human musical appreciation. In this way, it can contribute
to explaining in part both when and why humans appreciate
music.
Functional Scaffolding for Musical
Composition (FSMC)
The FSMC approach is based on the insight that music is
at heart a pattern of notes played over time with some regularity.
As a result, one way to conceptualize music is as a
function of time. Formally, for any musical voice, the pattern
of pitches and the pattern of durations and rests can be
expressed together as a vector function of time f(t) that outputs
both pitch and rhythm information. In practice, to generate
a sequence of notes, f could be queried at every time t
and the complete output sequence would constitute the pattern.
The parts played by each instrument in an ensemble
piece could also be output simultaneously by such a function.
This perspective is helpful for music generation when
combined with the insight that all the instrumental sequences
(i.e. each track) in a single piece must be somehow
related to each other. For example, in a popular rock piece,
the drum pattern, say d(t), typically establishes the rhythm
for the rest of the piece. Therefore, the bass pattern, b(t),
which helps structure the harmonic form, will by necessity
depend in some way upon the drum pattern. This idea of
relatedness between parts can be expressed more formally
by saying that the bass pattern is a function of the drum pattern,
which can be expressed by a function h that relates b(t)
to d(t): b(t) = h(d(t)). Building on the drum and bass patterns,
vocalists and other instrumental parts can then explore
more complicated melodic patterns that are themselves also
related to the established rhythmic and chord patterns. It follows
then that not only can each of these instrumental parts
be represented as a function of time, but that they are indeed
each functions of each other.
Beyond just observations, these insights imply a practical
opportunity for generating musical accompaniment. By
casting instrumental parts as functions of each other, the
problem of accompaniment is illuminated in a new light:
Proceedings of the Fourth International Conference on Computational Creativity 2013 92
g(t) =
f (t) =
h (f(t))
# 3
4 . .
# . . 3
4
(represented by a CPPN)
Figure 1: Representing and Searching for Accompaniments with FSMC. The function f(t), which is depicted by a piano
keyboard, represents the human composition called the scaffold, from which the computer-generated accompaniments are
created. A possible such accompaniment, g(t), is shown atop and depicted by the image of a computer. Each accompaniment is
internally represented by a helper function, h(f(t)), which is represented by a special type of artificial neural network (ANN)
called a compositional pattern producing network (CPPN). Like ANNs, CPPNs can theoretically approximate any continuous
function. Thus these CPPNs represent h, which transforms the scaffold into an accompaniment.
Given an existing part f(t), the problem of formulating an
appealing accompaniment becomes the problem of searching
for accompaniment g(t) such that g(t) complements
f(t). Yet while applying a search algorithm directly to
finding such a function g(t) would be difficult because the
search space is vast, instead the search can be significantly
constrained by searching for h(f(t)), as depicted in figure
1. The major benefit of this approach is that because h is
a function of the part it will accompany, it cannot help but
follow to some extent its contours. Therefore, the idea for
generating accompaniment in FSMC is to search with the
help of a human user for a function h(f(t)), where f(t) is a
preexisting part or scaffold. By searching for a transforming
function instead of an explicit sequence of notes, the plausibility
of output accompaniments is enhanced. In effect, f(t)
provides the functional scaffolding for the accompaniment.
The idea in FSMC that searching for h(f(t)) can
yield plausible accompaniment to f(t) can be exploited
in practice by programming a search algorithm to explore
possible variations of the function h. In fact,
this approach has been tested extensively in practice
through an implementation called MaestroGenesis (http:
//maestrogenesis.org/), whose results have been
reported in a number of publications (Hoover, Szerlip, and
Stanley, 2011b,a; Hoover et al., 2012). In MaestroGenesis,
the function h is represented by a special kind of arti-
ficial neural network called a compositional pattern producing
network (CPPN). A population of candidate CPPNs is
evolved interactively by allowing a human user to direct the
search algorithm by picking his or her favorite candidate accompaniments
to produce the offspring for the next generation.
Thus the representation of the transforming function is
a CPPN (which is a kind of function approximator) and the
search algorithm is interactive evolution (which is an evolutionary
algorithm guided by a human; Takagi, 2001). A full
technical description is given in Hoover, Szerlip, and Stanley
(2011a), Hoover, Szerlip, and Stanley (2011b), and Hoover
et al. (2012).
Interestingly, listener study results from FSMC-generated
music showed that musical pieces with accompaniments
that were generated purely through functional relationships
were indistinguishable from fully human composed pieces
(Hoover, Szerlip, and Stanley, 2011a). In fact, some
fully human composed pieces were rated more mechanicalsounding
than those that were only partially human composed.
Similarly positive results were also reported in other
studies (Hoover and Stanley, 2009; Hoover, Szerlip, and
Stanley, 2011b,a; Hoover et al., 2012). Although more variation
in the initial human composition (i.e. a polyphonic
versus monophonic scaffold) provides more richness from
which to work, as Hoover et al. (2012) show, plausible accompaniments
can nevertheless be generated from as little
as a single monophonic starting melody. Furthermore, often
in MaestroGenesis even the first generation of candidate
accompaniments, which are randomly-generated CPPNs,
sounds plausible because the functional relationship ensures
at least some relationship between the scaffold and its accompaniment
(Hoover and Stanley, 2009).
From Music Generation to Music
Appreciation
These results are of course relevant to progress in music generation,
but they hint at a deeper implication. In particular,
it is notable that MaestroGenesis (and FSMC behind it)
has almost no musical knowledge programmed into it. In
fact, the only real musical rule in the program is that CPPN
outputs are forced to be interpreted as notes within the key
of the scaffold track. Aside from that, MaestroGenesis has
no knowledge of chords, rhythm, progression, melody, harProceedings
of the Fourth International Conference on Computational Creativity 2013 93
mony, dissonance, style, genre, or anything else that a typical
music generator might have (Simon, Morris, and Basu,
2008; Chuan, 2009; Ebcioglu, 1990). It thus relies almost
entirely on the functional relationship between the scaffold
and the accompaniment to achieve plausibility. In effect, the
functional relationship causes the accompaniment to inherit
the gross structure of the scaffold, thereby endowing it with
many of the same aesthetic properties. Thus the key observation
behind this position paper is that establishing such
a functional relationship between different parts of a song
seems to be sufficient on its own to achieve plausible musical
structure.
This observation is intriguing because it implies a hypothesis
about the nature of musical appreciation: If a functional
relationship alone is sufficient to achieve musical plausibility
in the experience of human listeners, then perhaps musical
appreciation itself is at least in part the result of perceiving
a functional relationship between different parts of
a composition. That is, functional relationships, which are
mathematical properties of patterns that do not require any
specific musical knowledge to perceive, could explain why
listeners without any musical training or expertise nevertheless
experience and appreciate music and separate it firmly
from cacophony. In effect the human is appreciating the
functional relationship that binds different parts of a composition
together.
If true, this hypothesis can explain to some extent when
humans will or will not appreciate a composition. For example,
the harder it is to perceive how one part is functionally
related to another, the less pleasing that piece may be.
Such functional relationships are potentially perceived not
only between different instrumental parts or tracks, but also
within a single instrumental part played over time. That is, if
a functional relationship can be perceived between an earlier
sequence of notes and a later one, then the entire sequence
may succeed as musically plausible or even appealing.
At the same time, it may also explain why some compositions
are more difficult to enjoy. For example, some research
in computer music explores the sonification of non-auditory
data (Cope, 2005; Park et al., 2010; Vickers, 2005). Typically,
the user inputs semi-random data to a computer model
(e.g. cellular automata, swarms, etc.) that outputs music.
While the output is a function of the input, because the initial
seed does not stem from inherently musical events, the
outputs are often difficult for non-creators to immediately
understand. However, as these systems develop, composers
begin to build musical frames for anticipation and expectation.
While there can certainly be beauty in such pieces, the
audience needs some familiarity, like the composer, with the
style to begin to perceive the important relationships.
Though perhaps not explicitly, composers have long intuited
the importance of incorporating functional relationships
into compositions. Not only are musical lines regularly
translated, inverted, and reflected, but some logarithmic
and modular transformations (and set theory concepts) predate
the mathematical formalisms themselves (Risset, 2002;
Harkleroad, 2006). The implicit nature of the composers’
insight is that people appreciate these functional transformations
within a “relatable” musical context.
In fact, much of compositional musical theory was developed
to produce consistent aesthetic results (Payne, 1995;
Christensen, 2002). By following certain heuristics and established
patterns, composers ensure that pieces fit within
particular styles and genres. Such heuristics generally encompass
narrow sets of phenomena, e.g. waltzes, counterpoint,
jigs, etc. The hypothesis that functional relationships
provide a general principle for musical appreciation provides
a unifying perspective for all such disparate stylistic
conventions: At some level, all of them ultimately establish
some kind of functional relationship among the parts of a
composition.
Furthermore, this perspective suggests that as long as
they are perceptible (i.e. not so complex as to sound cacophonous),
relatively simple functions likely exist that generate
relationships among parts that are aesthetically appealing
yet not related to any genre, rule, or heuristic currently
taught or even yet conceived. For example, music generated
with FSMC exhibits a range of complexity, suggesting
little restriction on the type of function necessary to create
plausible accompaniments. Some of the most appealing accompaniments
are generated from very simple relationships
(Hoover, Szerlip, and Stanley, 2011b,a), while sometimes
more complex relationships between melody and percussion
are also appreciated by listeners (Hoover and Stanley,
2009). To some extent this theory thereby suggests without
any other musical theory when breaking the rules might
be appealing and when it might not; as long as a functional
relationship among parts can still be perceived, the average
listener will not necessarily react negatively to breaking established
conventions.
Sometimes it can also take a while to habituate to styles
or genres that do not follow conventional rules. For example,
atonal pieces can be difficult to enjoy for the uninitiated.
Interestingly, the functional hypothesis provides a potential
insight into why such unconventional styles can become appealing
with experience. The explanation is that initially the
functional relationships among different parts in such an unconventional
context are difficult to perceive because the relationships
are both complex and unfamiliar. Therefore, the
brain initially struggles to identify the functional relationship.
However, over time, repeated exposure familiarizes
the listener with the kinds of transformations that are typical
in the new context, such that eventually the brain can pick
out functional relationships that once were too complex to
perceive. At that point, the music becomes possible again to
appreciate. In fact, as noted by Huron (2006), the patterns
associated with a particular style are designed to elicit emotions
by playing on the listeners’ expectations. Those expectations
can be viewed as mediated by the kinds of functional
relationships with which the listener is familiar.
Music theory provides many heuristics for composing
plausible types of music like fugues or walking bass lines.
But as any musician knows, simply following such rules
without the elusive element of inspiration results in plausible
yet dry-sounding pieces. A good musician must know the
standard rules for composition and also when to break them,
but the problem of when to break the rules in music theory is
less understood than the standard rules for composition. The
Proceedings of the Fourth International Conference on Computational Creativity 2013 94
insight that a rule is well-broken if it still preserves a perceptible
functional relationship provides a possible direction for
studying this issue further.
Conclusions
This type of general theory follows directly from taking
a minimalist approach to music generation. Approaches
that rely on acquiring or enumerating all the complexities
of music of certain types or composers of certain types,
such as through statistical inference (Rhodes, Lewis, and
Mullensiefen, 2009; Kitani and Koike, 2010) or grammatical ¨
rules (Holtzman, 1981; McCormack, 1996), cannot probe
the possibility of deeper underlying principles than the rules
that are apparent at the surface. In contrast, FSMC and MaestroGenesis
took the minimalist approach to music generation
by predicating everything only on functional relationships.
While a potential criticism of such an approach is that
it is too simplistic to capture all the subtlety of sophisticated
musical composition, its benefit in a scientific context is that
it isolates a single phenomenon so that the full implication
of that phenomenon can be tested. The result is a simple
hypothesis that reduces musical theory to a mathematical
principle, i.e. perceiving functional relationships, that can
plausibly be appreciated even by listeners without musical
training. It also becomes a tool for music generation, as in
MaestroGenesis, that does not require enumerating complex
rules. While functional relationships need not constitute the
entire explanation for all musical appreciation, they are an
appealing ingredient because of their simplicity and possibility
for future study – they suggest that within the mind of
a composer perhaps at some level such a function is realized
as the overall pattern of a musical piece is first conceived.
In a broader context, explaining the appreciation of music
through perceiving functional relationships also connects
musical appreciation to non-musical aesthetics. After all,
across the spectrum of art, architecture, and even human
beauty, symmetry, repetition, and variation on a theme are
paramount. It is notable that all such regularities ultimately
reduce to one instance of a pattern being functionally related
to another. Given that we appreciate such relationships in so
many spheres of our experience, that music too would draw
from such an affinity follows elegantly.
Acknowledgments
This work was supported in part by the National Science
Foundation under grant no. IIS-1002507 and also by a NSF
Graduate Research Fellowship. Any opinion, findings, and
conclusions or recommendations expressed in this material
are those of the authors(s) and do not necessarily reflect the
views of the National Science Foundation.
<references_biblio/>
References
Abdallah, S., and Plumbley, M. 2009. Information dynamics:
Patterns of expectation and surprise in the perception
of music. Connection Science 21(2).
Balkwill, L.-L., and Thompson, W. F. 1999. A cross-cultural
investigation of the perception of emotion in music: Psychophysical
and cultural cues. Music Perception 43–64.
Christensen, T. 2002. The cambridge history of western
music theory. Cambridge University Press.
Chuan, C.-H. 2009. Supporting compositional creativity
using automatic style-specific accompaniment. In Proc.
of the CHI Computational Creativity Support Workshop.
Cope, D. 2005. Computer Models of Musical Creativity.
Ebcioglu, K. 1990. An expert system for harmonizing
chorales in the style of j.s.bach. Journal of Logic Programming
145–185.
Frith, S. 2004. What is bad music? In Washburne, C., and
Derno, M., eds., Bad Music: The Music We Love to Hate.
Routledge. 15–36.
Gracyk, T. 1996. Rhythm and Noise: An Aesthetics of Rock.
IB Tauris.
Hanslick, E. 1891. On the Musically Beautiful: A Contribution
Towards the Revision of the Aesthetics of Music.
Novello Ewer and Company.
Harkleroad, L. 2006. The Math Behind the Music. Outlooks.
Cambridge University Press.
Holtzman, S. R. 1981. Using generative grammars for music
composition. Computer Music Journal 5(1):51–64.
Hoover, A. K., and Stanley, K. O. 2009. Exploiting functional
relationships in musical composition. Connection
Science Special Issue on Music, Brain, & Cognition
21(2):227–251.
Hoover, A. K.; Szerlip, P. A.; Norton, M. E.; Brindle, T. A.;
Merritt, Z.; and Stanley, K. O. 2012. Generating a complete
multipart musical composition from a single monophonic
melody with functional scaffolding. In To appear
in: Proceedings of the Third International Conference on
Computational Creativity (ICCC-2012, Dublin, Ireland).
Hoover, A. K.; Szerlip, P. A.; and Stanley, K. O. 2011a. Generating
musical accompaniment through functional scaffolding.
In Proceedings of the Eighth Sound and Music
Computing Conference (SMC 2011).
Hoover, A. K.; Szerlip, P. A.; and Stanley, K. O. 2011b.
Interactively evolving harmonies through functional scaffolding.
In Proceedings of the Genectic and Evolutionary
Computation Conference (GECCO-2011). New York,
NY: The Association for Computing Machinery.
Huron, D. 2006. Sweet Anticipation: Music and the Psychology
of Expectation.
Kitani, K. M., and Koike, H. 2010. Improvgenerator: Online
grammatical induction for on-the-fly improvisation
accompaniment. In Proceedings of the 2010 Conference
on New Interfaces for Musical Expression (NIME 2010).
McCormack, J. 1996. Grammar based music composition.
Complex Systems 96:321–336.
Park, S.; Kim, S.; Lee, S.; and Yeo, W. S. 2010. Composition
with path: Musical sonification of geo-referenced
data with online map interface. In Proceedings of the International
Computer Music Conference.
Proceedings of the Fourth International Conference on Computational Creativity 2013 95
Payne, S. K. D. 1995. Tonal Harmony: With an Introduction
to Twentieth. McGraw-Hill.
Pearce, M. T., and Wiggins, G. A. 2012. Auditory expectation:
The information dynamics of music perception and
cognition. Topics in Cognitive Science 4(4):625–652.
Peddie, I. 2006. The Resisting Muse: Popular Music and
Social Protest. Ashgate Publishing Company.
Rhodes, C.; Lewis, D.; and Mullensiefen, D. 2009. Bayesian ¨
model selection for harmonic labelling. Mathematics and
Computation in Music 107–116.
Risset, J.-C. 2002. Computing musical sound. In Assayag,
G.; Feichtinger, H. G.; and Rodrigues, J. F., eds.,
Mathematic and Music: A Diderot Mathematical Forum.
Springer-Verlag. chapter 13.
Sacks, O. 2008. Musicophilia: Tales of Music and the Brain.
Vintage Canada.
Schmuckler, M. A. 1989. Expectation in music: Investigation
of melodic and harmonic processes. Music Perception:
An Interdisciplinary Journal 7(2):109–149.
Simon, I.; Morris, D.; and Basu, S. 2008. Mysong: Automatic
accompaniment generation for vocal melodies. In
Proc. of the Twenty-Sixth Annual SIGCHI Conference on
Human Factors in Computing Systems, 725–734. ACM.
Takagi, H. 2001. Interactive evolutionary computation: Fusion
of the capacities of EC optimization and human evaluation.
Proceedings of the IEEE 89(9):1275–1296.
Vickers, P. 2005. Ars informatica–ars electronica: Improving
sonification aesthetics. In Understanding and Designing
for Aesthetic Experience Workshop at HCI 2005 The
19th British HCI Group Annual Conference.
Proceedings of the Fourth International Conference on Computational Creativity 2013 96