Generating a Complete Multipart Musical Composition from a Single Monophonic Melody with Functional Sca↵olding Amy K. Hoover, Paul A. Szerlip, Marie E. Norton, Trevor A. Brindle, Zachary Merritt, and Kenneth O. Stanley Department of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816-2362 USA {ahoover@eecs.ucf.edu, paul.szerlip@gmail.com, marie.norton@knights.ucf.edu, tabrindle@gmail.com, zbmerritt@gmail.com, kstanley@eecs.ucf.edu } Abstract This paper advances the state of the art for a computer-assisted approach to music generation called functional sca↵olding for musical composition (FSMC), whose representation facilitates creative combination, exploration, and transformation of musical concepts. Music in FSMC is represented as a functional relationship between an existing human composition, or scaffold, and a generated accompaniment. This relationship is encoded by a type of artificial neural network called a compositional pattern producing network (CPPN). A human user without any musical expertise can then explore how accompaniment should relate to the sca↵old through an interactive evolutionary process akin to animal breeding. While the power of such a functional representation has previously been shown to constrain the search to plausible accompaniments, this study goes further by showing that the user can tailor complete multipart arrangements from only a single original monophonic track provided by the user, thus enabling creativity without the need for musical expertise. Introduction Among the most important functions of any approach to enhancing human creativity is what Boden (2004) terms transformational creativity. That is, key creative obstacles faced by human artists and musicians are the implicit constraints acquired over a lifetime that shape search space structure. By o↵ering an instance of the search space (e.g. of musical accompaniments) with a radically di↵erent structure, a creativity-enhancing program can potentially liberate the human to discover unrealized possibilities. In e↵ect, the familiar space of the human artist is transformed into a new structure intrinsic to the program. Once the user is exposed to this new world, as a practical matter the program must provide to the user the ability to explore and combine concepts within the newly-conceived search space, which corresponds to Boden’s combinatorial and exploratory classes of creativity (Boden, 2004). That way, the user experiences a rich and complete creative process within a space that was heretofore inconceivable. The danger with transformational creativity in computational settings is that breaking hard-learned rules may feel unnatural and thereby unsatisfying (Boden, 2007). Any attempt to facilitate transformational creativity should respect the relationships between key artistic elements even as they are presented in a new light. Thus for a given domain, such as musical accompaniment, a delicate balance must be struck between unfettered novelty and respect for essential structure. Many approaches to generating music focus on producing a natural sound at the cost of restricting creative exploration. Because structure is emphasized, the musical space is defined by rules that constrain the results to di↵erent styles and genres (Todd and Werner, 1999; Chuan, 2009; Cope, 1987). The necessity for a priori rules potentially facilitates the combination of musical structures or exploration of the defined space, but precludes transformational outcomes. In contrast, musical structures in the approach examined in this paper, functional sca↵olding for musical composition (FSMC), are defined as the very functions that relate one part of a piece to another, thereby enabling satisfying transformational creativity (Hoover, Szerlip, and Stanley, 2011a,b). Based on the idea that music can be represented as a function of time, FSMC inputs a simple, isolated musical idea into a function that outputs accompaniment that respects the structure of the original piece. The function is represented as a special type of artificial neural network called a compositional pattern producing network (CPPN). In practice, the CPPN inputs existing music and outputs accompaniment. The user-guided creative exploration itself is facilitated by an interactive evolutionary technique that in e↵ect allows the user to breed the key functional relationships that yield accompaniment, which supports both combinatorial and exploratory creativity (Boden, 2004) through the crossover and mutation operators present in evolutionary algorithms. By representing music as relationships between parts of a multipart composition, FSMC creates a new formalism for a musical space that transforms its structure for the user while still respecting its fundamental constraints. Hoover, Szerlip, and Stanley (2011a,b) showed that FSMC can produce accompaniments that are indisInternational Conference on Computational Creativity 2012 111 tinguishable by listeners from fully human-composed pieces. However, the accompaniment in these studies was only a single monophonic instrument, leaving open the key question of whether a user with little or no musical expertise can perhaps generate an entire multipart arrangement with this technology from just a singleinstrument monophonic starting melody. If that were possible, then anyone with only the ability to conceive a single, monophonic melody could in principle expand it into a complete multilayered musical product, thereby enhancing the creative potential of millions of amateur musicians who possess inspiration but not the expertise to realize it. This paper demonstrates that FSMC indeed makes such achievement possible. Background This section relates FSMC to traditional approaches to automated composition and previous creativityenhancing techniques. Automatic Music Generation Many musical representations have been proposed before FSMC, although their focus is not necessarily on representing the functional relationship between parts. For example, from long before FSMC, Holtzman (1980) creates a musical grammar that generates harp solos based on the physical limitations imposed on harp performers. Similarly, Cope (1987) derives grammars from the linguistic principles of haiku to generate music in a particular style. These examples and other grammarbased systems are predicated on the idea that music follows grammatical rules and thus by modeling musical relationships as grammars, they are representing the important structures of music (Roads, 1979; McCormack, 1996). While grammars can produce a natural sound, deciding which aspects of musical structure should be represented by them is often di!cult and ad hoc (Kippen and Bel, 1992; Marsden, 2000). Impro-Visor helps users create monophonic jazz solos by automatically composing any number of measures in the style of famous jazz artists (Keller et al., 2006). Styles are represented as grammars that the user can invoke to complete compositions. Creativityenhancement in Impro-Visor occurs through the interaction of the user’s own writing and the program’s suggestions. When users have di!culty elaborating musical concepts, they can access predictions of how famous musicians would approach the problem within the context of the current composition. By first learning di↵erent professional compositional techniques, students can then begin developing their own personal styles. While Impro-Visor is an innovative tool for teaching jazz styles to experienced musicians, it focuses on emulating prior musicians over exploration. Enhancing Creativity in Music Composition A problem with traditional approaches to music composition is that standard representations can potentially limit creative exploration. For instance, MySong generates chord-based accompaniment for a vocal piece from hidden Markov models (Simon, Morris, and Basu, 2008). Users select any vocal piece and MySong outputs accompaniment based on a transition table, a weighting factor that permits greater deviation from the table, and musical style (e.g. rock, big band). MySong thus allows users to create accompaniment for their own melodies in a variety of di↵erent predefined styles from which users cannot deviate. Zicarelli (1987) describes an early interactive composition program, Jam Factory, that improvises on human-provided MIDI inputs from rules represented in transition tables. Users manipulate the output in several ways including the probability distributions of eight di↵erent transition tables; there are four each for both rhythm and pitch. Users are provided more creative control in designing and consulting the transition tables, but the increased flexibility results in unnatural outputs that thereby limit the utility of the main algorithms (Zicarelli, 2002). The approach described by Chuan (2009) balances user control by training transition tables based on only a few userprovided examples. The tables then reflect the “style” inherent in the examples and can generate chord-based accompaniment for a user’s own piece. While each of these systems o↵ers users varied levels of control, rule manipulation alone may not be su!cient to access all three forms of creativity described by Boden (2004). For example, the representations cannot easily combine musical ideas or transform the musical space (due to inherent rule restrictions). Alternatively, most interactive evolutionary computation (IEC) (Takagi, 2001) approaches facilitate creativity through the evolutionary operators of crossover and mutation, and require human involvement in the creative process. In GenJam a human player and computer “trade fours,” a process whereby the human plays four measures and the computer “answers” them with four measures of its own (Biles, 1998). Musical propositions are mutated and combined into candidates that the user rates as good or bad. Similarly, Jacob (1995) introduces a system in which human users rate, combine, and explore musical candidates at three di↵erent levels of the composition process and Ralley (1995) generates melodies by creating a population from mutations of a provided user input. Finally, CACIE creates atonal pieces by concatenating musical phrases as they are generated over time (Ando and Iba, 2007). Each phrase is represented as a tree structure that users can interactively evolve or directly manipulate. However, most such systems impose explicit musical rules conceived by the developer to constrain the search spaces of possible accompaniment, thus narrowing the potential for discovery. Previous Work in FSMC The FSMC approach in this paper is based on previous work by Hoover, Szerlip, and Stanley (2011a,b), who focused on evolving a single monophonic accompaInternational Conference on Computational Creativity 2012 112 Instrument: OnOff Instrument: NewNote Piano: Rhythm Bass: Rhythm Bias Instrument: OnOff Instrument: [ ]...[ NewNote ]... (a) Rhythm Instrument: Pitch Bias Piano: Pitch Guitar: Pitch Bass: Pitch Instrument: Pitch ... ... (b) Pitch Figure 1: How CPPNs Compute a Function of the Input Sca↵old. The rhythm CPPN in (a) and pitch CPPN in (b) together form the accompaniments of FSMC. The inputs to the CPPNs are the sca↵old rhythms and pitches for the respective networks and the outputs indicate the accompaniment rhythms and pitches. Each rhythm network has two outputs: OnO↵ and NewNote. The OnO↵ node controls volume and whether or not a note is played. The NewNote node indicates whether a note is re-voiced or sustained at the current tick. If OnO↵ indicates a rest, the NewNote node is ignored. The pitch CPPN output decides what pitch the accompaniment should play at that particular tick. The internal topologies of these networks, which encode the functions they perform, change over evolution. The functions within each node depict that a CPPN can include more than one activation function, such as Gaussian and sigmoid functions. Two monophonic accompaniment outputs are depicted, but the number of instruments a CPPN can output is unlimited. The number of input instruments also can vary. niment for a multipart MIDI. These accompaniments are generated through two functions, one each for pitch and rhythm, that are represented as compositional pattern producing network (CPPNs), a special type of artificial neural network (ANN). CPPNs can evolve to assume an arbitrary topology wherein each neuron is assigned one of several activation functions. Through IEC, users explore the range of accompaniments with NeuroEvolution of Augmenting Topologies (NEAT), a method for growing and mutating CPPNs (Stanley and Miikkulainen, 2002). Unlike traditional ANN learning, NEAT is a policy search method, i.e. it explores accompaniment possibilities rather than optimizing toward a target. While existing songs with generated accompaniments were indistinguishable in a listener study from fully-composed human pieces, the real achievement for this approach would be to help the user generate entire polyphonic and multi-instrument accompaniment from just a single voice of melody (Hoover, Szerlip, and Stanley, 2011a). This paper realizes this vision. Approach: Extending Functional Sca↵olding for Music Composition This section extends the original FSMC approach, which only evolved a single monophonic accompaniment (Hoover, Szerlip, and Stanley, 2011a,b). It explains the core principles of the approach and how they are applied to producing multipart accompaniments. Defining the Musical Space A crucial aspect of any creativity-enhancing approach for music composition is first to define the musical space. Users can help define this space in FSMC by first selecting a musical starting point, i.e. the monophonic melody or sca↵old. Initial sca↵olds can be composed in any style and if they are only single monophonic parts as in this paper, they can be composed by users within a wide range of musical skill and expertise. The main insight behind the representation in FSMC is that a robust space of accompaniments can be created with only this initial sca↵old. Because of the relationship of di↵erent accompaniment parts to the sca↵old and therefore to each other, the space is easily created and explored. Each instrument part in the accompaniment is the result of two separate functions that independently relate rhythmic and pitch information in the sca↵old (i.e. the inputs) to the generated accompaniment. Depicted in figure 1, these functions are represented as CPPNs, the special type of ANN described in the background (Stanley, 2007). As figure 1 shows, multiple inputs can be sent to the output and many di↵erent instruments can be represented by the same CPPN. CPPNs incrementally grow through the NEAT method, which means they can in principle evolve to represent any function (Stanley and Miikkulainen, 2002; Cybenko, 1989). Together, the rhythmic and pitch CPPNs that will be evolved through NEAT define the musical space that the user can manipulate. In e↵ect, pitch information from the sca↵old is fed into the pitch CPPN at the same time as rhythmic information is fed into the rhythm CPPN. Both CPPNs then output how the accompaniment should behave in response. That way, they compute a function of the sca↵old. Accompaniments are divided into a series of discrete time intervals called ticks that are concatenated together to form an entire piece. Each tick typically represents the length of an eighth note, but this division can International Conference on Computational Creativity 2012 113 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 0 1 2 Time in Ticks Input Level (a) Rhythm 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 0 5 10 Time in Ticks Input Level (b) Pitch Figure 2: Input Representation. The spike-decay representation for rhythmic inputs is shown in (a) and the pitch representation is in (b). Rhythm is encoded as a set of decaying spikes that convey the duration of each note. Because the CPPN sees where within each spike it is at any given tick, in e↵ect it can synchronize its outputs with the timing of the input notes. Pitch on the other hand is input simply as the current note at the present tick. be altered through an interface. Outputs are gathered from both the rhythmic and pitch CPPNs at each tick that are combined to determine the accompaniment at that tick. As shown in figure 1a, the two outputs of the rhythm network for each line of accompaniment are OnO↵, which indicates whether a note or rest is played and its volume, and NewNote, which indicates whether or not to sustain the previous note. The single pitch output for each line of accompaniment in figure 1b determines instrument pitch at the current tick relative to a user-specified key. To produce the outputs, rhythmic and pitch information from the sca↵old is sent to the CPPN at each tick. The continuous-time graph in figure 2a illustrates how rhythmic information in the sca↵old is sent to the CPPN. When a note strikes, it is represented as a maximum input level that decays linearly over time (i.e. over ticks) until the note ends. At the same tick, pitch information on the current note is input as a pitch class into the pitch CPPN (figure 2b). That is, two C notes in di↵erent octaves (e.g. C4 and C5) are not distinguished. The sound of instruments in FSMC can be altered through instrument choice or key. A user can pick any of 128 pitched MIDI instruments and can request any key. Once a user decides from what preexisting piece the sca↵old is provided and the output instruments most appropriate for the piece, candidate CPPNs can be generated, thus establishing the musical space of accompaniments. The theory behind this approach is that by exploring the potential relationships between scaffolds and their accompaniments (as opposed to exploring direct representations of the accompaniment itself), the user is constrained to a space in which candidate accompaniments are almost all likely coherent with respect to the sca↵old. The next section describes how users can combine, explore, and transform this space to harness their own musical creativity. Navigating the Musical Space Figure 3: Program Interface. This screenshot of the program (called MaestroGenesis) that implements FSMC shows accompaniments for a melody input by the user. The instrument output is currently set to Grand Piano on the left-hand side, but can be changed through a menu. Accompaniments are represented as horizontal bars and are named by their ID. The user selects his or her favorite and then requests a new generation of candidates Exploration of musical space in FSMC begins with the presentation to the user of the output of ten randomly-generated CPPN pairs, each defining the key musical relationships between the sca↵old and output accompaniment. These accompaniments can be viewed in a graphical depiction (as shown in the screenshot in figure 3) or in standard musical notation. They can be played and heard through either MIDI or MP3 formats. The user-guided process of exploration that combines and mutates these candidates is called interactive evolutionary computation (IEC) (Takagi, 2001). Because each accompaniment is encoded by two CPPNs, evolution can alter both the pitch and rhythmic CPPNs or adjust them individually. The user combines and explores accompaniments in this space by selecting and rating one or more accompaniments from one generation to parent the individuals of the next generation. The idea is that the good musical ideas from both the rhythmic and pitch functions are preserved with slight alterations or combined to create a variety of new but related functions, some of which may be more appealing than their parents. The space can also be explored without combination by selecting only a single accompaniment. The next generation then contains slight mutations of the original functions. While IEC inherently facilitates these types of creativity, the approach in this paper extends the reach of transformational creativity o↵ered by FSMC. Previously, FSMC generated single-voice accompaniments to be played with a fully-composed, preexisting human International Conference on Computational Creativity 2012 114 piece (Hoover, Szerlip, and Stanley, 2011a,b). This paper introduces a new layering technique whereby generated accompaniment from previous generations can serve as inputs to new CPPNs that then generate more layers of harmony. The result is the ability to spawn an entire multi-layered piece from a single monophonic starting melody. One such layering approach is performed by generating one new monophonic accompaniment at a time. The first layer is the monophonic melody composed by the human user. The second layer is generated through FSMC from the first. The third layer is then generated through FSMC by now inputting into the CPPNs the first and second layers, and so on. All of the layers are finally combined to create an entire accompaniment, resulting in accompaniments that are functionally related to both the initial melody and previous accompaniment lines. In this way, each accompaniment line is slightly more removed from the original melody and subsequent accompaniment lines are based functionally on both the sca↵old and previously-generated lines. To create accompaniments more closely related to the original melody, another layering technique is for users to generate all accompaniment layers from only the single monophonic starting point. For this purpose, the CPPNs are given enough outputs to represent all the instruments in the accompaniment at the same time. Because the melody and the accompaniments are functionally related, any accompaniment will follow the contours of the melodic starting point. However, in this case, the only influence on each accompaniment is this starting point itself, yielding a subtly di↵erent feel. With either of these approaches or a combination of them users can further influence their accompaniments by holding constant the rhythm CPPN or pitch CPPN while letting the other evolve. Interestingly, when two accompaniments share the same rhythm network but di↵er in the pitch network slightly, the two monophonic instruments e↵ectively combine to create the sound of a polyphonic instrument. Similarly, the pitch networks can be shared while the rhythm networks are evolved separately, creating a di↵erent sound. Notice that this approach requires no musical expertise to generate multiple lines of accompaniment. Experiments The experiments in this paper are designed to show how users can generate multipart pieces from a single monophonic melody with FSMC. They are divided into accompaniment generation and a listener study that establishes the quality of the compositions. Accompaniment Generation For this experiment, three members of our team composed in total three monophonic melodies. From each of these user-composed melodies, a multipart accompaniment was generated through FSMC by the author of the originating melody. Two other multipart accompaniments were generated for the folk song, Early One Morning. We chose to include each of these FSMC composers, who were undergraduate independent study students at the University of Central Florida, as authors of this paper to recognize their pioneering e↵orts with a new medium. The most important point is that no musical expertise need be applied to the fi- nal creations beyond that necessary to compose the initial monophonic melody in MIDI format. Thus, although results may sound consciously arranged it is important to bear in mind that all the polyphony you hear is entirely the output of FSMC. The original melodies, accompaniments, and CPPNs are available at http://eplex.cs.ucf.edu/fsmc/iccc2012. The program, called MaestroGenesis, is available at http://maestrogenesis.org. As noted in the approach, FSMC provides significant freedom to the user in how to accumulate the layers of a multipart piece. In general, the user has the ability to decide from which parts to generate other parts. For example, from the original melody, five additional parts could be generated at once. Or, instead, the user might accumulate layers incrementally, feeding each new part into a new CPPN to evolve yet another layer. Some layers might depend on one previous layer, while other might depend on multiple previous layers. In e↵ect, such decisions shape the subtle structural relationships and hence aesthetic of the final composition. For example, evolving all of the new parts from just the melody gives the melody a commanding influence over all the accompaniment, while incrementally training each layer from the last induces a more delicate and complex set of harmonious partnerships. As the remainder of this section describes, the student composers took advantage of this latitude in a variety of ways Early One Morning, (Song 1) versions 1 and 2 with four- and five-part accompaniments began from an initial monophonic melody transcribed from the traditional, human-composed folk song. The second layer is identical in both versions and was evolved from Early One Morning itself. The third, fourth, and fifth parts of version 1 were all evolved from the second layer. The third, fourth, fifth, and sixth parts of version 2 were evolved from the pitch network of the second layer of version 1, and the rhythm network from the original Early One Morning monophonic melody. This experiment illustrates that the results with FSMC given the same starting melody are not deterministic and in fact do provide creative latitude to the user even without the need for traditional composition techniques. Song 2 started from an original monophonic melody composed by undergraduate Marie E. Norton. The second layer was added by inputting this melody into the rhythm and pitch networks of the subsequent accompaniment populations. This second layer then served as input to the pitch and rhythmic CPPNs for layers 3 and 4. The pitch CPPN for layer 5 consisted of layer 2, but the rhythm network only had a bias input. Finally, the inputs for the pitch network for layer 6 were layers 3, 4, and 5, while the inputs to the rhythm CPPN were International Conference on Computational Creativity 2012 115 layer 4 and a measure timing signal first introduced for FSMC by Hoover and Stanley (2009) that gives the network a sense of where the song is within the measure. All of the layers finally combined to create a single, multipart piece in which each line is functionally related to the others. Each layer took as few as three to as many as five generations to evolve. For Song 3, Zachary Merritt first created a layer that influences most of the other layers, but is not heard in the final track. The fourth layer was generated from the third, which is influenced by the monophonic melody and the unheard layer. The fifth layer was generated from the population of the fourth layer with the rhythm network held constant to create a chordal feel. The sixth layer was generated from only the initial starting melody and a special timing signal that imparts a sense of the position in the overall piece (Hoover and Stanley, 2009). Similarly, the seventh layer is generated from only the initial starting melody, but adds a separate function input, sin(⇡x), where x is the time in measure. Although there are seven layers described in this experiment, only six were selected to be heard, meaning that there is a five-part accompaniment. Trevor A. Brindle created an initial piece and evolved all five accompaniment lines for Song 4 directly from it. Instead of inputting results from previous generations, he started new runs for each voice from the same scaffold, giving a strong influence to the melody. Notice that the key decisions made by the users are in general from which tracks to generate more tracks. Of course the users also performed the IEC selection operations to breed each new layer. Importantly, none such decisions require musical expertise. Listener Study The contribution of users to the quality of the generated works and accordingly the e↵ectiveness of the creativity enhancement is evaluated through a listener study. The study consists of five surveys, one for each generated arrangement. The surveys present two MP3s to the listener, who is asked to rate the quality of both. The first MP3, called the collaborative accompaniment, is an arrangement resulting from the collaboration of the author with the program (i.e. the two versions from Early One Morning or Songs 2, 3, or 4). The second, called the FSMC-alone accompaniment, is generated by the program alone. That is, a random pitch CPPN and a random rhythm CPPN are provided the same monophonic starting melody as the collaborative accompaniment and their output is taken as the FSMC-alone accompaniment. Thus the factor that is isolated is the involvement of the human user, who is not involved in the FSMC-alone accompaniment. However, it is important to note that the FSMC-alone accompaniments do not actually sound random because even if the CPPNs are generated randomly, they are still functions of the same sca↵old, which tends even in the random case to yield outputs that sound at least coherent (which is the motivation for FSMC in the first place). Thus this study investigates whether the human user is really able to make a creative contribution by leveraging FSMC. A total of 129 students participated in the study. The full survey is available at http://eplex.cs.ucf.edu/fsmc/iccc2012/survey, but note that in the administered surveys, the order of the MP3s was random to avoid any bias. The users were asked to rate each piece with the following question: Rate MIDI i on a scale of one to ten. (1 is the worst and 10 is the best), where i refers to one of the ten generated works. The idea is that if the user-created arrangements are rated higher than those generated by FSMC-alone, the user’s own input likely positively influenced the outcome. While this study focuses on the quality of output, the degree to which FSMC enhances creativity will be addressed in future work. Results The generated accompaniments and original scaffold discussed in this section can be heard at http://eplex.cs.ucf.edu/fsmc/iccc2012. Accompaniments Samples of the scores for the two arrangements created to accompany Early One Morning are shown in figure 4. The layers are shown in order from top to bottom in both versions (layer 1 is the original melody). Layer 2, which is the same in both versions, is heard as violin II in version 1 and viola in version 2. An important observation is that the violoncello part in version 1 follows the rhythm of the initial starting melody very closely while the pitch contour di↵ers only slightly. While the viola and double-bass parts di↵er in both pitch and rhythm over the course of the song, both end phrases and subphrases on the tonic note, F, in many places over the course of the piece, including measure 4 in figure 4a. Version 2, on the other hand, contains many rhythmic similarities (i.e. the eighth note patterns contained in the keyboard I, viola, keyboard II, and the violin II parts), but illustrates distinct pitch contours. Together, the two versions illustrate how a single user can generate di↵erent accompaniment from the same initial monophonic starting melody and how the initial melody exerts its influence both rhythmically and harmonically. Songs 2, 3, and 4 exhibit a similar e↵ect: rhythmic and harmonic influence from the original melody, yet distinctive and original accompaniment nevertheless. The result is that the overall arrangements sound composed even though they are evolved through a breeding process. The next section provides evidence that impartial listeners also appreciate the contribution of the human user. International Conference on Computational Creativity 2012 116 Violin I (Layer 1) Double Bass (Layer 4) Viola (Layer 5) Violin II (Layer 2) Violoncello (Layer 3) (a) Early One Morning Version 1 Keyboard I (Layer 1) Electric Bass (Layer 4) Violin II (Layer 6) Violin I (Layer 5) Viola (Layer 2) Keyboard II (Layer 3) (b) Early One Morning Version 2 Figure 4: Early One Morning. The first four measures of versions 1 and 2 of Early One Morning illustrate how a single user with the same monophonic starting melody can direct the accompaniment in two di↵erent ways that nevertheless both relate to the initial melody. Because the accompaniments share two of their layers, they sound related. However, through timbre selection and the evolution of two and three distinct layers in versions 1 and 2 respectively, the user imparts a di↵erent feel. Listener Study Results The results of the listener study in figure 5 indicate that all of the collaborative accompaniments are rated higher than those generated with FSMC alone, with three out of five (Song 1 version 2, Song 4, and Song 5) displaying significant di↵erence (p < 0.05; Student’s paired t-test). Taken all together, the collaborative accompaniments sound highly significantly more appealing than those generated with FSMC alone (p < 0.001; Student’s paired t-test). These results indicate that not only does FSMC provide a structurally plausible search space, but that it is possible to explore such a space without applying musical expertise. That is, the results suggest that the user input significantly improves the perceived quality of the generated compositions. Discussion A key feature of figure 4 is that the collaborative accompaniments generated by users with the assistance of FSMC follow the melodic and rhythmic contours of the original sca↵old. Furthermore, the listener study suggests that FSMC helps the user establish and explore musical search spaces that may otherwise have been inaccessible. While the users search this space through IEC, which facilitates the combination of musical ideas and the exploration of the space itself, an interesting property of this search space is its robustness; even FSMC-alone accompaniments, which are created without the bene- fit of human, subjective evaluation, can sound plausible. However, when coupled with the human user, this approach in e↵ect transforms the user’s own internal search space of possible accompaniments to one constrained by functional sca↵olding. While the quantitative data suggests the merit of collaborative accompaniments, music is inherently subjective. Therefore, it is important for the reader to judge the results for his or herself at http://eplex.cs.ucf.edu/fsmc/iccc2012 to fully appreciate the potential of the FSMC method. One interesting direction for future work is to explore new interpretations for the output of the pitch functions. Currently, accompaniment pitches are interpreted as discrete note values, a process that limits the instrument to playing the same note each time a given combination of notes occurs in the sca↵old. However, by interpreting the output as a change in pitch (i.e. horizontal interval) rather than an absolute pitch, instruments can select any note to correspond to a particular combination depending on where in the piece it is occurring. In this way, an even larger space of musical possibilities could be created. Perhaps most importantly, with only a single, monophonic melody, users could compose entire multipart pieces without the need for musical expertise. Even if not at the master level, such a capability opens to the novice an entirely new realm of exploration. Conclusion This paper presented an extension to functional scaffolding for musical composition (FSMC) that facilitates a human user’s creativity by generating polyphonic compositions from a single, human-composed monophonic starting track. The technique enables creative exploration by helping the user construct and then navigate a search space of candidate accompaniments through a breeding process akin to animal breeding called interactive evolutionary computation (IEC). These collaborative accompaniments bred by users were judged by listeners against those composed only through FSMC. Overall, listeners liked collaborative accompaniments more than the FSMC-alone accompaniments. Most importantly, a promising potenInternational Conference on Computational Creativity 2012 117 0 1 2 3 4 5 6 7 8 9 10 1 (v. 1) 1 (v. 2) 2 3 4 Overall Song Number Average Score Collaborative FSMC-Alone Figure 5: Listener Study Results. The average rating (by 129 participants) from one to ten of both the collaborative and FSMC-alone accompaniments are shown side-by-side with the lines indicating a 5% error bound. The overall results for the listener study indicate that on average the collaborative accompaniments are of significantly higher perceived quality than FSMC-alone. tial for creativity enhancement in AI is to open up the world of the amateur to the domain once only accessible to the expert. The approach in this paper is a step in this direction. Acknowledgements This work was supported in part by the National Science Foundation under grant no. IIS-1002507 and also by a NSF Graduate Research Fellowship. References Ando, D., and Iba, H. 2007. Interactive composition aid system by means of tree representation of musical phrase. In IEEE Congress on Evolutionary Computation (CEC), 4258–4265. IEEE. Biles, J. 1998. Interactive GenJam: Integrating real-time performance with a genetic algorithm. In Int. Computer Music Conf.(ICMC 98), 232–235. Boden, M. A. 2004. The Creative Mind: Myths and Mechanisms. Routledge, second edition. Boden, M. A. 2007. Creativity and Conceptual Art. Oxford:Oxford University Press. Chuan, C.-H. 2009. Supporting compositional creativity using automatic style-specific accompaniment. In Proc. of the CHI Computational Creativity Support Workshop. Cope, D. 1987. An expert system for computer-assisted composition. Computer Music Journal 11(4):30–46. Cybenko, G. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS) 2(4):303–314. Holtzman, S. R. 1980. A generative grammar definition language for music. Interface 9(1):1–48. Hoover, A. K., and Stanley, K. O. 2009. Exploiting functional relationships in musical composition. Connection Science Special Issue on Music, Brain, & Cognition 21(2):227–251. Hoover, A. K.; Szerlip, P. A.; and Stanley, K. O. 2011a. Generating musical accompaniment through functional scaffolding. In Proceedings of the Eighth Sound and Music Computing Conference (SMC 2011). Hoover, A. K.; Szerlip, P. A.; and Stanley, K. O. 2011b. Interactively evolving harmonies through functional scaffolding. In Proceedings of the Genectic and Evolutionary Computation Conference (GECCO-2011). New York, NY: The Association for Computing Machinery. Jacob, B. L. 1995. Composing with genetic algorithms. In Proc. of the 1995 International Computer Music Conference, 425–455. Intl. Computer Music Association. Keller, R. M.; Morrison, D.; Jones, S.; Thom, B.; and Wolin, A. 2006. A computational framework for enhancing jazz creativity. In Proceedings of the Third Workshop on Computational Creativity, ECAI 2006. Kippen, J., and Bel, B. 1992. Modeling Music with Grammars: Formal Language Representation in the Bol Processor. Academic Press London. 207–238. Marsden, A. 2000. Readings in Music and Artificial Intelligence. Harwood Academic Publishers. chapter Music, Intelligence, and Artificiality, 18. McCormack, J. 1996. Grammar based music composition. Complex Systems 96:321–336. Ralley, D. 1995. Genetic algorithms as a tool for melodic development. In Proc. of the 1995 Intl. Computer Music Conf., 501–502. Intl. Computer Music Assoc. Roads, C. 1979. Grammars as representations for music. Computer Music Journal 3(1):48–55. Simon, I.; Morris, D.; and Basu, S. 2008. Mysong: Automatic accompaniment generation for vocal melodies. In Proc. of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, 725–734. ACM. Stanley, K. O., and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10:99–127. Stanley, K. O. 2007. Compositional pattern producing networks: A novel abstraction of development. Genetic Programming and Evolvable Machines Special Issue on Developmental Systems 8(2):131–162. Takagi, H. 2001. Interactive evolutionary computation: fusion of the capabilities of ec optimization and human evaluation. Proceedings of the IEEE 89(9):1275–1296. Todd, P. M., and Werner, G. M. 1999. Frankensteinian methods for evolutionary music. Musical Networks: Parallel Distributed Perception and Performace 313–340. Zicarelli, D. 1987. M and jam factory. Computer Music Journal 11(4):13–29. Zicarelli, D. 2002. How I learned to love a program that does nothing. Computer Music Journal 26(4):44–51. International Conference on Computational Creativity 2012 118