Real-Time Emotion-Driven Music Engine Alex Rodrguez Lopez, Antonio Pedro Oliveira, and Amlcar Cardoso Centre for Informatics and Systems, University of Coimbra, Portugal lopez@student.dei.uc.pt, apsimoes@student.dei.uc.pt, amilcar@dei.uc.pt Abstract. Emotion-Driven Music Engine (EDME) is a computer sys- tem that intends to produce music expressing a desired emotion. This paper presents a real-time version of EDME, which turns it into a stand- alone application. A real-time music production engine, governed by a multi-agent system, responds to changes of emotions and selects the more suitable pieces from an existing music base to form song-like structures, through transformations and sequencing of music fragments. The music base is composed by fragments classi ed in two emotional dimensions: valence and arousal. The system has a graphic interface that provides a front-end that makes it usable in experimental contexts of di erent sci- enti c disciplines. Alternatively, it can be used as an autonomous source of music for emotion-aware systems. 1 Introduction Adequate expression of emotions is a key factor in the ecacy of creative activi- ties [16]. A system capable of producing music expressing a desired emotion can be used to in uence the emotional experience of the target audience. Emotion- Driven Music Engine (EDME) was developed with the objective of having such a capability. The high modularity and parameterization of EDME allows it to be customized for di erent scenarios and integrated into other systems. EDME can be controlled by the user or used in an autonomous way, depend- ing on the origin of the input source (an emotional description). A musician can use our system as a tool to assist the process of composition. Automatic soundtracks can be generated for other systems capable of making an emotional evaluation of the current context (i.e., computer-games and interactive media, where the music needs to change quickly to adapt to an ever-changing context). The input can be fed from ambient intelligence systems. Sensing the environment allows the use in installations where music reacts to the public. In a healthcare context, self-report measures or physiological sensors can be used to generate music that reacts to the state of the patient. The next section makes a review of related work. Section 3 presents our computer system. Section 4 draws some conclusions and highlights directions for further work. 150 2 Related Work The developed system is grounded on research made in the areas of computer science and music psychology. Systems that control the emotional impact of musical features usually work through the segmentation, selection, transformation and sequencing of musical pieces. These systems modify emotionally-relevant structural and performative aspects of music [4, 11, 22], by using pre-composed musical scores [11] or by making musical compositions [3, 10, 21]. Most of these systems are grounded on empirical data obtained from works of psychology [8, 19]. Scherer and Zentner [18] established parameters of in uence for the experienced emotion. Meyer [13] analyzed structural characteristics of music and its relation with emotional meaning in music. Some works have tried to measure emotions expressed by music and to identify the e ect of musical features on emotions [8, 19]. From these, relations can be established between emotions and musical features [11]. 3 System EDME works by combining short MIDI segments into a seamless music stream that expresses the emotion given as input. When the input changes, the system reacts and smoothly fades to music expressing the new emotion. There are two stages (Fig. 1). At the o -line stage, pre-composed music is segmented and classi ed to build a music base (Section 3.1); this makes sys- tem ready for the real-time stage, which deals with selection, transformation, sequencing and synthesis (Section 3.2). The user interface lets the user select in di erent ways the emotion to be expressed by music. Integration with other systems is possible by using di erent sources as the input (Section 3.3). 3.1 O -line stage Pre-composed MIDI music (composed on purpose, or compiled as needed) is input to a segmentation module. An adaptation of LBDM [2] is used to at- tribute weights according to the importance and degree of proximity and change of ve features: pitch, rhythm, silence, loudness and instrumentation. Segmen- tation consists in a process of discovery of fragments, by looking to each note onset with the higher weights. Fragments that result are input to a feature ex- traction module. These musical features are used by a classi cation module that grades the fragments in two emotional dimensions: valence and arousal (plea- sure and activation). Classi cation is done with the help of a knowledge base implemented as two regression models that consist of weighted relations between each emotional dimension and music features [14]. Regression models are used to calculate the values of each emotional dimension through a weighted sum of the features obtained by the module of features extraction. MIDI music emotionally classi ed is then stored in a music base. 151 Real-Time Stage Off-line Stage Desired Emotion Music Selection Music Transformation Music Sequencing Music Synthesis Music Base Music Features Extraction Music Segmentation Pre-composed Music Knowledge Base Pattern Base Listener Music Classification Fig. 1. The system works in two stages. 3.2 Real-Time Stage Real-time operation is handled by a multi-agent system, where agents with dif- ferent responsabilities cooperate in simultaneous tasks to achieve the goal of generating music expressing desired emotions. Three agents are used: an input agent, which handles commands between other agents and user interface; a se- quencer agent, that selects and packs fragments to form songs; and a synthesizer agent, which deals with the selection of sounds to convert the MIDI output from the sequencer agent into audio. In this stage, the sequencer agent has important responsabilities. This agent selects music fragments with the emotional content closer to the desired emotion. It uses a pattern-based approach to construct songs with the selected fragments. Each pattern de nes a song structure and the harmonic relations between the parts of this structure (i.e., popular song patterns like AABA). Selected frag- ments are arranged to match the tempo and pitch of a selected musical pattern, through transformations and sequencing. The fragments are scheduled in order to make their perception as one continuous song during each complete pattern. This agent also crossfades between patterns and when there is a change in the emotional input, in order to allow a smooth listening experience. 3.3 Emotional Input The system can be used under user control with an interface or act autonomously with other input. The input speci es values of valence and arousal. User Interface. The user interface serves the purpose of letting the user choose in di erent ways the desired emotion for the generated music. It is possible for the user to directly type the values of valence and arousal the music should have. 152 Other way is through a list of discrete emotion the user can choose from. It is possible to load several lists of words denoting emotions to t di erent uses of the system. For example, Ekman [6] has a list of generally accepted basic emotions. Russell [17] and Mehrabian [12] both have lists which map speci c emotions to dimensional values (using 2 or 3 dimensions). Juslin and Laukka [9] propose a speci c list for emotions expressed by music. Another way to choose the a ective state of the music is through a graphical representation of the valence-arousal a ective space, based on FeelTrace [5]: a circular space with valence dimension is in the horizontal axis and the arousal dimension in the vertical axis. The coloring follows that of Plutchik's circumplex model [15]. Other Input. EDME can stand as an autonomous source of music for other systems by taking their output as emotional input With the growing concern on computational models of emotions and a ective systems, and a demand for interfaces and systems that behave in an a ective way, it is becoming frequent to adapt systems to show or perceive emotions. EmoTag [7] is an approach to automatically mark up a ective information in texts, marking sentences with emotional values. Our system can serve the musical needs of such systems by taking their emotional output as the input for real-time soundtrack generation. Sensors can serve as input too. Francisco et al. [20] presents an installation that allows people to experience and in uence the emotional behavior of their system. EDME is used in this interactive installation to provide music according to values of valence and arousal. 4 Conclusion Real-time EDME is a tool that produces music expressing desired emotions that has application in theatre, lms, video-games and healthcare contexts. Currently, we have applied our system in an a ective installation [20]. The real-time usage of the system by professionals of music therapy and the integration of EDME with EmoTag [7] for emotional soundtrack generation are also being analysed. The extension of EDME to an agent-based system increased its scability, which makes easier its expansion and integration with external systems. Listening tests are needed to assess the uentness of obtained songs. References 1. Bresin, R., Friberg, A.: Emotional Coloring of Computer Controlled Music Perfor- mance. Computer Music Journal, 24(4), pp. 44{62 (2000) 2. Cambouropoulos, E.: The Local Boundary Detection Model (LBDM) and its Appli- cation in the Study of Expressive Timing. International Computer Music Conference (2001) 153 3. Casella, P., Paiva, A.: Magenta: An Architecture for Real Time Automatic Compo- sition of Background Music. International Workshop on Intelligent Virtual Agents, pp. 224{232. Springer (2001) 4. Chung, J., Vercoe, G.: The A ective Remixer: Personalized Music Arranging. Con- ference on Human Factors in Computing Systems, pp. 393{398. ACM Press New York (2006) 5. Cowie, R.: Feeltrace: An Instrument for Recording Perceived Emotion in Real Time. SpeechEmotion, pp. 19{24 (2000) 6. Ekman, P.: Basic Emotions. In: Dalgleish, T., Power, M. (eds.) Handbook of Cog- nition and Emotion. Wiley, New York (1999) 7. Francisco, V., Hervas, R.: EmoTag: Automated Mark Up of A ective Information in Texts. EUROLAN 2007 Summer School Doctoral Consortium, pp. 5{12 (2007) 8. Gabrielsson, A., Lindstrom, E.: The In uence of Musical Structure on Emotional Expression. Music and Emotion: Theory and Research, pp. 223{248. Oxford Uni- versity Press (2001) 9. Juslin, P., Laukka, P.: Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening. Journal of New Music Research, 33(3), pp. 217{238 (2004) 10. Kim, S., Andre, E.: Composing A ective Music with a Generate and Sense Ap- proach. Flairs - Special Track on AI and Music. AAAI Press (2004) 11. Livingstone, S.R., Muhlberger, R., Brown, A.R., Loch, A.: Controlling Musical Emotionality: An A ective Computational Architecture for In uencing Musical Emotions. Digital Creativity, 18. Taylor and Francis (2007) 12. Mehrabian, A.: Basic Dimensions for a General Psychological Theory. Cambridge: OG&H Publishers (1980) 13. Meyer, L.: Emotion and Meaning in Music. University of Chicago Press (1956) 14. Oliveira, A., Cardoso, A. Modeling A ective Content of Music: A Knowledge Base Approach. Sound and Music Computing Conference (2008) 15. Plutchik, R. A general psychoevolutionary theory of emotion. Emotion: Theory, research, and experience: Vol. 1. Theories of emotion, 3-33. New York: Academic (1980) 16. Russ, S. A ect and Creativity: The Role of A ect and Play in the Creative Process. Lawrence Erlbaum Associates Inc, US. (1993) 17. Russell, J.: Measures of Emotion. Emotion: Theory, Research, and Experience, 4, pp. 83{111 (1989) 18. Scherer, K., Zentner, M.: Emotional e ects of music: Production rules. Music and emotion: Theory and research, pp. 361392 (2001) 19. Schubert, E.: Measurement and Time Series Analysis of Emotion in Music. PhD thesis. University of New South Wales (1999) 20. Ventura F., Oliveira A., Cardoso, A.: An Emotion-Driven Interactive System. 14th Portuguese Conference on Arti cial Intelligence, pp- 167-178 (2009) 21. Wassermann, K., Eng, K., Verschure, P., Manzolli, J.: Live soundscape composition based on synthetic emotions. IEEE Multimedia, pp. 82{90 (2003) 22. Winter, R.: Interactive Music: Compositional Techniques for Communicating Dif- ferent Emotional Qualities. Masters Thesis. University of York (2005) 154 Tabla Gyan: An Arti cial Tabla Improviser Parag Chordia and Alex Rae Georgia Institute of Technology, Atlanta GA, 30332 USA ppc@gatech.edu, arae3@gatech.edu http://paragchordia.com/ Abstract. We describe Tabla Gyan, a system capable of improvising tabla solo music, a sophisticated percussion tradition from North India. The system is based on a generative model of the qaida, a central form in tabla solo based on thematic variation. The system uses a recombi- native process of variation generation, and lters the results according to rhythmic and timbral characteristics of each phrase. The sequences are used to generate audio in realtime using pre-recorded tabla samples. An evaluation of the system was conducted with seventy users, primar- ily experienced tabla performer and listeners. With respect to qualities such as musicality, novelty, adherence to stylistic norms, and technical ability, the computer-generated performances compared favorably with performances by a world-class tabla player. 1 Introduction This work aims to explore computational models of creativity, realizing them in a system designed for realtime generation of improvised music. This is envi- sioned as an attempt to develop musical intelligence in the context of structured improvisation, and by doing so to enable and encourage new forms of musical control and performance. A model of qaida, a traditional north Indian solo tabla form, is presented along with the results of an online survey comparing it to a professional tabla player's recording on dimensions of musicality, creativity, and novelty. The model is based on generating a bank of variations and ltering according to musical qualities. 2 Background 2.1 Theories of Creativity This work is fundamentally motivated by an interest in exploring computational models of creativity. There have been many attempts to characterize the basic nature of creativity, and here we identify some key insights. Mihaly Csikszentmihalyi [4] outlined a theory formulating creativity as a concept arising from the interaction of a domain, such as music or a particular musical genre, the individual who produces some possibly creative work, and the eld within which the work is judged. One signi cance of this is that it 155 moves creativity from being a purely individual characteristic to one largely the product of external interactions; notably, the nal determination of whether the individual has been creative rests on the judgement of peers. Many theories are based in the idea of multiple creativities. Geneplore [6], for example, models creativity as comprised of a generative phase in which a large set of potential materials is amassed, and an exploratory phase in which this set is explored and interpreted. There is notable similarity between this and elements of our system described in Sections 4.1 and 4.2. Sternberg presents a theory[13] that represents creativity in terms of three processes for nding insights in large quantities of information: selective encoding, combination, comparison. Insights found by ltering information are then combined to generate new insights, which in turn are compared to previous or distant insights to create yet another in- sight. Gardner [7] also addresses creativity, characterizing it as the production of novelty within a domain, similarly to Csikszentmihalyi's approach. More practical but equally valid de nitions have focused on the concept of novelty. A common formulation de nes creativity as an action or process which produces novel output that satis es the constraints of context [3]. Addressing the basis for judging whether an arti cial system could be considered creative, Pereira [11] identi es the requirements that when given a problem, answers pro- duced by the system should not replicate previous solutions of which it has knowledge, and should apply acceptably to the problem. These are notably sim- ilar conceptualizations of creativity, and share the idea that the existence of creativity can, and should, be evaluated on the basis of the product. 2.2 Machine Musicianship Many systems have been developed which can claim to involve computational creativity. We mention here a few in order to indicate the range of approaches and goals which others have undertaken. The Continuator [9], developed by Francois Pachet tries to come up with im- provisatory responses to human pianist's playing, using weighted random draws from a pre x tree built from phrases detected in the audio input. Arne Eigen- feldt's multi-agent \Kinetic Engine" [5] models the interactions between net- worked improvising agents in terms of both musical features and social dynamics, allowing shared parameters such as tempo and overall contour to be controlled by a \conductor" agent. David Cope's long-running project Experiments in Mu- sical Intelligence (EMI) focuses on faithful emulations of styles in the Western classical canon [1]. His approach focuses on analyzing a large corpus of works to extract patterns which encode the main elements of the style, recombining them to create derivative works [2]. Cope has written and worked extensively in this eld, and identi es a number of basic elements which he determines to be central to computational creativity, speci cally calling out pattern-matching and recombinance [3]. 156 3 Introduction to Tabla Tabla is the predominant percussion instrument of North India. Physically, tabla is actually a pair of drums, as seen in Figure 1. It is played with the hands and ngers, and each drum is associated with one hand. The right-hand drum, called the tabla or dayan, is higher in pitch than the left-hand drum, or bayan. Both drums are capable of producing a variety of distinct timbres, ranging from ringing sounds with a clear pitch to short sharp sounds with a high noise content. There are speci c striking techniques for producing each of the di erent timbres, known generally as strokes, and each is named. There are three broad classes of strokes: resonant strokes with a clear pitch and ringing tone, shorter non-resonant noisy strokes, and bass strokes produced on the bayan. Individual strokes and common short phrases are known as bols, and form the building-blocks of larger phrases. Improvisation in tabla music takes places within a rhythmic cycle which de nes a large-scale periodicity, consisting of a set number of beats. The most common cycle is Teental, consisting of sixteen beats. To make the cycle easier to perceive, bayan strokes on certain beats are damped, and are referred to as \closed". Strokes in which the bass is allowed to sound are referred to as \open". Fig. 1. A tabla. The drum on the left is the bayan; the drum on the right is the dayan. There is a rich tradition of solo tabla performance. The tabla is then usually accompanied by a melodic instrument that plays a repeated gure known as nagma which occupies the role of a timekeeper. One of the most prominent compositional forms presented in a solo tabla performance is qaida, a structured improvisation consisting of a theme and variations form [14]. The theme upon which a given qaida performance is built is composed of a series of subphrases, and is taken as a xed composition. The macroscopic form of qaida follows a fairly simple structure: introduction of the theme, development of variations at an increased tempo, conclusion. Within the main body, variations are presented in a structured manner: a variation is introduced, the theme is reiterated, the same variation is repeated with closed bayan, and nally the theme is played again with closed bayan, often re-opening it shortly before the end of the cycle. 157 While qaida themes are part of the shared repertoire of solo tabla, variations are improvised according to some basic principles. The most important guiding principle of qaida variation is a restriction: only bols which appear in the qaida theme may be used in the variations. This is intended to preserved the essential character of the given qaida. Given this limitation, one common and e ective variation technique is to rearrange subsections of the theme. 4 Methods Theme Stochastic shuffler Feature extractor Phrase chooser Feature profile Audio Theme features bank Domain knowledge Fig. 2. Overview of the Qaida variation architecture. The theme bank is greyed-out because the choice of theme is made only once, initially. Domain knowledge, speci c knowledge about qaida and tabla, is shown being incorporated at speci c points in the process. The design of the system centers around complementary processes of vari- ation generation and variation selection. A database of potential variations is built through a stochastic process, and phrases are selected from that set based on certain criteria. This bears some semblance to the technique known in algo- rithmic composition as \generate-and-test" [12], however in our case the criteria are treated more probabilistically, as a basis for the system to make a choice with a some indeterminacy but weighted heavily towards a desired outcome. Consistent with the fact that qaida themes are not themselves improvised, and rarely even composed by the performer, no attempt was made to generate new thematic material. Instead, a number of traditional themes were transcribed manually and annotated with partition bounds. A bank of of these themes is stored in XML format, and one theme is chosen at start-up which remains the only source material for the duration of the qaida improvisation. The core of the system was coded in Python, relying on the NumPy and SciPy [8] packages for performance intensive computation. Audio output was generated using Pure Data (Pd) [10]. An overview of this system is shown in Figure 2. 158 4.1 Variation Generation A bank of phrases is generated from the chosen theme by applying transfor- mations consistent with qaida theory, and then stochastically applying another set of operations to bias the population towards more stylistically appropriate content. An overview of these operations is shown in Figure 3. The size of the phrase database is set in advance, and is far smaller than the set of all possible variations given the transforms. Clearly, a larger database is preferable in that it will contain a greater diversity of material; however, the feature extraction and phrase selection processes described in Section 4.2 scale with the size of the database, and computational eciency is critical within a real-time architec- ture. A bank of two thousand phrases was used during much of the development process, and it was qualitatively found that this size contained sucient phrase diversity to support varied and novel output. A given variation is constructed by applying the transforms, and accepting or rejecting the result based on the constraint that the variation have the same metrical duration as the original. This process is repeated until a bank of the speci ed size has been constructed. There are two main transforms used: re-ordering of the theme partitions, and repetition at doubled tempo. The rst assembles a variation by sampling with replacement from the set of partitions. For eciency, the number of possible partitions in the new phrase is limited to the range within which generated phrases of the required length are possible. The second transform simply selects a partition at random and repeats it twice at double the speed. A parameter controls the relative likelihood of applying one or the other of these operations. Require metrical length Random shuffling, no style constraint Preserve cadence Repeat sub-phrases Introduce rests, fills Domain knowledge Fig. 3. Detail of the Qaida variation generating architecture. The reordering routine is depicted here, constrained by metrical length and incorporating domain knowledge in the form of added transformations. An additional set of three transformations may then be applied, each with an independent probability. They function to bias the phrase bank toward more style-speci c characteristics. They are intended to favor multiple occurrences of the same partition (non-consecutive repetition), consecutive repetitions of a partition and preservation of the nal partition (cadence). 159 Lastly, a nal transform that introduces short rests into the phrases may be applied at any time. This operation is essential to break the homogeneity which tends to emerge over time, but it can also disturb the coherence of a phrase. For this reason it is reserved for use in the more \complicated" sections of qaida development, and may be applied to an existing phrase bank. 4.2 Variation Selection Selection of a phrase from the bank is initiated by a request for a phrase with a desired set of features. In response, phrases in memory are compared against the request, a close match is selected, and the system returns a single phrase for playback. Immediately after the phrase bank is rst built, features are calculated over each phrase in the set. It was found that a relatively small set of features could provide a surprisingly exible handle into the character of the returned phrases, though a larger set would no doubt improve the range of performance. The cur- rently calculated features are distribution over each stroke type, by frequency of occurrence and by time, ratio of open to closed strokes, by frequency of occur- rence and by time, rhythmic density, spectral centroid, and spectral spread. Note that these are not all of equivalent dimensionality | rhythmic density, open/closed ratios, and spectral centroid are scalar values, while the distribu- tions over stroke types are vectors. For the most part, these are in e ect timbral features, due to the correspondence between stroke types and timbre. The spec- tral centroid and spread require more explanation. The features themselves are uncomplicated, but up to this point we have been dealing with symbolic data only. However, the sequences are destined for playback on a known set of sounds, so in this step we calculate average values over the same audio database of seg- mented tabla strokes which is used in playback. This gives us a quantitative estimate of the timbre we expect when a phrase is synthesized. The feature preferences de ned in the request for a variation can describe any subset of the above features, and specify three values for each: the target value, a relative weighting for this feature, and a \ exibility" measure. The target value, expressed in the range 0 to 1, is normalized to the range present in the current bank of variations. The exibility parameter functions as a sort of distance metric, an alternative to simple linear distance. It de nes the width of a Gaussian centered on the target value, which is then used as a look-up table to get the unweighted score for that phrase and feature. A score is calculated for each phrase in the bank of variations. Rather than always choose the best match, which would lead to a deterministic output, the choice is made probabilistically. The two most successful algorithms are to rescale the scores to emphasize the higher-scoring phrases and choose randomly from the full bank using scores as probability weightings, or to take the set of top scorers and make a choice among those based on their normalized probabilities. This procedure serves as a way to balance the creativity and novelty of the system's output with its responsiveness to the demands of context. 160 4.3 Macroscopic Structure The macroscopic structure is simpler and largely deterministic, following the ba- sic qaida form outlined above. Playback is implemented in Pd, and is described further in Section 4.4. The patch controls the alternation between theme and variation, requests variations from the Python generator, controls the periodic opening and closing of the bayan strokes, and generates the audio. An accom- panying nagma marks the cycle. Feature preferences for the variation requests are speci ed manually with a set of sliders. Modeling of longer-term structure is minimal; the manual controls provided allow a user to take the place of a fuller model. It should be noted, however, that the user need not be highly skilled, or even particularly knowledgeable with respect to tabla or qaida. 4.4 Audio Output Synthesis of the generated qaida was accomplished using high-quality isolated samples of tabla strokes, played by a professional tabla player and recorded speci cally for this project. Several timbrally consistent samples were recorded for each stroke type, one of which was selected at random at each playback command. Amplitudes were scaled by durations, to mimic the lighter touch that is generally used when playing fast sequences. The quality and consistency of the recordings was re ected in the audio output; the only signi cant shortcoming remains a lack of bayan modulation. 5 Evaluation 3.5 4 4.5 5 5.5 6 6.5 7 human 2 human 1 computer 3 computer 2 computer 1 segment response value Fig. 4. Plot showing mean values and con dence intervals for responses to Question 1: \To what extent would you say that this recording demonstrates a feeling of musical- ity?" An online survey was conducted, in which three recordings of generated out- put were presented alongside two recordings by a world-class tabla player, with- 161 out indication of the origin of the recordings; participants were simply asked to make a series of judgements, unaware that the survey involved compari- son of human playing and computer modeling. The survey can be found at http://paragchordia.com/survey/tablasurvey/, and the audio clips of both computer-generated output and professional tabla performance can be heard separately at http://www.alexrae.net/thesis/sound/, the rst three being the qaida model's output, as in the results presented here. The recordings of model output were \played" via the user interface implemented in Pd, and were recorded without subsequent editing. A total of 70 participants responded to the survey. A majority claimed mod- erate to high familiarity with tabla music, and many reported themselves to be practicing tabla players. The mean age was 35.2, with a standard deviation of 12.2. The order of presentation of audio segments was randomized, and partic- ipants were asked to rate the examples along several dimensions. Judgements were on a scale of 1 to 7, re ecting answers ranging from \very little" to \a lot", except in case of the last two questions, phrased as ranging from \poor" to \excellent". A higher value corresponded to a more favorable judgment. Re- spondents were invited to supplement their quantitative judgements with further comments. 3.5 4 4.5 5 5.5 6 6.5 7 human 2 human 1 computer 3 computer 2 computer 1 segment response value Fig. 5. Plot showing mean values and con dence intervals for responses to Question 2: \To what extent would you say that this recording demonstrates musical creativity?" Participants were asked the following questions: 1. To what extent would you say that this recording demonstrates a feeling of musicality? 2. To what extent would you say that this recording demonstrates musical creativity? 3. To what extent would you say that this recording adheres to qaida form? 4. To what extent would you say that this recording is novel or surprising, given the qaida theme? 5. To what extent would you say that the improvisations in this recording are appropriate to the style and the theme? 162 6. If told that this recording were of a tabla student, how would you rate his/her overall TECHNICAL abilities? 7. If told that this recording were of a tabla student, how would you rate his/her overall MUSICAL abilities? Figures 4{6 show mean values and con dence intervals of the judgement scores for each audio segment, adjusted for multiple comparisons using the Dunn- Sidak correction (p < 0:05). A trend is visible in the average values of the data across the examples, showing the computer generated output to be rated slightly lower than the human generated excerpts. However, the di erences do not reach statistical signi cance given the sample size, except in the case of the third generated qaida, which in many cases is rated somewhat lower than the other model outputs. Judgements of musical creativity, question 2, are notable, as two of the qaida model's outputs were ranked on par with the human performer. The model was rated similarly highly on judgements of novelty. These results are encouraging: the computer-generated qaida performed quite well in comparison to very high-quality human-played examples. 3.5 4 4.5 5 5.5 6 6.5 7 human 2 human 1 computer 3 computer 2 computer 1 segment response value Fig. 6. Plot showing mean values and con dence intervals for responses to Question 4: \To what extent would you say that this recording is novel or surprising, given the qaida theme?" It is also interesting to note from the comments that many respondents re- mained unaware that three of the examples were computer-generated. One, for example, wrote in response to example 3: \Again this recording demonstrates that the Tabla player has excellent abilities in playing the right drum with crisp tonal quality. Left drum (Baya) needs some improvement as I stated in the rst two Qaidas." Some comments focused more directly on the style or quality, for example \Good presentation of Purab / Benaras style kayda. Great speed. Nice overall sound" (excerpt 2), and \Very nicely done" (excerpt 3). Only one re- spondent clearly deduced the origin of the model's output, writing simply \The synthesized nature of this piece limits its ability to be musical." Criticism was not reserved for the generated recordings. One respondent commented that excerpt 4 \sounded too mechanical and devoid of emotion," and another that \The Tirak- 163 itas at the start [of example 5] sound very odd and clumsy!" Most comments for examples 4 and 5, however, were clearly positive. 6 Conclusion The results of our survey suggest that the qaida model has been successful in producing improvisatory music which is heard as creative. There is, of course, much work to be done, ranging from addressing de ciencies in playback cited by a number of respondents, such as the lack of bayan modulation, to incorporating a more robust model of sculpting a larger contour. However it is encouraging and quite interesting to see how e ective the methods employed in this model have been. References 1. David Cope. Experiments in Musical Intelligence. A-R Editions, Madison, WI, 1996. 2. David Cope. Virtual Music: Computer Synthesis of Musical Style. MIT Press, Cambridge MA, 2001. 3. David Cope. Computer Models of Musical Creativity. MIT Press, Cambridge MA, 2005. 4. Mihaly Csikszentmihalyi. Creativity: Flow and the Psychology of Discovery and Invention. Harper Collins, New York, 1996. 5. Arne Eigenfeldt. The creation of evolutionary rhythms within a multi-agent net- worked drum ensemble. In Proceedings of the International Computer Music Con- ference, pages 267{270, Copenhagen, Denmark, 2007. 6. Ronald A. Fink, Thomas B. Ward, and Steven M. Smith. Creative Cognition: Theory, Research, and Applications. MIT Press, Cambridge, MA, 1992. 7. Howard Gardner. Intelligence reframed: multiple intelligences for the 21st century. Basic Books, New York, 1999. 8. Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scienti c tools for Python, 2001{present. 9. Francois Pachet. The Continuator: Musical interaction with style. Journal of New Music Research, 32(3):333{41, 2003. 10. Pure Data | PD community site. http://puredata.info, (accessed March 2009). 11. Francisco Cmara Pereira. Creativity and arti cial intelligence: a conceptual blend- ing approach. Walter de Gruyter, 2007. 12. Curtis Roads. The Computer Music Tutorial. MIT Press, Cambridge, MA, 1998. 13. Robert J. Sternberg and Janet E. Davidson. The mind of the puzzler. Psychology Today, 16:37{44, October 1982. 14. Gert-Matthias Wegner. Vintage Tabla Repertory. Munshiram Manohalal Publish- ers Pvt. Ltd., New Delhi, India, 2004. 164