Real-Time Emotion-Driven Music Engine
Alex Rodrguez Lopez, Antonio Pedro Oliveira, and Amlcar Cardoso
Centre for Informatics and Systems, University of Coimbra, Portugal
lopez@student.dei.uc.pt, apsimoes@student.dei.uc.pt, amilcar@dei.uc.pt
Abstract. Emotion-Driven Music Engine (EDME) is a computer sys-
tem that intends to produce music expressing a desired emotion. This
paper presents a real-time version of EDME, which turns it into a stand-
alone application. A real-time music production engine, governed by a
multi-agent system, responds to changes of emotions and selects the more
suitable pieces from an existing music base to form song-like structures,
through transformations and sequencing of music fragments. The music
base is composed by fragments classied in two emotional dimensions:
valence and arousal. The system has a graphic interface that provides a
front-end that makes it usable in experimental contexts of dierent sci-
entic disciplines. Alternatively, it can be used as an autonomous source
of music for emotion-aware systems.
1 Introduction
Adequate expression of emotions is a key factor in the ecacy of creative activi-
ties [16]. A system capable of producing music expressing a desired emotion can
be used to inuence the emotional experience of the target audience. Emotion-
Driven Music Engine (EDME) was developed with the objective of having such
a capability. The high modularity and parameterization of EDME allows it to
be customized for dierent scenarios and integrated into other systems.
EDME can be controlled by the user or used in an autonomous way, depend-
ing on the origin of the input source (an emotional description). A musician
can use our system as a tool to assist the process of composition. Automatic
soundtracks can be generated for other systems capable of making an emotional
evaluation of the current context (i.e., computer-games and interactive media,
where the music needs to change quickly to adapt to an ever-changing context).
The input can be fed from ambient intelligence systems. Sensing the environment
allows the use in installations where music reacts to the public. In a healthcare
context, self-report measures or physiological sensors can be used to generate
music that reacts to the state of the patient.
The next section makes a review of related work. Section 3 presents our
computer system. Section 4 draws some conclusions and highlights directions
for further work.
150
2 Related Work
The developed system is grounded on research made in the areas of computer
science and music psychology.
Systems that control the emotional impact of musical features usually work
through the segmentation, selection, transformation and sequencing of musical
pieces. These systems modify emotionally-relevant structural and performative
aspects of music [4, 11, 22], by using pre-composed musical scores [11] or by
making musical compositions [3, 10, 21].
Most of these systems are grounded on empirical data obtained from works of
psychology [8, 19]. Scherer and Zentner [18] established parameters of inuence
for the experienced emotion. Meyer [13] analyzed structural characteristics of
music and its relation with emotional meaning in music. Some works have tried
to measure emotions expressed by music and to identify the eect of musical
features on emotions [8, 19]. From these, relations can be established between
emotions and musical features [11].
3 System
EDME works by combining short MIDI segments into a seamless music stream
that expresses the emotion given as input. When the input changes, the system
reacts and smoothly fades to music expressing the new emotion.
There are two stages (Fig. 1). At the o-line stage, pre-composed music is
segmented and classied to build a music base (Section 3.1); this makes sys-
tem ready for the real-time stage, which deals with selection, transformation,
sequencing and synthesis (Section 3.2). The user interface lets the user select
in dierent ways the emotion to be expressed by music. Integration with other
systems is possible by using dierent sources as the input (Section 3.3).
3.1 O-line stage
Pre-composed MIDI music (composed on purpose, or compiled as needed) is
input to a segmentation module. An adaptation of LBDM [2] is used to at-
tribute weights according to the importance and degree of proximity and change
of ve features: pitch, rhythm, silence, loudness and instrumentation. Segmen-
tation consists in a process of discovery of fragments, by looking to each note
onset with the higher weights. Fragments that result are input to a feature ex-
traction module. These musical features are used by a classication module that
grades the fragments in two emotional dimensions: valence and arousal (plea-
sure and activation). Classication is done with the help of a knowledge base
implemented as two regression models that consist of weighted relations between
each emotional dimension and music features [14]. Regression models are used to
calculate the values of each emotional dimension through a weighted sum of the
features obtained by the module of features extraction. MIDI music emotionally
classied is then stored in a music base.
151
Real-Time Stage
Off-line Stage
Desired
Emotion
Music
Selection
Music
Transformation
Music
Sequencing
Music
Synthesis
Music Base
Music
Features
Extraction
Music
Segmentation
Pre-composed
Music
Knowledge
Base Pattern Base
Listener
Music
Classification
Fig. 1. The system works in two stages.
3.2 Real-Time Stage
Real-time operation is handled by a multi-agent system, where agents with dif-
ferent responsabilities cooperate in simultaneous tasks to achieve the goal of
generating music expressing desired emotions. Three agents are used: an input
agent, which handles commands between other agents and user interface; a se-
quencer agent, that selects and packs fragments to form songs; and a synthesizer
agent, which deals with the selection of sounds to convert the MIDI output from
the sequencer agent into audio.
In this stage, the sequencer agent has important responsabilities. This agent
selects music fragments with the emotional content closer to the desired emotion.
It uses a pattern-based approach to construct songs with the selected fragments.
Each pattern denes a song structure and the harmonic relations between the
parts of this structure (i.e., popular song patterns like AABA). Selected frag-
ments are arranged to match the tempo and pitch of a selected musical pattern,
through transformations and sequencing. The fragments are scheduled in order
to make their perception as one continuous song during each complete pattern.
This agent also crossfades between patterns and when there is a change in the
emotional input, in order to allow a smooth listening experience.
3.3 Emotional Input
The system can be used under user control with an interface or act autonomously
with other input. The input species values of valence and arousal.
User Interface. The user interface serves the purpose of letting the user choose
in dierent ways the desired emotion for the generated music. It is possible for
the user to directly type the values of valence and arousal the music should have.
152
Other way is through a list of discrete emotion the user can choose from. It
is possible to load several lists of words denoting emotions to t dierent uses
of the system. For example, Ekman [6] has a list of generally accepted basic
emotions. Russell [17] and Mehrabian [12] both have lists which map specic
emotions to dimensional values (using 2 or 3 dimensions). Juslin and Laukka [9]
propose a specic list for emotions expressed by music.
Another way to choose the aective state of the music is through a graphical
representation of the valence-arousal aective space, based on FeelTrace [5]: a
circular space with valence dimension is in the horizontal axis and the arousal
dimension in the vertical axis. The coloring follows that of Plutchik's circumplex
model [15].
Other Input. EDME can stand as an autonomous source of music for other
systems by taking their output as emotional input
With the growing concern on computational models of emotions and aective
systems, and a demand for interfaces and systems that behave in an aective
way, it is becoming frequent to adapt systems to show or perceive emotions.
EmoTag [7] is an approach to automatically mark up aective information in
texts, marking sentences with emotional values. Our system can serve the musical
needs of such systems by taking their emotional output as the input for real-time
soundtrack generation.
Sensors can serve as input too. Francisco et al. [20] presents an installation
that allows people to experience and inuence the emotional behavior of their
system. EDME is used in this interactive installation to provide music according
to values of valence and arousal.
4 Conclusion
Real-time EDME is a tool that produces music expressing desired emotions that
has application in theatre, lms, video-games and healthcare contexts. Currently,
we have applied our system in an aective installation [20]. The real-time usage
of the system by professionals of music therapy and the integration of EDME
with EmoTag [7] for emotional soundtrack generation are also being analysed.
The extension of EDME to an agent-based system increased its scability, which
makes easier its expansion and integration with external systems. Listening tests
are needed to assess the uentness of obtained songs.
<references_biblio/>
References
1. Bresin, R., Friberg, A.: Emotional Coloring of Computer Controlled Music Perfor-
mance. Computer Music Journal, 24(4), pp. 44{62 (2000)
2. Cambouropoulos, E.: The Local Boundary Detection Model (LBDM) and its Appli-
cation in the Study of Expressive Timing. International Computer Music Conference
(2001)
153
3. Casella, P., Paiva, A.: Magenta: An Architecture for Real Time Automatic Compo-
sition of Background Music. International Workshop on Intelligent Virtual Agents,
pp. 224{232. Springer (2001)
4. Chung, J., Vercoe, G.: The Aective Remixer: Personalized Music Arranging. Con-
ference on Human Factors in Computing Systems, pp. 393{398. ACM Press New
York (2006)
5. Cowie, R.: Feeltrace: An Instrument for Recording Perceived Emotion in Real Time.
SpeechEmotion, pp. 19{24 (2000)
6. Ekman, P.: Basic Emotions. In: Dalgleish, T., Power, M. (eds.) Handbook of Cog-
nition and Emotion. Wiley, New York (1999)
7. Francisco, V., Hervas, R.: EmoTag: Automated Mark Up of Aective Information
in Texts. EUROLAN 2007 Summer School Doctoral Consortium, pp. 5{12 (2007)
8. Gabrielsson, A., Lindstrom, E.: The Inuence of Musical Structure on Emotional
Expression. Music and Emotion: Theory and Research, pp. 223{248. Oxford Uni-
versity Press (2001)
9. Juslin, P., Laukka, P.: Expression, Perception, and Induction of Musical Emotions:
A Review and a Questionnaire Study of Everyday Listening. Journal of New Music
Research, 33(3), pp. 217{238 (2004)
10. Kim, S., Andre, E.: Composing Aective Music with a Generate and Sense Ap-
proach. Flairs - Special Track on AI and Music. AAAI Press (2004)
11. Livingstone, S.R., Muhlberger, R., Brown, A.R., Loch, A.: Controlling Musical
Emotionality: An Aective Computational Architecture for Inuencing Musical
Emotions. Digital Creativity, 18. Taylor and Francis (2007)
12. Mehrabian, A.: Basic Dimensions for a General Psychological Theory. Cambridge:
OG&H Publishers (1980)
13. Meyer, L.: Emotion and Meaning in Music. University of Chicago Press (1956)
14. Oliveira, A., Cardoso, A. Modeling Aective Content of Music: A Knowledge Base
Approach. Sound and Music Computing Conference (2008)
15. Plutchik, R. A general psychoevolutionary theory of emotion. Emotion: Theory,
research, and experience: Vol. 1. Theories of emotion, 3-33. New York: Academic
(1980)
16. Russ, S. Aect and Creativity: The Role of Aect and Play in the Creative Process.
Lawrence Erlbaum Associates Inc, US. (1993)
17. Russell, J.: Measures of Emotion. Emotion: Theory, Research, and Experience, 4,
pp. 83{111 (1989)
18. Scherer, K., Zentner, M.: Emotional eects of music: Production rules. Music and
emotion: Theory and research, pp. 361392 (2001)
19. Schubert, E.: Measurement and Time Series Analysis of Emotion in Music. PhD
thesis. University of New South Wales (1999)
20. Ventura F., Oliveira A., Cardoso, A.: An Emotion-Driven Interactive System. 14th
Portuguese Conference on Articial Intelligence, pp- 167-178 (2009)
21. Wassermann, K., Eng, K., Verschure, P., Manzolli, J.: Live soundscape composition
based on synthetic emotions. IEEE Multimedia, pp. 82{90 (2003)
22. Winter, R.: Interactive Music: Compositional Techniques for Communicating Dif-
ferent Emotional Qualities. Masters Thesis. University of York (2005)
154
Tabla Gyan: An Articial Tabla Improviser
Parag Chordia and Alex Rae
Georgia Institute of Technology, Atlanta GA, 30332 USA
ppc@gatech.edu, arae3@gatech.edu
http://paragchordia.com/
Abstract. We describe Tabla Gyan, a system capable of improvising
tabla solo music, a sophisticated percussion tradition from North India.
The system is based on a generative model of the qaida, a central form
in tabla solo based on thematic variation. The system uses a recombi-
native process of variation generation, and lters the results according
to rhythmic and timbral characteristics of each phrase. The sequences
are used to generate audio in realtime using pre-recorded tabla samples.
An evaluation of the system was conducted with seventy users, primar-
ily experienced tabla performer and listeners. With respect to qualities
such as musicality, novelty, adherence to stylistic norms, and technical
ability, the computer-generated performances compared favorably with
performances by a world-class tabla player.
1 Introduction
This work aims to explore computational models of creativity, realizing them
in a system designed for realtime generation of improvised music. This is envi-
sioned as an attempt to develop musical intelligence in the context of structured
improvisation, and by doing so to enable and encourage new forms of musical
control and performance. A model of qaida, a traditional north Indian solo tabla
form, is presented along with the results of an online survey comparing it to
a professional tabla player's recording on dimensions of musicality, creativity,
and novelty. The model is based on generating a bank of variations and ltering
according to musical qualities.
2 Background
2.1 Theories of Creativity
This work is fundamentally motivated by an interest in exploring computational
models of creativity. There have been many attempts to characterize the basic
nature of creativity, and here we identify some key insights.
Mihaly Csikszentmihalyi [4] outlined a theory formulating creativity as a
concept arising from the interaction of a domain, such as music or a particular
musical genre, the individual who produces some possibly creative work, and
the eld within which the work is judged. One signicance of this is that it
155
moves creativity from being a purely individual characteristic to one largely the
product of external interactions; notably, the nal determination of whether the
individual has been creative rests on the judgement of peers.
Many theories are based in the idea of multiple creativities. Geneplore [6], for
example, models creativity as comprised of a generative phase in which a large
set of potential materials is amassed, and an exploratory phase in which this set
is explored and interpreted. There is notable similarity between this and elements
of our system described in Sections 4.1 and 4.2. Sternberg presents a theory[13]
that represents creativity in terms of three processes for nding insights in large
quantities of information: selective encoding, combination, comparison. Insights
found by ltering information are then combined to generate new insights, which
in turn are compared to previous or distant insights to create yet another in-
sight. Gardner [7] also addresses creativity, characterizing it as the production
of novelty within a domain, similarly to Csikszentmihalyi's approach.
More practical but equally valid denitions have focused on the concept of
novelty. A common formulation denes creativity as an action or process which
produces novel output that satises the constraints of context [3]. Addressing
the basis for judging whether an articial system could be considered creative,
Pereira [11] identies the requirements that when given a problem, answers pro-
duced by the system should not replicate previous solutions of which it has
knowledge, and should apply acceptably to the problem. These are notably sim-
ilar conceptualizations of creativity, and share the idea that the existence of
creativity can, and should, be evaluated on the basis of the product.
2.2 Machine Musicianship
Many systems have been developed which can claim to involve computational
creativity. We mention here a few in order to indicate the range of approaches
and goals which others have undertaken.
The Continuator [9], developed by Francois Pachet tries to come up with im-
provisatory responses to human pianist's playing, using weighted random draws
from a prex tree built from phrases detected in the audio input. Arne Eigen-
feldt's multi-agent \Kinetic Engine" [5] models the interactions between net-
worked improvising agents in terms of both musical features and social dynamics,
allowing shared parameters such as tempo and overall contour to be controlled
by a \conductor" agent. David Cope's long-running project Experiments in Mu-
sical Intelligence (EMI) focuses on faithful emulations of styles in the Western
classical canon [1]. His approach focuses on analyzing a large corpus of works
to extract patterns which encode the main elements of the style, recombining
them to create derivative works [2]. Cope has written and worked extensively
in this eld, and identies a number of basic elements which he determines to
be central to computational creativity, specically calling out pattern-matching
and recombinance [3].
156
3 Introduction to Tabla
Tabla is the predominant percussion instrument of North India. Physically, tabla
is actually a pair of drums, as seen in Figure 1. It is played with the hands and
ngers, and each drum is associated with one hand. The right-hand drum, called
the tabla or dayan, is higher in pitch than the left-hand drum, or bayan. Both
drums are capable of producing a variety of distinct timbres, ranging from ringing
sounds with a clear pitch to short sharp sounds with a high noise content. There
are specic striking techniques for producing each of the dierent timbres, known
generally as strokes, and each is named. There are three broad classes of strokes:
resonant strokes with a clear pitch and ringing tone, shorter non-resonant noisy
strokes, and bass strokes produced on the bayan. Individual strokes and common
short phrases are known as bols, and form the building-blocks of larger phrases.
Improvisation in tabla music takes places within a rhythmic cycle which denes
a large-scale periodicity, consisting of a set number of beats. The most common
cycle is Teental, consisting of sixteen beats. To make the cycle easier to perceive,
bayan strokes on certain beats are damped, and are referred to as \closed".
Strokes in which the bass is allowed to sound are referred to as \open".
Fig. 1. A tabla. The drum on the left is the bayan; the drum on the right is the dayan.
There is a rich tradition of solo tabla performance. The tabla is then usually
accompanied by a melodic instrument that plays a repeated gure known as
nagma which occupies the role of a timekeeper. One of the most prominent
compositional forms presented in a solo tabla performance is qaida, a structured
improvisation consisting of a theme and variations form [14]. The theme upon
which a given qaida performance is built is composed of a series of subphrases,
and is taken as a xed composition. The macroscopic form of qaida follows a
fairly simple structure: introduction of the theme, development of variations at
an increased tempo, conclusion. Within the main body, variations are presented
in a structured manner: a variation is introduced, the theme is reiterated, the
same variation is repeated with closed bayan, and nally the theme is played
again with closed bayan, often re-opening it shortly before the end of the cycle.
157
While qaida themes are part of the shared repertoire of solo tabla, variations
are improvised according to some basic principles. The most important guiding
principle of qaida variation is a restriction: only bols which appear in the qaida
theme may be used in the variations. This is intended to preserved the essential
character of the given qaida. Given this limitation, one common and eective
variation technique is to rearrange subsections of the theme.
4 Methods
Theme
Stochastic
shuffler
Feature
extractor
Phrase
chooser
Feature
profile
Audio
Theme features
bank
Domain knowledge
Fig. 2. Overview of the Qaida variation architecture. The theme bank is greyed-out
because the choice of theme is made only once, initially. Domain knowledge, specic
knowledge about qaida and tabla, is shown being incorporated at specic points in the
process.
The design of the system centers around complementary processes of vari-
ation generation and variation selection. A database of potential variations is
built through a stochastic process, and phrases are selected from that set based
on certain criteria. This bears some semblance to the technique known in algo-
rithmic composition as \generate-and-test" [12], however in our case the criteria
are treated more probabilistically, as a basis for the system to make a choice
with a some indeterminacy but weighted heavily towards a desired outcome.
Consistent with the fact that qaida themes are not themselves improvised,
and rarely even composed by the performer, no attempt was made to generate
new thematic material. Instead, a number of traditional themes were transcribed
manually and annotated with partition bounds. A bank of of these themes is
stored in XML format, and one theme is chosen at start-up which remains the
only source material for the duration of the qaida improvisation.
The core of the system was coded in Python, relying on the NumPy and
SciPy [8] packages for performance intensive computation. Audio output was
generated using Pure Data (Pd) [10]. An overview of this system is shown in
Figure 2.
158
4.1 Variation Generation
A bank of phrases is generated from the chosen theme by applying transfor-
mations consistent with qaida theory, and then stochastically applying another
set of operations to bias the population towards more stylistically appropriate
content. An overview of these operations is shown in Figure 3. The size of the
phrase database is set in advance, and is far smaller than the set of all possible
variations given the transforms. Clearly, a larger database is preferable in that
it will contain a greater diversity of material; however, the feature extraction
and phrase selection processes described in Section 4.2 scale with the size of the
database, and computational eciency is critical within a real-time architec-
ture. A bank of two thousand phrases was used during much of the development
process, and it was qualitatively found that this size contained sucient phrase
diversity to support varied and novel output. A given variation is constructed
by applying the transforms, and accepting or rejecting the result based on the
constraint that the variation have the same metrical duration as the original.
This process is repeated until a bank of the specied size has been constructed.
There are two main transforms used: re-ordering of the theme partitions, and
repetition at doubled tempo. The rst assembles a variation by sampling with
replacement from the set of partitions. For eciency, the number of possible
partitions in the new phrase is limited to the range within which generated
phrases of the required length are possible. The second transform simply selects
a partition at random and repeats it twice at double the speed. A parameter
controls the relative likelihood of applying one or the other of these operations.
Require
metrical length
Random shuffling,
no style constraint
Preserve
cadence
Repeat
sub-phrases
Introduce
rests, fills
Domain knowledge
Fig. 3. Detail of the Qaida variation generating architecture. The reordering routine
is depicted here, constrained by metrical length and incorporating domain knowledge
in the form of added transformations.
An additional set of three transformations may then be applied, each with
an independent probability. They function to bias the phrase bank toward more
style-specic characteristics. They are intended to favor multiple occurrences
of the same partition (non-consecutive repetition), consecutive repetitions of a
partition and preservation of the nal partition (cadence).
159
Lastly, a nal transform that introduces short rests into the phrases may be
applied at any time. This operation is essential to break the homogeneity which
tends to emerge over time, but it can also disturb the coherence of a phrase.
For this reason it is reserved for use in the more \complicated" sections of qaida
development, and may be applied to an existing phrase bank.
4.2 Variation Selection
Selection of a phrase from the bank is initiated by a request for a phrase with
a desired set of features. In response, phrases in memory are compared against
the request, a close match is selected, and the system returns a single phrase for
playback.
Immediately after the phrase bank is rst built, features are calculated over
each phrase in the set. It was found that a relatively small set of features could
provide a surprisingly exible handle into the character of the returned phrases,
though a larger set would no doubt improve the range of performance. The cur-
rently calculated features are distribution over each stroke type, by frequency of
occurrence and by time, ratio of open to closed strokes, by frequency of occur-
rence and by time, rhythmic density, spectral centroid, and spectral spread.
Note that these are not all of equivalent dimensionality | rhythmic density,
open/closed ratios, and spectral centroid are scalar values, while the distribu-
tions over stroke types are vectors. For the most part, these are in eect timbral
features, due to the correspondence between stroke types and timbre. The spec-
tral centroid and spread require more explanation. The features themselves are
uncomplicated, but up to this point we have been dealing with symbolic data
only. However, the sequences are destined for playback on a known set of sounds,
so in this step we calculate average values over the same audio database of seg-
mented tabla strokes which is used in playback. This gives us a quantitative
estimate of the timbre we expect when a phrase is synthesized.
The feature preferences dened in the request for a variation can describe
any subset of the above features, and specify three values for each: the target
value, a relative weighting for this feature, and a \exibility" measure. The
target value, expressed in the range 0 to 1, is normalized to the range present in
the current bank of variations. The exibility parameter functions as a sort of
distance metric, an alternative to simple linear distance. It denes the width of
a Gaussian centered on the target value, which is then used as a look-up table
to get the unweighted score for that phrase and feature.
A score is calculated for each phrase in the bank of variations. Rather than
always choose the best match, which would lead to a deterministic output, the
choice is made probabilistically. The two most successful algorithms are to rescale
the scores to emphasize the higher-scoring phrases and choose randomly from the
full bank using scores as probability weightings, or to take the set of top scorers
and make a choice among those based on their normalized probabilities. This
procedure serves as a way to balance the creativity and novelty of the system's
output with its responsiveness to the demands of context.
160
4.3 Macroscopic Structure
The macroscopic structure is simpler and largely deterministic, following the ba-
sic qaida form outlined above. Playback is implemented in Pd, and is described
further in Section 4.4. The patch controls the alternation between theme and
variation, requests variations from the Python generator, controls the periodic
opening and closing of the bayan strokes, and generates the audio. An accom-
panying nagma marks the cycle. Feature preferences for the variation requests
are specied manually with a set of sliders. Modeling of longer-term structure is
minimal; the manual controls provided allow a user to take the place of a fuller
model. It should be noted, however, that the user need not be highly skilled, or
even particularly knowledgeable with respect to tabla or qaida.
4.4 Audio Output
Synthesis of the generated qaida was accomplished using high-quality isolated
samples of tabla strokes, played by a professional tabla player and recorded
specically for this project. Several timbrally consistent samples were recorded
for each stroke type, one of which was selected at random at each playback
command. Amplitudes were scaled by durations, to mimic the lighter touch that
is generally used when playing fast sequences. The quality and consistency of the
recordings was reected in the audio output; the only signicant shortcoming
remains a lack of bayan modulation.
5 Evaluation
3.5 4 4.5 5 5.5 6 6.5 7
human 2
human 1
computer 3
computer 2
computer 1
segment
response value
Fig. 4. Plot showing mean values and condence intervals for responses to Question 1:
\To what extent would you say that this recording demonstrates a feeling of musical-
ity?"
An online survey was conducted, in which three recordings of generated out-
put were presented alongside two recordings by a world-class tabla player, with-
161
out indication of the origin of the recordings; participants were simply asked
to make a series of judgements, unaware that the survey involved compari-
son of human playing and computer modeling. The survey can be found at
http://paragchordia.com/survey/tablasurvey/, and the audio clips of both
computer-generated output and professional tabla performance can be heard
separately at http://www.alexrae.net/thesis/sound/, the rst three being
the qaida model's output, as in the results presented here. The recordings of
model output were \played" via the user interface implemented in Pd, and were
recorded without subsequent editing.
A total of 70 participants responded to the survey. A majority claimed mod-
erate to high familiarity with tabla music, and many reported themselves to be
practicing tabla players. The mean age was 35.2, with a standard deviation of
12.2. The order of presentation of audio segments was randomized, and partic-
ipants were asked to rate the examples along several dimensions. Judgements
were on a scale of 1 to 7, reecting answers ranging from \very little" to \a
lot", except in case of the last two questions, phrased as ranging from \poor"
to \excellent". A higher value corresponded to a more favorable judgment. Re-
spondents were invited to supplement their quantitative judgements with further
comments.
3.5 4 4.5 5 5.5 6 6.5 7
human 2
human 1
computer 3
computer 2
computer 1
segment
response value
Fig. 5. Plot showing mean values and condence intervals for responses to Question 2:
\To what extent would you say that this recording demonstrates musical creativity?"
Participants were asked the following questions:
1. To what extent would you say that this recording demonstrates a feeling of
musicality?
2. To what extent would you say that this recording demonstrates musical
creativity?
3. To what extent would you say that this recording adheres to qaida form?
4. To what extent would you say that this recording is novel or surprising, given
the qaida theme?
5. To what extent would you say that the improvisations in this recording are
appropriate to the style and the theme?
162
6. If told that this recording were of a tabla student, how would you rate his/her
overall TECHNICAL abilities?
7. If told that this recording were of a tabla student, how would you rate his/her
overall MUSICAL abilities?
Figures 4{6 show mean values and condence intervals of the judgement
scores for each audio segment, adjusted for multiple comparisons using the Dunn-
Sidak correction (p < 0:05). A trend is visible in the average values of the data
across the examples, showing the computer generated output to be rated slightly
lower than the human generated excerpts. However, the dierences do not reach
statistical signicance given the sample size, except in the case of the third
generated qaida, which in many cases is rated somewhat lower than the other
model outputs. Judgements of musical creativity, question 2, are notable, as
two of the qaida model's outputs were ranked on par with the human performer.
The model was rated similarly highly on judgements of novelty. These results are
encouraging: the computer-generated qaida performed quite well in comparison
to very high-quality human-played examples.
3.5 4 4.5 5 5.5 6 6.5 7
human 2
human 1
computer 3
computer 2
computer 1
segment
response value
Fig. 6. Plot showing mean values and condence intervals for responses to Question
4: \To what extent would you say that this recording is novel or surprising, given the
qaida theme?"
It is also interesting to note from the comments that many respondents re-
mained unaware that three of the examples were computer-generated. One, for
example, wrote in response to example 3: \Again this recording demonstrates
that the Tabla player has excellent abilities in playing the right drum with crisp
tonal quality. Left drum (Baya) needs some improvement as I stated in the rst
two Qaidas." Some comments focused more directly on the style or quality, for
example \Good presentation of Purab / Benaras style kayda. Great speed. Nice
overall sound" (excerpt 2), and \Very nicely done" (excerpt 3). Only one re-
spondent clearly deduced the origin of the model's output, writing simply \The
synthesized nature of this piece limits its ability to be musical." Criticism was not
reserved for the generated recordings. One respondent commented that excerpt 4
\sounded too mechanical and devoid of emotion," and another that \The Tirak-
163
itas at the start [of example 5] sound very odd and clumsy!" Most comments for
examples 4 and 5, however, were clearly positive.
6 Conclusion
The results of our survey suggest that the qaida model has been successful in
producing improvisatory music which is heard as creative. There is, of course,
much work to be done, ranging from addressing deciencies in playback cited by
a number of respondents, such as the lack of bayan modulation, to incorporating
a more robust model of sculpting a larger contour. However it is encouraging
and quite interesting to see how eective the methods employed in this model
have been.
References
1. David Cope. Experiments in Musical Intelligence. A-R Editions, Madison, WI,
1996.
2. David Cope. Virtual Music: Computer Synthesis of Musical Style. MIT Press,
Cambridge MA, 2001.
3. David Cope. Computer Models of Musical Creativity. MIT Press, Cambridge MA,
2005.
4. Mihaly Csikszentmihalyi. Creativity: Flow and the Psychology of Discovery and
Invention. Harper Collins, New York, 1996.
5. Arne Eigenfeldt. The creation of evolutionary rhythms within a multi-agent net-
worked drum ensemble. In Proceedings of the International Computer Music Con-
ference, pages 267{270, Copenhagen, Denmark, 2007.
6. Ronald A. Fink, Thomas B. Ward, and Steven M. Smith. Creative Cognition:
Theory, Research, and Applications. MIT Press, Cambridge, MA, 1992.
7. Howard Gardner. Intelligence reframed: multiple intelligences for the 21st century.
Basic Books, New York, 1999.
8. Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientic
tools for Python, 2001{present.
9. Francois Pachet. The Continuator: Musical interaction with style. Journal of New
Music Research, 32(3):333{41, 2003.
10. Pure Data | PD community site. http://puredata.info, (accessed March 2009).
11. Francisco Cmara Pereira. Creativity and articial intelligence: a conceptual blend-
ing approach. Walter de Gruyter, 2007.
12. Curtis Roads. The Computer Music Tutorial. MIT Press, Cambridge, MA, 1998.
13. Robert J. Sternberg and Janet E. Davidson. The mind of the puzzler. Psychology
Today, 16:37{44, October 1982.
14. Gert-Matthias Wegner. Vintage Tabla Repertory. Munshiram Manohalal Publish-
ers Pvt. Ltd., New Delhi, India, 2004.
164