Clap-along: A Negotiation Strategy for Creative
Musical Interaction with Computational Systems
Michael Young and Oliver Bown
1 Music Department, Goldsmiths, University of London, New Cross, London SE14
6NW, UK
m.young@gold.ac.uk
2 Centre for Electronic Media Art, Monash University, Clayton 3800, Australia
oliver.bown@infotech.monash.edu.au
Abstract. This paper describes Clap-along, an interactive system for
theorising about creativity in improvised musical performance. It ex-
plores the potential for negotiation between human and computer par-
ticipants in a cyclical rhythmic duet. Negotiation is seen as one of a set of
potential interactive strategies, but one that ensures the most equitable
correspondence between human and machine. Through mutual negotia-
tion (involving listening/feature extraction and adaptation) the two par-
ticipants attempt to satisfy their own and each other's target outcome,
without knowing the other's goal. Each iteration is evaluated by both
participants and compared to their target. In this model of negotiation,
we query the notion of `ow' as an objective of creative human-computer
collaboration. This investigation suggests the potential for sophisticated
applications for real-time creative computational systems.
1 Introduction
Music performance is a creative, `real-time' process that often entails collabo-
rative activity. Creative potential in a performance { i.e. the opportunity for
individual or collective actions that appear innovative { is contingent on con-
text and style, the presence of a priori agreements (whether explicit or tacit),
and other cultural and procedural elements [1]. Various aspects of performance
practice constrain or aord opportunities for immediate creative input, such as
recourse to pre-existing materials, the relative emphasis on individual respon-
sibility and the means by which information is exchanged while performing. In
human-computer performance the capacities of software to exchange informa-
tion with other participants and to take responsibility (i.e. act autonomously)
are highly signicant factors. Any given approach to these factors impacts greatly
on the performance practice as a whole.
Free collective improvisation is an eective testing ground for computer-
based creativity. The computer must be able to produce sonic events that appear
intrinsically valid, and it must be able to collaborate appropriately (responsively
and proactively) with one or more human musicians. Both properties satisfy a
working denition of computational creativity, such as the ability to exhibit
215
behaviour \which would be deemed creative if exhibited by humans" [2] as well
as being \skillful, appreciative and imaginative" [3]. These criteria are only truly
satised if the system is not directly reliant on a human `user'; there must be an
equitable correspondence of ideas between collaborators. Such challenges are not
easily met, but at least apply equally well to human-only improvisation, where
the behaviour of one performer would never be expected to depend entirely on
another's contribution, or depend on rules agreed in advance. Ideally at least,
group improvisation avoids organisational procedures that determine or inuence
musical content, structure or the interactions and mutual dependencies between
performers. Implicit procedures may develop through a process of negotiation
in performance. As in other forms of process-orientated art, this process may
not be directed towards a known outcome. Rather, the process itself forms both
the central problem and focus of interest that both enables and constitutes the
performance.
We investigate negotiation as a musical process in a wider context of interac-
tion strategies that can demonstrate performance-based computational creativ-
ity. We devise a simple system, Clap-along, dened by a number of constraints,
which we believe demonstrates the challenges of performative, computational
creativity, and oers a promising model for future, more elaborate, computa-
tional systems for interactive musical performance.
2 Interaction strategies
We regard negotiation as a specic strategy for human-machine interaction, a
member of a larger set that includes the following list. (The terms `source' and
`result' are used to refer to actions in an asymmetric relationship, and may apply
equally to human and machine depending on context):
Shadowing: The source and result move together. There is a clear temporal
simultaneity that produces layering within a coordinated motion. The coordina-
tion of motion between a body and its shadow is simultaneous but also distorted,
because the shadow is projected into a dierent `geometrical' space. Various mu-
sical strategies for textural organisation (homophony, micro-polyphony) entail
shadowing. Real-time digital eects and simple interaction systems commonly
exhibit shadowing methods. Timbral matching techniques can be thought to
employ this strategy, as found in the Cata-RT [4] and Soundspotter systems [5].
The system as a whole may be weakly or strongly integrated, but in general
interactivity is likely to be readily veriable to participants and listeners.
Mirroring: The source is reected in the result. Synchronicity is not re-
quired. There is a more elaborate re-interpretation of information received from
the source than in shadowing, and this is more telling in the absence of temporal
synchrony. Innumerable compositional and improvisational approaches are anal-
ogous to mirroring, to be found in structural repetition, motivic development
and call-and-response strategies. Delay eects are elementary mirrors, project-
ing back an image of the musical source. Systems that seek to analyse an existing
style to generate music (e.g. by Markov modelling) are mirroring at a high level
216
of musical organisation, such as in the Continuator system [6], even though the
method may be sub-symbolic. Mirroring can establish a cohesive musical iden-
tity between source and result, and may also be readily veriable to participants
as an interactive process.
Coupling: Two sources are distinct but connected. There is { or appears
to be { mutual inuence, but the roles of `source' and `result' may be unstable
and unequally balanced. In live performance, coupling can constitute \proce-
dural liveness" and/or \aesthetic liveness" [7]. For instance, coupling can be
trivial and procedural, when one system controls another and receives appro-
priate feedback, as in the laptop-as-instrument paradigm. Or, it can be illusory
and \aesthetic", as in the (increasingly rare) genre of music for live instrument
with tape, where binding and apparently causal links between the performer
and tape are in reality entirely controlled and pre-determined. More abstract
couplings that use virtual modelling and dynamical systems oer a more open-
ended and genuine relationship between sources (agents), potentially integrating
the procedural and the aesthetic. Examples include music systems where sources
share a virtual environment, as in the Kinetic Engine [8] and Swarm Granulator
[9]. In coupling, equal relationships are possible, but verication of the degree of
true interaction is problematic.
Negotiating: The roles of `source' and `result' are conated. Participating
elements have equal status and are engaged in a series of transactions. Negoti-
ation treats performer and computational system as equal in status and overall
approach to the interaction. It can be seen as a unied system based on equiv-
alence, and this contrasts with the categories above that suggest an unequal
architecture of source and result, in which the most likely scenario is a musician
(source), acting upon a computational system (result).
According to OED denitions3 `negotiation' refers to transactions directed
towards an objective:
1. To communicate or confer (with another or others) for the purpose of ar-
ranging some matter by mutual agreement; to discuss a matter with a view
to some compromise or settlement
2. To arrange for, achieve, obtain, or bring about (something) by negotiation
3. To nd a way through, round, or over (an obstacle, a dicult path, etc.)
To negotiate, participants engage in a series of transactions that are guided by
local goals directed towards an individual desired outcome (expectation). The
transactions can be understood to involve two mutually informing strategies,
one externalised, the other internal; \action" and \description" [10]. Actions ex-
ecuted within a system may instigate further changes of state in the system. An
assessment of these changes (especially in relationship to anticipated or desired
changes) forms a description of action-outcome in the system. This empirical
description may inform the next action. If so, a cyclical process of experiential
accumulation develops, as the total description becomes more detailed and com-
plex. In a pre-determined and constrained context, this accumulation might be
3 Accessed online, 18th September 2009
217
understood as a straightforward `making sense of it'. But in a more process-
orientated context there may be an emergent formulation of knowledge that is
not external to the system (i.e. the system is not pre-determined). Hamman
[10] uses Foucoult's term `episteme' to describe interactive processes that are
\immanent in the very particularity of the thoughts, actions, and descriptions
made with respect to a hypothesised object of interaction" (p. 95). This is an
open-ended process of negotiation, orientated by variable or uncertain expec-
tations. Local goals (intentions) might change as a product of the interactive
transactions underway. Desired outcomes (expectations) need not be static ei-
ther, so the OED characterisation of a \compromise or settlement" may remain
theoretical and notional.
A reciprocal negotiation entails a degree of equivalence between human and
machine capacities to act and formulate descriptions. Both participants must
form their own descriptions of the system that incorporates the other participant.
Both must be able to modify their actions, short-term goals and intentions, given
new information. In other words, they should also be able to formulate an expec-
tation about the overall musical output and modify their contribution, given the
other's, in order to best satisfy the expectation. Verication that transactions
are underway is, in itself, a part of this process, but the accuracy or ecacy of
any \description" is not relevant to the fact that negotiation occurs.
We regard \optimal ow" [11] as a directly relevant but problematic concept.
Flow is the human enjoyment derived in undertaking a task that becomes au-
totelic, achieved when there is an optimal level of diculty relative to the skills
of the subject. This balance requires the subject to form an internal descrip-
tion of the task's demands and an assessment of his/her skills in meeting them.
Flow has been explored in human-computer interaction [12] and in the creative
process [13] including group-based creativity [1]. Particularly in the case of cre-
ativity, Csikszentmihalyi notes a number of factors contributing to ow, some
of which could be modelled (clarity of goals, availability of immediate feedback)
and others perhaps not (no worries of failure, no distractions etc.) [13]. Whereas
this might describe a positive and productive psychological state, ow perhaps
does not take fully into account other facets of creativity, or for that matter
the experience of negotiating; the role of randomness, unpredictability, happen-
stance [14], the use of haphazard trial and error, periods of incubation and,
subsequently, innovative \behavioural mutations" [15]. More emotively, consider
the pursuit of the impossible, the thin borderline between absorption and obses-
sive compulsion [15], and, perhaps resultant periods of boredom or frustration.
Flow describes a settled state that may be too eortless in itself to be central
in establishing creativity. So we attempt to avoid an easily achievable sense of
`ow' in designing the Clap-along system.
3 Implementation of Clap-along
Our aims in Clap-along are to explore process, expectation and verication in a
negotiation-based system, and to consider how these elements might ultimately
218
be extended to produce more aesthetically complex and musically valid results.
Negotiation occurs both in a feature space and in the foreground surface of actual
rhythms.
Clap-along is a duet system for human-computer interaction. Both partic-
ipants produce continuous, synchronised 4-bar clapping patterns in 4/4. The
musical context is as minimal as we can conceive: a xed tempo, a xed metrical
structure, and single sound events quantised to beats.
In any loop instance n there is a human clapping pattern, Hn, a computer
pattern, Cn, and a composite of the two patterns, Rn. A feature set, Fn, is
extracted from Rn and compared by the system to a target feature set Tn, and
this comparison is the basis for the next iteration. Rn is the reality of the current
state, and Tn represents an expectation that is unknown to the human performer.
The computer maintains patterns as a sequence of 0s and 1s, representing
either a rest or a clap on each beat. The initial pattern C0 is generated randomly.
Machine claps are generated from a sample bank of human clap recordings that
have some natural variation in sound; this oers some semblance of human clap-
ping. The human performer claps into a microphone, allowing the system to
build a second binary sequence that represents the human's clapping rhythm,
Hn. Human claps are quantised by rounding down to the previous beat, unless
within 200ms of the following beat, in which case they are rounded up. The
initial expectation, T0, is obtained from a randomly created composite rhythm
Rtarget that is immediately discarded.
At the end of each 4-bar pattern, the system takes the composite of the
two rhythms, Rn, and calculates a feature set Fn that forms a minimal internal
representation of the musical output. The four features used in this version are:
{ density: the total number of claps as a fraction of the maximum possible
number.
{ homophony: the number of coincident claps as a fraction of the maximum
possible value.
{ position weighting: the normalised average position in the cycle over all claps.
{ clumping: the average size of continuous clap streams as a fraction of the
maximum possible value.
In this multi-dimensional feature space, the system calculates the Euclidean
distance between Fn and the target feature set Tn. If the distance exceeds a
pre-dened threshold, this is deemed to indicate a signicant musical dierence
between reality and the expectation. We use a threshold (s) of 0.001 for sat-
isfaction of expectation, measured in the feature space where each feature was
normalised to the range [0,1].
To create Cn+1 the system runs a generate-and-test loop, producing 20 vari-
ations of Cn. Variations were generated by ipping each bit in the rhythmic
representation with a probability of 0.1. Each variation is combined with the
previous human pattern Hn to produce a candidate composite rhythm R0n, with
features F0n
. The pattern with the nearest features to the target is chosen as
Cn+1.
219
The human performer is invited to negotiate with the system in a compara-
ble way. As each loop occurs, the performer contributes Hn to the total pattern
Rn and assesses the machine contribution. He/she may introduce a modica-
tion to the next contribution Hn+1. Any modication might be experimental
and pseudo-random. Alternatively, it may constitute an intentional action based
upon his/her internal description of how the two contributions are co-dependent,
and so contribute to developing a better understanding of the expectation, the
target point Tn.
This implementation could in theory allow the performer and machine to
quickly settle upon a rhythm that satises the target Tn, so any further changes
would need to be entirely elective; this might be likened to a state of `optimal
ow'. To avoid this settled state, and ensure a continuing creative negotiation,
we introduce an additional device that ensures the open-ended nature of the
negotiation. If the distance between Fn and Tn is under the threshold (s) { i.e.
if reality is suciently close to the expectation { the system introduces random
variation to its expectation to produce Tn+1, with mutation along each feature
dimension drawn from a Gaussian distribution. In other words, as the contribut-
ing rhythms approach the target, and as the human performer has developed a
relatively accurate set of descriptions about the system (based only on the musi-
cal surface), the expectation changes. As the feature description Fn is obtained
from the composite of both rhythms, both the machine and human performers
have a potential role in initiating this change of expectation, whether deliber-
ately or unwittingly, requiring a change of descriptions and resultant actions. A
continually diverging system is possible, fostering a mutual creative negotiation
that avoids `optimal ow'.
4 Evaluation in performance
The system awaits testing with a number of human collaborators, to build upon
informal, proof-of-concept tests undertaken by the designers. It is evident that
with care, a performer can induce a stable and sustainable behaviour in the
computer. For the human performer, there are a number of common sense actions
to be attempted. For any rhythmic cycle Rn, possible actions include:
1. varying: an intuitive rhythmic variation of a previous pattern.
2. matching: attempting to follow the Cn homo-rhythmically.
3. repeating: so Hn+1 = Hn
4. complementing: an attempt to insert events or gaps that mirror the system
pattern in Cn or remembered from Cn􀀀1
5. parsing: rhythmic patterns c, where c is a substring of C, that are either
complementary or matching parts of Rn􀀀1
For each of these possible actions, outcomes are heard in the next outputted
rhythm. So the performer attempts to form a description of the system based on
how this new pattern deviates from the last. Upsetting the system with a marked
variation of pattern (action 1) can cause unstable changes in the output. In this
220
event the system's attempt to update its behaviour in any progressive way is
frustrated. Consequently, the performer struggles to verify logical interaction.
Clapping the exact same pattern repeatedly (including not clapping at all) and
matching patterns (action 2 or 3) can cause the system to slowly evolve its
output towards the expectation, allowing a more accurate description of the
system to develop, ultimately initiating a change in expectation (feature target
point). However, since the features specied are not independent, it cannot be
guaranteed that a given expectation can actually be achieved in the musical
foreground. In this case the system is stuck, and, to move on, requires a more
radical intervention from the performer.
These scenarios are thought experiments as much as real tests. We intend
to look at how dierent performers go about negotiating with the kinds of sim-
ple `black-boxes' under dierent scenarios, and how strategies to deal with the
computer's behaviour are developed. Further development of the system could
involve additional, or more eective, feature extraction, to provide a richer fea-
ture space. Use of alternative methods, such as autocorrelation, would allow an
open denition of the length of any given loop, currently dened as 16 units, and
the system could be expanded to involve expressive timing and other more com-
plex rhythmic features. Future systems could oer a more sophisticated and rich
musical environment that incorporates many other elements of musical organi-
sation beyond metrical rhythm. In all cases { however frustrating or rewarding {
these procedures manipulate the expectations and actions of human and machine
performers alike. An unresolvable negotiation is fostered. This points to more
complex, and possibly creatively valid, negotiation processes that could produce
real-time, computational performances of manifest interest and integrity.
5 Conclusion
We have outlined a minimal musical context for investigating computational
creativity in improvised performance. We have oered a framework for inter-
action between human and machine that comprises four categories: shadowing,
mirroring, coupling and negotiating. We adopt a critical approach to the notion
of `optimal ow' in creative interaction. We have developed a test system that
explores a process of negotiation in practice, which uses an adaptive system with
hidden expectations for varying rhythmic cycles. This creates a demanding con-
text for interactive negotiation. This simple study suggests that the negotiation
paradigm could be used to test the dimensions of musical interaction in greater
detail (this includes comparing human-human, human-computer and computer-
computer interactions using the same paradigm), and could be built up from this
minimal form to a critical level of complexity where meaningful and veriable
interaction does occur.
221
Acknowledgements
This work was developed at the July 2009 Dagstuhl Seminar, `Computational
Creativity: An Interdisciplinary Approach'. We would like to thank the organis-
ers, Margaret Boden, Jon McCormak and Mark d'Inverno, and the other mem-
ber of our `interactivity' discussion group: Iris Asaf, Rodney Berry, Daniel Jones,
Francois Pachet and Benjamin Porter.
Oliver Bown's contribution to this research was funded by the Australian
Research Council under Discovery Project grant DP0877320.
<references_biblio/>
References
1. Sawyer, R.K.: Group creativity: Music, Theater, Collaboration. Lawrence (2003)
2. Wiggins, G.A.: Searching for computational creativity. New Generation Computing
24(3) (2006) 209{222
3. Colton, S.: Creativity versus the perception of creativity in computational systems.
In: Papers of the AAAI Spring Symposium on Creative Systems. (2008)
4. Schwarz, D., Beller, G., Verbrug, B., Britton, S.: Real-time corpus-based concate-
native synthesis with catart. In: Proceedings of 9th International Conference on
Digital Audio Eects. (2006)
5. Casey, M.: Soundspotting: A new kind of process? In Dean, R., ed.: The Oxford
Handbook of Computer Music. Oxford University Press (2009)
6. Pachet, F.: Beyond the cybernetic jam fantasy: The continuator. IEEE Computer
Graphics and Applications 24(1) (2004) 31{35
7. Croft, J.: Theses on liveness. Organised Sound 12(1) (2007) 59{66
8. Eigenfeldt, A.: Emergent rhythms through multi-agency in max/msp. In Kronland-
Martinet, R., Ystad, S., Jensen, K., eds.: Proceedings of the 4th International
Computer Music Modeling and Retrieval Symposium, CMMR 2007. (2008) 368{
379
9. Blackwell, T., Young, M.: Self-organised music. Organised Sound 9(2) (2004)
137{150
10. Hamman, M.: From symbol to semiotic: Representation, signication, and the
composition of music interaction. Journal of New Music Research 28(2) (1999)
90{104
11. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper and
Row (1990)
12. Ghani, J.A., Deshpande, S.P.: Task characteristics and the experience of optimal
ow in human-computer interaction. Journal of Psychology 128(4) (1994) 381{391
13. Csikszentmihalyi, M.: Creativity: Flow and the Psychology of Discovery and In-
vention. Harper Collins, New York (1996)
14. Boden, M.: The Creative Mind. George Weidenfeld and Nicholson Ltd (1990)
15. Abra, J.: Skinner on creativity: A critical commentary. Leonardo 21(4) (1988)
407{412
222