Automated Collage Generation – With More Intent
Michael Cook and Simon Colton
Computational Creativity Group
Department of Computing, Imperial College, London, UK.
ccg.doc.ic.ac.uk
Abstract
The majority of software has no meta-level perception
of what it is doing, or what it intends to achieve. Without
such higher cognitive functions, we might be disinclined
to bestow creativity onto such software. We
generalise previous work on collage generation, which
attempted to blur the line between the intentionality of
the programmer and that of the software in the visual
arts. Firstly, we embed the collage generation process
into a computational creativity collective, which contains
processes and mashups of processes, designed so
that the output of one generative system becomes the
input of another. Secondly, we analyse the previous
approach to collage generation to determine where intentionality
arose, leading to experimentation where we
test whether augmented keyword searches can enable
the software to exert more intentional control.
Introduction
We are building The Painting Fool software
(www.thepaintingfool.com) to one day be taken seriously
as a creative artist in its own right. To guide this
process, we have engaged in much informal discussion
with artists, art students and teachers. One clear opinion
almost universally expressed has been that art generation
software such as Photoshop is not seen as creative, because
it exhibits no intentionality. That is, it does not conceive the
artwork it wishes to produce, and hence does not drive the
process through aesthetic or production decisions. Rather,
the person using Photoshop is seen as providing all the
intentionality, and hence is the sole creative entity, with the
software acting as a mere tool.
To address this lack of intentionality in The Painting Fool,
as described in (Krzeczkowska et al. 2010), we built a
collage-generation module able to construct artworks to illustrate
newspaper articles. The system worked via internet
retrieval of a news article, text extraction of important keywords,
internet image retrieval using the keywords, and nonphotorealistic
rendering of the images. We removed human
decision making, so that we had no control over (a) what
news story would be chosen for a collage (b) what keywords
would be extracted (c) what images would be retrieved or
(d) how the collage would be rendered. We used the system
to raise the issue of whether it was appropriate to say the
software exhibited intentionality in producing the collages.
We describe here two extensions to this previous work. In
the next section, we show how the collage generator can be
seen as a mashup of generative processes, and we give details
of a computational creativity collective designed with
the hope that by allowing the output of such processes to
become the input of others, more culturally interesting artefacts
will be produced. Following this, we return to the question
of intentionality, and analyse the results of the existing
collage generation system. This highlights the roles that five
parties have in adding intentionality to the generation process,
and suggests simple improvements which could wrestle
back some control for the software. We present the
results from some preliminary experimentation with augmented
keyword image retrieval, which leads to further discussion
about intentionality in software. We conclude by
describing avenues for future work.
A Computational Creativity Collective
The software discussed in (Krzeczkowska et al. 2010) is
best described in modern computing parlance as a mashup.
As described in (Abiteboul, Greenshpan, and Milo 2008), in
general, mashups take data – usually downloaded from internet
sites – and pass them through various linked processes
in order to turn raw data into more consumable (amusing/entertaining/informative/interactive)
presentations. In
contrast, in general, the kinds of creative systems that produce
artefacts like musical and literary compositions, works
of visual art, scientific hypotheses, etc., are run as standalone,
offline, systems. We hypothesise that the value of
the artefacts produced by creative systems, and hence the
creativity attributed to them, would be increased if (a) they
could place their work into current cultural contexts and (b)
they could work in concert, whereby the output from one becomes
the input to another, so that there is an increase in the
sophistication of the overall system.
To test this hypothesis in the long term, we have
compiled a collective of processes and mashups
employing these processes, which is available at
(www.doc.ic.ac.uk/ccg/collective), along with a simple
Java API which guides the construction of additional
material for the collective. Currently, the collective contains
119 processes mashed-up into 57 mashups. The processes
follow an interface which means that the output from any
process can become the input to any other (whether or not
Proceedings of the Second International Conference on Computational Creativity 1
the receiving process can usefully work with the type of
input it is given). In general, the current processes perform
simple text or graphics routines, or they employ a web 2.0
API in order to retrieve data from the internet. Collage
generation is represented with a mashup that contains processes
which link with the Guardian newspaper API and the
Google image retrieval API, in addition to processes which
perform text extraction and graphics routines. The other
mashups perform similarly. For instance, one mashup uses
image retrieval, graphics, and natural language generation
techniques to parody business speak. Another mashup uses
processes based on the LastFM and RunKeeper APIs, so
that a mosaic of album covers can be constructed depending
on what music people run the fastest to. Such mosaics
are not necessarily the most culturally important artefacts
produced by creative systems. However, the ease by which
such mashups can be written highlights the potential for
creativity when story generation systems feed into poetry
generators, the output of which inspire paintings that influence
musical compositions, and so on, with each process
benefitting from internet downloads to add timely cultural
relevance. We discuss our aims and future directions for the
collective in the final section below.
Intentionality Analysis and Experimentation
One of the most impressive collages presented in
(Krzeczkowska et al. 2010) is given in figure 1. This was
produced in response to a news article which covered the war
in Afghanistan, published in 2009 in the Guardian newspaper.
While not rendered to a particularly high aesthetic standard,
the contents are striking, as it contains a bomber plane,
an explosion, a family with a baby, a girl in ethnic headgear,
and – most poignantly of all – a field of war graves. By mixing
images of death and destruction with those of children,
it is fair to say that these constituents have helped to produce
a collage with a moderate but definite negative bias, which
reflected the tone of the original news article. We previously
thought that the intentionality driving the collage generation
therefore came from four parties: (i) the programmer,
by enabling the software to access the left-leaning, largely
anti-war Guardian newspaper (ii) the software, through its
processing, the most intelligent aspect of which was to extract
keywords using an algorithm based on (Brin and Page
1998), (iii) the writer of the original article, through the expression
of his/her opinions in print, and (iv) individual audience
members who have their own opinions forming a context
within which the collages are judged.
However, upon further inspection, we can add a fifth party
to this ensemble. To see this, note first that the keywords
extracted from the article were:
afghanistan, brown, forces, troops, nato,
british, speech, country, more and afghan
We further note that none of these words have a particular
negative emotive bias. We therefore must reduce the intentional
role of the article writer in the construction of the
collage, as it would be possible to write a very upbeat appraisal
of the war which contained the same keywords. The
images for the collage were retrieved from Flickr using the
Figure 1: Collage illustrating the war in Afghanistan
keywords above. Hence, it is clear that the negative connotations
in the collage derive from the way in which Flickr users
have tagged the images they have uploaded, presumably by
tagging images of cemeteries and explosions with words like
‘afghanistan’, ‘nato’ and ‘troops’. Hence, we must attribute
some of the intentionality behind the collage construction to
the crowd.
It is interesting that intent can be dissipated amongst five
parties, one of which is a faceless crowd. However, the software
is currently the junior partner in this arrangement, and
our original motivation was for it to become the major driving
force, so that it might be perceived as being more creative.
We are currently working on an opinion-forming module
for The Painting Fool which will enable it to start with
some text, such as a news article or a set of search terms,
then download and analyse multiple texts on related subjects,
in order to form either a positive or negative opinion
about the original text. Once it has formed such an opinion,
this will be used to bias the search for images in collage
generation, i.e., if the opinion is positive, The Painting Fool
will attempt to retrieve appropriately up-beat images, and
vice-versa if not.
A straightforward method for attempting to influence the
emotional nature of retrieved images is to augment the keywords
with appropriately positive/negative adjectives. To
test the effectiveness of this method, we performed some
preliminary experiments. We used augmented keyword
searches with Flickr to retrieve images, and then asked people
to tag the images with respect to the broad emotion being
portrayed. In particular, in the first experiment, 17 participants
were each asked to tag 21 images, 7 of which were
retrieved from Flickr with the keyword soldier, 7 of which
were retrieved with the keywords: soldier and sad and 7 of
which were retrieved with the keywords: soldier and happy.
For each set of keywords, 80 images were retrieved and
cached, and these were selected from randomly during the
experiment. Subjects were asked to tag each image as either
‘happy’, ‘sad’ or ‘unsure’ (if they were not prepared to assign
an emotion to the image). The same experiment was
repeated, but with the keyword soldier replaced by dog, and
repeated again with the keyword baby.
A sample of the images from the dog experiment is given
in figure 3, and the distribution of tags for the sets of images
Proceedings of the Second International Conference on Computational Creativity 2
Figure 2: Results from the image tagging experiments. Neutral refers to the images retrieved via unaugmented search.
Search terms: dog and happy
Search term: dog [neutral]
Search terms: dog and sad
Figure 3: Sample images from the dog image tagging exp.
in the three experiments is given in figure 2. We see that
the participants’ reactions to images were image-type dependent.
In particular, people were less sure about the emotions
being portrayed in the soldier images than in the other two
types of image. Also, the results show a trend for neutral images
to be tagged with ‘happy’ as often as ‘unsure’ and four
times as often as ‘sad’. It’s likely that this is because people
tend to upload pictures of happy babies/dogs/soldiers to
Flickr, hence an unaugmented search will result in largely
positive images being retrieved. Importantly, it is clear that
people were more likely to tag images retrieved with the keyword
happy as ‘happy’ and with the keyword sad as ‘sad’.
Of the 119 happy images, 65% were tagged with ‘happy’
and only 10% with ‘sad’. Similarly, of the 119 sad images,
47% were tagged with ‘sad’ and only 22% with ‘happy’.
Conclusions and Future Work
The Painting Fool already has access to the methods and
mashups in the computational creativity collective, including
the collage generator mashup, and we plan to explore
the creative potential of this. In particular, given that any
process in the collective can provide input to any other process,
we plan to automate the construction of chains of processes
to see what the resulting mashups produce. We also
plan to explore more sophisticated ways of using the processes,
for instance using global workspace architectures, as
explored in (Charnley 2009). Our hope is that the collective
will become a resource to which computational creativity
researchers can contribute creative systems, in addition
to experimenting with those of others. Moreover, we envisage
the collective becoming a creative entity in its own right,
producing artefacts of real cultural value.
The results from the preliminary image tagging experiments
described above were encouraging. Hence, in future
versions of the software, when The Painting Fool forms an
opinion about a subject, then attempts to express that opinion
via collages built using augmented keyword searches,
we can be fairly confident that the attempt will be successful.
However we will need to undertake further experimentation
to determine under which conditions this will be the
case. The opinion forming process will rely on sentiment
analysis routines, such as those described in (Pang, Lee, and
Vaithyanathan 2002), but The Painting Fool will be trained
to subvert popular sentiment on particular topics, or attempt
to portray both sides of an argument, if either of these methods
might produce artworks with greater impact. Currently,
the collages it produces are too literal. Hence, we plan to
implement some obfuscation techniques, which will be used
to produce pieces which require a level of interpretation by
audiences. By implementing such higher level abilities, we
hope to endow The Painting Fool with behaviours that will
one day be regarded as creative.
Acknowledgements
We would like to thank Anna Krzeczkowska for her original
work on the collage generation system, and John Charnley
for foundational work on the computational creativity collective.
We wish to thank Fai Greeve for comments which
led to a reappraisal of the dissipation of intent in the collage
generation process, in addition to Sanjay Bilakhia, Michal
Cudrnak, Daniel Waddington and Douglas Willcocks for
their contributions to the collective. We are grateful to the
anonymous reviewers for their useful comments.
<references_biblio/>
References
Abiteboul, S.; Greenshpan, O.; and Milo, T. 2008. Modeling the
mashup space. In Proceedings of tht 10th ACM International Workshop
on Web Information and Data Management.
Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual
web search engine. Comp. Networks & ISDN Syst. 30.
Charnley, J. 2009. A Global Workspace Framework for Combined
Reasoning. Ph.D. Dissertation, Department of Computing, Imperial
College, London.
Krzeczkowska, A.; El-Hage, J.; Colton, S.; and Clark, S. 2010.
Automated collage generation - with intent. In Proceedings of the
1st International Conference on Computational Creativity.
Pang, B.; Lee, L.; and Vaithyanathan, S. 2002. Thumbs up? Sentiment
classification using machine learning techniques. In Proc. of
the Conf. on Empirical Methods in Natural Language Processing.
Proceedings of the Second International Conference on Computational Creativity 3