Automated Collage Generation { With Intent
Anna Krzeczkowska, Jad El-Hage,y Simon Colton and Stephen Clarky
 Department of Computing, Imperial College, London, UK.
y University of Cambridge Computer Laboratory, Cambridge, UK.
1 Introduction
One reason why software undertaking creative tasks might be perceived of as
uncreative is its lack of overall purpose or intent. In computational creativity
projects, while some skillful aspects may be undertaken by the computer, there
is usually a person driving the process, by making decisions with regard to
the intent of the artefact being produced. Such human intervention can take
dierent forms, for instance via the supplying of background information in
scientic discovery tasks, or making choices during an evolutionary art session.
We are interested in whether a notion of purpose could be projected onto The
Painting Fool, an automated painter we hope will one day be taken seriously
as a creative artist in its own right [3, 5]. Starting with the maxim that good
art makes you think, we have enabled The Painting Fool to produce visual art
specically to invite the viewer to interpret the pieces in the context of the world
around them, i.e., to make a point about a current aspect of society. However,
we do not prescribe what aspect of modern life it should depict, nor do we
describe the art materials { in this case digital images { it should use. Hence,
we eectively opt out of specifying a purpose for any individual piece of art. As
described in section 2, the software starts with an instruction to access sources
of news articles at regular intervals. Then, via text manipulation, text analysis,
image retrieval, image manipulation, scene construction and non-photorealistic
rendering techniques, the system produces a collage which depicts a particular
news story. This is initial work, and more eort is needed to produce collages
of more aesthetic and semantic value. We present some preliminary results in
section 3, including some illustrative examples and feedback from viewers of the
collages produced. In section 4, we describe the next stages for this project.
2 Automated Collage Generation
A schematic for the collage generation system is provided in gure 1. At regular
intervals, scheduled processes begin a ow of information involving the retrieval
of news articles from internet sources (with the news source specied by the
scheduled job); the extraction of keywords from the news articles; the retrieval
of images using the keywords; and the construction of input les for The Painting
Fool. The input les specify which images to annotate and extract colour segments
from, how to arrange the segments in an overall collage, and what natural
media to simulate when painting the segments to produce the nal piece. We
provide further details of these processes below, with full details of the overall
system available in [7].
36
Fig. 1. Automated collage generation system overview.
 Text Retrieval and Analysis
We enabled the system to access the Guardian News website and Google News
search, via their APIs. For the Guardian, the API provides access to headlines,
for which there are a number of associated articles, multimedia les, blogs, forums,
etc., and our system extracts the rst text-based article from this list.
The Google News API produces similar output from multiple news sources,
from which we extract text-based news stories from the BBC, The Independent
newspaper and Reuters news service. The system retrieves only English language
headline articles, and we can specify whether the articles should be about World
or country-specic issues only. The retrieved articles are cleaned of database
information and HTML appendages to produce plain text.
Following this, we use a text analysis technique to extract a specied number
of keywords from the plain text. The technique is an implementation of the TextRank
algorithm [8], which is designed to extract those keywords most indicative
of the content of the document. It is based on PageRank [1], the algorithm used
by Google to determine the importance of a web page based on the hyperlink
structure of the web. The intuition behind PageRank is that important pages
will be pointed at by other important pages, a recursive notion of importance
which can be assigned numerical values using a simple iterative algorithm. The
intuition behind TextRank is similar: important words in the document will be
pointed at by other important words. In the context of a document, pointed at
is dened in [8] as being in the same context. Hence a graph representing the
document can be created, where an edge exists between two words if those words
appear in each other's contexts, where context is just a xed-size window either
side of the target word. PageRank uses the graph to assign numerical values
to each word and extracts the most important ones. For our experiments, only
nouns were extracted, as these were considered likely to be the most informative,
and also the most useful keywords to use for image retrieval. Full details of the
keyword extraction implementation are given in [6].
 Image Retrieval and Manipulation
The keywords extracted from the news stories are used to retrieve art materials
(i.e., digital images) from the internet and local sources. The system
has access to the 32,000 images from the Corel library which have been hand
tagged and can be relied on for images which match the given keywords well.
37
Fig. 2. Example collages
produced by the system.
We also wanted to include images retrieved from the
Internet, as these add a level of surprise,1 and a more
contemporary nature to the retrieved images. We interfaced
the collage generation system with both the
Google images and the Flickr APIs. In the former case,
the interface is fairly lightweight, given that Google
supplies a set of URLs for each keyword, which point
to the relevant images. In the latter case, however, a
URL must be built from information retrieved from
a photo-list, which is a non-trivial process. The three
image sources (Corel, Google, Flickr) are queried in a
random order, but when either Corel or Flickr return
empty results, Google is queried, as this always supplies
images. Note that we discuss experiments comparing
Flickr and Google images in section 3.
 Scene Construction and Rendering
In the nal stage of processing, the retrieved images
are assembled as a collage in one of a number of gridbased
templates. Then the system employs The Painting
Fool's non-photorealistic rendering capabilities [5]
to draw/paint the collages with pencils, pastels and
paints. In future, we will use the more sophisticated
scene generation techniques described in [4].
3 Initial Results
The rst image in gure 2 portrays a typical collage produced by the system.
Here, a scheduled process happened to retrieve a Guardian news story about
the war in Afghanistan, with the headline `Brown may send more troops to
Afghanistan'. From the text, the words afghanistan, brown, forces, troops,
nato, british, speech, country, more and afghan were extracted. Images
were retrieved from Flickr accordingly, including a picture of a ghter plane,
a eld of graves, a young woman in an ethnic headdress and an explosion. The
rendering style for this was simply to segment each image into 1000 regions and
present the images in an overlapping grid with 10 slots. This example hints at the
ability of the system to construct a collage which can semantically complement
a news story, or even add poignancy. The second collage in gure 2 provides a
hint of the possibilities for more interesting and perhaps playful juxtapositions.
This was produced in response to a news story on the England versus Australia
Ashes test cricket series, which had the headline: `England versus Australia {
as it happened!' The images of the Houses of Parliament and a kangaroo in the
collage are fairly obvious additions. However, the Collage also contains a picture
1 For instance, at the time of writing, querying Flickr for images tagged with the word
\Obama" returns an image of a woman body-builder as the rst result.
38
of the famous Falling Water building built by Frank Lloyd-Wright. Upon investigation,
we found that the name wright was extracted from the news story (as
a member of the England cricket team) and the rst Google image returned for
that keyword is, of course, the image of Falling Water.
In order to informally assess the power of the collages to represent the text
upon which they are based, we showed a collection of the collages to 11 subjects.
We asked them to complete a survey where they were shown 12 news stories
and 5 collages per news story, only one of which, called the master collage was
generated from the news story (with the others generated from similar stories).
To generate the 12 master collages, we varied both the number of keywords to
be extracted (5 and 10) and the image source (Google and Flickr). Subjects
were asked to rank the 5 collages for each news story in terms of relevance,
with rank 1 indicating the most relevant and 5 the least relevant. Taking the
average of the ranks, we found that the master collage was ranked as (a) the most
relevant in 8 of the 12 tests and (b) the second most relevant in the other 4 tests.
The most marked dierence we noticed was between the collages produced with
Google and with Flickr. In particular, the Google collages had an overall rank of
1.82, while the Flickr collages had an overall rank of 2.14. This highlights that
image tagging in Flickr is not particularly reliable, with Google returning more
relevant images. These results are encouraging, as they demonstrate that even via
the abstractions of keyword extraction, image retrieval and non-photorealistic
rendering, it is still possible for the collages to have semantic value with respect
to the news stories from which they were derived.
4 Discussion and Further Work
The system described above is a prototype, and is presented largely as a proof
of principle, rather than a nished system. We have presented an illustrative
example of how the pipeline of processes can produce a collage with the potential
to make viewers engage their mental faculties { in this case about warfare. The
value here is not necessarily in the quality of the nal artefacts { which are
currently a little nave { but rather in the fact that we had little idea of what
would be produced. We argue that, as we did not know what the content of the
collages would be, it cannot have been us who provided the intention for them.
Given that these collages are based on current events and do have the potential
to engage audiences, we can argue that the software provided the intent in this
case (perhaps subject to further discussion of intent and purpose in art).
In [3], we argue that the perception of how a computational system produces
artefacts is as important as the nal product. Hence, the fact that the system
supplies its own purpose in automated art generation may add extra value to
the artworks produced. Having said that, there are a number of improvements
to the process we intend to make in order to increase the visual and semantic
appeal of the collages. In particular, we plan to make better use of The Painting
Fool's scene construction abilities, and to implement scene construction techniques
which are aware of the context of the news story being portrayed. For
instance, if the text of the news article has a distinctive plot-line, then a linear
39
collage might best portray the narrative of the story, with images juxtaposed in
an appropriate order. However, if the major aspect of an article is the mood of
the piece, then possibly a more abstract collage might best portray this. We also
plan to involve text summarisation software to provide titles, wall text and other
written materials for the collages. We hope to show that by stepping back from
certain creative responsibilities (described as \climbing the meta-mountain" in
[4]), such as specifying the intent for a piece of art, we make it possible to project
more creativity onto the collage generation system than if there was a person
guiding the process. Our long term goal for The Painting Fool is for it to be accepted
as a creative artist in its own right. Being able to operate on a conceptual
level is essential for the development of The Painting Fool, hence we will pursue
further interactions with text analysis and generation systems in the future.
We would like to thank the anonymous reviewers for their useful advice. One
reviewer stated that in some scientic theory formation systems, the software
is not perceived as uncreative because of a lack of intent. As the engineers of
scientic discovery software [2], when running sessions, we always provide intent
through our choice of background material and our choices for evaluating the theory
constituents. Hence, a critic could potentially argue that the software is not
being creative, as it has no purpose of its own.We believe that this is true of most
other scientic discovery systems, especially machine learning based approaches
such as [9], where nding a classier is the explicit user-supplied intention. The
reviewer also compared the collage generation system with Feigenbaum's famous
Eliza program. We nd it dicult to see the comparison, given that the collage
generation system is given no stimulus from a user, whereas Eliza reacts repeatedly
and explicitly to user input. A more accurate analogy in the visual arts
would be image ltering, where an altered version of the user's stimulus is presented
back to them for consideration. It is clear that the notion of intent in
software causes healthy disagreements, and perhaps our main contribution here
is to have started a fruitful discussion on this topic.
<references_biblio/>
References
1. S Brin and L Page. The anatomy of a large-scale hypertextual web search engine.
Computer Networks and ISDN Systems, 30:1{7, 1998.
2. S Colton. Automated Theory Formation in Pure Mathematics. Springer, 2001.
3. S Colton. Creativity versus the perception of creativity in computational systems.
In Proceedings of the AAAI Spring Symposium on Creative Systems, 2008.
4. S Colton. Experiments in constraint-based automated scene generation. In Proceed-
ings of the 5th International Joint Workshop on Computational Creativity, 2008.
5. S Colton, M Valstar, and M Pantic. Emotionally aware automated portrait painting.
In Proc. of the 3rd Int. Conf. on Digital Interactive Media in Ent. & Arts, 2008.
6. J El-Hage. Linguistic analysis for The Painting Fool. Master's thesis, The Computer
Laboratory, University of Cambridge, UK, 2009.
7. A Krzeczkowska. Automated collage generation from text. Master's thesis, Depart-
ment of Computing, Imperial College, London, UK, 2009.
8. R Mihalcea and P Tarau. TextRank: Bringing order into texts. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, 2004.
9. S Muggleton. Inverse Entailment and Progol. New Generation Computing 13, 2005.
40