Game of Tropes: Exploring the Placebo Effect in Computational Creativity Tony Veale School of Computer Science and Informatics University College Dublin, Belfield D4, Ireland. Tony.Veale@UCD.ie Abstract Twitter has proven itself a rich and varied source oflanguage data for linguistic analysis. For Twitter is more than a popular new channel for social interactionin language; in many ways it constitutes a whole newgenre of text, as users adapt to its new limitations (140 character messages) and to its novel conventions such as retweeting and hash-tagging. But Twitter presents anopportunity of another kind to computationally-mindedresearchers of language, a generative opportunity to study how algorithmic systems might exploit linguistictropes to compose novel, concise and re-tweetable textsof their own. This paper evaluates one such system, aTwitterbot named @MetaphorMagnet that packages itsown metaphors and ironic observations as pithy tweets. Moreover, we use @MetaphorMagnet, and the idea of Twitterbots more generally, to explore the relationship of linguistic containers to their contents, to understand the extent to which human readers fill these containers with their own meanings, to see meaning in the outputs of generative systems where none was ever intended. We evaluate this placebo effect by asking human raters to judge the comprehensibility, novelty and aptness of texts tweeted by simple and sophisticated Twitterbots. Tropes: Containers of Meaning A mismatch between a container and its contents can often tell us much more than the content itself, as when a person places the ashes of a deceased relative in a coffee can, or sends a brutal death threat in a Hallmark greeting card. The communicative effectiveness of mismatched containers is just one more reason to be skeptical of the Conduit metaphor (Reddy, 1979) – which views linguistic constructs as containers of propositional content to be faithfully shuttled between speaker and hearer – as a realistic model of human communication. Language involves more than the faithful transmission of logical propositions between information-hungry agents, and more effective communication – of attitude, expectation and creative intent – can often be achieved by abusing our linguistic containers of meaning than by treating them with the sincerity that the Conduit metaphor assumes. Consider the case of verbal irony, in which a speaker deliberately chooses containers that are pragmatically ill- suited to the conveyance of their contents. For instance, the advertising container “If you only see one [X] this year, make it this one” assumes that [X] denotes a category of event – such as “romantic comedy” or “movie about superheroes” – with a surfeit of available members for a listener to choose from. When [X] is bound to the phrase “comedy about Anne Frank” or “musical about Nazis”, the container proves too hollow for its content, and the reader is signaled to the presence of playful irony. Though such a film may well be one-of-a-kind, the ill- fitting container suggests there are good reasons for this singularity that do not speak to X’s quality as an artistic event. Yet if carefully chosen, an apparently inappropriate container can communicate a great deal about a speaker’s relationship to the content conveyed within, and as much again about the speaker’s relationship to their audience. As more practical limitations are placed on the form oflinguistic containers, the more incentive one has to exploitor abuse containers for creative ends. Consider the use of Twitter as a communicative medium: writers are limited to micro-texts of no more than 140 characters to conveyboth their meaning and their attitude to this meaning. So each micro-text, or tweet, becomes more than a containerof propositional content: each is a brick in a larger edifice that comprises the writer’s online personae and textual aesthetic. Many Twitter users employ irony and metaphorto build this aesthetic and thus build up a loyal audienceof followers for their world view. Yet Twitter challengesmany of our assumptions about irony and metaphor. Such devices must be carefully modulated if an audience is to perceive a speaker’s meaning in the playful (mis)match ofa linguistic container to its contents. Failure to do so canhave serious repercussions when one is communicating tothousands of followers at once, with tweets that demandconcision and leave little room for nuance. It is thus not unusual for even creative tweets to come packaged withan explicit tag such as #irony, #sarcasm or #metaphor. Metaphor and irony are much-analysed phenomena in social media, but this paper takes a generative approach, to consider the production rather than the analysis of creative linguistic phenomena in the context of a fully- Proceedings of the Sixth International Conference on Computational Creativity June 2015 autonomous computational agent – a Twitterbot – that crafts its own metaphorical and ironical tweets from itsown knowledge-base of common-sense facts and beliefs. How might such a system exhibit a sense of irony thathuman users will find worthy of attention, and how mightthis system craft interesting metaphoric insights from aknowledge-base of everyday facts that are as banal asthey are uncontentious? We shall explore the variety oflinguistic containers at the disposal of this agent – a realcomputational system named @MetaphorMagnet – to better understand how such containers can be playfully exploited to convey ironic, witty or thought-provoking views on the world. With @MetaphorMagnet we aim to show that interesting messages are not crafted from interesting contents, or at least not necessarily so. Rather, effective tweets emerge from an appropriate if non- obvious combination of familiar linguistic containers with unsurprising factual fillers. In support of this view, weshall present an empirical analysis of the assessment of@MetaphorMagnet’s uncurated outputs by human judges. Just as one can often guess the contents of a physicalcontainer by its shape, one can often guess the meaning ofa linguistic container by its form. We become habituatedto familiar containers, and just as we might imagine ourown uses for a physical container, we often pour our own meanings into suggestive textual forms. For in language, meaning follows form, and readers will generously inferthe presence of meaning in texts that are well-formed andseemingly the product of an intelligent entity, even if thisentity is not intelligent and any meaning is not intentional. Remarkably, Twitter shows that we willingly extend thisgenerosity of interpretation to the outputs of bots that weknow to be unthinking users of wholly aleatoric methods. Twitterbots exploit this placebo effect – wherein a well- formed linguistic container is presumed to convey a well- founded semantic content – by serving up linguistic formsthat readers tacitly fill with their own meanings. We aimto empirically demonstrate here that readers do more thanwillingly suspend their disbelief, and that a well-packaged linguistic form can seduce readers into seeing what is notthere: a comprehensible meaning, or at least an intent tobe meaningful. We do this by evaluating two metaphor- generating bots side-by-side: a rational, knowledge-based Twitterbot named @MetaphorMagnet vs. an aleatoric and largely knowledge-free bot named @MetaphorMinute. Digital Surrealists: La Regle Du Jeu Most Twitterbots are simple, rule-based systems that use stochastic methods to explore a loosely-defined space of texual forms. Such bots are high-concept, low-complexity text-production mechanisms that transplant the aleatoric techniques of surrealist writers – from André Breton to William Burroughs and Brion Gysin – into the realms of digital content, social networking and online publishing. Each embodies a language game with its own generative rules, or what Breton called “la regle du jeu.” Yet Breton, Burroughs and Gysin viewed the use of aleatorical rules as merely the first stage of a two-stage creation process: at this first stage, random recombinant methods are used to confect candidate texts in ways that, though unguided by meaning, are also free of the baleful influence of cliché; at the second stage, these candidates are carefully filtered by a human, to select those that are novel and interesting. Most bots implement the first stage and ignore the second, pushing the task of critiquing and filtering candidate texts onto the humans who read and selectively re-tweet them. Nonetheless, some bots achieve surprising effects with the simplest language tools. Consider @Pentametron, a bot that generates accidental poetry by re-tweeting pairs of random tweets of ten syllables apiece (for an iambic pentameter reading) if each ends on a rhyming syllable. When the meaning of each tweet in a couplet coheres with the other, as in “Pathetic people are everywhere” |“Your web-site sucks, @RyanAir”, the sum of tweets produces an emergent meaning that is richer and more resonant than that of either tweet alone. Trending social events such as the Oscars or the Super Bowl are especially conducive to just this kind of synchronicity, as in this fortuitous pairing: “So far the @SuperBowl commercials blow.” | “Not even gonna watch the halftime show.” In contrast, a bot named @MetaphorMinute wears its aleatoric methods on its sleeve, for its tweets – such as “a haiku is a tonsil: peachblow yet snail-paced” – are not so much random metaphors as random metaphor-shaped texts. Using a strategy that stresses quantity over quality, this bot instantiates that standard linguistic container for metaphors – the copula frame “X is a Y” – with mostly random word choices every two minutes. Interestingly, its tweets are as likely to provoke a sense of mystification and ersatz profundity as they are total incomprehension. Yet bots such as @Pentametron and @MetaphorMinute do not generate their texts from the semantic-level up; rather, they manipulate texts at the word-level, and thus lack any sense of the meaning of a tweet, or any rationale for why one tweet might be better – which is to say, more interesting, more apt or more re-tweetable – than others. The Full-FACE poetry generator of Colton et al. (2012) also uses a template-guided version of the cut-up method to mash together semantically-coherent text fragments in a way that – much like @Pentametron – obeys certain over-arching constraints on metre and rhyme. These text fragments come from a variety of online sources, ranging from short tweets to long news articles. News stories are a rich source of readymade phrases that convey resonant images, and these can be clipped from a news text using standard NLP techniques, while tweets that use affect-rich language can also be extracted automatically via standard sentiment analysis lexica and tools. Thus, a large stock of resonant similes, such as “blue as a blueberry” or “hot as a sauna” can be extracted from the Web using a search engine (Veale, 2014), since the simile frame “as X as Y” is specific enough to query for, and promiscuous enough to match, a rich diversity of typical X:Y associations. These associations can then be recast in a variety of poetic forms to make their clichéd offerings seem fresh again, as in “Blueberry-blue overalls” or “sauna-hot jungle.” Proceedings of the Sixth International Conference on Computational Creativity June 2015 Indeed, the very act of juxtaposing clichés can itself be a creative act, as evidenced both by the success of the cutup method in general and that of specific cut-ups in particular. Consider William Empson’s withering analysis of the persnickety, cliché-hating George Orwell, whom Empson called “the eagle eye with the flat feet” (quoted in Ricks [1995:356], who admires Empson’s “audacious compacting of clichés”). The Full-FACE system is just one of many CC systems that use an autonomous variant of Burroughs and Gysin’s cut-up method to integrate tight constraints on form with loose constraints on meaning. Breton famously stated that “Je ne veux pas changer la regle du jeu, je veux changer de jeu.” Twitterbots do not change or transcend their own rules, but different bots do represent different language games with their own rules. So to change the game, a CC developer can simply build a new bot, to exploit a different set of tropes and linguistic containers. It is rare for any one Twitterbot to incorporate a diverse set of tropes and production mechanisms; each typically follows Breton’s experimentalist approach to art in its random sampling of a specific space of possibilities. Each bot thus forms its own art installation, to showcase a single generative idea. @MetaphorMagnet, the bot at the heart of this paper, represents a departure from this norm, insofar as it exploits a wide range of tropes and rendering strategies, it employs diverse sources of knowledge, and it applies a variety of reasoning styles to generate surprising conclusions from what is otherwise a stock of banal facts. But does this added sophistication – bought at the cost of increased system complexity and knowledge-engineering effort – result in tweets that are seen as more meaningful, novel, apt or retweetable by human users? It is this point that exercises us most in the coming sections. The Placebo Effect : Trope-A-Dope We humans obtain more mileage than we care to admitfrom templates, tropes and other “bot” tricks for linguisticcreativity. Consider what Matthew McGlone and Jessica Tofighbakhsh (1999) call the Keats heuristic, an insight into creative language use that owes as much to Nietzsche(“we sometimes consider an idea truer simply because ithas a metrical form and presents itself with a divine skip and jump”) as to the poet John Keats (“Beauty is truth, truth beauty”). McGlone and Tofighbakhsh (2000) showthat when presented with uncommon maxims or proverbswith internal rhyme (e.g. “woes unite foes”), subjects tendto view these as more insightful about the world than theequivalent paraphrases with no internal rhyme at all (e.g. “troubles unite enemies”). While the Keats heuristic is not exactly a license to pun, it is an incentive to rhyme, and togive as much weight (or more still) to superficial aspectsof poetry generation as to deep semantics and pragmatics. Indeed, the heuristic is tacitly central to the operation ofvirtually every computational creativity (CC) approach topoetry generation (e.g. Milic, 1970; Chamberlain & Etter, 1983; Gervás, 2000; Manurung et al. 2012; Veale, 2013). If human poets ask questions first and rhyme later, CCsystems typically rhyme first and ask questions later, if at all. For if the human jury in the O.J. Simpson trial couldbe turned against bald facts with the Keatsian “If the glove don’t fit you must acquit”, readers of computer-generated poetry can be persuaded to see deliberate meaning and resonance in any output that has a “divine skip and jump.” There is something undeniably special about poetry, whether it is the gentle poetry of William Shakespeare’s “Shall I compare thee to a summer’s day” or the rough poetry of Johnnie Cochrane’s “If the glove don’t fit you must acquit”. Milic (1970), an early CC pioneer, argues that while poetry “is more difficult to write than prose” it offers other freedoms to writers due to the willingness of readers to “interpret a poem, no matter how obscure, until he has achieved a satisfactory understanding.” What then of the enigmatic tweets of bots like @MetaphorMinute, whose obscurity is a function of random word choice and whose surface forms are not designed to make any sense at all? Milic argues that computer poetry serves a useful role other than its obviously generative one, by alerting us to “the curious behavior of familiar words in unfamiliar combinations.” Behaviour that makes perfect sense when dealing with the writings of a gifted human poet, such as our tendency to “interpret an utterance by making what concessions are necessary on the assumption that a writer has something in mind of which the utterance is the sign”, is, argues Milic, “inappropriate when the speaker is a computer.” Yet Twitterbots benefit from such concessions and assumptions whether or not followers know them to be bots. This placebo effect is especially pronounced in the coining of would-be metaphors, leading Milic to note “how readily we accept metaphor as an alternative to calling a sentence nonsensical.” @MetaphorMinute and other aleatoric bots wring maximal value from this insight by devising texts that they themselves cannot distinguish from nonsense. So this begs an important question: are the meanings imposed on a random text by a creative human of comparable value to those conveyed by a bot with its own model of the world and its own insights to tweet? Building Metaphors : Theory and Practice What might it mean for a bot to have “something in mind of which [its] utterance is the sign”? When it comes to metaphor generation, we might expect that our bot would generate its figurative tweets from a conceptual model of the world as it sees it, in a way that accords with a sound theory of how and why humans actually use metaphor. For the latter, AI offers us a range of models to choose from. Computational approaches to metaphor divide into four broad classes: the categorial, the corrective. the analogical and the schematic. Categorial approaches view metaphor as a means to reconceptualize one idea by placing it into a taxonomic category strongly associated with another (see Hutton, 1982; Way, 1991; Glucksberg, 1998). Corrective approaches view metaphor as an inherently anomalous deviation from literal language, and strive to recover the corresponding literal meaning of any figurative statement that violates its lexico-semantic norms (see Wilks, 1978; Proceedings of the Sixth International Conference on Computational Creativity June 2015 Fass, 1991). The analogical approaches aim to capture the relational parallels that allow our representation of an idea in one domain, the source, to be systematically projected onto our mental representation of an idea in another, the target (see Gentner et al., 1989; Veale and Keane, 1997). Finally, schematic approaches aim to explain how related linguistic metaphors arise as surface manifestions of deep seated cognitive structures called Conceptual Metaphors (Lakoff & Johnson, 1980; Carbonell, 1981; Martin, 1990; Veale & Keane, 1992). Each approach has its own merits, but none offers a complete computational solution. Bots that aim for a general competence in metaphor must thus implement a selective hybrid of multiple approaches. Yet each approach also requires its own source of knowledge. Categorial approaches require a comprehensive taxonomy of flexible categories that can embrace atypical members on demand. Corrective approaches are built on a substrate of literal case-frames onto which deviant usages can be correctively projected. Analogical approaches assume an inventory of graph-theoretic representations of concepts, from which a structure-mapping engine can eke out its sub-graph isomorphisms. Schematic approaches rely on a stock of Conceptual Metaphors (CMs) – such as Life is a Journey or Theories are Buildings – to unearth the deep structures beneath the surface of diverse linguistic forms. Though hybrid approaches demand multiple sources of knowledge, there exist public Web services that integrate this knowledge with the appropriate means of using it for metaphor. The Thesaurus Rex Web service of Veale & Li (2013) provides a highly divergent system of fine-grained categorizations that allows a 3rd-party client system to e.g. determine that War and Divorce have each been viewed as kinds of destructive thing, traumatic event and severe conflict in the texts of the Web. The Metaphor Eyes Web service of Veale & Li (2011) is a rich source of relational norms – also harvested at scale from Web texts – such as that businesses earn profits and pay taxes, or that religions ban alcohol and believe in reincarnation. The Metaphor Magnet service of Veale (2014) offers a rich source of the stereotypical properties and behaviors of familiar ideas, and provides the means to retrieve salient CMs from the Google n-grams (Brants & Franz, 2006) which can then be further elaborated to create novel linguistic metaphors. @MetaphorMagnet relies on each of these public Web services to generate the conceptual conceits that underpin its figurative tweets. For instance, it uses Thesaurus Rex to provide the categorization insights that it then packages as odd-one-out lists or as faux-dictionary definitions. It uses the Metaphor Eyes service to provide the relational structures it needs to perform structure mapping and thus concoct original analogies and dis-analogies. And it uses the Metaphor Magnet service to access the stereotypical properties and behaviors of ideas, and to juxtapose these properties via resonant contrasts and norm contraventions. Once the conceptual chassis of a metaphor is constructed in this way, it is then packaged in an apt linguistic form. Building Strings: Trope-On-A-Rope CMs such as Life Is A Journey and Politics Is A Game are more than productive deep-structures for the generation of whole families of linguistic metaphors; they also provide the conceptual mappings that shape our habitual thinking about such familiar ideas as Life, Love, Politics and War. Politicians and philosophers exploit conceptual metaphorsto frame an issue and shape our expectations; when a CMfails to match our own experience, we reject it and switchto a more apt metaphor. So a metaphor-generating bot canthus create a thought-provoking opposition by pitting oneCM against another that advocates a conflicting view ofthe world. The following tweet from @MetaphorMagnetuses this approach to contrast two views on #Democracy: To some voters, democracy is an important cornerstone. To others, it is a worthless failure. #Democracy=#Cornerstone #Democracy=#Failure The CM Democracy Is A Cornerstone (of society) is oftenused to frame political discussions, and can be seen as an specialization of the CM Society Is A Building, itself an elaboration of the CM Organization Is Physical Structure(see Grady, 1997). Yet the importance of cornerstones tothe buildings they anchor finds a sharp contrast in theassertion that Democracy Is A Failure. Each of these affective claims is so commonly asserted that they can be found in the Google n-grams, a large database of shortfragments of frequent Web texts. The 4-gram “democracy is a cornerstone” has a frequency of 91 in the Google n- grams, while the 4-gram “democracy is a failure” has a frequency of 165. These n-grams, which suggest potentialCMs for @MetaphorMagnet, are elaborated with added detail via the Metaphor Magnet Web service, which tells the bot that the stereotypical cornerstone is important and the stereotypical failure is worthless. The following tweetmakes similar use of CMs found in the Google n-grams, but renders the conflict in a different linguistic container: Remember when tolerance was promoted by crusadingliberals? Now, tolerance is violence that only fearful appeasers can avoid. The bot is guided here by the suggestive Google 3-gram“Tolerance for Violence” (frequency=1353), but it doesnot directly contrast the ideas #Tolerance and #Violence. Instead, it finds a potential analogy in this juxtaposition, between the promoters of #Tolerance (which it renders ascrusading liberals) and the opponents of #Violence (which it renders as fearful appeasers). The choice of stereotypical properties (crusading and fearful) is drivenby the bot’s need to create a resonant semantic opposition. The bot omits the hashtags #Tolerance=#Violence fromthis tweet due to the confines of Twitter’s 140-character limit. But it can also choose to render a complex conceitacross two successive tweets, as in the following pair: Proceedings of the Sixth International Conference on Computational Creativity June 2015 Remember when research was conducted by prestigious philosophers? #Research=#Fruit #Philosopher=#Insect Now, research is a fruit eaten only by lowly insects. #Research=#Fruit #Philosopher=#Insect @MetaphorMagnet uses a number of packaging strategiesto turn a figurative comparison into an ironic observation, ranging from the use of an explicit #irony hashtag (which is commonplace on Twitter) to the use of “scare” quotesto focus on the part of a tweet most deserving of disbelief. The following tweet showcases both of these strategies: #Irony: When some chefs prepare "fresh" salads the way apothecaries prepare noxious poisons. #Chef=#Apothecary #Salad=#Poison Irony offers a concise means of contrasting two points ofview: that which is expected and the disappointing reality. By comparing the preparation of salads – the “healthy” option on most menus – to the preparion of poisons, thisanalogy undermines the expectation of healthfulness and suggests that some salads are noxious and chemical-filled. The real world is filled with situations in which naturally antagonistic properties are found in surprising proximity. These situations, if expressed in the right linguistic form, can be elevated to the level of situational irony. Consider, for instance, the following @MetaphorMagnet tweet: #Irony: When the timers that are found in enjoyablegames activate gruesome bombs. #Enjoyable=#Gruesome It is important to stress that @MetaphorMagnet does not simply fill linguistic templates with related words. Rather, the above tweet is constructed at the knowledge-level, bya bot that intentionally seeks out stereotypical norms thatare related (e.g. by a pivotal idea timer) yet which can beplaced into antagonistic juxtapositions around this pivot. In effect, the goal of the linguistic rendering is to packagea knowledge-level conceit – typically a conflict of ideasand properties – in a tweet-sized narrative. For example, the following tweet is rendered as a narrative of change: To join and travel in a pack: This can turn pretty girlsinto ugly coyotes. #Girl=#Coyote Twitter offers unique social affordances that allow a bot to elevate almost any contrast of ideas into a dramaticnarrative. Rather than talk of generic liberals or appeasers, a bot can give these straw men real names, or at leastinvent fake names that look like the real thing and which, as Twitter handles, seem wittily apropos to the views that are espoused. In this way, by imagining its central conceitas a topic of a vigorous debate by real people, a bot canturn an abstract metaphor into a concrete situation with itsown colorful participants. Consider the social debate that is made personal in this tweet from @MetaphorMagnet: .@war_poet says history is a straight line .@war_prisoner says it is a coiled chain #History=#Line #History=#Chain The handles @war_poet and @war_prisoner are invented by @MetaphorMagnet to suit, and amplify, the figurativeviews that they are advanced in the tweet, by using a mix of relational knowledge (from the Metaphor Eyes service) and language data (via the Google n-grams). Since poetswrite poems about the wars that punctuate history, andpoems contain lines, the 2-gram “war poet” is recognizedas an apt handle for an imaginary Twitter user who mightadvance a view of history as a line. In this case the handle @war_poet really does name a real Twitter user, but thisonly adds to the sense that Twitterbot confections are anew kind of interactive theatre and performance art (seeDewey, 2014). Note that the more profound aspects ofthis contrast are not appreciated by @MetaphorMagnetitself, or at least not yet. For example, the bot does not yet appreciate what it means for history to be a straight line, and while it knows enough to invent the intriguing handle@war_prisoner, neither does it appreciate what it might mean to be a prisoner of history, enslaved in a repeatingcycle of war. The placebo effect is not a binary effect: it benefits by degrees, and can benefit knowledge-rich botsjust as much as knowledge-free bots. Our bots will alwaysevoke in we humans more than they themselves can everappreciate, yet this may itself be a key part of CC’s allure. Bot Vs. Bot : The Metaphor Challenge @MetaphorMagnet differs from @MetaphorMinute in a number of key ways. For one, its mechanics are informed by Lakoff and Johnson’s Conceptual Metaphor Theoryand a range of computational approaches. For another, itdraws on considerable semantic and linguistic resources, from a large knowledge-base of conceptual relations and stereotypical beliefs to the linguistic diversity of the Google n-grams. All of @MetaphorMagnet’s tweets – all its hits and all its misses – are open to public scrutiny onTwitter. But to empirically evaluate the success of the botas a knowledge-based, theory-driven producer of novel, meaningful and retweet-worthy metaphors, we turn to thecrowdsourcing platform CrowdFlower, where we conduct a comparative evaluation of @MetaphorMagnet and its closest knowledge-free counterpart, @MetaphorMinute. The latter, designed by noted bot-maker Darius Kazemi, uses a wholly aleatoric approach to metaphor generationyet has over 500 followers on Twitter that do not mind itsone-every-two-minutes scattergun approach to generation. @MetaphorMinute crafts metaphors by filling a template with nouns and adjectives that are chosen more-or-less at random, to produce inscrutable tweets such as “a cubit is a headboard: stational yet tongue-obsessed.” We chose 60 tweets at random from the past outputs of each Twitterbot. CrowdFlower annotators, who were each paid a small sum per judgment, were not informed of the origin of any tweet, but simply told that each was selected from Twitter because of its metaphorical content. We did not want annotators to actively suspend their disbelief by knowingly dealing with bot outputs. Annotators were paid to rate the content of each tweet along three dimensions, Proceedings of the Sixth International Conference on Computational Creativity June 2015 Comprehensibility, Novelty and likely Retweetability, and to rate all three dimensions on the same scale: Very Low to Medium Low to Medium High to Very High. Ten annotations were solicited for each dimension of each tweet, though the responses of likely scammers (non- engaged annotators) were later removed from the dataset. Tables 1 through 3 present the distributions of mean ratings per tweet, for each dimension and each Twitterbot. Comprehensibility @Metaphor Magnet @Metaphor Minute Very Low Med. Low 11.6% 13.2% 23.9% 22.2% Med High 23.7% 22.4% Very High 51.5% 31.6% Table 1. Relative Comprehensibility of each bot So more than half of @MetaphorMagnet’s tweets were ranked as having very high comprehensibility, while less than one third of @MetaphorMinute’s tweets are so ranked. More surprising, perhaps. is the result that annotators found more than half of @MetaphorMinute’s wholly random metaphors to have medium-high to very- high comprehensibility. This Twitterbot’s use of abstruse terminology, such as stational and peachblow, may be a factor here, as might the bot’s use of the familiar copula container X is Y for its metaphors, which may well seduce annotators into believing that an apparent metaphor really does have a comprehensible meaning, if only one were to expend enough mental energy to actually discern it. Tweet Novelty @Metaphor @Metaphor Magnet Minute Very Low 11.9% 9.5% Med. Low 17.3% 12.4% Med High 21% 14.9% Very High 49.8% 63.2% Table 2. Comparative Novelty of each bot’s tweets The dimension Novelty yields results that are equally surprising. While half of @MetaphorMagnet’s metaphors are rated as having very-high novelty in Table 2, almost two-thirds of @MetaphorMinute’s tweets are just as highly rated. However, we should not be overly surprised that @MetaphorMinute’s bizarre juxtapositions of rare or unusual words, as yielded by its unconstrained use of aleatoric techniques, are seen as more unusual than those word juxtapositions arising from @MetaphorMagnet’s controlled use of attested Web n-grams and stereotypical knowledge. As shown by Giora et al. (2004), novelty is neither a source of pleasure in itself nor is it a reliable benchmark of creativity. Rather, pleasurability derives from the recognition of useful novelty, that is, novelty that can be understood and appreciated relative to the familiar. Re-Tweetability @Metaphor Magnet @Metaphor Minute Very Low Med. Low 15.5% 41.9% 41% 34.1% Med High 27.4% 15% Very High 15.3% 9.9% Table 3. Relative Retweetability of each bot’s tweets On Twitter, useful exploitation is frequently a matter of social reach. A tweet is novel and useful to the extent that it attracts the attention of Twitter users and is deemed worthy of re-tweeting to others in one’s social circle. Our third dimension, Re-Tweetability, reflects the likelihood that an annotator would ever consider re-tweeting a given metaphorical tweet to others. Though we ask annotators to speculate here – neither bot has enough followers to perform a robust statistical analysis of actual retweet rates – the results largely conform to our expectations. The results of Table 3 show retweetability to be a matter of novelty and comprehensibility, and not just novelty alone. Though annotators are not generous with their Very-High ratings for either bot, @MetaphorMagnet’s tweets are judged to be considerably more re-tweetable than the largely random offerings of @MetaphorMinute. Comprehensibility and comprehension are two different things: while a Computational Creativity (CC) version of the placebo effect may well foster a belief that a given tweet has a coherent meaning, it cannot actually provide this meaning. Meaning is the product of interpretation, and interpretation is often hard. Milic (1970) notes that in a context that licences a poetic interpretation, such as one in which a reader is told that a particular text is a metaphor, readers are more likely to accept that the text – as inscrutable as it may be – has a metaphorical meaning rather than dismiss it as nonsense. Recall that over 75% of @MetaphorMagnet’s tweets and over 50% of @MetaphorMinute’s tweets are judged as having medium-high to very-high comprehensibility. We thus need to look deeper, to determine whether raters can actually back up these judgments with actual meanings. In a second CrowdFlower experiment, we make raters work harder, to reconstruct a partial tweet by adding the missing information that will make it whole and apt again. That is, we employ a cloze test format for this experiment, by removing from each tweet the pair of key qualities that anchor the tweet and make its comparison of ideas seem meaningful and apt. For @MetaphorMagnet, for example, we remove the properties detailed and vague in this tweet: Proceedings of the Sixth International Conference on Computational Creativity June 2015 To some freedom fighters, freedom is a detailed recipe. To others, it is a vague dream. #Freedom=#Recipe #Freedom=#Dream For @MetaphorMinute, we excise the pair of qualities hippy and revisional from the following tweet: a flatfoot is a houseboat: hippy and revisional For each tweet from each bot, we blank out a pair of original qualities as above; this pairing is the answer that is sought from human judges. We also choose 4 distractor pairs for each original pair, by selecting pairs from other tweets from the same bot. As in our first experiment, we chose 60 tweets at random from the past outputs of each bot, and 10 ratings were solicited for each. Annotators were presented with a tweet in which the key properties were blanked out (as above), and given five randomly ordered pairs of possible fillers to choose from. To make the results of the experiment comparable to those of the 1st experiment (Tables 1,2,3), we obtain the mean aptness of each tweet, so that e.g. if 7 out of 10 raters correctly choose the original pairing, then that tweet is deemed to have an aptness of 0.7. We then place these aptness scores into bands, where the Very Low band = 0 to 0.25, Medium Low = 0.26 to 0.5, Medium High = 0.51 to 0.75, and Very High = 0.76 to 1. By calculating the distribution of tweets to each band, we can determine e.g. the percentage of tweets from each bot that are put into the Very High band. Our hypothesis is rather straightforward: if tweets are linguistic containers that are carefully crafted to convey a particular meaning, then it should be easier to select the missing pair of qualities that make this meaning whole; if, on the other hand, the tweet is all there is, and its content is chosen mostly at random, then raters will choose the right pairing with no more success than random selection. The results reported in Table 4 bear out our hypothesis. Metaphor Aptness @Metaphor @Metaphor Magnet Minute Very Low 0% 84% Med. Low 22% 16% Med High 58% 0% Very High 20% 0% Table 4. Relative Aptness of each bot’s metaphors The placebo effect in CC can lead us to appreciate a bot’s tweets as meaningful but cannot tell us what this meaning should be. Though the results above may seem a foregone conclusion, as @MetaphorMagnet’s tweets are designed to communicate a fully recoverable meaning while those of @MetaphorMinute are not, this is surely what it means to engage in real communication: to design an utterance so that an intended meaning is re-created, in whole or in part, in the mind of an intelligent, receptive audience. Fake It ‘Til You Make It The Placebo Effect benefits all Computational Creativity systems, from superficial users of surealistic techniques to sophisticated knowledge-based AI systems. That this is so should come as no surprise, for we humans also benefit from the effects of an active and receptive mind when dealing with other people. Just as a prior belief in the efficacy of a medical intervention can lead us to perceive (and experience) a post-hoc benefit from an otherwise empty treatment, a prior belief in the meaningfulness of a verbal intervention can lead us to perceive (and enjoy) a creative meaning where none was ever intended. When a CC system uses superficial techniques to convey a sense of understanding and profundity with otherwise shallow linguistic forms, as in Weizenbaum’s (1965) infamous ELIZA system, the label “ELIZA Effect” proves to be an apt one (Hofstadter, 1995). However, we humans are also subject to an ELIZA effect of our own, insofar as we often do others the courtesy of assuming their utterances to be freighted with real meaning and creative intent, and will often work hard to uncover that meaning for them. At one time or another, we have all relied on catchphrases, clichés, slogans, idioms, canned jokes and other half-empty linguistic containers to suggest to others that we have deeper meanings in mind, or have something more profound to offer, than we actually do. In a famous polemical essay from 1946, George Orwell excoriates speakers of English for their reliance on jargon, foreign words and empty phraseology as a substitute for thoughts of real substance, while Geoff Pullum (2003) upbraids modern speakers for a grating over-reliance on “multi-use, customizable, instantly recognizable, time-worn, quoted or misquoted phrases or sentences that can be used in an entirely open array of different jokey variants by lazy journalists and writers.” These “phrases for lazy writers in kit form” are not that different from the template-based language games played by superficial Twitterbots, and though we humans fill our templates – such as “X is the new black”, “In X no one can hear you scream” or “if the Eskimos have N words for snow then Xs surely have as many for Y” – with lexical fillers that are contextually apt, we employ our templates to be just as provocative, and to imply or to suggest more than we actually mean. To see machines work with humans in the construction of real figurative meanings, readers are directed to a variant of @MetaphorMagnet – a related bot named @MetaphorMirror – that tweets its own novel metaphors in response to breaking news events. This bot’s metaphors are not offered as informative summaries of the news, but as figurative lenses through which followers can view the news and adopt a novel perspective on human affairs. Acknowledgements This research was supported by the EC project WHIM: Proceedings of the Sixth International Conference on Computational Creativity June 2015 The What-If Machine. See http://www.whim-project.eu/ References Thorsten Brants and Alex Franz. (2006). Web 1T 5-gram database, Version 1. Linguistic Data Consortium. Jaime G. Carbonell. (1981). Metaphor: An inescapablephenomenon in natural language comprehension. Report 2404. Carnegie Mellon Computer Science Dept. William Chamberlain and Thomas Etter. (1983). The Police-man’s Beard is Half-Constructed: Computer Proseand Poetry. Warner Books. Simon Colton, Jacob Goodwin and Tony Veale. (2012). 3rd Full-FACE Poetry Generation. In Proc. of the International Conference on Computational Creativity, Dublin, Ireland. Caitlin Dewey. (2014). What happens when @everyword ends? Intersect, Washington Post, May 23rd edition. Dan Fass. (1991). Met*: a method for discriminatingmetonymy and metaphor by computer. Computational Linguistics, 17(1):49-90. Dedre Gentner, Brian Falkenhainer and Janice Skorstad. (1989). Metaphor: The Good, The Bad and the Ugly. InTheoretical Issues in NLP, Yorick Wilks (Ed.) Hillsdale, NJ: Lawrence Erlbaum Associates. Pablo Gervás. (2000). Wasp: Evaluation of different strategies for automatic generation of Spanish verse. In Proc. of the AISB-2000 Symposium on Creative & Cultural Aspects of AI, 93-100. Rachel Giora, Ofer Fein, Jonathan Ganzi, Natalie Alkeslassy Levi and Hadas Sabah. (2004).Weapons of Mass Distraction: Optimal Innovation and Pleasure Ratings. Metaphor and Symbol 19(2):115-141. Sam Glucksberg. (1998). Understanding metaphors. Current Directions in Psychological Science, 7:39-43. Joseph Grady. (1997). Foundations of Meaning: PrimaryMetaphors and Primary Scenes. University of California. Douglas Hofstadter. (1995). The Ineradicable Eliza Effectand Its Dangers. Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms ofThought (Preface 4), Basic Books: New York. James Hutton (translator) (1982). Aristotle’s Poetics. New York, NY: Norton. George Lakoff and Mark Johnson. (1980). Metaphors We Live By. Chicago, Illinois: Chicago University Press. James H. Martin. (1990). A Computational Model of Metaphor Interpretation. Academic Press. Ruli Manurung, Graeme Ritchie and Henry Thompson. (2012). Using genetic algorithms to create meaningful poetic text. JETAI 24(1):43–64. Matthew S. McGlone and Jessica Tofighbakhsh. (1999). The Keats heuristic: Rhyme as reason in aphorism interpretation, Poetics 26(4):235-44. Matthew S. McGlone and Jessica Tofighbakhsh. (2000). Birds of a feather flock conjointly (?): rhyme as reason in. aphorisms. Psychological Science 11 (5): 424–428. Louis T. Milic. (1971). The possible usefulness of computer poetry. The Computer in Literary and Linguistic Research, R.A. Wisbey (Ed.), Cambridge, MA. Geoffrey Pullum. (2003). Phrases For Lazy Writers in Kit Form. Language Log post, October 27, 2003. George Orwell. (1946). Politics and the English language. Horizon, 13(76), April issue. Michael J. Reddy. (1979). The conduit metaphor: A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and Thought, 284–310. Cambridge University Press. Christopher B. Ricks, (1980). Clichés. In: L. Michaels and C. Ricks (Eds), The State of the Language. University of California Press, Berkeley. Tony Veale and Mark T. Keane. (1992). Conceptual Scaffolding: A spatially founded meaning representation for metaphor comprehension. Computational Intelligence 8(3):494-519. Tony Veale and Mark T. Keane. (1997). The Competence of Sub-Optimal Structure Mapping on ‘Hard’ Analogies. In Proceedings of IJCAI’97, the 15th International Joint Conference on Artificial Intelligence. Nagoya, Japan. Morgan Kaufmann. Tony Veale and Guofu Li. (2011). Creative Introspection and Knowledge Acquisition. In Proc. of AAAI-2011, The 25th Conference of the Association for the Advancement of Artificial Intelligence. San Francisco: AAAI Press. Tony Veale and Guofu Li. (2013). Creating Similarity: Lateral Thinking for Vertical Similarity Judgments. In Proceedings of ACL 2013, the 51st Annual Meeting of the Assoc. for Computational Linguistics, Sofia, Bulgaria, Tony Veale. (2013). Less Rhyme, More Reason: Knowledge-based Poetry Generation with Feeling, Insight and Wit. In Proc. of ICCC 2013, the 4th Int. Conference on Computational Creativity. Sydney, Australia. Tony Veale. (2014). Running With Scissors: Cut-Ups, Boundary Friction and Creative Reuse. In Proceedings of ICCBR-2014, the 22nd International Conference on Case- Based Reasoning. Eileen Cornell Way. (1991). Knowledge Representation and Metaphor: Studies in Cognitive systems. Kluwer. Joseph Weizenbaum. (1966). ELIZA – A Computer Program For the Study of Natural Language Communication Between Man And Machine. Communications of the ACM 9 (1): 36–45. Yorick Wilks. (1978). Making Preferences More Active. Artificial Intelligence 11(3):197-223. Proceedings of the Sixth International Conference on Computational Creativity June 2015