Poetic Machine: Computational Creativity for Automatic Poetry Generation in Bengali Amitava Das Department of Computer Science and Engineering University of North Texas Denton, Texas, USA amitava.santu@gmail.com Abstract The paper reports an initial study on computational poetry generation for Bengali. Bengali is amorpho-syntactically rich language and partiallyphonemic. The poetry generation task has beendefined as a follow-up rhythmic sequence generation based on user input. The design processinvolves rhythm understanding from the given input and follow-up rhyme generation by leveragingsyllable/phonetic mapping and natural languagegeneration techniques. A syllabi.cation engine based on grapheme-tophoneme mapping has been developed in orderto understand the given input rhyme. A SupportVector Machine-based classifier then predicts thefollow-up syllable/phonetic pattern for the generation and candidate words are chosen automatically, based on the syllable pattern. The final rhythmic poetical follow-up sentence is generatedthrough n-gram matching with weight-based aggregation. The quality of the automatically generated rhymes has been evaluated according to threecriteria: poeticness, grammaticality, and meaningfulness. Introduction Cognitive abilities can be divided into three broad categories: intelligence, aesthetics, and creativity. Supposesomeone has read a sonnet by Shakespeare and is askedthe following questions: • Do you understand the meaning of this sonnet?If the reader says yes, s/he has used her/his intelligence together with knowledge of the English language and world knowledge to understand it. • Do you like this sonnet?Whatever is answer, the reader is using a subjectivemodel of liking — and this is what is called aestheticappreciation or sentiment. • Can you add two more lines to this sonnet?So the reader has to write some poetry — and has touse her/his creative ability to do it. Artificial Intelligence is a now six-to-seven decadesmatured research field. The majority of the research Björn Gambäck Department of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway gamback@idi.ntnu.no efforts until now have concentrated on the understanding of natural phenomena. During the latest twodecades, we have witnessed a huge rise of research attention towards affect understanding, that is, the second level of cognition. However, there have so far been pretty few attempts towards making machinestruly creative. The paradigm of computational creativity is actually still in infancy, and most of thoseefforts that have been carried out have concentrated on music or art. Still, computer systems have already made some novel and creative contributions inthe fields of mathematical number theory (Colton 2005;Colton, Bundy, and Walsh 2000) and in chess openingtheory (Kaufman 2012). In this paper, in contrast, we look at computationallinguistic creativity, and in particular poetry generation. Computational linguistic creativity has only inthe last few years received more wide-spread interestby language technology researchers. A book on linguistics creativity was recently written by Veale (2012), andin particular the research group at Helsinki Universityis very active in this domain (Toivanen et al. 2012; Gross et al. 2012; Toivanen, Toivonen, and Valitutti 2013; Toivanen, Järvisalo, and Toivonen 2013). Someother interesting research attempts have also been made(Levy 2001; Colton, Goodwin, and Veale 2012, e.g.,),but the approaches still vary widely. The field of automatic poetry generation was pioneered by Bailey (1974), although Funkhouser (2009)quotes work going back to the 1950s. These systemswere written by actual poets who were keen to explorethe potential of using computers in writing poetry andwere not fully autonomous. Thereafter, Gervás and hiscolleagues were the first to discuss sophisticated approaches to automatic poetry generation (Gervás 2000;2001a; 2001b; 2002a; 2002b; Díaz-Agudo, Gervás, andGonzález-Calero 2002; Gervás et al. 2007). Gervás’ work established the possibility of automatic poetrygeneration and has in the last decade been followed bya moderate number of attempts at linguistics creativityand in particular at automatic poetry generation. The system developed by Manurung (2004) uses agrammar-driven formulation to generate metrically constrained poetry out of a given topic. In addition to the scienti.c novelty, the work defined the fundamental evaluation criteria of automatic poetry generation: meaningfulness, grammaticality, and poeticness. A complete poetry generation system mustgenerate texts that adhere to all these three properties. An alternative approach to evaluation wouldbe to adopt the criteria specified by Ritchie (2007;2001) for assessing the novelty and quality of creativesystems in general based on their output. All these previous efforts were inspiration points forthe present work, but as we are unable to conclude whatmethod performs best, we decided to propose a new architecture by following the rules and practices of Bengali poems and writings. There is no previous similarwork in Bengali, nor on other Indian languages, exceptattempts at automatic analysis and generation of Sanskrit Vedas (Mishra 2010) and at automatic Tamil lyricgeneration (Ramakrishnan A, Kuppan, and Devi 2009;Ramakrishnan A and Devi 2010). The basic strategy adopted here is not to try to makethe system create poetry on its own, but rather in collaboration with the user. And not a complete poem, but rather one poetry line at a time. The user enters a line of poetry and the system generates a matching, rhymingline. This task then in turn involves two subtasks: rhyme understanding and rhyme generation. Rhymeunderstanding entails parsing the input line to understand its poetic structure. Rhyme generation is basedon the usage of a Bengali syllabi.cation engine and aSupport Vector Machine (SVM) based classifier for predicting the structure of the output sentence and candidate word generation, combined with bigram pruningand weighted aggregation for the selection of the actualwords to be used in the generated rhyming line. The rest of the paper is laid out as follows: To givean understanding of the background, we first discussthe Bengali language as such and the different rhythmsand metres that are used in Bengali poems. Thereafterthe discussion turns to the chosen methods for poetryline understanding and generation, starting by givingdetails of a corpus of poems collected for rhyme understanding, and then in turn describing the rhyme understanding and the rhyme generation tasks, and theirrespective subparts. Finally, an evaluation of the poetry generation model is given, in terms of the threedimensions poeticness, grammaticality, and meaningfulness. Bengali and Bengali Poetry Bengali (ethnonym: Bangla) is the seventh largest (interms of speakers) language worldwide. It originatesfrom Sanskrit and belongs to the modern Indo-Aryanlanguage family. Bengali is the second largest languagein India and the national language of Bangladesh. Bengali poetry has a vibrant history since the 10th century and the modern Bengali poetry inherited its basicground from Sanskrit. As the first non-European NobelLiterature Laureate and known mainly for his poems, Rabindranath Tagore (1861–1941) was the pioneer whofounded the firm basis of modern Bengali poetry. Bengali Orthography and Syllable Patterns Bengali, just as all Modern Indo-Aryan languages being derived from Sanskrit, is partially phonemic. Thatis, its pronunciation style depends not only on orthographic information, but also on Part-of-Speech (POS)information and semantics. Partially phonemic languages use writing systems that are in between strictlyphonemic and non-phonemic. Bengali — and manyother modern Indo-Aryan languages — still uses Sanskrit orthography, although the sounds and the pronunciation rules have changed to varying degrees. The modern Bengali script contains the characters(known as ak.ara) for seven vowels (/i/ /u/, /e/, /o/, /a/, /O/, /a/), four semi-vowels, (/j/, /w/, /e./, /o./),and thirty consonants. Many diphthongs are possible,although they must always contain one semi-vowel, butonly two of the diphthongs are represented directly inthe script (i.e., have their own ak.ara: /oi/and /ou/).All vowels can be nasalized (written as /a/, etc.) andvowel deletion (e.g., schwa deletion) is common, particularly in word medial and final positions. A phonetic group of Bengali consonants is called aborgo (..Î). As we shall see below, these groups areparticularly important in poetic rhymes. There are five basic borgos in Bengali and four separate pronunciation groups, as shown in Table 1, where each consonant is displayed together with its pronunciation inthe International Phonetic Alphabet (IPA). Many consonant sounds can be either unaspirated or aspirated(e.g., /ú/vs /úh/). The first five borgos are named according to their first character. In each borgo, the first consonant takes the least stress when pronounced andthe last takes the highest stress. The first member isthus called less-stressed (alpo-pra.: .O J..), the secondto forth members are called high-stressed (maha-pra.: ...J..), and the .fth and last is a nasal (nasik: .....Ë). Following the classification of Sarkar (1986), Bengalihas 16 canonical syllable patterns, but CV (consonantvowel) syllables constitute 54% of the whole language(Dan 1992). Patterns such as CVC, V, VC, VV, CVV,CCV, and CCVC are also reasonably frequent. For more detailed recent overviews of Bengali phonetics, werefer the reader to, for example, Sircar and Nag (2014),Barman (2011) or Kar (2009), and just take the examples below of Bengali orthography — originally devisedby Chatterji (1926) — to illustrate how it has deviatedfrom the strictly phonemic orthography of Sanskrit. • Consonant clusters are often pronounced as geminates irrespective of the second consonant. Thus: bAkya /bakko/, bakSha /bO/kkho, bismaYa /biSSOe/. • Single grapheme for multiple phonemes: The vowel [e] is pronounced as either /e/or /a/. The ambiguity cannot be resolved by the phonological context alone as the etymology is often the underlyingreason. For example: eka /ak/, but megha /megh/. Borgo Name Consonant Members . (k)-borgo . (tS)-borgo . (ú)-borgo . (.)-borgo . (p)-borgo .G.U(internal)-sound .Š(warm)-sound .......(scolding)-sound ...O..(parasitic)-sound . (k) . (tS) . (ú) . (.) . (p) . (A) . (S) . (ó) . (kh) . (g) . (gh) . (tSh) . (A) . (Ah) . (úh) . (a) . (ah) . (.h) . (.) . (.h) . (ph) . (b) . (bh) . (e.) . (R) . (l) . (S) . (s) . (h) . (ó) .. (h) . (N) . (n) . (n) . (n) . (m) .. (N) Table 1: Bengali borgo-phonetic groups [a] is pronounced as /o/word medially or word finallyin specific contexts: nagara /nOgor/, bakra /bOkro/. • Vowel harmony or vowel height assimilation: [a] and [e] are pronounced as /o/ resp. /e/ if followed by a high vowel (/u/ or /i/): patha /pOth/, but pathika /pothik/; ekaTA /akúa/, but ekaTu /ekúu/. • Schwa deletion: [a] is deleted from word final ormedial open syllables under specific conditions dependent on phonotactic constraints and etymology.For example: AmarA /amra/, darbAra /dOrbar/. Metres and Rhythms in Bengali Bengali poetry has three basic and common metres: ak.ara-v.tta, matra-v.tta, and svara-v.tta. The first two were inherited from Sanskrit, while the third is more genuinely Bengali. However, before Tagore popularized it, the svara-v.tta was used mainly for nursery rhymes and not really recognised as a serious poetic metre. The matra-v.tta and svara-v.tta metres are based on the length of the vowels. The ak.ara-v.tta metre is in contrast in Sanskrit based on the number of letters in a line (ak.ara is the Sanskrit letter); however, in Bengali poetry the number of syllables are counted rather than the number of letters. The letters . (a), . (i) and . (u) are counted as being of one unit (matra) each, that is, a short vowel (mora), while . (e), . (ai), . (o), and . (au) are counted as being two units each, that is, along vowel (macron). Furthermore, at the end of a linea short vowel may be counted as a long one. The concepts of open and closed syllables are alsocentral to Sanskrit prosody and poetry: closed syllablesare those ending with a vowel sound, while those endingwithout vowels are called open. In Bengali, a syllableis considered as being one or two units long dependingon its position in a line, rather than on whether it isopen or closed. If a line begins with a closed syllable,the syllable is counted as one unit, but if it occurs atthe end of a line it is counted as two units. In the matra-v.tta metre, the position of closed syllables doesnot matter; they are always counted as two units. In asimilar fashion, in the svara-v.tta each vowel (svara) is counted as one unit, regardless of whether the syllablesare open or closed. There are three types of rhymes in Sanskrit poetry,depending on whether the rhyme is on the first syllable of each line (adiprasa), or on the second syllable (dviteeyakshara prasa), or if it is the final syllable ofthe line which is rhyming (antyaprasa). The most important rhyme for our purposes is antyaprasa, which is known as tail-rhyme or end-alliteration in English, and as anto-mil in Bengali poetry. There are many overviews and in-depth analysesof the metres and rhythms of Bengali poetry writtenin Bengali, but fairly few available in English. The reader is referred to Arif (2012), or the writings of Aurobindo (2004) that give a more poetic angle. Here, wewill concentrate on poems written in matra-v.tta metre with anto-mil rhyme, as these poems are relatively easyto understand and generate. The Poetry Generation Model The previous efforts on investigating computer poeticcreativity vary widely in terms of the poetry generationapproaches. Some have used document corpus-basedmodels (Manurung 2004; Toivanen et al. 2012), whileothers have used constraint-programming based models (Toivanen, Järvisalo, and Toivonen 2013) or geneticprogramming based models (Manurung, Ritchie, andThompson 2012). In contrast, we choose a conversation follow-up modelhighly inspired by the Bengali movie ‘Hirak Rajar Deshe’ (‘Kingdom of Diamonds’, 1980) by Oscar winning director Satyajit Ray (the son of Sukumar Ray,the poet whose writings form the basis of our rhymeunderstanding corpus, as further discussed below). In Satyajit Ray’s movie, the entire conversation wasin rhythm. For example: ... .. I... ... (1) Era yatabeoi pa.e they as much more read ‘The more they read’ .. I... .... (2) Tata that beoi more janeknow ‘The more they learn’ .. .. .... (3) Tata kama mane that less obey ‘The less they obey’ For the present task, the follow-up model means thatthe system automatically generates a follow-up rhythmic line based on the user’s one-line poetry input. For example, if the given sentence is: .. ...Â.. ... ... (4) E’i duniyyara sakala bhala this world everything good ‘All is well in the world’ the machine could generate a follow-up line such as: ... ... ... ... (5) Asala bhala nakala bhala best good fake good ‘Real is good, even fake is also good’ There are two essential modules for effective follow-up poetry generation in Bengali: rhyme structure understanding of the given user input and matching rhymegeneration. The development of those modules is discussed in turn in the next two sections. Rhyme Understanding The initial step involves understanding the rhyme in aninput line given by the user. The actual rhyme understanding module consists of syllable identification followed by borgo identification and open/closed syllableidentification. Firstly, however, it is necessary to collect a corpus in order to understand the rhythm andmetre structures of Bengali poems. Corpus Acquisition To collect the corpus, several dedicated Bengali poemsites (called Kobita in Bengali)1 were chosen. For the present task, we choose mainly poems written for children, as they mostly are written in matra-v.tta metre and with anto-mil (tail) rhyme, which is relatively easyto start with for the task of automatic poetry generation. The poems chosen were mainly written by Sukumar Ray (1889–1923), as the rhyme structure of thosepoems is fairly easy to grasp. A few of Tagore’s poems, in particular those written for children, were alsocollected. Corpus size statistics are reported in Table 2. This corpus was used later on to train a classifier topredict follow-up rhyme syllables. Therefore, from thecollected poems only those pairs of lines were extractedthat had both matra-v.tta metre and anto-mil rhythm. 1http://www.bangla-kobita.com/ Type of units Number Sentences 3567 Words 9336 Unique tokens 7245 Table 2: Bengali poem corpus size statistics Syllabi.cation Syllabi.cation processes depend directly on the pronunciation patterns of any language. In Bengali poetry,open and closed syllables have been used deliberately tocontinue or stop rhythmic matras (units), as describedin the section above on Bengali poetry. These are important features for syllabi.cation. In order to implement a syllabi.cation engine, we developed a grapheme to phoneme (G2P) converter following the methods discussed by Basu et al. (2009).The consonants and vowels IPA patterns were inherited from that work, while the orthographic and contextual rules were rebuilt. An open-source Bengali shallowparser based POS tagger2 was used for the task. With the help of this list, the syllabi.cation enginemarks every input word according to its borgo. If a word stars with a vowel, the system marks it as a ‘v’ group.Only the rules mentioned in the paper by Basu et al.have been included, whereas a few things that are notclearly described in the paper remain unattended, forexample, some orthographic and exception rules. Anexample of syllabi.cation output is given in Table 3,where the input is the first line of Sukumar Ray’s poem‘Cloud Whims’, ‘Meghera khe yyala’(I.... I.Â..). Borgo Identification For open syllabic words, identification of the borgo class for the final character is quite important. In case no rhythmic follow-up word is available for the last word inthe given sentence, an alternative approach is to choosea word that ends with a consonant belonging to the same borgo. This helps in keeping the rhythm alive. For example, in the following sequence (also fromSukumar Ray’s poem ‘Cloud Whims’) the first line ends with .(/úh/) and the final word of the second line endswith a member of the same borgo, namely . (/t/). ...µ. ...µ. ...µ I.. .... .. ... (6) Bu.o bu.o dha.i megha.hipihayye u.he old old inveterate cloud mound becomes ‘The very old inveterate cloud looks like a hill’ .. ... ... ... ....... ..... (7) Ouyye ba’se sabha kare saradina ju.e laid sitting meeting all day fellows ‘They were meeting all the day with the gatheredfriends.’ 2http://ltrc.iiit.ac.in/showfile.php?filename= downloads/shallow_parser.php Input ...... akasher akaoera .Â.... ....... ma yyadane bataser ma yyadane batasera ... vore bhare English In the sky with the air Syllables Syllable count Open/Closed Borgo aka-oe-ra 3 o v ma yya-dane bata-se-ra 2 3 c o p p bhare 1 c p Table 3: Sample syllabi.cation output Rhyme Generation The automatic rhyme generation engine consists of several parts. First, an SVM-based classifier predicts syllable sequence patterns. Then, a set of candidate outputwords are selected from preprocessed syllable-markedword lists. In order to preserve the rhythm in the generated sentence, a few other parameters are checked,such as borgo classes, anto-mil, and whether the syllables are open or closed. Finally, bigrams are used toprune the list of candidate words and weighted sentenceaggregation used to generate the actual system output.These steps are described in detail in turn below. Syllabic Sequence Prediction A machine-learning classifier was trained for the syllabic rhyme sequence prediction. The Weka-based Support Vector Machine (SVM) implementation (Hall etal. 2009) was chosen as basis for the classifier The collected poetry corpus described above was used here fortraining and testing. The training corpus was split intorhythmic pairs of sentences, where the first line wouldrepresent the user-provided input whereas the secondline would be the one that has to be generated by the system. The input features for the syllabic sequenceprediction are: the syllable count sequence of the givenline, open/closed syllable pattern sequence of the givenline, and the borgo group marking sequence of the firstgiven line. The output labels for the training and testing phases are the syllable counts of each word. For simplicity only those pairs of sentences were chosen where the number of words are same in both the lines. The overall task has been designed as a sequencesyllable count prediction, but there are tricky trade-o.sfor initial position and the last position. The commonrhythmic pattern in Bengali poems is anto-mil (tailrhyme), so it is necessary to take care of the last word’ssyllables separately. Therefore three different ML engines have been trained: One for the initial position,one for the final position, and one for other intermediate positions. Feature engineering has been kept thesame for each design, whereas different settings havebeen adopted for the intermediate positions. Word Selection A relatively large word collection was used for the wordselection task. The collection consists of the created poem corpus and an additional news corpus.3 For rhythmic coherence, all words are kept in their in.ectedforms. In practice, stemming changes the syllable countof any word and may therefore affect the rhythm of therhythmic sequence. All word forms are pre-processed and labelled withtheir syllable counts using the G2P syllabi.cation module. For the word selection, the following strategies havebeen incorporated serially in the same sequential orderas they are described here, in order to narrow down thesearch space. Syllable-wise: All words with similar syllabic patterns are extracted from the word list. Closed Syllable / Open Syllable: Depending onthe word in the previous line at the corresponding position, either open or closed syllabic words are chosen.The rest of the words are discarded. Semantic Relevance: Semantic relevance is very essential to keep the generated rhyme meaningful. Thereis neither any WordNet publicly available for Bengalinor any relational semantic network like ConceptNet.Therefore the English ConceptNet (Havasi, Speer, andAlonso 2007) and an English-Bengali dictionary (Biovas2000) were used to measure the semantic relevance ofthe automatically chosen words. Before the semantic relevance judgement, each Bengali word from the given input is stemmed using themorphological analyser, packaged with the Bengalishallow parser. After stemming, those words are translated to English by dictionary look-up. The translatedEnglish words are then checked in the ConceptNet andall the semantically related words are extracted. Now,if a selected word co-occurs with the given word in theConceptNet extracted list, then it is considered as relevant. Otherwise it is discarded. For the ConceptNet 3http://www.anandabazar.com/ search, only nouns and verbs are considered. For example (same as in Table 3) if the given line is: ...... .Â.... ....... ... (8) Akaoera mayyadane batasera bhare sky field air filled ‘The sky is filled with the air from the fields’ The words that will be searched in ConceptNet are sky(....), field (.Â...), andair(.....). The extracted wordlist will then definitely contain words such as cloud(I..), which was used by Sukumar Ray in the originalpoem (again ‘Meghera khe yyala’ or ‘Cloud Whims’): I... .µ .... ..... .. I.. .... (9) Cho.a ba.a sada kalo kata meghacare small large white black many clouds grazing ‘Many large and small, black and white clouds are grazing.’ Borgo-wise: Borgo-wise similarity is checked and only words ending in the same borgo classes are keptfor the last position word. The other words are checkedfor first letter borgo-similarity, and the non-matching are discarded. Anto-mil: For anto-mil or tail-rhyme matching, anedit distance (Levenshtein 1966) based measure hasbeen adopted. If the Minimum Edit Distance is . 2, then any word is considered as homophonic and kept.This strategy only works for the final word position.The remaining members are excluded. Pruning and Grammaticality The methods described so far are able to produce word-lists for each word member from the input. Appropriatepruning and natural language techniques are requiredto generate grammatically correct rhythm sequencesfrom these word options. N-gram (bigram) matching followed by aggregationis used for the final sentence generation. The n-gramshave been generated using the same word collection asdescribed above, that is, the poem corpus plus the newscorpus. The system computes weights (frequency/total number of unique n-grams in the corpus) for each pair ofn-grams. For example, suppose that the total numberof generated word candidates for the first position wordis n1 and for the second position word it is n2. Then n1 • n2 valid comparisons have to be carried out. Thepossible candidates will be: n1n2 .. 12 w • w (10) ii i=0 i=0 Where the sums intend to represent the relevance ofusing one term after another to create a meaningfulword sequence. Suppose the targeted sentence has m Figure 1: Word sequence selection by n-gram pruning number of words. The process will then be continuedfor each successive bigram pair, for example, for 2345 m 1 - w 2 - w 3 - w 4 - w m-1 - w w ,w ,w ,w ,...,w Finally, the best possible combination is chosen bymaximizing the total weighted path as a multiplicationfunction (that is, by maximizing over the dot productof all the possible n-gram sequences). The process isillustrated in Figure 1. Experiments and Performance The generated system has been evaluated in two ways:through a set of in-depth studies by three dedicatedexpert evaluators and in more free-form studies by tenrandomly selected evaluators. As discussed in the introduction, three major criteriafor the quality assessment of automatic poetry generation have been used previously: poeticness, grammaticality, and meaningfulness (Manurung 2004). The sameevaluation measures have been applied to the presenttask. The evaluation process is manual and each of thethree dimensions is assessed on a 3-point scale: • Poeticness: (3) Rhythmic (2) Partially Rhythmic (1) Not Rhythmic • Grammaticality: (3) Grammatically Correct (2) Partially Grammatically Correct (1) Not Correct • Meaningfulness: (3) Meaningful (2) Partially Meaningful (1) Not Meaningful The evaluation results are reported in Table 4, wherethe scores assigned by three in-depth evaluators are reported separately, while the randomly selected evaluators have been grouped according to whether theyshould give short (not more than five words) input linesor whether they could give unrestricted length input.The whole assessment process is elaborated on below,including explanations for the scores given by the different evaluators. Evaluators #1 Dedicated experts #2 #3 Randomly chosen . 5 words unrestricted Poeticness 2.4 1.2 2.1 2.3 1.9 Grammaticality 1.7 1.0 1.4 1.8 0.6 Meaningfulness 1.5 0.9 1.1 1.6 0.8 Table 4: Evaluation of the Bengali poetry generator In-Depth Evaluation Three dedicated expert evaluators were chosen for anin-depth evaluation. One of them is a Bengali literature student, the second a Bengali journalist, and thethird a technical undergraduate student. Each of themwere asked to test the system performance on 100 inputsentences, chosen by themselves. Evaluator 1: Literature Student The Bengali literature student was instructed to collect100 simple poem lines from various poets, whose poemswere not included in our training set. Through discussion with the evaluator, we decided to choose lines from Satyendranath Dutta’s (1882–1922) poems since he isknown for his rhyme sense and renowned as the ‘wizardof rhymes’ (..H. .....) in Bengali literature. Also, hiscreatures are very easy to understand. We started with the famous ‘The Song of the Palanquin’, ‘Palkir Gan’ (...... ...). Following are someexamples of the output the system produced. The second lines in the examples were generated by the system,while the first lines were given to the system as input. ..... ... ! (11)‘Palanquin moves!’ .... .... ‘Trot pace’ uL .... (12)‘Stunned village’ .G i... ‘Cloggy doors’ The output in Example 11 is surprisingly good. Actually, the same line has been used as follow-up to thisinput line in one of the paragraphs of the original poem.The output in Example 12 is also good in terms of poeticness, but is less meaningful, while the first output isfabulous for all the evaluation criteria poeticness, meaningfulness and grammaticality. However, we obviouslyalso got many bad output sequences. Evaluator 2: Journalist The journalist evaluator was requested to judge thesystem’s performance on news line input and was instructed to chose short sentences with a prior assessment of having a possible poetic sequence. He choselines from the Bartanam newspaper.4 The best system 4http://bartamanpatrika.com/ output was the one in Example 13, where first line againis the input line and the second line has been generatedby the system. I. .... J....I. ? (13)‘Who will be the prime minister?’ ... I..... ...I. ‘Conspirator for the throne’ However, most of the system output in the news domain was unsatisfactory. From discussions with the evaluator, it was eminent that it also is very difficultfor humans to generate poetic sequences for any givenline, so it is naturally quite difficult for a machine to dothis, in particular if the lines are coming from a nonrhythmic news domain. Evaluator 3: Technology Student The technical undergraduate student was asked to choselines from modern Bengali songs, and was instructed tochose smaller and simpler sentences. In the evaluation,she assigned a high score to poeticness, but lower scoresto grammaticality and meaningfulness. Thus the system performed better than in the news domain, but inferior to the poetry domain. The best output producedby the system is shown in Example 14. ..... ... (14)‘Dive into the depth of your heart’ .... ... ‘Rectify yourself’ Evaluation by Random Evaluators Ten randomly selected evaluators (not connected to theresearch in any way) were asked to evaluate the system’sperformance on sentences given by themselves, with theonly restriction given that they should provide simpleexamples with possible tail-rhymes. The first five of them were instructed to limit their input to five words only. This is in order to understandsystem performance on longer vs shorter sentences. Asa result, we found that system performance is good onall the three aspects on shorter sentences, but that itdegrades drastically when longer sentences are given asinput. As can be seen in Table 4, this is in particularthe case for the dimension of grammaticality, and alsotrue for meaningfulness, while the scores on poeticnessare not that bad overall. Conclusion This paper has reported some initial experiments onautomatic generation of Bengali poems. Bengali is amorph-syntactically rich language which has inheritedthe characteristics and fundamentals of its poems fromSanskrit. Automatic rhyme generation for Bengali istherefore a relatively complex problem. The approachtaken here is novel and based on interaction with the user who enters a line of poetry, which the system thenaims to understand in order to generate a correspondingtext line, adhering to the rules and metres of Bengalipoetry and rhyming with the input. This basic system has many drawbacks and limitations, especially in the understanding of wide varietiesof rhythms and in terms of grammaticality. The rhymegeneration utilises a Bengali syllabi.cation engine andan SVM-based classifier for predicting the structure ofthe output sentence and for the candidate word generation, which is based on a notion of semantic relevance in terms of proximity mappings derived from ConceptNettranslations. The final selection of the actual poeticwords is presently done through bigram pruning andaggregation. Using the notion of semantic relevance is a computationally cheap way to automatically create meaningful rhymes, although poetry written by humans obviously do not always contain semantically related words.However, this is initial work and using ConceptNet is astraight-forward approach; and even though conceptualsimilarity hardly is the ultimate way to measure wordrelevance for poems, it is probably one of the easiestways. In the future, we would aim to involve furthernatural language generation techniques to create moremeaningful poetry. Acknowledgments Many thanks go to the evaluators for all their efforts,assistance and comments. We would furthermore like to thank the anonymous reviewers for several commentsthat helped to substantially improve the paper. We are grateful to the late Satyajit Ray (1921–1992)for directing the movie ‘Kingdom of Diamonds’ (‘Hirak Rajar Deshe’) which originally inspired our approach. A very special token of appreciation goes to the threeBengali poets who in the early years of the previouscentury wrote the verses that were used in the building,training and evaluation of our system: Sukumar Ray,Rabindranath Tagore and Satyendranath Dutta. References Arif, H. 2012. Prosody. In Banglapedia: the National Encyclopedia of Bangladesh. Asiatic Society of Bangladesh, 2 edition. Aurobindo, S. 2004. Letters on Poetry and Art, volume 27 of The complete works of Sri Aurobindo. Pondicherry, India: Sri Aurobindo Ashram Publication. Bailey, R. W. 1974. Computer-assisted poetry: Thewriting machine is for everybody. In Mitchell, J. L., ed., Computers in the Humanities. Edinburgh University Press. 283–295. Barman, B. 2011. A contrastive analysis of Englishand Bangla phonemics. Dhaka University Journal of Linguistics 2(4):19–42. Basu, J.; Basu, T.; Mitra, M.; and Mandal, S. 2009. Grapheme to phoneme (G2P) conversion for Bangla. In Proceedings of the Oriental International Conference onSpeech Database and Assessments COCOSDA, 66–71. IEEE. Biovas, O. 2000. Samsad Bengali-English dictionary. Calcutta, India: Sahitya Samsad, 3 edition. Chatterji, S. K. 1926. The Origin and Development of the Bengali Language. Calcutta University Press. Colton, S.; Bundy, A.; and Walsh, T. 2000. Automaticinvention of integer sequences. In Proceedings of theSeventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, 558–563. AAAI. Colton, S.; Goodwin, J.; and Veale, T. 2012. Full-FACE poetry generation. In Proceedings of the ThirdInternational Conference on Computational Creativity, 95–102. Colton, S. 2005. Automated conjecture making in number theory using HR, Otter and Maple. Journal of Symbolic Computation 39(5):593–615. Dan, M. 1992. Some issues in metrical phonology ofBangla: The indigenous research tradition. Phd Thesis, Deccan College, University of Poona, Pune (Poona),India. Díaz-Agudo, B.; Gervás, P.; and González-Calero, P. A.2002. Poetry generation in COLIBRI. In Advances in Case-Based Reasoning. Springer. 73–87. Funkhouser, C. 2009. Prehistoric Digital Poetry: AnArchaeology of Forms, 1959–1995. Modern and Contemporary Poetics. University of Alabama Press. Gervás, P.; Pérez y Pérez, R.; Sosa, R.; and Lemaitre, C. 2007. On the .y collaborative story-telling: Revisingcontributions to match a shared partial story line. In Proceedings of the 4th International Joint Workshop inComputational Creativity, 13–20. Goldsmiths, University of London. Gervás, P. 2000. WASP: Evaluation of different strategies for the automatic generation of Spanish verse. In Proceedings of the AISB-00 Symposium on Creative &Cultural Aspects of AI, 93–100. AISB. Gervás, P. 2001a. An expert system for the compositionof formal Spanish poetry. Knowledge-Based Systems 14(3):181–188. Gervás, P. 2001b. Generating poetry from a prosetext: Creativity versus faithfulness. In Proceedings ofthe AISB’01 Symposium on Artificial Intelligence andCreativity in Arts and Science, 93–99. AISB. Gervás, P. 2002a. Exploring quantitative evaluations ofthe creativity of automatic poets. In Proceedings of the2nd Workshop on Creative Systems, Approaches to Creativity in Artificial Intelligence and Cognitive Science,the 15th European Conference on Artificial Intelligence. Gervás, P. 2002b. Linguistic creativity at different levelsof decision in sentence production. In Proceedings of theAISB 02 Symposium on AI and Creativity in Arts andScience, 79–88. AISB. Gross, O.; Toivonen, H.; Toivanen, J. M.; and Valitutti, A. 2012. Lexical creativity from word associations. In Seventh International Conference on Knowledge, Infor mation and Creativity Support Systems, 35–42. IEEE. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; and Witten, I. H. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1):10–18. Havasi, C.; Speer, R.; and Alonso, J. B. 2007. ConceptNet 3: a flexible, multilingual semantic network forcommon sense knowledge. In Proceedings of the 6th International Conference on Recent Advances in Natural Language Processing. Kar, S. 2009. The syllable structure of Bangla in Optimality Theory and its application to the analysis ofverbal in.ectional paradigms in Distributed Morphology. Phd Thesis, Neuphilologischen Fakultät, Universität Tübingen, Tübingen, Germany. Kaufman, L. 2012. The Kaufman Repertoire for Black and White: A Complete, Sound and User-friendlyChess Opening Repertoire. Alkmaar, The Netherlands: New In Chess. Levenshtein, V. I. 1966. Binary codes capable ofcorrecting deletions, insertions and reversals. Soviet Physics Doklady 10(8):707–710. Levy, R. P. 2001. A computational model of poeticcreativity with neural network as measure of adaptivefitness. In Proceedings of the Workshop on CreativeSystems, International Conference on Case-Based Reasoning. Manurung, R.; Ritchie, G.; and Thompson, H. 2012.Using genetic algorithms to create meaningful poetic text. Journal of Experimental & Theoretical Artificial Intelligence 24(1):43–64. Manurung, H. M. 2004. An Evolutionary Algorithm Approach to Poety Generation. Phd Thesis, School of Informatics, University of Edinburgh, Edinburgh, UK. Mishra, A. 2010. Modelling A..adhyayi: An ap proach based on the methodology of ancillary disciplines (Veda.ga). In Sanskrit Computational Linguistics. Springer. 239–258. Ramakrishnan A, A., and Devi, S. L. 2010. An alternate approach towards meaningful lyric generation inTamil. In Proceedings of the NAACL HLT 2010 SecondWorkshop on Computational Approaches to LinguisticCreativity, 31–39. ACL. Ramakrishnan A, A.; Kuppan, S.; and Devi, S. L. 2009.Automatic generation of Tamil lyrics for melodies. In Proceedings of the Workshop on Computational Ap proaches to Linguistic Creativity, 40–46. ACL. Ritchie, G. D. 2001. Assessing creativity. In Proceedings of the AISB’01 Symposium on Artificial Intelligence and Creativity in Arts and Science, 3–11. AISB. Ritchie, G. D. 2007. Some empirical criteria for attributing creativity to a computer program. Minds and Machines 17(1):67–99. Sarkar, P. 1986. Aspects of Bengali syllables. In National Seminar on the Syllable in Phonetics and Phonology. Hyderabad, India: Osmania University. Sircar, S., and Nag, S. 2014. Akshara–syllable mappings in Bengali: a language-specific skill for reading.In Winskel, H., and Padakannaya, P., eds., South and Southeast Asian psycholinguistics. Cambridge University Press. 202–211. Toivanen, J. M.; Toivonen, H.; Valitutti, A.; and Gross, O. 2012. Corpus-based generation of content and formin poetry. In Proceedings of the Third InternationalConference on Computational Creativity, 175–179. Toivanen, J. M.; Järvisalo, M.; and Toivonen, H. 2013. Harnessing constraint programming for poetry composition. In Proceedings of the Fourth International Conference on Computational Creativity, 160–167. Toivanen, J. M.; Toivonen, H.; and Valitutti, A. 2013. Automatical composition of lyrical songs. In Proceedings of the Fourth International Conference on Computational Creativity, 87–91. Veale, T. 2012. Exploding the Creativity Myth:The computational foundations of linguistic creativity. Bloomsbury Academic.