A Fractal Approach Towards Visual Analogy Keith McGreggor, Maithilee Kunda & Ashok Goel Design & Intelligence Laboratory, School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA {keith.mcgreggor, mkunda}@gatech.edu, goel@cc.gatech.edu Abstract. We present a preliminary computational model of visual analogy that uses fractal image representations that rely only on the grayscale pixel values of input images, and are mathematical abstractions quite rigorously grounded in the theory of fractal image compression. We have applied this model of visual analogy to problems from the Raven’s Progressive Matrices intelligence test, and we describe in detail the fractal solution strategy as well as some preliminary results. Finally, we discuss the implications of using these fractal representations for memory recall and analogical reasoning. 1 Cognition, Computation, and Creativity We may study creative behavior at many levels of aggregation and abstraction ranging from environmental and genetic to neural and cognitive to social and cultural. Our work on computational creativity focuses on the cognitive level. Although at present there is little agreement in the cognitive sciences about the proper characterization of creativity, nevertheless there is broad and deep consensus that analogy is a core cognitive process of creative behavior, e.g. [1]–[8]. Indeed, some cognitive scientists have argued that analogy is a core process not only of creativity, but all cognition, including perception [6][9]. Thus, understanding the computational processes of analogy seems critical to understanding and developing computational models of creativity. We may classify computational theories of analogy into two broad categories. In one category are theories that propose general-purpose mechanisms for mapping and transfer of relations from one problem to another, e.g. production systems [10], structure mapping [11][12], schema induction [13], and constraint satisfaction [14]. These mechanism theories typically make few commitments about the contents of knowledge; indeed, their mechanisms are general-purpose because they are agnostic towards knowledge contents. In the other category are computational theories that describe contents of knowledge that drive analogies in creative tasks such as story understanding [15], diagram understanding [16][17], scientific problem solving [18][19], innovative design [20][21][22], and calligraphy [23]. These content theories of analogy, too, describe computational processes, but their processes are driven by the contents of knowledge (and corresponding vocabularies for capturing the knowledge contents). The extant content and mechanism computational theories of analogy nevertheless are united in their use of propositional representations: From 65 the perspective of the content theories of analogy, the mechanism theories offer potential substrates for computational implementation. We present a preliminary computational theory of visual analogy that is fundamentally different from both content and mechanism theories because it uses fractal representations instead of propositional ones. Visual analogy is a topic of longstanding interest in computational creativity because of its central role in many creative tasks such as intelligence tests [24]. In particular, in this paper, we focus on geometric analogy problems that are a type of visual analogy problems: geometric analogy problems focus solely on shapes, sizes and locations of the shapes, and spatial relations among the shapes. Fractal image representations rely only on the grayscale pixel values of input images, and are mathematical abstractions quite rigorously grounded in the theory of fractal image compression [25]. Below, first we describe how fractal representations can be used for computing features and similarity among distinct images, and how these operations can be combined to form a technique for creating visual analogies using purely pictorial inputs. Second, we present initial experimental results from using this method on geometric analogy problems that occur on the standardized intelligence test called the Raven’s Progressive Matrices test [26]. Third, we discuss how this fractal model of visual analogy may be considered to be a mechanism for creativity. 2 Fractal Image Representation Consider the general form of an analogy problem as: A : B :: C : ? In the case of a visual analogy, we can present each of these analogy elements to be a single image. Some unknown transformation T can be said to transform image A into image B, and likewise, some unknown transformation T' transforms image C into the unknown answer image. The central analogy in the problem may then be imagined as requiring that T is analogous to T'. In other words, the answer will be whichever image D yields the most analogous transformation. Using fractal representations, we shall define the most analogous transform T' as that which shares the largest number of fractal features with the original transform T. 2.1 Mathematical Basis The mathematical derivation of fractal image representation expressly depends upon the notion of real world images, i.e. images that are two dimensional and continuous [25]. A key observation is that all naturally occurring images we perceive appear to have similar, repeating patterns. Another observation is that no matter how closely you examine the real world, you find instances of similar structures and repeating patterns. These observations suggest that it is possible to describe the real world in 66 terms other than those of shapes or traditional graphical elements—in particular, terms which capture the observed similarity and repetition alone. Computationally, to determine the fractal representation of an image requires the use of the fractal encoding algorithm. The collage theorem [25] at the heart of the fractal encoding algorithm can be stated concisely: For any particular real world image D, there exists a finite set of affine transformations T which, if applied repeatedly and indefinitely to any other real world image S, will result in the convergence of S into D. We now shall present the fractal encoding algorithm in detail. 2.2 The Fractal Encoding Algorithm Given an image D, the fractal encoding algorithm seeks to discover the set of transformations T. The algorithm is considered “fractal” for two reasons: first, the affine transformations chosen are generally contractive, which leads to convergence, and second, the convergence of S into D can be shown to be the mathematical equivalent of considering D to be an attractor [25]. Here are the general steps for encoding an image D in terms of another image S: 1) Decompose D into a set of N smaller images {d1, d2, d3, ..., dn}. These individual images are sets of points. 2) For each image di: a) Examine the entire source image S for an equivalent image si such that an affine transformation of si will result in di. This affine transformation will be a 3x3 matrix, as the points within si and di under consideration can be represented as the 3-D vector where c is the (grayscale) color of the 2-D point . Collect all such transforms into a set of candidates C. b) Select from the set of candidates that transform which most minimally achieves its work, according to some predetermined and consistent metric. c) Let Ti be the representation of the chosen affine transformation of si into di. 3) The set T = {T1, T2, T3, ..., Tn} is the fractal encoding of the image D. The decomposition of D into smaller images can be achieved through a variety of methods. In our present implementation, we merely choose to subdivide D in a regular, gridded fashion, typically choosing a grid size of either 8x8 or 32x32 pixels. Alternate decompositions could include irregular subdivisions, partitioning according to some inherent colorimetric basis, or levels of detail. 2.3 Searching and Encoding The search of the source image S for a matching fragment is exhaustive, in that each possible correspondence si is considered regardless of its prior use in other discovered transforms. Also, for each potential correspondence, each transformation under a 67 restricted set of affine, similitude transformations is considered. Our implementation presently examines each potential under identity (I), horizontal (HF) and vertical (VF) reflections, and 90° (R90), 180° (R180), and 270° (R270) rotational transformations. The metric we employ to evaluate candidate transformations prefers those transformations which induce the least translation, rotation or reflection, and color manipulation. We use this to ensure that areas which are aligned in a Cartesian sense are chosen. We shall revisit this important metric in a later section. Once a transformation has been chosen, we construct a compact representation of it, called a fractal code. A fractal code Ti is a 3-tuple, < , , k, c >, where is the location of the leftmost and topmost pixel in si; is the location of the leftmost and topmost pixel in di; k ∈ { I, HF, VF, R90, R180, R270 } indicates which affine transformation is to be used; and c ∈ [ -255, 255 ] indicates the overall color shift to be added uniformly to all elements in the block. Note that the choice of source image S is arbitrary. Indeed, the image D can be fractally encoded in terms of itself, by substituting D for S in the algorithm. Although one might expect that this substitution would result in a trivial encoding (wherein all of the chosen fractal codes correspond to an identity transform), in practice this is not the case, for we want a fractal encoding of D to converge upon D regardless of chosen initial image. For this reason, the size of source fragments considered is taken to be twice the dimensional size of the destination fragment, resulting in a contractive affine transform. Similarly, color shifts are made to contract. The fractal encoding algorithm, while computationally expensive in its exhaustive search, transforms a real world image into a much smaller set of fractal codes, which form, in essence, an instruction set for reconstituting the image. This resulting fractal representation of an image D vis-à-vis another given image forms the basis of our investigation into solutions for visual analogy problems. 2.4 Determining Fractal Features As we have shown, the fractal representation of an image is a set of specific affine, similitude transformations, a set of fractal codes, which compactly describe the geometric alteration and colorization of fragments of the source image that will collage to form the destination image. While it is tempting to treat contiguous subsets of these fractal codes as features, we note that their derivation does not follow strictly Cartesian notions (e.g. adjacent material in the destination might arise from strongly non-adjacent source material). Accordingly, we consider each of these fractal codes independently, and construct candidate fractal features from individual codes. In our present implementation, each fractal code < , , k, c > yields a small set of features, generally formed by constructing subsets of the tuple: < , , k, c >: a specific feature; < < dx - sx, dy - sy >, k, c >: a position agnostic feature; < , , c >: an affine transform agnostic feature; < , , k >: a color agnostic feature; < k, c >: an affine specific feature; < c >: a color shift specific feature. 68 As shown, the features derived may be categorized into either specific or agnostic aspects. Our implementation of the fractal encoding algorithm employs a metric which minimizes the transformative work (either geometrically or colorimetrically) expressly due to our desire to exploit specific feature matches. With this basis of fractal representation and feature discovery, we now may address the problem of visual analogy. 2.5 A Fractal Process of Visual Analogy To find analogous transforms, our algorithm first visits memory to retrieve a set of candidate solution images D to form candidate solution pairs in the form . For each candidate pair of images, we generate the fractal encoding of the candidate image D in terms of the former image C. As we illustrated earlier, from this encoding we are able to generate a large number of fractal features per transform. We store each transform into a memory system, indexed by and recallable via each associated fractal feature. To determine which of the candidate images has resulted in the most analogous transform to the original problem transform T, we first fractally encode that relationship between the two images A and B. Next, using each fractal feature associated with that encoding, we retrieve from the memory system those transforms previously stored as correlates of that feature (if any). Considering the frequency of the transforms recalled, for all correlated features in the target transform, we then calculate a measure of similarity. This metric reflects similarity as a comparison of the number of fractal features shared between candidate pairs taken in contrast to the joint number of fractal features found in each pair member [27]. In our present implementation, the measure of similarity S between the candidate transform T' and the target transform T is calculated using the following formula, also known as the ratio model: S(T, T') = f(T ∩ T') / [ f(T ∩ T') + α f(T - T') + β f(T' - T) ] where f(X) is the number of features in the set X. For our initial work, we have chosen values of α = β = 1.0, which, according to [27], results in this simplification: S(T, T') = f(T ∩ T') / f(T ∪ T') The final solution is taken as the candidate image from memory that results in the highest measured similarity according to this measure. 3 The Raven’s Progressive Matrices Test In this section, we describe the Raven’s Progressive Matrices test and present some preliminary results for a solution algorithm that uses the fractal algorithm for visual analogy described in the previous section. 69 The Raven’s Progressive Matrices (RPM) test is a standardized intelligence test that consists of visually presented, geometric-analogy-like problems in which a matrix of geometric figures is presented with one entry missing, and the correct missing entry must be selected from a set of answer choices. Figure 1 shows an example of a problem that is similar to one of the problems in the Standard Progressive Matrices (SPM). Fig. 1. Example problem similar to one in the Standard Progressive Matrices (SPM) test. Although the test is supposed to measure only eductive ability, or the ability to extract and understand information from a complex situation [26], the RPM’s high level of correlation with other multi-domain intelligence tests have given it a position of centrality in the space of psychometric measures [28], and it is therefore often used as a test of general intelligence. Despite its widespread use, neither the computational nor the cognitive characteristics of the process of solving the RPM are well understood. Hunt gives a theoretical account of the information processing demands of certain problems from the Advanced Progressive Matrices (APM), in which he proposes two qualitatively different solution algorithms—“Gestalt,” which uses visual representations and perceptually based operations, and “Analytic,” which uses feature-based representations and logical operations [29]. Existing AI systems for problem solving on the RPM, in contrast to Hunt’s early work, use propositional representations [30][31]. Carpenter, Just, and Shell describe a computational model that simulates solving RPM problems using propositional representations [30]. Their model is based on the traditional production system architecture, with a long-term memory containing a set of productions and a working memory containing the current state of problem solving (e.g. current goals). Productions are based on the relations among the entities in a RPM problem, for example, the location of the dark component in a row, which might be the top half in the top row of a problem, bottom-half in the bottom row, and so on. Lovett, Forbus, and Usher describe a model that extracts qualitative spatial representations from 70 visually segmented representations of RPM problem inputs and then uses the technique of structure mapping to find solutions [31]. While, as we mentioned earlier, production systems and structure mapping offer alternative mechanisms for implementing content theories of analogies, from the perspective of our work on fractal representations, the two are similar in that both use propositional representations. 3.1 Preliminary Results from the Fractal Method of Visual Analogy A RPM problem can be viewed as a sequence of images (ordered in rows and columns), where some unknown transformation T can be said to transform one image into a corresponding adjacent image. In a typical 2x2 RPM problem, there are four such transformations, as shown in Figure 2. (RPM problems can also have three-bythree matrices, which we do not address in this paper.) Fig. 2. Illustration of four image transformations implicit in a 2x2 RPM problem matrix. RPM problems are formulated to suggest that these transformations are pairwise analogous (i.e. the two row transformations are analogous to one another). A 2x2 RPM problem generally has six candidate solutions presented. Following the algorithm for visual analogy described above, we seek to solve an RPM problem by determining which of the candidate solutions yields the most analogous transformations. In particular, we perform the recall and similarity calculation described in Section 2.3 independently over all row and column relationships present in the RPM problem. The final solution is taken as the candidate that results in the highest measured similarity for any such relationships. As an example, we shall use the “arrow” problem shown in Figure 3. Also, while the complete fractal method examines each of the problem’s analogies, we shall restrict this detailed discussion to just one of these transformations (T1, as labeled in Figure 2). The initial transformation T1 is the fractal encoding of the transformation from the upper left arrow into the upper right arrow. When encoded using a block size of 32 x 32 pixels, this encoding generates 39 distinct fractal features. Each candidate answer is encoded likewise, from the upper left arrow into the candidate, each resulting in between 27 and 45 distinct features. For the arrow problem, using a 32 x 32 block size, the similarity measures for each answer Ci are: 71 S(T,C1) = 21 / (21+18+24) ≅ 0.333333 S(T,C2) = 15 / (15+24+30) ≅ 0.217391 S(T,C3) = 16 / (16+23+11) ≅ 0.32 S(T,C4) = 14 / (14+25+29) ≅ 0.205882 The answer with the highest calculated similarity is deemed correct. Therefore, the fractal method chooses as its answer #1. Fig. 3. The “arrow” problem, in the format of a 2x2 problem matrix with four answer choices. 4 Implications of Fractal Representations for Analogical Reasoning and Memory Recall Benoit Mandelbrot coined the term “fractal” from the Latin adjective fractus and its corresponding verb (frangere, which means “to break” into irregular fragments), in response to his observation that shapes previously referred to as “grainy, hydralike, in between, pimply, pocky, ramified, seaweedy, strange, tangled, tortuous, wiggly, wispy, wrinkled, and the like” could be described by a set of compact, rigorous rules for their production [32]. The approach we outline in this paper seeks to define a class of visual representations in a similar, fractal manner, where images are taken as encountered and transformed into a fractal representation either with regard to other images or to themselves. The very nature of this fractal representation places an immediate emphasis on similarity discovery. These discoveries are used to advantage in solving problems that place equivalent emphasis on visual analogy, such as the Raven’s Progressive Matrices test. While the use of fractal representation is important, the emphasis upon visual recall in our solution afforded by features derived from those representations bears further discussion. The importance of individual discovered features presents itself strongly in our calculation of the similarity metric from [27]. The weights α and β may be skewed toward or away from equality in order to favor feature inclusion or exclusion. We note that while a 2x2 RPM problem offers little opportunity for determining feature 72 importance, the class of RPM problems which are 3x3 in nature require it, for the additional row and column provide an in-domain manner in which to report visual evidence for the kinds of features to expect within the eventual successful candidate. Using this additional evidence would allow for the weights of feature inclusion (α) and exclusion (β) to become covariant with features. We are actively exploring this as we prosecute solutions for 3x3 RPM problems. We take the position that placing candidate transformations into memory, indexed via those discovered fractal features, affords a new method of discovering image similarity. Indeed, this method of solving the RPM, by being reminded of similar transformations, bears close kinship to various methods and theories of case-based reasoning [33], although we exploit recalled transformations toward a significantly different end. That images, encoded either in terms of themselves or other images, may be indexed and retrieved without regard to shape, geometry, or symbol, suggests that the fractal representation bears further exploration not only as regards solutions to problems akin to the RPM, but also to those of general visual recall and analogy based creativity. Acknowledgments. This research has been supported by an NSF grant (IIS Award #0534266) entitled “Multimodal Case-Based Reasoning in Modeling and Design,” by ONR through an NDSEG fellowship, and by the NSF GRFP fellowship program. References 1. Christensen, B. T., & Schunn, C. D. The relationship of analogical distance to analogical function and pre-inventive structure: The case of engineering design. Memory & Cognition, 35(1), 29-38. (2007) 2. Clement, J. Creative Model Construction in Scientists and Students: The Role of Imagery, Analogy, and Mental Simulation. Dordrecht: Springer. (2008) 3. Darden, L. (ed). Reasoning in Biological Discoveries. Cambridge University Press. (2006) 4. Dunbar, K. The Analogical Paradox. In Gentner, D., Holyoak, K.J., & Kokinov, B.N. (Eds.) The Analogical Mind: Perspectives from Cognitive Science, MIT Press. (2001) 5. Goel, A. K. Design, Analogy, and Creativity. IEEE Expert 12(3): 62-70. (1997) 6. Hofstadter, D. Godel, Escher, Bach: An Eternal Golden Braid. NY: Basic Books. (1979) 7. Holyoak, K., & Thagard, P. Mental Leaps: Analogy in Creative Thought. Cambridge, MA: MIT Press. (1995) 8. Nersessian, N.J. Creating Scientific Concepts. Cambridge, MA: MIT Press. (2008) 9. Hofstadter, D. (editor). Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. NY: Basic Books. (1995) 10. Anderson, J.R., & Thompson, R. Use of Analogy in a Production System Architecture. In Vosniadou, S. & Ortony, A. (Eds.), Similarity and analogical reasoning, pp. 267-297, London: Cambridge University Press. (1989) 11. Gentner, D. Structure Mapping. Cognitive Science, 7: 155-170. (1983) 12. Falkenhainer, B., Forbus, K., and Gentner, D. The Structure-Mapping Engine: Algorithms and Examples. Artificial Intelligence, 41-1:63. (1989) 13. Gick, M., & Holyoak, K.J. Schema Induction and Analogical Transfer. Cognitive Psychology, 15(1):1-38. (1983) 73 14. Holyoak, K., & Thagard, P. Analogical Mapping by Constraint Satisfaction. Cognitive Science, 13: 295-355. (1989) 15. Winston, P. Learning & Reasoning by Analogy. CACM, 23(12): 689-703. (1979) 16. Gross, M., & Do, E. Drawing on the Back of an Envelope. Computer Graphics and Applications. (2000) 17. Yaner, P., & Goel, A.K. Understanding Drawings by Compositional Analogy. In Proc. International Joint Conference on Artificial Intelligence (IJCAI-2007), Hyderabad, India. January 2007, pp. 1131-1137. (2007) 18. Davies, J., Nersessian, N., & Goel, A.K. Visual Models in Analogical Problem Solving. Foundations of Science, 10(1):133-152, 2005. (2005) 19. Griffith, T., Nersessian, N.. & Goel, A.K. Function-follows-Form: Generative Modeling in Scientific Reasoning. In Proc. 22nd Cognitive Science Conference. (2000) 20. Casakin, H., & Goldschmidt, G. Expertise and the Use of Visual Analogy: Implications for Design Education. Design Studies, 20(2): 153-179, 1999. 21. Goel, A. K., & Bhatta, S. Use of Design Patterns in Analogy-Based Design. Advanced Engineering Informatics, 18(2):85-94. (2004) 22. Davies, J., Goel, A.K. & Nersessian, N. A Computational Model of Visual Analogies in Design. Cognitive Systems Research, 10:205-215. (2009) 23. Hofstadter, D., & MacGraw, G. LetterSpirit. In Fluid Concepts and Creative Analogies, Hofstadter (editor), NY: Basic Books. (1995) 24. Evans, T. G. A program for the solution of a class of geometric-analogy intelligence-test questions. Semantic Information Processing, pp. 271–353, MIT Press. (1968) 25. Barnsley, M. F., & Hurd, L. P. Fractal Image Compression. Boston: A. K. Peters. (1992) 26. Raven, J., Raven, J.C., & Court, J. H. Manual for Raven's Progressive Matrices and Vocabulary Scales, General Overview. San Antonio, TX: Harcourt Assessment. (1998) 27. Tversky, A. Features of similarity. Psychological Review 84 (4): 327-352. (1977) 28. Snow, R., Kyllonen, P., & Marshalek, B. The topography of ability and learning correlations. In Sternberg, R. ed. Advances in the Psychology of Human Intelligence 2: 47-103. Hillsdale, NJ: Erlbaum. (1984) 29. Hunt, E. Quote the raven? Nevermore! In Gregg, L. W. ed. Knowledge and cognition, 129–158. Hillsdale, NJ: Erlbaum. (1974) 30. Carpenter, P. A., Just, M. A., & Shell, P. What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review 97 (3): 404-431. (1990) 31. Lovett, A., Forbus, K., & Usher, J. Analogy with qualitative spatial representations can simulate solving Raven’s Progressive Matrices. In Proc. 29th Annual Conference of the Cognitive Science Society, 449-454. (2007) 32. Mandelbrot, B. The Fractal Geometry of Nature. San Francisco: W.H. Freeman. (1982) 33. Kolodner, J.L. Case-Based Reasoning. CA: Morgan Kaufmann Publishers. (1993)