Steps Toward the AIR Toolkit: An Approach to Modeling
Social Identity Phenomena in Computational Media
D. Fox Harrell, Ph.D., Greg Vargas, Rebecca Perry
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139 USA
{fox.harrell, gvargas, rebperry}@mit.edu
Abstract
The Advanced Identity Representation (AIR) Project is
a new interdisciplinary approach to the problem of designing
identity technologies to enable imaginative selfrepresentations
for users by implementing dynamic social
identity models grounded in computing and cognitive
science. AIR Project research develops models of
social computational identity (e.g., characters, avatars,
and social networking profiles) to enable user representations
that dynamically change in response to context
and use, and to implement an identity modeling toolkit
for constructing cross-application self-representations.
This paper reports on the developing AIR Toolkit’s
support for modeling social identity phenomena in
which single users deploy multiple self-representations
(avatars, characters, or profiles) for different purposes.
Introduction
Computational media have transformed the creation and
representation of human identities. Understanding identity
representation as both a creative and a computational act
can inform development of technologies to enhance how
identities are enacted as social and technical practices, particularly
in videogames and social networks.
Human-centered computing researchers have tended to
focus on issues such as user and task analyses, cooperation,
and usability, e.g. in (Muramatsu and Ackerman 1998;
Suchman 1987). In contrast, humanists and social scientists
have often investigated identity-inflected issues such as
power, class, stigma, racism, sexism, and related themes
(Nakamura 2002, 2008; Nelson and Tu 2001; Waggoner
2009) – exposing identity as a dynamic, creative feat of
self and social construction. Games studies scholar Zach
Waggoner (2009) describes identity creation as an unfolding
process of self-representation that takes place in the
creative liminal space between the user and the videogame
avatar – between the embodied materiality of the player
and the imagination. Social scientist Sherry Turkle’s
(2004) studies of membership in multiple communities
have revealed that users often experience a sense of “cycling
through” different selves. Expression of multiple
selves is intrinsic to everyday human creativity. Indeed, in
his seminal work Erving Goffman (1959) described a negotiation
between the socially constructed, public performance
of the self, and the desired inner self – a complex,
creative social and imaginative act.
Informed by such perspectives, we take the view here
that creation and maintenance of computation identities is,
in part, an active creative feat of imaginative cognition.
Furthermore, social categories are often aspects of identity
that are reified in computational systems. Hence, we focus
on a cognitive science perspective on categorization that
highlights its imaginative nature and basis in cognitive
mechanisms for metaphorical and metonymic mapping.
The Advanced Identity Representation (AIR) Project
consists of developing new technologies informed by categorization
and classification theories from cognitive science
and sociology. (Harrell 2009) We are developing a
toolkit that can take data-structures for characters in games
or profiles in social networks and use them to model social
phenomena such as presenting oneself differently to different
groups, becoming a member of a group, or passing as a
member of another group. This is accomplished through
performing operations such as finding analogically matching
profiles/character data-structures, adapting them to
different social categories, forming new categories based
on analogical relationships between individuals, revealing
or simulating stereotypical categories at the data-structural
level, and more. Hence, we address a computationally reified,
reductive form of identity, but do so: (1) as a critical
technical practice (Agre 1997) aware of the aspects of
identity that are not computational, and (2) recognizing the
this reduction has already taken place “in the wild” as users
have built identities already encoded as data-structures.
Theoretical Framework
Technical Components of a Sociodata Ecology Computational
identity systems, e.g., social networking profiles,
online accounts, and avatars/characters are implemented
using a limited and often overlapping set of components.
Proceedings of the Second International Conference on Computational Creativity 147
Figure 1: Shared technical underpinnings of computational
identity applications
There are two important motivations for describing these
components: (1) identifying an appropriate level of abstraction
for analyzing the technical side of computational
representations comparatively across different types of
applications, and (2) identifying components that can be
analyzed both in terms of how they appear visually and
how they are implemented algorithmically and datastructurally.
Figure 1 describes the six components that
comprise the majority of widely used computational identity
technologies. (Harrell 2009) This paper focuses on
support for components at levels 4 and 5 (statistical/numerical
representation and formal annotation).
These underpinnings exist in a sociodata ecology
(Harrell 2010), wherein technical infrastructure, datastructures
and algorithms, and code are looked at as they
relate to issues such as embodied experiences, subjective
interpretations, power relationships, and cultural values.
Cognitive Model of Computational Identity The AIR
Project approach begins with the basic cognitive building
blocks of identity upon which social identity categories are
built. Cognitive scientists have proposed that human conceptual
categories form “idealized cognitive models”
(ICMs) upon which categories of objects in the world are
built (Lakoff 1987). Social networking sites explicitly
group users into categories called “friends,” while games
may group users into categories called elves or half-orcs.
These categories may also manifest implicitly, for example
Eric Gilbert and Karrie Karahalios’s (2009) metric for “tie
strength” determines “friendliness between” users evidenced
through use of the system. Yet, most computational
user categorizations invoke much less robust models.
Technical infrastructures may implement (often incorrect)
stigmatizing identity classification models (Bowker and
Star 1999; Goguen 1997), indeed some games feature datastructures
instantiated with values where some
races/genders are less intelligent than others. Cognitive
science theory is presented below to provide models that
can help explain how users project their identities onto
their computational surrogates. (Gee 2003)
Cognitive Categorization The AIR approach is influenced
by the prototype theory of Eleanor Rosch and work in
categorization by George Lakoff. (1987) Lakoff describes
a metonymy/metaphor-based account of how imaginative
extensions of “prototype effects” result in several phenomena
of social identity categorization that have proven useful
for the AIR Project:
• Representatives (prototypes): “best example” members
of categories;
• Stereotypes: normal, but often misleading, category expectations;

• Ideals: culturally valued categories even if not typically
encountered; and
• Salient Examples: memorable examples used to understand/create
categories.
Since the AIR Project technology involves techniques to
formalize and implement ICMs as computational datastructures,
identity phenomena become amenable to algorithmic
manipulation and experimentation.
Conceptual Blending and Multiple Selves Learning scientist
James Gee’s concepts of the real, virtual, and projective
identities in games provide a useful starting point for
thinking about how embodied identity experiences and
values in the real world intersect with the affordances and
semiotic values of computational representations. (Gee
2003) For Gee, player representations as projected identities
manifest the ways that real player values are reconciled
with values understood as associated with avatars.
The AIR Project approach emphasizes projected identity.
(Corneliussen and Rettberg 2008) Using cognitive
science terminology, this can be seen as metaphorically
mapping ICMs (mental spaces) that humans have of themselves
onto characters, or to use terminology from Gillles
Fauconnier and Mark Turner’s (2002) conceptual blending
theory as selectively projecting aspects from conceptualizations
of both a real identity and a virtual identity into a
blended identity. Examples of blended identities include
the venerable notion of double-consciousness, the dual
awareness of a person from a marginalized or oppressed
group’s self-conception and the social stigma attributed to
the social group (Du Bois 1903), and identity torque, the
often psychologically painful experience of a person’s selfconception
differing from a stigmatized perceptions reinforced
by classification infrastructure (Bowker and Star
1999). The notion of blended identities is central here because
it informs the idea that a single user can have multiple
identities depending on the elements being projected.
Implementation and Findings
We have developed a model of multiple user identity datastructures
and ways of displaying the contents of those
data-structures via a GUI. For example, a profile on the
social networking site Facebook consists of structured data
Proceedings of the Second International Conference on Computational Creativity 148
indicating friends, items a user likes, personal information
(such as gender or location), etc.
Figure 2: A subgraph of a Facebook profile
This can be represented as a graph in which items and attributes
are nodes that are connected to users by relations
such as ‘like’ or ‘friend.’ Some of these may also include
numerical statistics such as integers for age (see Figure 2).
Figure 3: A subgraph of a role-playing game character
In such a profile the number of friends and pages for many
typical user may reach the hundreds or thousands, resulting
in interesting graph structures to analyze.
Similarly, for a character in a game (especially roleplaying
games in which character creation is a primary
focus) a graph can be used to represent stats (numerical
values for gameworld attributes like intelligence or dexterity),
skills, race, class, gender, etc. (see Figure 3).
Despite their differing structures, the similarities in these
representations at the abstract data-structural level have
allowed us to consider how multiple representations (or
views on representations) can reflect identity phenomena
from the real world such as self-presenting differently in
different communities, attempting to “pass” as a member
of another community, or being a central or marginal
member of a community. In games, multiple representations
can be used to implement phenomena such as critically
modeling stereotyping (by making non-player characters
uniformly respond to characters based on some subgraph
of elements rather than the full graph), developing
emergent profession/class models rather than top-down
designations, and decoupling real world racial, ethnic, and
gender categories from game mechanics-oriented numerical
statistics for combat and exploration of game worlds.
Toward this end, our models support implementation of:
• Multiple Identities based upon:
o adding to, subtracting from, or reorganizing the
graphs described above; this can be used to automatically
customize a user’s profile/character, or
view of a profile/character, based upon who the
profile/character is presented to
o users explicitly creating multiple profiles (or views
of a single profile/character) based on privacy settings
or membership in different groups
• Identity Categories emerging from finding clusters of
users with analogous graphs
• Prototypical Members of categories based upon
maximizing analogy with other users
• Critical Attributes are profile/character attributes that
are most telling in revealing analogy with other
users
It is not clear that only manipulating these data-structures
provides the necessary affordances for modeling real world
identity experiences adequately. Further development may
require augmenting these structures with metadata indicating
salience of particular attributes or additional attributes.
It will also require study of how users take up and deploy
the data-structures beyond technical affordances of the
systems (e.g., chatting in virtual worlds or flat text descriptions
of characters in games). However, our model does
introduce an extensible set of features to allow system designers
to implement the semantics of social identity phenomena
rather than hardcoding in racism as social critique
(as in the game Dragon Age’s portrayal of racism against
elves) or simplistic models of group membership such as
the opt-in/opt-out model in Facebook. In future AIR project
development, phenomena such as stereotyping, marginalization,
naturalizing in communities, and stigmatization
will be addressed.
Technology Development There have been two main
thrusts of technology development. These are:
(1) AIR Toolkit Development
(2) Application Development and Deployment (assessing
popular software systems to use the AIR toolkit with
and deploying the toolkit in those systems)
Proceedings of the Second International Conference on Computational Creativity 149
Regarding (1), we are currently developing an interface,
implemented in Python, capable of comparing and adapting
user profiles. This interface is agnostic toward applications
(it can be applied to games and social networking
applications alike) and is agnostic toward algorithms used
for comparing users. Initially, comparison is being done
using a system called AnalogySpace developed by the
Commonsense Reasoning research group led by Henry
Lieberman at the MIT Media lab. (Speer, Havasi, and
Lieberman 2008) We also have been considering using the
Structure Mapping Engine developed by Ken Forbus, Dedre
Gentner, Ron Ferguson, and others at Northwestern
University. (Ferguson, Forbus, and Gentner 1997; Forbus
2001; Gentner 1983) Finally, we also have considered using
a matching algorithm developed in (Chow and Harrell
2009; Harrell 2010). Aside from potentially varying in
effectiveness, these different approaches require differing
amounts of background knowledge and may be more or
less useful for particular applications.
Regarding (2), we have deployed the toolkit to implement
multiple identity representations, categories, and
comparisons in Facebook. Before selecting Facebok for
our initial deployment, we assessed popular systems used
in both social networking and gaming in order to determine
which would be optimal for initially testing the system.
Toolkit API We are designing an API for the basic functionality
of the toolkit. The current AIR toolkit iteration
uses Facebook's Graph API to download information about
the user and his/her friends including profile information,
friends, and likes. The toolkit then creates a large, sparse n
x n matrix and performs a truncated Singular Value Decomposition
(SVD) using the Divisi library from AnalogySpace.

It offers functions for the following purposes (using the
term “object” to refer to a profile or character structure):
• Finding Similar Objects: The truncated
SVD approximates dot products between each pair of objects.
These approximated dot products are used as a
similarity metric and the toolkit can return the objects
most similar to a given object.
• Predicting Features: The truncated SVD has a “smoothing”
effect on the values in the matrix in a way that
makes it useful for making inferences. The toolkit can
use this to calculate the likelihood of a particular feature
belonging to an object, whether or not it was represented
in the original graph, as well as return the top predictions.
• Projecting one object onto another: The toolkit can
return a filtered view of a particular object filtered by the
predictions of another object. We shall discuss more of
the potential uses of such a tool later.
• Creating Categories: The toolkit allows for the manual
creation of a category by choosing initial seed objects,
averaging the objects’ feature vectors and then suggesting
other objects to be included in the categories as well
as predicting important features for the category.
• Creating and Inserting Objects into the Graph: The
toolkit also allows the creation and insertion of new objects
into the graph. This could be useful for creating prototypical
objects and examining their relation to other objects
or experimenting with the graph structure and seeing
the changes it causes.
The first use of this API is a web interface for exploring a
user’s Facebook graph with the toolkit. We wrote a program
that authenticates a Facebook user and downloads
metadata from the user’s profile as well as their “likes,”
then does the same for each of the user’s friends. The web
interface we created downloads this information and converts
it to the graph structure that the toolkit can read. The
website then provides an interface structured like a readonly
social network site focused on exploring the user’s
network and examining other profiles. One key feature of
this site is that it can allow the user to view other users’
profile data based on their relationship to her/his own. That
is, when a user visits a friend’s profile, the user could see
only the connections that they share or that the system
thinks they should share (see Figure 4).
Figure 4: User2547 filtered to show only the links predicted to
be present in User 6366’s graph
Figure 5: The interface allows the selection of groups of users
(objects) to create categories based upon analogy between the
users, find key features of those categories, and find other
possible members of the category
The interface enables exploration of basic toolkit functions
such as comparing profiles, calculating predictions, adding
profiles, and creating categories (see Figure 5).
Proceedings of the Second International Conference on Computational Creativity 150
Model and Toolkit Development
The AIR Toolkit is still under development and we hope to
continue to implement mechanisms that allow those using
the toolkit to represent the types of identity phenomena
discussed above. Extensions to the models developed will
consist of refining and extending techniques to implement
a small subset of cognitive and social identity phenomena
in software, initially addressing torque, metonymic category
models, marginalization, markedness, naturalization,
and category gradience.
In addition to that work, we will add support for implementing
modular graphical user-representations for users.
Currently, our toolkit is limited to altering textual and semantic
representations. Adding functionality for examining
and altering graphical representations is potentially a more
difficult problem, but would be helpful in systems that
place an emphasis on avatars or other graphical models.
With the progress made on the toolkit, it is possible to prototype
further applications that take advantage of the models
we have discussed. Examples might include:
• a social networking GUI for changing a user’s self representation
for different social groups as opposed to cumbersome
alteration of privacy settings,
• integrated networking/gaming applications allowing information
social networking information to influence
play style and vice versa,
• a system modeling the phenomena of “passing” as a
member of a different social group to facilitate a learner’s
transition from a novice to an expert member of a group,
• a social networking system supporting the ability to swap
between multiple identities, perhaps based the user’s perception
of which identities would be empowering, stigmatizing,
or challenging in a given context, and more.
Evaluation
It will be important to assess whether or not users feel that
our AIR Project systems are more empowering than current
systems and if they can be used to minimize stigma
built into identity representation structures. Though this
assessment has not been completed yet, sufficient development
work has been done so as to warrant reporting. We
also have been developing methods to pursue such assessments.
In the spring of 2010, Harrell conducted a pilot
study for the AIR Project with four female participants and
two researchers. The subjects, who were novice computer
users, engaged in identity creation via the manipulation of
character creation systems in three game systems, The
Sims, the Nintendo Mii Channel, and the game Elder
Scrolls IV: Oblivion. As the subjects engaged in character
creation, semi-structured clinical interviews were conducted
regarding the character creation process and the
relationship of the characters created to a range of identity
issues after users were first prompted to describe their
creations “in their own words.” The dialogue was captured
via digital video and the sessions were screen captured
comprising raw data for analyses to be presented elsewhere.
The dialogue captured in these files is being transcribed
and will serve as the basis for crafting an empirical
instrument for evaluating AIR systems as well as assessing
whether users feel that these well-known games are adequately
expressive. Transcripts and videos will be analyzed
using grounded theory techniques (Glaser 1992; Strauss
1987), a well-known method of qualitative analysis.
Open Questions and Concluding Reflections
While we have made a good start with the preliminary
framing and ongoing development of the AIR Toolkit a
number of interesting open questions remain. In particular,
given our reliance on cognitive accounts of metaphor and
analogy, we have been influenced by the critique of
Chalmers, et. al. in (Chalmers, French, and Hofstadter
1992) regarding computational approaches to the same as
they assert:
How are these data put into the correct form for the representation?
Even if we have determined precisely which
data are relevant, and we have determined the desired
framework for the representation—a frame-based representation,
for instance—we still face the problem of organizing
the data into the representational form in a useful
way.
The AIR project takes heed of this concern, however it
asks a reciprocal question. How can one design idealized
logical forms amenable to our algorithmic techniques useful
for modeling the social phenomona we are interested
in? We see design of such ontologies as a creative problem
requiring human judgment and do not intend the ontologies
to be models of the real world. Rather, they are user’s own
expressive self-representations or subjective ontologies.
Identifying methods to reduce identities to abstract data
types is both a non-trivial problem and a double-edged
sword, potentially both facilitating and hindering analysis
of the data. Can these data types be effectively optimized
for use with analogical reasoning systems like AnalogySpace
or SME? We will need to further develop our
rationale for adopting particular analogy systems, and basis
for our belief in their usefulness and validity.
Another open question considers the relationships between
OS level and application level GUIs. Turkle describes
users toggling between online identities, arguing
that this comprises a type of conversation between different
identities, which enables a fluid, decentered, fragmented
self to be deployed across different domains in
creative and sometimes unexpected ways. (Turkle 1995)
The experience she describes is linked to interactions with
computer graphical user interfaces (GUIs) rather than specific
applications. The AIR Project model will explore analytic
methods and tools to identify and facilitate these
changing presentations of self at either level.
Finally, the core motivating observation for the AIR
Project is that identity is a feat of imaginative cognition.
Proceedings of the Second International Conference on Computational Creativity 151
Social categories are often reified in software systems
which cognitive science theories have suggested are not
objective, but are unconscious and based in metaphorical
thought. Humans have great power in determining and
shifting the meanings of our categories – the AIR Project is
a modest step toward doing so in software.
Acknowledgments
We gratefully thank the National Science Foundation’s
support provided by CAREER Award #0952896. We also
thank Henry Lieberman, Catherine Havasi, Jason Alonso
and others from the MIT Commonsense Reasoning Group.
<references_biblio/>
References
Agre, P. E. 1997. Computation and Human Experience.
Cambridge, U.K.: Cambridge University Press.
Bowker, G. C., and Star, S. L. 1999. Sorting Things Out:
Classification and Its Consequences. Cambridge, MA:
MIT Press.
Chalmers, D. J., French, R. M., and Hofstadter, D. R. 1992.
High-Level Perception, Representation, and Analogy: A
Critique of Artificial Intelligence Methodology. Journal of
Experimental and Theoretical Artificial Intelligence, 4(3),
185 - 211.
Chow, K. K. N., and Harrell, D. F. 2009. Active
Animation: An Approach to Interactive and Generative
Animation for User-Interface Design and Expression.
Paper presented at the 2009 Digital Humanities
Conference.
Corneliussen, H., and Rettberg, J. W. (Eds.). 2008. Digital
Culture, Play and Identity: A World of Warcraft Reader.
Cambridge, MA: MIT Press.
Du Bois, W. E. B. 1903. The Souls of Black Folk.
Chicago:Illinois: A.C. McClurg and Co.
Fauconnier, G., and Turner, M. 2002. The Way We Think:
Conceptual Blending and the Mind's Hidden Complexities.
New York: Basic Books.
Ferguson, R., Forbus, K. D., and Gentner, D. 1997. On the
Proper Treatment of Noun-Noun Meatphor: A Critque of
the Sapper Model. Paper presented at the Nineteenth
Annual Meeting of the Cognitive Science Society.
Forbus, K. D. 2001. Exploring Analogy in the Large. In
The Analogical Mind: Perspectives from Cognitive
Science. Cambridge, MA: MIT Press.
Gee, J. P. 2003. What Video Games Have to Teach Us
About Learning and Literacy. New York City: Palgrave
Macmillan.
Gentner, D. 1983. Structure-Mapping: A Theoretical
Framework for Analogy. Cognitive Science, 7(2), 155-170.
Gilbert, E., and Karahalios, K. 2009. Predicting Tie
Strength with Social Media. Paper presented at the
Proceedings of the 27th International Conference on
Human Factors in Computing Systems.
Glaser, B. 1992. Basics of Grounded Theory Analysis. Mill
Valley, CA: Sociology Press.
Goffman, E. 1959. The Presentation of Self in Everyday
Life. New York: Anchor Books.
Goguen, J. 1997. Towards a Social, Ethical Theory of
Information. In Geoffrey Bowker, L. G., Leigh Star,
William Turner (Ed.), Social Science Research, Technical
Systems and Cooperative Work (pp. 27-56). Mahwah, N.J.:
Lawrence Erlbaum Associates.
Harrell, D. F. 2009. Computational and Cognitive
Infrastructures of Stigma: Empowering Identity in Social
Computing and Gaming. Proceedings of the Association
for Computing Machinery (ACM) Cognition and
Creativity Conference, Berkeley, CA.
Harrell, D. F. 2010. Toward a Theory of Critical
Computing: The Case of Social Identity Representation in
Digital Media Applications. CTheory.
Lakoff, G. 1987. Women, Fire, and Dangerous Things:
What Categories Reveal About the Mind. Chicago, IL:
University of Chicago Press.
Muramatsu, J., and Ackerman, M. S. 1998. Computing,
Social Activity, and Entertainment: A Field Study of a
Game Mud. Computer-Supported Cooperative Work, 7,
87-122.
Nakamura, L. 2002. Cybertypes. New York: Routledge.
Nakamura, L. 2008. Digitizing Race: Virtual Cultures of
the Internet. Minneapolis, MN: University of Minnesota
Press.
Nelson, A., and Tu, T. L. N. (Eds.). 2001. Technicolor:
Race, Technology and Everyday Life. New York: New
York University Press.
Speer, R., Havasi, C., and Lieberman, H. 2008.
Analogyspace: Reducing the Dimensionality of Common
Sense Knowledge. Proceedings of the Twenty-Third AAAI
Conference on Artificial Intelligence.
Strauss, A. 1987. Qualitative Analysis for Social Scientists.
Cambridge, U.K.: Cambridge University Press.
Suchman, L. 1987. Plans and Situated Actions: The
Problem of Human-Machine Communication. New York:
Cambridge University Press.
Turkle, S. 1995. Life on the Screen: Identity in the Age of
the Internet. New York: Simon and Schuster.
Turkle, S. 2004. Whither Psychoanalysis in Computer
Culture? Psychoanalytic Psychology, 21(1), 16-30.
Waggoner, Z. 2009. My Avatar, My Self: Identity in Video
Role-Playing Games. Jefferson, NC: McFarland and
Company.
Proceedings of the Second International Conference on Computational Creativity 152