Considering Ontologies For A Thematic Collection

The association between humanities and the computer may be traced back to Father Roberto Busa(note?). His attempt to create an index verborum of all words in works of St. Thomas Aquinas in 1949 may be thought of as the starting point in the field of humanities computing or what is today known as digital humanities. It is generally agreed that the place of the computer in humanities research lies in answering questions that might have been difficult or impossible to consider otherwise. This essay will try to examine the existing role of technology in research in the humanities and will consequently focus on using technology to build thematic archives.

Since 1949, there has been a steady growth in the influence of computing in humanities research(cit.). While quantitative analysis of text dates back to the nineteenth century, technology has made the study of authorship and the quantitative study of vocabulary an easier and more accurate process. In fact, perhaps the greatest strength of the computer is to analyse large sets of data in little time. While questions of authorship and vocabulary have existed for many centuries in the pursuit of scholarly knowledge, the computer allows us to represent and visualise these sets of data in ways through which new questions may be addressed and even posed. Thus, the evolution of technology has allowed us to create new ways to “read” texts(cit.). At this point I’d like to make a distinction between the tool, the machine and technology. The tool is an extension of the hand. The machine performs the task of the tool in a repetitive manner. Technology is the condition within which we reside and not an assemblage of machines. Technology evolves as the machine becomes more efficient or when new machines are built to perform new tasks. This is a continuous process. The advancement in computers (both in terms of software and hardware) over the last ten years has been remarkable. There appears to be a concerted effort to make the machine smarter; as the machine gets smarter the possibilities in the world of humanities computing seem to increase on a daily basis.

The digital library has been of critical importance to humanities scholarship. The primary role of the library is to provide access to original works; the digital library not only provides access to digitized versions of original works (the work itself could be born digital) but also a set of additional services and resources. In a sense, a digital library can be seen as working on the principles of a traditional library, but with a layer of technology over its collection. This layer of technology is geared towards scholarly work and enables the researcher to ask questions that might not have been possible before.

The difference, I feel, between the library and the archive is a subtle one and lies in the accumulation of its resources. While the library has a strong service component in providing access, the archive is built to preserve the material within its collections (Palmer 2004). Though questions of access and preservation are present in both the library and the archive, it is perhaps a matter of priorities in the building of their resources. However, this difference is blurred when we consider research libraries and thematic collections.

The purposes of both the library and the archive are to collect, to build their resources. The real advantage in creating a thematic resource is for a better integration of the digital tools and the body of texts. There are, however, several questions associated with the building of such a collection. As part of my dissertation I would like to create a thematic/purpose-built collection of photographs depicting India in the early twentieth century. The first, and my primary, concern remains in the choosing of these photographs. While I could choose a series of photographs representing, for example, a city in a certain time period, the inference I can draw from such a collection would perhaps be predictable. Thus the layer of technology that I would like to assign to this collection would again only reveal a predictable set of results. On the other hand, a series considered through a more random approach could perhaps provide more interesting readings though it would run the risk of being simply an ill-assorted group of photographs. Walter Benjamin writes:

What fundamentally distinguishes the brooder from the thinker is that the former not only mediates a thing but also mediates his mediation of the thing. The case of the brooder is that of a man who has arrived at the solution of a great problem but then has forgotten it. And now he broods – not so much over the matter itself as over his past reflections on it. The brooder’s thinking, therefore, bears the imprint of memory. Brooder and allegorist are cut from the same cloth. (Benjamin 1999: 367)        

Benjamin’s distinction between the allegorist and the collector draws to attention a rather significant point – profundity for the allegorist is not a matter of reflection, as it might be for the collector, but rather a sudden spark of illumination which, after the fact, shines profound light and unforeseen meaning on each affected thing. The question that occurs to me is in the role of the creator in the building of an archive – is it only to provide access to the resources or is to build a collection with the outcome in mind, hoping to find that moment of inspiration? I do realise that this is, to an extent, the chicken and egg conundrum. However, this seems to me to be the most pertinent question while considering the use of technology on a certain thematic collection.

One method that I would consider using in such an endeavour is the use of an ontology. An ontology represents knowledge in a formal manner as a set of concepts used within a domain. It expresses the relationships between those concepts and is used to describe that domain. The tradition of equating knowledge with facts has exists from a philosophical and scientific perspective that can be dated as far back as Aristotle. This view was augmented through the renaissance and enlightenment in order to systemise knowledge.

Traditionally, the efforts to represent knowledge were largely seen as an attempt to manage collections of facts relating to the physical world. The contemporary interest in ontologies can be seen to originate within this tradition and can be taken as an extension of this monolithic view of knowledge. This view on knowledge has been argued over the centuries: Bacon and Locke can be seen to consider knowledge as a single system of beliefs to which new concepts are added. This view would be challenged by Quine who would consider knowledge to be like a ‘field of force’, which impinged on experience only along the edges (Quine 1963: 42). However, what can be agreed on, is that a body of formally represented knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them (Genesereth & Nilsson 1987: 9). A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly. We use common ontologies to describe ontological commitments for a set of agents so that they can communicate about a domain of discourse without necessarily operating on a globally shared theory.  Pragmatically, a common ontology defines the vocabulary with which queries and assertions are exchanged among agents. Ontological commitments are agreements to use the shared vocabulary in a coherent and consistent manner. The agents sharing a vocabulary need not share a knowledge base; each knows things the other does not, and an agent that commits to an ontology is not required to answer all queries that can be formulated in the shared vocabulary. Thus, we can assume that ‘concepts’ are the key building blocks and that we manipulate these concepts with words. Ontologies are dependent on human language to represent the world. It is here that we face the first and perhaps the most significant challenge in order to achieve a shared understanding of the humanities. Natural languages are subjective. Bakhtin distinguishes between the human sciences (language) and the exact sciences. He writes:

The exact sciences constitute a monologic form of knowledge: the intellect contemplates on a thing and expounds upon it. There is only one subject here—cognizing (contemplating) and speaking (expounding). In opposition to the subject there is only a voiceless thing. Any object of knowledge (including man) can be perceived and cognized as a thing. But a subject as such cannot be cognized and studies as a thing, for as a subject it cannot, while remaining a subject, become voiceless, and, consequently, cognition of it can only be dialogic. Dilthey and the problem of understanding. Various ways of being active in cognitive activity. The activity of the one who acknowledges a voiceless thing and the activity of one who acknowledges another subject, and the degrees of this activity. The thing and the personality (subject) as limits of cognition. Degrees of thing-ness and personality-ness. The event- potential of dialogic cognition. Meeting. Evaluation as a necessary aspect of dialogic cognition.

The human sciences – sciences of the spirit – philological sciences (as part of and at the same time common to all of them – the word).

Historicity. Immanence. Enclosure of analysis (cognition and understanding) in one given text. The problem of the boundaries between text and context. Each word (each sign) of the text exceeds its boundaries. Any understanding is a correlation of a given text with other texts. Commentary. The dialogic nature of this correlation.

The place of philosophy. It begins where precise science ends and a different science begins. It can be defined as the metalanguage of all sciences (and of all kinds of cognition and consciousness). (Bakhtin 1986: 161)

To extend his viewpoint on the inter-subjectivity of the human sciences, the pen and the paintbrush require the human mind to process what is seen; they require imagination. As Bakhtin would argue, all language is subjective; the written records that we have can never contain objective history. Thus no act of recording, whether with the pen or with the paintbrush, can ever be said to be true accounts of our past or our present. In essence, it is a difference between the human sciences and the exact sciences; while the human sciences deal in approximations, the exact sciences deal in absolutes and have very finite answers to every question. Forms of representation vary: the letter-writer contemplates the present and writes into posterity while the painter condenses the duration of the painting into one perfect image; a certain temporality can be sensed in these acts of recording. From the beginning to the end of the enactment of memory, a change occurs in what is being reproduced; it remains an approximation of the real. Each word that we use has contextual meaning and often more than one. Thus, when we consider building a knowledge system, the use of human language is the first consideration. It was to eliminate the vagueness and ambiguity of language that John Wilkins in the 17th century created his ‘Real Character’ assigning each concept a numerical value derived from its position in the taxonomic tree. It was with much the same intent that, 400 years later, Lenat conceived his monumental enterprise to capture all of human common sense knowledge in Cyc (Lenat and Guha 1990). The question remains how much, if at all, can items in the world or experiences, which do not lend themselves readily to verbal expression, be modelled?

With reference to ontologies, at the very outset, we are faced with two distinct issues – a problem of metaphysics and a problem of semiotics. The philosophical investigation of ontology seeks to find the necessary building blocks of the world, their properties and their inter-relationships. A starting point could be found in Brentano’s notion of intentionality and ‘objects of consciousness’ (Brentano 1973: 127-128). An ontology must make clear what the nature, necessary conditions and properties of these objects could be. This must also be independent of out ‘knowledge’ of things. Formal ontology combines this goal with a use of logic that is intended to ensure rigor and axiomatizability of postulated results.

The general programme of ontology relies on it being possible to uncover properties that could not fail to be as they are for the world to be as it is.  Existing ontologies have been concerned with the organization and the structuring of human knowledge of reality rather than with reality itself. However, to engage with an ontology at a level deeper than this – with specific focus on the ‘conceptual’ framework – it needs to be epistemologically adequate.

Some form of accepted constraints on modeling decisions agreement over conceptual ontology construction is required. The main issue with creating these constraints is, of course, in defining the required ontological level. Since this level has to include accounts of basic objects and basic relations independently of our knowledge of them, it is necessary for the account to define how such objects and relations may be put together in order to reveal an understanding of the world. As argued by Heidegger (Heidegger, 1962 1927), and later by Schutz (Schutz 1966: 82) Wittgenstein (Wittgenstein, 1953), and others, the world of human being is essentially committed and inter-subjective. That is, the world which human beings have access to is already organized ontologically in inter-subjective terms of human interest. Creating a committed view of the world from a ‘God’s eye-view’ neutral perspective of necessity appears to be extremely difficult.

 The semiotic problem (Bateman 1993: 5) is derived from a non-theoretical understanding of language that hinders an appropriate construction of ontologies. The underlying conception of language is that it places an emphasis on the world as a source of its decisions concerning ontology construction without a prior analysis of what is meant by the ‘world’. It compounds the problem by driving attention away from natural language as it is inadequate and restricted. The relationship between a ‘sign’ and its meaning is only arbitrary for the most trivial of possible sign-types – that between linguistic form and phonetic substance. A semiotically richer view can capture the fact that more complex ‘signs’ are strongly and non-arbitrarily related to their social purpose (Hodge and Kress 1988: 82).

With these considerations we can posit that ontology or knowledge representation is a surrogate standing for the objects and relations outside in the world. The ‘fidelity’ of the representation depends on what the ontology captures from the real thing and what it omits. Perfect fidelity is impossible. A simplistic view would say that an ontology is a model of the world which can be used to reason about it. One of the major claims made in favour of ontologies is that can facilitate the interchange of knowledge between agents, or the reuse in different systems. However if each ontology is modeled around an imperfect universe, knowledge sharing would increase or compound errors which were not visible in the initial use of the ontology. Again, an ontology is a set of ontological commitments. The choice of ontology is also a “decision about how and what to see in the world” (Davis et al., 1993). This is unavoidable when we consider that representations are imperfect; however, at the same time, the purpose-built ontology has its advantages as it focuses on what is relevant or interesting within the boundaries of the domain. These choices allow us to cope with the overwhelming complexity and detail of the world. Consequently, the content of the representation provides a particular perspective on the world. The way a knowledge representation is conceived reflects a particular insight or understanding in human reasoning. The selection of any of the available representational technologies commits one to the fundamental views on the nature of intelligent reasoning and consequently very different goals and definitions of successes.  An ontology must allow for computational processing, and consequently issues of computational efficiency will inevitably arise. Since all ontologies depend on a propositional view of knowledge in order to begin to be computationally tractable, already a very restricted view of what it is possible to represent has arisen. The fact that OWL Full is not guaranteed to be ‘decidable’ unfortunately does not guarantee it to be sufficiently powerful to represent the whole gamut of what we can consider to be knowledge. All forms of knowledge representation including ontologies are both media of expression for human beings and ways for us to communicate with machines in order to tell them about the world.

The appearance of new applications of ontologies has made clear that some forms of knowledge are suitable to representation in the form of ontologies, such as taxonomic information. Criticism leveled at ontologies focuses on the fact that they are unsuited to the world of applications once they get beyond a certain level of complexity. While some ontologies are acceptable there is always a trade-off between expressivity, usability and accuracy. Further arguments can be made (on a more pragmatic level) about the difficulty of maintaining ontologies and reify a particular point of view of the domain knowledge. Ontologies can be seen to be struggling to keep pace with the dynamic, complex world of knowledge bodies and knowledge-sharing. One of the most basic issues facing the users and developers of ontologies is its degree of complexity. Folksonomies are comparatively easier to use and maintain while offering a flexible and personalized perspective; however their use is limited due to two reasons – (a) their quality of concepts involved does not match that of ontologies and (b) their reliability cannot be compared to that of an ontology. On the other hand, formal ontologies, such as DOLCE [2], the Descriptive Ontology for Linguistic and Cognitive Engineering, and GFO, the General Formal Ontology, OWL, The Web Ontology Language) or RDF, the Resource Description Framework, require specialized knowledge to build and use them, and are more challenging to maintain. They are also more rigid than the ubiquitous folksonomies and thesauri, and less adaptable to changing applications and user perspectives (Brewster et al. 2007: 563-568).

I feel that much of the problems associated with ontology-building reside at a conceptual-level. To imagine a bias-free hierarchical representation of the real world is the biggest challenge. The technological problems associated with ontologies can be largely seen as an extension of the problems of metaphysics and semiotics. The pragmatic answer to the use of ontologies for the purpose of knowledge-sharing would lie in the creation of purpose-built knowledge representations that describe their own particular domain to the best of their ability. A thematic archive would profit if it could build its context from an existing knowledge-domain. I feel that an ontology could be built for texts with photo-illustrations, especially travelogues. This would take the project beyond merely harvesting metadata (from captions) and make for a richer experience. Considering a collection and the building of an ontology for it could be the starting point for my digital project.

Bibliography

Bakhtin, M. (1986). “The Problem of Speech Genres”, Speech Genres and Other Late Essays. ed. Caryl Emerson and Michael Holquist, tr. Vern W. McGee. University of Texas Press: 161. Print.

Benjamin, W. (1999). The Arcades Project, Translated and Edited by Rolf Tiedermann, Mass: Belknapp Press: 367. Print.

Bateman, J. A. (1993). “Ontology Construction and Natural Language”, In Proceedings of the International Workshop on Formal Ontology: 5. 4May 2012. PDF file.

Brentano , F. (1973). Psychology from an Empirical Standpoint. Trans. A. C. Rancurello, D. B. Terrell & L.L. McAlister. London: Routledge & Kegan Paul: 127-128. Print.

Brewster, C. and K. O’Hara (2007). “Knowledge representation with ontologies: Present challenges—Future possibilities”, International Journal of Human-Computer Studies 65, Elsevier: 563-568. 4 May 2012. PDF file.

Davis, R., H. Shrobe, and P. Szolovits (1993). “What is a knowledge

Representation”. AI Magazine 14 (1). 4 May 2012. PDF file.

Genesereth, M. R. and N. Nilsson (1987).  Logical Foundations of Artificial Intelligence. Morgan Kaufmann Publishers: San Mateo, CA: 9

Heidegger, M. (1962 1927) Being and Time, Trans. J. Macquarrie and E. Robinson, Basil Blackwell, Oxford, 4 May 2012. PDF file.

Hodge, R. and G. Kress (1988). Social Semiotics. Polity Press, Cambridge, England: 82. Print.

Lenat, D. and R. Guha (1990). Building Large Knowledge Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley Publishing. 4 May 2012. PDF file.

Palmer, C. L. (2004). ‘Thematic Research Collections’, ed. Schreibman, Susan, Ray Siemens and John Unsworth, A Companion to Digital Humanities, Wiley-Blackwell. Web. 4 May, 2012. <http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405103213/9781405103213.xml&chunk.id=ss1-4-5&toc.depth=1&toc.id=ss1-4-5&brand=default>

Schutz, A. (1966). “The problem of intersubjectivity in Husserl”. In Ilse Schutz, editor, Collected Papers III: Studies in phenomenological philosophy. The Hague: Nijhoff: 82. Print.

Quine, W. V. O. (1980). From a Logical Point of View (2nd ed.). New York: Harper Torchbooks: 42. Print.

Wittgenstein, L. (1953). Philosophical Investigations. Trans. G.E.M. Anscombe, Basil Blackwell, Oxford. 4 May 2012. PDF File.

18,431 thoughts on “Considering Ontologies For A Thematic Collection