Contents

See Also

More on Topic Maps

TMCore Features

Topic-oriented portals

An Introduction To Topic Maps

Topic Maps is an ISO standard designed to address the problem of Information Access Management. It is a generic, open standard for capturing knowledge in the form of topics (people, places, projects, companies etc etc), the connections between these topics (associations) and the relationship these topics have to pieces of information, such as web pages, documents etc.

Topic Maps can be used a way to better organise and find information over vast information corpuses as well as capturing the details and nuances of subtle knowledge models. From building richer information web sites through to aggregating and providing access to disperate information resources Topic Maps are a unique and powerful phenomenon.

This article introduces the basic concepts of the Topic Maps paradigm. For a discussion of how topic maps can be applied to a variety of real-world problems, please see the Solutions section of the site.

The Three Basic Elements Of Topic Maps

The topic map paradigm is based on just three basic components : Topics, Associations and Occurrences.

Topics

A Topic is the embodiment within a topic map of a subject of discussion. A topic is used to represent anything that you want to talk about in a topic map. There is no restriction on what can be represented as a topic. Commonly topics are used to represent items of interest in a domain e.g. people, documents, companies; but they can equally be used to represent concepts such as pensions, insurance, politics and can even be used to represent events such as the Battle of Trafalgar, the coronation of Henry VIII, John Smith's birthday and so on. The important thing to note is that what a topic represents is in no way constrained by the Topic Maps standard.

A topic serves as a focal point for informtion.
A topic serves as a focal point

A Topic serves as a record or a focal point to which a number of different items can be attached. Most importantly for a computer, every topic has its own set of identifiers, these allows topic maps to be modularised and still maintain references between topics.

For human beings, topics also carry multiple names. Names can be specified in multiple languages (English, French, Spanish...) and in multiple forms (e.g. a name formatted for sorting; a soundex encoding of a name and so on). The standard even allows for non-textual representations of a name so graphical, and audio-visual representations of topic names can also be supported.

Topics can be typed by other topics.
Topics can be typed using other topics

Topics can also be typed. A type specifies the category of "thing" that the topic belongs to. Seeing as a type of thing is itself something that you may want to say something about, in a topic map types are defined by other topics. So to say that "John Smith is a Person", I create a topic "John Smith" and a topic "Person" and then attach "Person" to "John Smith" as a type. The Topic Maps standard places no restrictions on the types that you define and to all intents and purposes a topic that is used as a type is no different to any other topic in the topic map.

Associations

Associations describe relationships between subjects by connecting together the topics in a topic map. Each association consists of a number of association roles each of which has a topic attached as a role player. The association role defines how the role player topic participates in an association.

Both the association itself and each of the association roles can be separately assigned a type. As with topics, this type is defined by another topic. So to model the fact that "John Smith" works for the company "ABC Corp.", we can create an association of type "WorksFor" and role types "Employer" and "Employee".

Associations are typed relationships between topics.
Associations are typed relationships between topics

At first glance, this might seem like a lot of overhead - three more topics needed to create just a single relationship! But, of course the topics used to define the association and association role types are then used to define the type of every "WorksFor" association between a person and their employer.

Occurrences

Occurrences are the mechanism for providing more information about a topic than its name and relationship to other topics. The Topic Maps standard allows an occurrence to be specified either as a text string that is part of the topic map itself, or as a link to a resource maintained outside the topic map.

As with topics and associations, occurrences can be typed, and once again that type is specified by a topic. When an occurrence links to an external resource, the typing of an occurrence should be used to convey some information about the kind of resource that is to be found and its relationship to the topic. When the occurrence is an internal text string, only the nature of the relationship between the occurrence data and the topic is important.

As shown in the diagram below, occurrences cross a boundary between information maintained solely inside the topic map itself and external resources in a separately managed "resource space".

Occurrences connect topics to resources.
Occurrences connect topics to resources

Identity And Merging

As already mentioned above, a topic in a topic map stands in place of some subject of discussion, whether it be an object in the real world; an resource on the Web or on a local PC; or an abstract concept that can be neither pointed to nor stored electronically. However, for a computer to be able to deal with these different subjects efficiently, it is necessary to be able to identify important subjects in some way.

Names

For human beings, names are an important way in which we identify things, but names have restrictions. The most common problem is that the same name can mean multiple things - sometimes this is due to incomplete naming (e.g. "Do you mean George Bush Sr. or George W. Bush ?") but more often names that are otherwise the same can be distinguished by context (e.g. "Do you mean Paris, France or Paris, Texas ?"). The topic map paradigm has a context mechanism built in which allows not only names, but also occurrences and the roles played by topics in associations to be specified with an associated context. More on context later.

Identity By Address

For resources which are found on computer systems, the location of the resource can be used as an identifier. This form of identity is known as a subject address. When two topics have the same subject address, they are considered to be about the same thing.

Of course, using the address of a resource like this is not suitable for all forms of electronic document. For example some web addresses refer to pages that are generated automatically from real-time data and some resources can have addresses that are very short-lived. However there is also still a large "static" web that can make use of this form of identity, and many other systems such as content management systems provide some form of persistent address that stays with a piece of content for its entire lifetime and it is a simple matter to map such persistent addresses to a subject address.

Identity By Subject Description

While Identity By Address is suitable for electronic resources with some form of persistent address, it is not possible to assign such addresses to real-world things, nor is it possible to define and address for an abstract concept. However, it is possible to describe such things in a document and then make that document available at a persistent address. In the Topic Maps paradigm such a document is called a subject indicator because its purpose is to provide an indication of the nature of a topic to a human reader of the document. The address of the subject indicator document can then be applied to a topic as a subject identifier. It is important to note that it is always a human being that "understands" the content of a subject indicator, however a computer can make use of the fact that if two topics have the same subject identifier, they are both referring to the same resource that describes their nature and so they must both be about the same thing.

Identity By Declaration

Another common form of identity on the Web is a URL that does not point to any resource at all, but is simply defined as an identifier in its own right. XML developers will be most familiar with this form of identifier in use for XML Namespaces.

In topic maps it is equally possible to make use of this form of identifier simply by assigning a topic a subject identifier that does not point to any resource at all. This approach can be useful when the goal is to create a set of identifiers for concepts managed as some form of code list as you can simply define a mapping from a code to a URI suitable for use as a subject identifier without having to create a corresponding web resource.

From a human user's perspective, this form of identifier is less useful than Identity By Description as there is no authoritative resource that describes the subject. However, for a computer that only cares about comparing addresses, this approach works perfectly well and can often be a practical compromise solution for very large sets of subjects.

Merging

So far we have seen the different forms of identity that allow a topic map processor to find the cases where two topics represent the same thing. In a fully processed topic map, however, there must only ever be one topic for each represented thing. The process of merging defines how multiple topics are combined into a single topic to achieve this goal.

The merging operation is almost entirely intuitive. When two topics are merged, the result is a single topic that has the combined set of names, identifiers and occurrences of the two original topics and where either of the original topics are referenced anywhere else in the topic map, that reference is replace with a reference to the merged topic. Finally, the two merged topics are removed from the topic map.

The diagrams below show and example of the merging operation. The identifiers used to determine that the two topics should be merged have been ommitted for clarity.

Two separate topics prior to merging.
Two separate topics prior to merging.
The result of merging the two topics.
The single merged result.

Context and Scope

Sometimes the statements that we make need to be qualified in some way. In language we say things like "In my opinion...", "For the 1st quarter of 2004...", "In French the movie is called... ". These are all examples of providing a context for a statement about a thing. The topic map paradigm provides a basic level of built-in support for specifying the context for a statement. In topic map terms we make statements by providing names and occurrences for topics and also by making associations between topics, so it is these constructs: topic names, topic occurrences and associations that support the topic map context mechanism known as scope.

The scope of a statement in a topic map is specified as a collection of topics, together these topics define a context in which the statement is considered to be valid. A topic map application may then use this scope definition to determine whether or not a statement is valid. Based on the determination of validity of a statement, the application is then free to present (or not present) that statement to the user as required.

XML Interchange Syntax (XTM)

The principle purpose of defining a standard is for interoperability between applications. In the preceding sections, we have described the functional level of interoperability between topic map processing tools - all conformant topic map processing tools support the structures and mechansims described above. The other important form of interoperability is in data interchange between systems and in topic maps the primary format for interchange is the XML Topic Maps (XTM) syntax.

Originally defined separately (the original standard specified an SGML-based interchange syntax), XTM is now a formal part of the ISO standard for Topic Maps. As with any XML-based syntax, XTM has the benefit that it can be created and parsed with off-the-shelf XML tools and also that it provides a human-readable data format that can be easily inspected and modified using a text editing application.

Conclusion

This article has presented all of the basic principles of topic maps and has hopefully given you an insight into the features that the topic map paradigm provides as standard.

We discussed the three basic elements of topic maps, topics (that represent things), associations (that represent relationships between things) and occurrences (that contain or point to information related to topics). We showed the main forms of topic identity that are employed by a topic map processor to determine when two topics represent the same thing and we also presented the merging process that is used to combine two topics so that there is only one topic in any given topic map that represents any given subject. We aslo presented the topic maps approach to specifying context and to limiting the validity of a statement using scope. Finally we briefly discussed the XTM interchange syntax.

In future articles we will be talking more about various aspects of the topic map paradigm in detail, sharing best practice and explaining how we approach the design and development of applications based on topic maps.