Making Topic Maps SPARQL
The focus of this article is on the implementation of the SPARQL query language on top of a topic map repository. I'm not going to get down into implementation issues but instead will describe what has motivated this work and the way in which we have addressed the various incompatibilities between the graph-based data model for which SPARQL was designed and the aggregate association model of topic maps.
Contents
Motivation and Goals
The principle motivation behind this work has been the need for topic maps repositories to be active participants in the Linked Data Web. It is relatively trivial to expose topic map data as RDF but just providing a bunch of addressable RDF resources isn't really enough. A true Linked Data repository really must also support a query interface that allows clients to search and filter the data provided by the repository. It is clear (from even a cursory inspection of the various linked data repositories that exist out there) that SPARQL is the preferred query language for Linked Data clients. Therefore if a topic maps repository is to be a full and useful participant in the Linked Data web it is important that it can provide a SPARQL query service endpoint.
A second, weaker, motivation has been the progress and direction of the ISO TMQL standard. At the time of writing the TMQL standard is still under development and while it is not clear just yet exactly how the query language will look, it is my opinion that the final language will not be palatable to those outside the topic maps community. It is hoped that by providing a SPARQL query implementation on top of a topic maps repository it would be possible to provide a simpler query language that perhaps meets the 80/20 rule leaving the 20% of really heavy lifting in topic maps query to TMQL.
The main goal of this project is to enable the implementation of a SPARQL query endpoint, but it is also important that this should be an endpoint that is usable by any Linked Data client. In particular the SPARQL query endpoint should be queryable by a client that does not know (and does not need to care) that the underlying repository is using topic maps as its core data model rather than RDF. Ideally a client should be able to send the same query to both RDF and Topic Maps based repositories and, assuming there is some consistent mapping of identifiers to resources, retrieve equivalent results. For example, if I have a topic map that stores social network data using the FOAF ontology, it should be possible for a client to query that data using exactly the same SPARQL queries it would use to query an RDF store of FOAF data.
We have called this project TMSPARQL.
Approach
A SPARQL query is expressed as a graph pattern that (in its most basic form) consists of a set of triples where the subject, predicate, or object of those triples may be either an RDF node or a variable. The query result returns the collection of bindings of values to the variables that result in a match of the pattern against the data store. TMSPARQL takes a SPARQL query and converts it into a set of matches against a topic map data store. It is important to point out that the data being queried is stored using a conventional topic maps data model - no conversion or intermediate form of data is required nor is any configuration or mapping of identifiers.
For the purpose of explanation I'm going to use some very simple sample data. This is the data being queried in CTM notation:
person http://www.networkedplanet.com/ontology/employment/person . company http://www.networkedplanet.com/ontology/employment/company . worksFor http://www.networkedplanet.com/ontology/employment/worksFor . employer http://www.networkedplanet.com/ontology/employment/employer . employee http://www.networkedplanet.com/ontology/employment/employee . hourlyWage http://www.networkedplanet.com/ontology/employment/hourlyWage . alf isa person hourlyWage 14.50 . bert isa person hourlyWage 25.40 . xyz isa company . worksFor(employee: alf, employer: xyz) worksFor(employee: bert, employer: xyz)
Mapping URIs to Topics
The first stage of the TMSPARQL process is to bind all URIs in the triples of the graph pattern to topics. The URI may be one of a small number of special predefined URIs (primarily used as predicates for explicit traversals of the topic map data model), otherwise the URI is used to look up topic map items by item identifier or subject identifier (for topic items). Under this rule it is possible for a single topic to be referenced by different resource URIs in a SPARQL query, but this has no negative effect on the construction or execution of queries. If a topic map item cannot be found for a particular URI, then the triple containing that URI cannot be matched. For example given the query:
PREFIX ont:
SELECT ?person ?employer ?wage WHERE {
?person ont:employer ?employer .
?person ont:hourlyWage ? wage
}
only the predicates of the triples in the graph pattern are bound to URIs. The TMSPARQL engine must, then, find the topic map items identified by the URIs http://www.networkedplanet.com/simpleOntology/hourlyWage and http://www.networkedplanet.com/simpleOntology/employer.
Implicit Data Model Traversals
If the predicate of a triple in the SPARQL graph pattern does not match any of the predefined URIs for the explicit data model traversals, then there are a number of possible traversal options that the TMSPARQL processor must consider. In each case, the predicate specifies a typing topic that acts as a filter on the traversal.
Related Topics Traversal
The related topics traversal binds topic items to the subject and object of the triple. The predicate of the triple must be a role type and specifies the type of the role played by the topic bound to the object of the triple in an association in which the subject of the triple plays a role of any type other than the type bound to the predicate. In our example query the triple ?person ont:employer ?employer executes a Related Topics Traversal because the topic with subject identifier http://www.networkedplanet.com/simpleOntology/employer is used as a role type in two associations in our sample data set.
At first glance it may be unclear why the predicate is a role type and not an association type. The reason for this is that a topic map association potentially groups more than two topics together. Using the association type as a filter implies no specific "direction" to the triple meaning that any role player could be bound to either the subject or object of the triple and that all possible combinations would have to be returned by the match (in our example if we used association type as the predicate ?person ont:worksFor ?employer and would result in the following matches: