Querying The DBpedia Open Knowledge Graph With Standard SQL
DBpedia is a crowd-sourced community project that extracts structured content from mainly Wikipedia pages — learn how to use it!
Join the DZone community and get the full member experience.
Join For FreeIntroduction to DBpedia
DBpedia is a crowd-sourced community project that extracts structured content from mainly partially unstructured and semi-structured parts of Wikipedia pages and other structured sources such a Wikimedia. The project maintains links to external sources and reciprocally many sources point to the latest DBpedia ontology/open knowledge graph (OKG).
You may also like: The Best Way to Write a SQL Query
The extracted/consolidated information from a holistic viewpoint represents an ontology/open knowledge graph (OKG), which in turn, also represents a knowledge-base containing at the same time class/property metadata and class/property instance data. DBpedia covers a multitude of domains as compared to typical limited domain-specific knowledge-bases and also, it includes a multitude of different international chapters/language communities.
On the other hand, timbr DBpedia represents a synergy between DBpedia + SQL. Permits the querying of the DBpedia ontology/Open Knowledge Graph (OKG) via standard SQL including also path expressions, avoiding the definition of complex joins and complex SPARQL queries, via a SPARQL End-point.
Anybody that feels comfortable with SQL can reuse its knowledge for querying/accessing the DBpedia OKG. Also, timbr supports creating ontologies in SQL and also supports loading existing ones like in this case, the DBpedia Ontology/OKG, which is written in OWL.
Timbr DBpedia Semantic Web Stack
timbr DBpedia supports RDF/RDFS/OWL ontologies via its virtual SQL compiler which fulfills the Semantic Web stack:
- Modeling, managing, querying ontologies via standard SQL
- Enabling graph traversals in standard SQL via path expressions as typically included in graph databases
- Permitting semantic inference at run-time via ontology rules
- Avoiding complex joins and complex SPARQL queries
- Supporting RDF/RDFS/OWL ontologies mapping them to backend SQL DBMSs.
- Extending SQL with inheritance (IS-A relationships allowing the definition of large subsumption hierarchies) and inference capabilities.
Timbr DBpedia Architecture
SQL clients and BI tools access via a Virtual SQL engine/query service (timbr supports JDBC/ODBC), the timbr DBpedia ontology that maps ontology artifacts to backend SQL DBMS’s. OKG Classes/Concepts/Properties are mapped to relational tables/columns via its “three-in-one”, timbr, etimbr, and dtimbr schemas (see section 5).
Properties may include direct properties, inherited properties (within the subsumption hierarchy) and references that are special attributes representing relationships to other classes/concepts (typically represented as URI’s/foreign keys in the virtual SQL engine).
The DBpedia OKG no longer is required to be accessed by complex SPARQL queries/SPARQL end-points. The DBpedia OKG can be queried with standard SQL. timbr DBpedia also includes an Ontology Viewer/Explorer so end-users can intuitively review the scope of a given sub-part of the OKG.
In this technical report, we focus on accessing the DBpedia OKG via SQL clients (in this case MySQL and Amazon Redshift). At the end of this report, we briefly mention the timbr DBpedia Ontology/OKG Explorer and the timbr DBpedia Data Exploration Beta component.
Timbr DBpedia SYSTEM TABLES
We briefly mention timbr-DBpedia system tables as the set of tables supporting the definition and querying of the DBpedia via standard SQL. It can be considered as the underlying "system catalog" table. Among them we can cite:
Timbr DBpedia System Tables:
- Timbr.SYS_CONCEPTS: contains unique timbr DBpedia ontology class/concepts.
- Timbr.SYS_INHERITANCE: contains base/derived class/concepts.
- Timbr.SYS_ONTOLOGY: contains the set of classes/concepts with the respective properties, primary keys, inherited primary keys, parent class/concepts, inheritance level and so on.
- Timbr.SYS_RELATIONSHIPS: includes a total of 8 columns, among them class/concept, the related class/concept, a foreign key name associated, etc.
Three-In-One: Timbr, Etimbr, Dtimbr Ontology Schemas
In timbr DBpedia there are three different viewpoints/perspectives of the ontology representation:
- Timbr schema: Implicit representation of the ontology. Includes explicit ontology/relational mapping.
- Etimbr schema: Exhaustive representation. Includes the timbr schema ontology artifacts plus derived class/concept properties. In other words, parent class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy (for instance the “thing” class/concept will include all the DBpedia ontology/knowledge graph properties from where all class/concepts inherit entity_id and entity_type properties).
- Dtimbr schema: Dereferenced representation that includes the etimbr schema plus graph/hop traversals, avoiding unnecessary joins.
Timbr DBpedia Schema Queries
This particular DBpedia schema representation treats classes/concepts/properties as tables/columns where every class/concept have as a minimum, the following attributes (inherited from the parent ancestor ontology artifact “thing”):
- Entity_id: unique URI/primary key.
- Entity_type: self-explanatory, it is the type of ontology artifact.
Please note that timbr DBpedia permits multiple inheritances, therefore, entity_type may contain several entries. Also please note that the DBpedia ontology/Knowledge Graph may contain many class/concept property null values.
SELECT entity_label as `Book Title`, publisher as `Publisher URI`, publicationdate as `Publication Date`, numberofpages `Num. Pages`, isbn `ISBN`
FROM timbr.book
WHERE entity_label IS NOT NULL AND publisher IS NOT NULL AND publicationdate IS NOT NULL AND numberofpages IS NOT NULL AND isbn IS NOT NULL
AND entity_type='book'
ORDER BY publicationdate DESC
One can note that the first query row refers to invalid publication date. The rest seems to be valid date entries.
Etimbr DBpedia Schema Queries
Etimbr DBpedia Schema queries include timbr properties plus derived class/concept properties. Parent/ancestors' class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy. Please note that the parent class/concept of all classes/concepts is "thing", therefore the "thing" class/concept will contain all the set of properties in the e-timbr schema.
Query Example:
The following query executes a UNION of “literarygenre” in “thing” class/concept and “genre” in “genre” concept:
xxxxxxxxxx
-- Retrieve distinct book ”genres” First Part Retrieves 9 distinct rows
SELECT distinct(thing.entity_label) as `Literary Genre`
FROM etimbr.book `b`
JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
ON `b`.genre = thing.entity_id
WHERE `Literary Genre` like '%fiction%'
UNION -- Retrieves a total of 11 distinct rows
-- Retrieve distinct book ”literarygenres” Second Part Retrieves 10 distinct rows
SELECT distinct(thing.entity_label) as `Literary Genre`
FROM etimbr.book `b`
JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
ON `b`.literarygenre = thing.entity_id
WHERE `Literary Genre` like '%fiction%'
Query Example:
All Ontology/OKG classes/concepts have as an ancestor “thing” concept, therefore, we can get the whole set of derived attributes via etimbr.thing schema. In this case, we obtain simple statistics returning the following:
- The number of classes/concept instances in timbr DBpedia ontology/OKG.
- The number of distinct class/concept types.
- The number of distinct entity_labels (property only accessible via etimbr schema (derived property)).
xxxxxxxxxx
SELECT count(entity_id) as `Num Concepts`, count(distinct entity_type) as `Num Entity Types` , count(distinct entity_label) as `Num Distinct Entity Labels`
FROM etimbr.thing
Query Example:
Now we obtain simple statistics by querying etimbr.thing (includes all instance types given that all classes/concepts have as ancestor concept thing):
- Grouping by class/concept type.
- The number of class/concept instances (person, place, company).
xxxxxxxxxx
SELECT entity_type as `Concept`, count(resource_id) as `Number of Concept Instances`
FROM etimbr.`thing`
WHERE entity_type in ('person','place','company')
GROUP BY entity_type
ORDER BY `Number of Concept Instances` DESC
This query presents an interesting viewpoint that needs to be further explained. If we define the following query:
"SELECT count(1) from timbr.person"
We retrieve 4339681 “person” concept instances because the timbr inference rules include instances that are mapped to derived classes/concepts of a person as well, as compared to the query depicted above specifically filtered person instances mapped directly to the “person” concept (2829415).
Dtimbr DBpedia Schema Queries
Dtimbr DBpedia schema: includes the etimbr schema plus graph traversals. Relationships to other class/concepts are modeled as properties, therefore avoiding complex joins and enabling path query expressions (typical in graph databases).
Query:
This query retrieves the book URI/ID, book title, author name, and thing literary genre where book title is about science fiction, or literary genre is about science fiction or the author is in a selected list of science fiction authors:
xxxxxxxxxx
SELECT `b`.entity_id as `Book URI`, `b`.entity_label as `Book Title`, `b`.`author.entity_label` as `author_name`, thing.entity_label as `Literary Genre`, `b`.entity_type as `entity_type`
FROM dtimbr.book `b`
LEFT JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
ON `b`.literarygenre = thing.entity_id
WHERE entity_type LIKE '%book%' AND
lower(`author_name`) IN ('isaac asimov','arthur c. clarke','orson scott card', 'robert a. heinlein') OR
lower(`Book Title`) LIKE '%science fiction%' OR
lower(`Literary Genre`) LIKE '%science fiction%'
ORDER BY `Literary Genre` ASC
Query:
Dtimbr query retrieves the number of person’s types grouped by continent and country (a total of three hops) and entity_type:
xxxxxxxxxx
SELECT `birthplace.country.continent.entity_label` as `Birth Continent`, `birthplace.country.entity_label` as `Birth Country`, entity_type as `Person Type`, count(entity_type) as `Number of Persons Type`
FROM dtimbr.person
WHERE `Birth Country` in ('Australia','Spain','United States of America','Peru')
-- OR `birthplace.country.continent.entity_label` in ('Europe')
GROUP BY `Birth Continent`, `Birth Country`, entity_type
HAVING `Number of Persons Type` > 10
ORDER BY `Birth Country` ASC, `Number of Persons Type` DESC
Timbr DBpedia Ontology Explorer
Timbr DBpedia models the ontology/OKG with concepts, properties and references mapped to:
- Concepts: ontology concepts are mapped to relational tables.
- Properties: concept properties are mapped to columns in relational tables and may be of type typical SQL data types such as varchar, integer, timestamp, etc. depending on the underlying back-end DBMS (i.e. Amazon Redshift, Apache Spark, etc.). Properties include inherited properties (from parent/ancestor concepts), direct properties (defined for a given concept).
- References: represent relationships between ontology concepts and are mapped to SQL foreign keys in relational tables. The SQL data type of reference is varchar representing an ontology URI.
Timbr DBpedia supports complex subsumption hierarchies via IS-A relationships (up-to 8 levels allowing multiple inheritances), properties and references to other concepts. The Ontology Explorer contains the following parts:
- Ontology Tree: situated on the left-hand side displays the subsumption hierarchy of a given concept.
- Ontology Graph Viewer/Explorer: on the center of the screen, displays the entire ontology sub-part/view.
- Ontology Concept Details: displays a summary of information for the selected concept, including the direct ancestor, the hierarchical level, number of total properties, direct and inherited references, inherited and direct properties.
- Ontology Main Graph Control: allows different options for controlling the output of the ontology explorer such as including references/relationships, properties and so on.
In the ontology tree, we can see the “book” concept hierarchy, which is in level 4. In the Ontology graph Viewer/Explorer (center of the screen) we have included property/references displaying author, writer, genre, and literarygenre. On the right-hand side, in the ontology concept details, we can see that the concept book has a total of 126 properties, 6 direct references, 69 inherited references, 15 direct properties, and 111 inherited properties.
Timbr DBpedia Data Exploration Beta
We can access the new Beta Ontology/Data exploration menu item by selecting Exploration/Data Exploration beta, which firstly displays a dialog box that demands us to enter the required back-end DBMS. In this case, we can select either DBpedia Amazon Redshift or DBpedia Spark. After selecting DBpedia Spark we have to choose the desired Ontology/OKG class/concept.
A set of concept properties is required to fill-in and the required filters and results to be retrieved. For the time being, this Beta does not allow to define a filter that allows path expressions. Nevertheless, we can do so via the Concept details Window section, step-by-step incrementally. According to timbr.ai, this feature will be incorporated later in the Beta program.
In Figure 5 we can identify the Data Exploration Beta parts:
- Data Exploration Concept Viewer: Situated on the left-hand side of the screen, displays concepts involved in the OKG.
- Data Exploration Ontology/OKG Viewer: Situated in the middle of the screen, we can identify several books with some relationships exploded such as book author and author details such as birthplace and death place.
- Data Exploration Concept Details Viewer: On the right-hand side of the screen, displays concept/properties details, giving the possibility of graph traversal with the related/references to other concepts.
On the captured screen below of the Data Exploration Beta timbr DBpedia component, we have “exploded” different “book”, ”author” (person) concepts, which have been added incrementally by clicking on the “path traverse” button. A total of 11 related concepts are involved, including book, person, place, country, continent, language, agent, ethnicgroup, populatedplace, award and so on.
Conclusion
With timbr-DBpedia, we can analyze/traverse the DBpedia Ontology/OKG with standard SQL that includes the set-up of path expressions avoiding cumbersome joins and avoiding complex SPARQL queries/SPARQL end-points. Anybody that has experience working with standard SQL will feel comfortable issuing timbr DBpedia queries.
Timbr DBpedia is scalable as much as the used back-end. In this technical report, we have been working with timbr DBpedia Amazon Redshift for the OKG, for system tables we have been querying a MySQL backend and for the timbr DBpedia Data Exploration Beta we have been working with DBpedia Spark.
Further Reading
Opinions expressed by DZone contributors are their own.
Comments