Exploring DBpedia Through SPARQL
A Practical Guide to Understanding Ontology Structure and Data Patterns in the Wild
DBpedia remains one of the richest publicly available Knowledge Graphs derived from Wikipedia content. Its structure gives a unique window into the shape of real-world data on the Web: entity types, properties, hierarchies, and semantic relationships.
This article explores DBpedia using a sequence of SPARQL queries, each designed to highlight a specific pattern or semantic capability. Every query includes:
1. Entity Types and Representative Instances
This query lists classes (types) in DBpedia, shows a sample instance for each, and counts how many entities belong to that type.
Query
SELECT ?entityType (SAMPLE(?entity) AS ?sampleEntity) (COUNT(*) AS ?count)
WHERE {
?entity a ?entityType .
}
GROUP BY ?entityType
ORDER BY DESC(?count)
Run it
Why It’s Useful
This is the fastest way to understand what types DBpedia actually contains and how many instances each type has.
It highlights:
It also provides a compact sanity check before doing deeper ontology or property exploration.
2. SubProperty/SuperProperty Exploration (Random Representative Start Points)
This query samples commonly used super-properties, selects one representative sub-property for each, and computes the full transitive hierarchy.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE {
{
SELECT ?superProperty ?subProperty
WHERE {
{
SELECT (?c AS ?superProperty) (SAMPLE(?a) AS ?subProperty) (COUNT(*) AS ?usageCount)
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(?usageCount)
LIMIT 5
}
}
}
?subProperty rdfs:subPropertyOf* ?superProperty .
}
LIMIT 100
Run it
Why It’s Useful
This demonstrates:
This is particularly useful when mapping DBpedia’s ontology to external ontologies or evaluating property alignment for integration tasks.
3. SubProperties Using the {+} Property Path Operator (Strictly Descendant Only)
This query selects the single most reused super-property and retrieves all of its sub-properties at any depth—but only those reachable via at least one rdfs:subPropertyOf relationship.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 1
}
?a rdfs:subPropertyOf+ ?c .
}
LIMIT 500
Run it
Why It’s Useful
The + operator ensures that only proper descendants are returned—not the property itself.
This is ideal for:
A Practical Guide to Understanding Ontology Structure and Data Patterns in the Wild
DBpedia remains one of the richest publicly available Knowledge Graphs derived from Wikipedia content. Its structure gives a unique window into the shape of real-world data on the Web: entity types, properties, hierarchies, and semantic relationships.
This article explores DBpedia using a sequence of SPARQL queries, each designed to highlight a specific pattern or semantic capability. Every query includes:
1. Entity Types and Representative Instances
This query lists classes (types) in DBpedia, shows a sample instance for each, and counts how many entities belong to that type.
Query
SELECT ?entityType (SAMPLE(?entity) AS ?sampleEntity) (COUNT(*) AS ?count)
WHERE {
?entity a ?entityType .
}
GROUP BY ?entityType
ORDER BY DESC(?count)
Run it
Why It’s Useful
This is the fastest way to understand what types DBpedia actually contains and how many instances each type has.
It highlights:
It also provides a compact sanity check before doing deeper ontology or property exploration.
2. SubProperty/SuperProperty Exploration (Random Representative Start Points)
This query samples commonly used super-properties, selects one representative sub-property for each, and computes the full transitive hierarchy.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE {
{
SELECT ?superProperty ?subProperty
WHERE {
{
SELECT (?c AS ?superProperty) (SAMPLE(?a) AS ?subProperty) (COUNT(*) AS ?usageCount)
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(?usageCount)
LIMIT 5
}
}
}
?subProperty rdfs:subPropertyOf* ?superProperty .
}
LIMIT 100
Recommended by LinkedIn
Run it
Why It’s Useful
This demonstrates:
This is particularly useful when mapping DBpedia’s ontology to external ontologies or evaluating property alignment for integration tasks.
3. SubProperties Using the {+} Property Path Operator (Strictly Descendant Only)
This query selects the single most reused super-property and retrieves all of its sub-properties at any depth—but only those reachable via at least one rdfs:subPropertyOf relationship.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 1
}
?a rdfs:subPropertyOf+ ?c .
}
LIMIT 500
Run it
Why It’s Useful
The + operator ensures that only proper descendants are returned—not the property itself.
This is ideal for:
4. SubProperties Using the {2} Property Path Operator
This focuses on properties that are exactly two steps below a frequently used super-property.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 1
}
?a rdfs:subPropertyOf{2} ?c .
}
LIMIT 500
Run it
Why It’s Useful
The {2} operator gives you a controlled look at mid-depth ontology structure.
This is helpful for:
5. Two-Hop SubProperty Exploration for the Top 10 Super-Properties
This expands the previous pattern to explore multiple major super-properties simultaneously.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (?c AS ?superProperty) (?a AS ?subProperty)
WHERE {
{
SELECT ?c
WHERE {
?a rdfs:subPropertyOf ?c .
?a a ?type .
FILTER (?type IN (owl:ObjectProperty, rdf:Property))
}
GROUP BY ?c
ORDER BY DESC(COUNT(?a))
LIMIT 10
}
?a rdfs:subPropertyOf{2} ?c .
}
LIMIT 100
Run it
Why It’s Useful
Powerful exploration of the DBpedia Knowledge Graph
You can quickly spot:
6. Property Usage and Dominance
This query counts the usage of every property (predicate) in the entire knowledge graph. It is the most direct way to discover which relationships form the backbone of DBpedia.
Query
SELECT ?p (COUNT(*) AS ?usageCount)
WHERE { ?s ?p ?o }
GROUP BY ?p
ORDER BY DESC (?usageCount)
Run it
Why It’s Useful
This query provides a high-level statistical overview of the graph's structure. It answers the question: "What are the most common facts stored in DBpedia?"
It helps you immediately identify:
7. Top-5 Property Hierarchies by Usage and Transitive Closure
This advanced query combines statistical analysis with ontology traversal. It first identifies the five most-used properties in the entire graph, calculates their usage count and percentage, and then finds all of their respective sub-properties at any depth.
Query
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?startProperty ?usageCount ?usagePercent ?subProperty ?superProperty
WHERE {
#####################################################################
# 1. Determine the 5 most-used properties (global property ranking)
#####################################################################
{
SELECT ?startProperty ?usageCount ?usagePercent
WHERE {
# Compute usage count per property
{
SELECT ?p (COUNT(*) AS ?usageCount)
WHERE { ?s ?p ?o }
GROUP BY ?p
}
# Compute percentage of total usage
{
SELECT (SUM(?cnt) AS ?totalCount)
WHERE {
SELECT (COUNT(*) AS ?cnt)
WHERE { ?s ?p ?o }
}
}
BIND(?p AS ?startProperty)
BIND((100 * ?usageCount / ?totalCount) AS ?usagePercent)
}
ORDER BY DESC(?usageCount)
LIMIT 5
}
#####################################################################
# 2. Use the ranked properties as starting points of closure
#####################################################################
?subProperty rdfs:subPropertyOf* ?startProperty .
BIND(?startProperty AS ?superProperty)
}
LIMIT 200
Run it
Why It’s Useful
This is the ultimate "high-impact" exploration query. It directly connects the statistical backbone of the knowledge graph (the most used properties) with its semantic structure (the property hierarchies).
This allows you to:
Related Links: [1] https://www.garudax.id/posts/openlink-software_dbpedia-knowledgegraphs-ai-activity-7395189553558552576-EBbs -- About The DBpedia 2025-06 release [2] https://www.garudax.id/posts/openlink-software_dbpedia-knowledgegraph-linkeddata-activity-7395241299295043586-a7EC -- DBpedia AMI for the AWS Cloud