Using KnowledgeGraphs to create a basis for an Ontology / Object Type Library
You know those conversations that you sometimes have that spark all kinds of ideas? Had one of those with a potential client on friday. Since it was our first get-to-know-each-other, I'll forego on the usual namedropping, but still wanted to share this with you.
This specific party has extensive experience with BIM, digital information management and that sort of thing. They feel they are ready to start exploring the world of #LinkedData, #KnowledgeGraphs and other #AI-ready #DataDriven workflows.
However, the question was: where to start? They had been talking to several parties about utilizing this type of technology. And something weird happened. All these extremely advanced solutions required a very long, very manual and thus very expensive process of defining a data structure for their Ontology and Object Type Library as a basis to start with. After creating this, it would be possible to use #LinkedData principles to link the OTL to 3D models, thus creating a bridge between the 3D model and other data sources.
What's an Ontology?
For those of you that don't know: an Ontology is basically a lexicon of all definitions that you would find in a specific domain, in our case a building (model), their relationships and properties. For efficiency purposes an Ontology is usually structured in a tree-like structure, where children can inherit properties from their parents. In a sense you could call (part of) the IFC schema an Ontology: it describes the types of objects you can have and their relationships. The whole geometry part usually isn't defined in an ontology, but for the rest: pretty solid example.
The IFC Ontology describes the Ontology of an IfcWallType as shown in the image below.
And this WallType also has another level of subtypes in the form of Enumerations:
It has a bunch of Properties defined:
And it's relations to other definitions within the Ontology:
And what's an Object Type Library?
When you describe an Ontology as the overall "data structure", an Object Type Library (OTL) would be the step where you populate your data structure. So in case of the IfcWallType.SOLIDWALL there could be this set of Object Types like:
For each of these types you define the type-specific Property values. Does that sound an awful lot like a library of 3D components? Well, that's because it is. Sort of. There's a lot of stuff in a 3D object like a Revit Family that's not supposed to be in your OTL. Like all the parametrics. Or the geometry. And there's probably Properties in an OTL that don't belong in your 3D model because they cater to the information needs of other processes. Such as the environmental impact or replacement costs. But there most definately is an overlap. If you model it, it should also be in your OTL.
However: not everything in your OTL has to be in your model. It's a data structure. So especially the aggregrations closer to the top-level often are more conceptual.
Similarly, not everything in your Ontology has to be mirrored in your OTL. An Ontology can hold broader concepts than just stuff that you can touch. For instance Space definitions, Building Systems, etc. But also Actors or Roles.
Why do people want this?
The promise of Linked Data is that you can freely exchange information between linked applications / databases. This allows all kind of automated flows of data and would potentially eliminate the need of manually updating several applications when something changes. So if your Maintenance replaces a faulty pump this gets automatically updated in your FMIS, BMS, ERP, PMP, DMS and whatever other acronym you're running to keep your digital building information in.
However: in order to do this each of those acronyms would have to know how the other acronyms call a Pump. And what Property needs to be updated with the new Date Of Installation, Manufacturer and Model information.
That's what Ontologies and OTLs do. They provide the software with a manual on how to interpret and process data coming from another application and correctly process (or ignore) it.
How would we approach it?
Coming up with an Ontology and/or OTL to accomodate every process is extremely difficult and time consuming. It can take many months, even years and be very, very expensive. So when the question came "how would bimforce approach this task", it got me thinking.
Why would an "Intelligent" solution require you to manually define your data structure from scratch? Especially when you have a history of dozens of (BIM) projects where you already produced 3D models that have been used for building (and sometimes even maintaning) a building. Shouldn't you just be able to simply ask your existing data what you (apparently) need and take it from there? Isn't this type of insight exactly the type of thing we, as well as our esteemed colleagues, are selling?
In order to test this, I started asking our testproject some questions...
Q&A with our IfcGraph
First thing I wanted to do: get a sense of our sample size. So I asked if I could get an overview of:
The result is promising. 1109 lines that define and rank the most commonly used properties per IfcClass. The top section (for IfcBeams) looks like this:
This tells me a couple of things:
*Would be lovely to show you how you can use your KnowledgeGraphs to quickly identify these objects, allowing you to fix it. But that's for another time.
Recommended by LinkedIn
Diving deeper
So for the first iteration of getting a more detailed view I tried to just look at the IfcName of all objects. Now, this returns the following overview, which shows us something: there's a distinct difference between the information added to wooden beams (the first 6 rows) and steel beams (the following rows). Second thing I noticed: this returns 6782 lines. Which is way too much. This is caused by Revit being Revity (and the person exporting not knowing which checkboxes to tick): a lot of Beams have the Revit ID added to their IfcName making them unique.
So we need a subtype of Beams to differentiate. But, as most people do, the information structure across projects kind of differs. So we really can only use the name. Luckily for us, Graph databases support all types of string actions when querying. So for the next iteration, I added some grouping based on Material names that might be part of the IfcName Attribute. Besides this, I've added the "Other" option to catch all other beam subtypes:
Cleaning up some more by removing properties we don't care about (more IFC export stuff) and the Properties we already defined on the Beam toplevel provides us with this list:
As you can see, this still leaves some "noise": Properties that are either flat out wrong (somebody used a different naming for the IfcGuid parameter), not appropriate (test parameters and such) or just not "standard" properties. But there's also missing data. Some of these properties should have been in all Beams. This is a limited dataset, but with larger sets, more projects and more repetition we could start seeing patterns.
First off, let's look at the values that are applied for these parameters.This would help us create an enumerated list of allowed values (when desired). It also might bring some more insight:
First thing to notice: the Revit geometry engine is kind of crappy. What you see with the numerical values is Revit being Revity. Let's fix that with some rounding, just because it's annoying. Also, we're going to order the values a bit.
What's next?
Now we have defined a basic Ontology for Beams for review:
Will this need adjusting? Heck yeah. For starters it's a huge simplification of an Ontology. But it's not meant to be perfect. I'm sorry, but especially cross-project you will find it is very difficult to be consistent. For instance: it's easy to spot that the Assembly Code (main classification parameter) should have been on the top level. Every beam needs to be classified. The reason that it's not is quite simple: I'm looking at multiple models and they all have the same "weight". So crappy models cancel out good models. Off course, this too could be taken into account. This example isn't, but we could start "weighing" the results based on the project they come from. Point is: you will most definately need to vet this. But you have a starting point. You can reuse the (building) data you've accumulated and turn it into Knowledge. Hence the term KnowledgeGraph.
We can expand this by querying the IFC models for other relationships. That would lead us to discover that Beams are attached to a BuildingStorey. That they have a Material definition. That they can belong to structural assemblies.
Going back to the different ObjectTypes we can also define an OTL for Beams based on the output we have. We could look at different Object Types defined across all buildings and apply them to this ontology. We can analyse which data is missing and add that.
But it doesn't end with 3D models. That's just one source of building information. Feed a couple of years worth of Maintenance ticketing data and the Graph will tell you what information you want to have when something breaks. Or your predictive maintenance data and find out which properties you need when conducting a maintenance survey. Cross-reference those with the Knowledge you gained from your 3D model and you can not only update your Ontology and OTLs but with that also improve the information quality of your 3D models.
So what is our approach for setting up an Ontology and/or OTL?
We look at your data. Because if you did it somewhat right, your digital footprint should tell us what you want to know. Once we established that, we can still discuss the overall data structure and hierarchy, add aggregration levels, conceptualize and whatnot.
But it's always easier to solve a puzzle that you know the answer to. And honestly: the first version of something as complex as an Ontology is always flawed. As a great philosopher once said "Everybody has a plan until they get punched in the face". Truth is you will get punched in the face by reality. Your data won't be flawless. There will be things you don't consider. So the trick is to create a workflow that allows you to incrementally improve and expand your Ontology.
So instead of locking ourselves in a meeting room for 6-18 months with a truckload of coffee trying to capture reality into a data model out of thin air, we spend a couple of weeks looking at what you already have. And build a logical structure on top of that. Ater all: we have the tools to do that. We sell them. Why not use them from the start?
I would generalize the approach described here, feed in a bunch of models and see what pops out as a basis. If you have it, I would do the same with any other data source in your organization. I would try to cross-reference them and compile an aggragrated view of all those data sources.
Still doesn't constitute an Ontology though
That would be correct. But that's where our great friends from buildingSMART International come in. They developed this thing called the buildingSMART Data Dictionary (bSDD). This is an Open Standard that allows you to define Ontologies and OTL's. Now strictly speaking the bSDD is not an Ontology by itself. Because it's primary goal is to provide Data Dictionaries which are more Type Libraries. Basically: it will allow you to define all types of Properties, but doesn't really support conceptual super-types such as Assemblies (for instance).
However: it's an Open Standard. You can do whatever you want. If you want to define conceptual super-types and relationships, nobody is stopping you. In fact: the bSDD also holds IFC which, for all means and purposes, is an Ontology. So it most definately can be done.
And the nice thing is: it's a JSON format. So I would probably take my basic Ontology and export it to a .bsdd file. Which I can then import in a free editor (such as the one ACCA software EN has) to further expand it as needed. And yes, this would be the part where we still need a truckload of coffee to discuss and define a data structure for your Ontology. Especially the conceptual parts. But it's going to be a lot less coffee.
Once done, we can then re-import as a separate KnowledgeGraph that we can use to analyse future projects with. Similar to the approach described here you can analyse new building data to spot the difference between your OTL and the model components.
Future projects that can use your bSDD Ontology while doing their modelling by the way. A bSDD can be made publicly available through the bSDD website
There is a free and Open Source plugin available for Revit on GitHub to connect to the bSDD and link model components to object types. When exported to IFC, this link can be preserved. There are also several tools available to link IFC files to the bSDD. Again, ACCA software EN has a nice tool.
Importing this IFC back into your KnowledgeGraph will allow you to link directly back to your Ontology. And see which components have not yet been defined, creating a self-improving feedbackloop.
But yeah, let's start by looking at what you already Know.
Sounds interesting?
Are you sitting on a pile of data, trying to figure out what to do with it? Trying to get insights on ways to improve your digital information management? Trying to connect and integrate different data sources through Linked Data concepts? Get in touch and we'll be happy to help you out.
Or just want to have a (digital) cup of coffee and see what else we're up to? Just drop a line and we'll set something up.