Graphs are to Multi-Cellular Organisms as Databases are to Bacteria
I’ve been reading The Song of the Cell and, like all other books by Siddhartha Mukherjee, it is wonderful!
At the same time, I’ve been thinking about the slides that I’m supposed to prepare on how to build a Biomedical knowledge graph (kgraph). As I carried these two ideas in my head, one clicked with the other and got me to wonder: “Are graphs like multi-cellular organisms that support and collaborate with each other to create a larger self?”.
In other words, even thought each kgraph is an individual unit—the kgraph at Uniprot or the kgraph at pathway commons (BioPAX)—which is only consistent and sustainable when it’s complete and passes the quality and completeness checks from its designers, it has some inter-dependencies with other kgraphs (e.g CheBI, Ensembl, HGNC). Without them, it cannot thrive. Similarly, other graphs have inter-dependencies with them.
The medium that these graphs co-inhabit, or the “inter-cellular matrix,” is the Web. As long as it’s possible to do a secure transaction of data over the web, these graphs can exchange “data” via the SPARQL SERVICE keyword (ie. federated querying) .
Recommended by LinkedIn
Note: Yes, I know—those among you who have tried SPARQL federation only to notice in frustration that the remote endpoint is down and your query failed due to time-out—I hear you. But hey, it’s just like in real life! If you were a cell and needed some nutrients from a neighboring cell in order to survive, you would be dependent on that cells’ willingness to help you and share nutrients with you. And she’d only do that if sharing nutrients were mutually beneficial, such as it is in a multi-cellular organism.
A relational database
Like a bacteria, a relational database is a self-contained, self-sufficient unit: we engineers build them for a purpose, they are meant to perform some very important tasks for a set of business users, and that purpose is what gives them energy (i.e funding) to survive (i.e. hire other engineers to maintain it and grow it) and thrive (i.e. create backup copies and evolve into something new and more adapted to future needs).
All of this to say: To decide whether to collect our data in a relational database or a knowledge graph, we can ask ourselves “Is what I am trying to accomplish like a small individual and self-supporting bacteria or is it a larger multi-cellular organism where the individual cells have inter-dependencies such that they can grow into a much larger, must more robust and future proof organism?" If the answer is "the latter" then a knowledge graph is the way to go.
Just a passing thought I had. Would love to hear what you think about this idea!
What if we compare knowledge graphs with noSQL databases. May be a nested json which could be nested under other json. FHIR is in a way designed with such ideology. Yes, On the other side, you may look at it as protein molecules which are used by cells to collaborate...
interesting Lena. I like the growing and building inter relationships part of the analogy. Hope you are well.
Interesting analogy, Helena Deus, that I will use (with attribution, of course) to explain the difference to colleagues!! Really great.
Interesting analogy. A key component of building a multicellular system is cooperation (positive and negative). So the question is how to quantify if two graphs interact positively or negatively? In evolutionary terms, when two cells have co-dependencies they optimize individual function and support each other to survive, possibly giving rise to multicellularity (network). But cheaters will always occurs and get a "free-ride". If cheaters grow too much in a population, the whole network collapses.