Paradox in modern software systems or "This statement is false"
In my interviews these days as I look for a new home, I get asked a number of hypothetical questions about REST API design. The response I really want to give is not the one they want to hear, and I am a practical cat. That said, I am now suffering from an overwhelming desire to give the response I want to, which pretty much boils down to 'are you crazy?'.
Consider a simple REST API implementing basic social messaging functionality. Lets dispense with the easier elements of the exercises;
1) There exists a URI messages/<messageId> which represents the resource for a distinct singular message, and the common HTTP verbs are realized as;
- GET will net you the pre-existing message or an error depending on any needed credentials, content type, etc in the header
- PUT is not allowed - messages are considered immutable
- POST will create a new message ID for the posed message and persist it - of course XML and JSON are supported via content type negotiation. The new message ID is available as a simple scalar result given a non-error return.
- DELETE is not allowed - messages never disappear - think before you send :-)
2) Any message can have comments, shares, and ratings. There exists the set of URIs messages/<messageId>/<attribute>/<attributeId>, e.g. message/1/comments/1 is a request for comment 1 attached to message 1. The <attribute> value is taken from a constant set of values in this example, specifically [comments,shares,ratings]
One issue I have is with the fact that many constructed examples assume that the set of attributeId values over messageId values are non-intersecting, i.e. you can effectively derive a messageId from an attributeId. Even given the identities of messageId and attributeId start at a common origin, given that singular id is used across any set of attributes, it simply follows that the attribute and the messageId are derivable as a function of the attribute id, due to the fact the attribute id is globally unique. If every attribute id is globally unique, then by definition it is unique to a specific message id, therefore the fundamental system model posits that messageId = functionOf(attributeId).
Consider a bug where accidentally, some attribute was simply copied from one message to another. Of course it was reinstantiated. Of course there's a new attribute id. You promise you never made a mistake anywhere in any implementation. But what if you had ? You're in a situation where the system is in fundamental disagreement with itself. Say when everything was good, this was legal "/messages/42/comments/1600. A bad thing happens and /messages/49/comments/1600 gets materialized. What do the deeper systems do ?
Obviously because of the fact the id's are primary keys, and that will fail out of the gate. But as the complexity of the model grows, not everything is a primary key. You will end up in this situation because your system is fundamentally capable of representing paradoxical information, which cannot be resolved. This is the fundamental goal of simple normalization of information by the removal of redundant data. Efficiency is not the only goal. The real goal is to have a system that is incapable of holding multiple opinions about the same atomic fact. This is the heart of transactional database design. It is also the heart of any software design once it reaches a certain level of complexity.
My concern with this lies in what I consider to be a fundamental 'ility' of a system, just like scalability or extensability. This is the measure of 'durability', or conversely 'fragility' of a system. In the example given above it is possible that the system can end in an end state that is both wrong, and more importantly, illegal with respect to the metadata of the system. In simplest terms, you've created a paradox, and the only way out of a paradox is a random function with respect to the set of paradoxical discriminants, or, pick what you like, look everything else up again, and deal somehow with the fact that a piece of information magically disappeared, and your system has effectively broken a fundamental rule whose effects will return to bite you. Oh, and you may have changed contexts in this process through a fundamentally unobservable change in metadata relations.
What you need to do is to define key information with respect to the context within which it is scoped. A messageId can be seen as evidently global, defining a specific message within the entire set of known messages. An attribute id however, defines an attribute with respect to the message it is an attribute of. It has no independent meaning beyond that fact. If the message, and its attribute did not exist, neither would it. Therefore, it is better to consider the identity of attributes as being a distinct monotonically increasing set (1,2,3,....) for any given attribute.
Parenthetically, discussions of whether or not messages are better defined within the context of a user are far more complex, and generates the most common answer of any practiced architect. "It depends."
The foremost result of this scoping of identity to context is that it is no longer possible to create constructs that define paradoxical information, with respect to the systems information model. This isn't to say you can't define illegal constructs, i.e. indexing an attribute that does not exist, rather, it's that you can't have proper syntax and base semantics (the /messages/*/comments/1600 example) that allows you to create a paradox in the higher levels of the system. You can define something not within the system, but you can no longer define something paradoxical.
Other benefits accrue as well. If you consider pagination of an attribute set such as comments, this process moves from an effectively random set of keys to an ordered set of keys, vastly simplifying any sequential database operations. It effectively richens the system vocabulary, as in providing /messages/<messageId>/comments/1 as a mechanism to access the very first comment,where the query parameter extension /messages/<messageId>/comments/1?orderBy=votes would cause the REST service to return the most upvoted comment. At the end of the day, the statement that /messages/1/comments/1 is a meaningful statement incapable of paradox is the fundamental benefit. From that accrues the ability to infer (admittedly with code enforcement) that for any /messages/<message>/comments/1, this is a reference to the first comment with respect to a global default ordering of comments.
Summarizing, this is one thing that is missed too often in system architecture, the understanding that as models become more complex, it becomes easier and easier to introduce paradox into those models. It is critical to consider identity with respect to the context within which that serves as an identity. I cannot say that in an interview when handed the scenario and given an obvious expectation, however it rankles enough I finally had to put fingers to keyboard. When considering the definition of identity, the context of that identity is critically important. Nesting of identities that are not truly contained means that the system is fundamentally capable of expressing paradoxical data. Any system so defined is just waiting for something really bad to happen.
In conclusion, I'd like to once again thank Douglas Hofstadter for GEB (https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach) I read GEB early on in my career, and many times over since then. Over my entire career, in every day, I agree more and more. The interest, risk, and power of systems is directly related to their relations to the metadata that defines them for any given level and their capacity for handling legally expressible paradoxes. My experience with modern software technology says that production systems and paradoxes are disaster that is either happening or will happen.