Creating the future
Even though it's still a heavy lift, generating some code types from specifications can yield good results. I've been helping retailers and other industry verticals access and adopt data models for years. Being a proponent of Model Driven Development, it's always fun working through adoption challenges with those customers as we all tend to model reality a little differently. Several domains have data models and practices that include highly optimized and even dynamic models. One of these models is schema.org which defines over 1400 entities and data types with highly customizable and adaptable relationships. As such, I put my AI hat on and along with some web scrapers and handlers to see what kind of code I could produce.
Challenge
Using the schema.org website, locate all of the defined types, datatypes, and their hierarchy and create a set of Pydantic transfer objects that can represent the domain and serialize properly to JSON. In addition, for every class created, ensure there's a test case, and lastly ensure the documentation is good enough without being overly obvious.
Components for success
The Process
The outcome
Schema.org specifies a top level class called "Thing" that has common attributes inherited by all types. All class files below were generated using Google's Gemini. A total of 1864 classes were created, this included enumerations what will require either a better prompt or human intervention. Of those 1864 classes all second level hierarchy classes required at least one change, and less that 50 required additional changes to be valid. Of the test cases created the outcome is similar as the input data was the generated class.
class Thing(BaseModel):
"""
The most general type of item.
https://schema.org/Thing
"""
context: Optional[str] = Field(default="https://schema.org", alias='@context', description='Defines the context for the schema.')
type: Optional[str] = Field(default="Thing", alias='@type', description='Defines the schema type.')
name: Optional[str] = Field(default=None, alias='name', description='The name of the item.')
description: Optional[str] = Field(default=None, alias='description', description='A description of the item.')
identifier: Optional[Union[str, float, AnyUrl]] = Field(default=None, alias='identifier', description='The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, of course. In schema.org usage, generally only one identifier is allowed per Thing, but identifying code systems can be represented using multiple identifier properties. ')
image: Optional[Union[AnyUrl, "ImageObject"]] = Field(default=None, alias='image', description='An image of the item. This can be a URL or a fully described ImageObject.')
url: Optional[AnyUrl] = Field(default=None, alias='url', description='URL of the item.')
sameAs: Optional[AnyUrl] = Field(default=None, alias='sameAs', description='URL of a reference Web page that unambiguously indicates the item\'s identity. E.g. the Wikipedia page, Wikidata entry, or official website.')
subjectOf: Optional[Union["CreativeWork", "Event"]] = Field(default=None, alias='subjectOf', description='A subject of the item. Inverse property: subject')
potentialAction: Optional["Action"] = Field(default=None, alias='potentialAction', description='Indicates a potential Action, which describes an idealized action in which this thing would play an \'object\' role.')
mainEntityOfPage: Optional[Union["CreativeWork", AnyUrl]] = Field(default=None, alias='mainEntityOfPage', description='Indicates a page (or other CreativeWork) for which this thing is the main entity being described. See background notes for details. Inverse property: mainEntity')
additionalType: Optional[AnyUrl] = Field(default=None, alias='additionalType', description='An additional type for the item, typically used for adding more specific types from external vocabularies in microdata syntax. This is a more general alternative to 'type' property, which requires full agreement on the type\'s definition. In a schema.org context, the type property always refers to the schema.org type, whereas additionalType allows for more general use.')
alternateName: Optional[str] = Field(default=None, alias='alternateName', description='An alias for the item.')
Here, you can see that based on the rules given it created the class and all of the forward references to the other schema.org types.
Recommended by LinkedIn
Next, we see Action, action is-a Thing and has additional attributes.
class Action(Thing):
"""
An action performed by a direct agent and indirect participants upon a direct object. Optionally happens at a location with the help of an inanimate instrument.
Agents and objects can be Organizations as well as people.
The motivation behind the action is captured by the.
https://schema.org/Action
"""
actionStatus: Optional[Union[AnyUrl, "ActionStatusType"]] = Field(default=None, alias="actionStatus", description="Indicates the current disposition of the Action.")
agent: Optional[Union["Person", "Organization"]] = Field(default=None, alias="agent", description="The agent causing the action.")
endTime: Optional[str] = Field(default=None, alias="endTime", description="The endTime of something. For a reserved event or service (e.g. FoodEstablishmentReservation), the time that it is expected to end. For actions that span a period of time, when the action was performed. e.g. John moved to Seattle on 2015-03-01.")
error: Optional["Thing"] = Field(default=None, alias="error", description="For failed actions, more information on the cause of the failure.")
instrument: Optional["Thing"] = Field(default=None, alias="instrument", description="The object that helped the agent perform the action. e.g. John wrote a book with a pen.")
location: Optional[Union["Place", "PostalAddress", str]] = Field(default=None, alias="location", description="The location of for example where an event is happening, an organization is located, or an action takes place.")
object: Optional["Thing"] = Field(default=None, alias="object", description="The object upon which the action is carried out, whose state is changed or where it is being directed. The object is the thing the action acts upon, not necessarily when the action is performed on (multiple) other objects.")
participant: Optional[Union["Organization", "Person"]] = Field(default=None, alias="participant", description="Other participants that may be involved in the action; e.g. when a supporting role was played in the action.")
result: Optional["Thing"] = Field(default=None, alias="result", description="The result produced in the action. e.g. John wrote a book.")
startTime: Optional[str] = Field(default=None, alias="startTime", description="The startTime of something. For a reserved event or service (e.g. FoodEstablishmentReservation), the time that it is expected to start. For actions that span a period of time, when the action was performed. e.g. John moved to Seattle on 2015-03-01.")
target: Optional["EntryPoint"] = Field(default=None, alias="target", description="Indicates a target EntryPoint for an Action.")
Observations
Even though the above output compiles and is a representation of good output, not all output is equal, nor consistent.
@Field(alias='@type', default="BioChemEntity")
type: str
Here for example even though several other classes describe the "type" field correctly, we still see errors like this where the LLM injects incorrect code.
Other such examples are forward refs instead of direct refs or vice versa. Overall the process become increasingly difficult as I realized what had to be done outside of the LLM due to logic limitations. As these continue to improve with thinking models and higher performance non-cached models I hope to see this complexity reduce. Until then I'm continuing to build out such examples to help developers adapt to the change in "what is a developer."
My last observation is that getting our models to perform takes an incredible amount of domain knowledge still. Building prompts and instruction sets defined well enough to create reproducible results isn't a small feet and can often be just as time consuming as writing the code by hand.
Conclusion
Models are continuing to improve but have several generations to go before we'll be creating fully functional apps or frameworks. These continue to get better but the complexity may not always make this the first choice of action. 2025 is going to continue rocking the world of companies and developers and as long as we continue to grow with, test and learn through these changes, the possibilities are granting more capabilities to those who learn the toolsets instead of those who deny them.
#Google #GoogleGemini #VertexAI #Python #GenAI #GenerativeAI #CodeGen