Demystifying Generative AI with Databricks Data Intelligence Platform (Part 2)

Demystifying Generative AI with Databricks Data Intelligence Platform (Part 2)

In Continuation with "Demystifying Generative AI with Databricks Data Intelligence Platform" Part-1 which can be accessed here (https://lnkd.in/g49wPrjb), here is the second and the final part.


AI Systems

  • Compound AI Systems - An AI system that has multiple interacting components usually independent of the framework or language model to be used
  • LangChain - A composition software framework that helps to build and manage multi-stage reasoning AI systems using large language models, consists of components for building chains and agents, integrations with other tools and off-the-shelf implementations for common tasks

Components of LangChain

  • Prompt - A structured text input designed to communicate a specific task ort query to a language model, guiding it to produce the desired output
  • Chain - A sequence of automated actions or components that process a user's query and produce a model's output
  • Retriever - An interface thatreturns relevant documents or information based on an unstructured query, often used in conjunction with indexed data to enhance search and retrieval capabilities
  • Tool - A functionality or resource that an agent can activate, such as APIs, databases or custom functions, to perform specific tasks

LLamaIndex - A data framework that enhances the capabilities of LLMs by structuring and indexing data to make it easily consumable. Components include - Models, Prompts, Indexing & storing, Querying and Agents

Haystack - An open-source Python framework for building custom applications with LLMs, focusing on document retrieval, text generation, and summarization. Components include - Generators, Retrievers, Document stores and Pipelines

Databricks Foundation Model API - An API that supports accessing and querying state-of-the-art open generative AI models, it has a pay-per-token pricing model for low-throughput applications and provisioned throughput for high-throughput. Some of the supported models include DBRX Instruct, Meta Llama 3 8/70B, Mixtral-8x7B Instruct, BGE Large (English) etc.

  • DBRX - A new source open LLM by Databricks, it has 2 versions - DBRX Base and DBRX Instruct
  • DBRX Base - A pre-trained model which functions like a smart auto-complete
  • DBRX Instruct - A fine-tuned model designed to answer questions and follow instructions, built on top of DBRX by performing further training on domain-specific data and fine-tuning for instruction-following

Agents

  • Agent - An application that can execute complex tasks by using a language model to define a sequence of actions to take, the sequences are query-dependent chosen dynamically by the LLMs

Components of an Agent

  • Task - The user request through prompt to be solved
  • LLM (Brain) -The central coordination module that manages the core logic and behavioral characteristics of an agent, it is the brain of the agent.
  • Tools - External resources that the agent uses to accomplish the tasks at hand
  • Memory & Planning - Components for planning and executing the future actions

Agent Reasoning - Cognitive process by which agents draw logical conclusions and make decisions autonomously mirroring aspects of human cognitive abilities

Agent reasoning Design patterns

  • ReAct - Reason + Act - Enable models to generate verbal reasoning traces and actions. Main states used in ReAct agents are Thought (reflect on the given problem and previous actions taken), Act (choosing correct tool and input format) and Observe (evaluate the result of the action and generate next thought)
  • Tool use & Function Calling - Agents interact with external tools and APIs to perform specific tasks, decides which tools to use and when/how to use them
  • Planning - Able to dynamically adjust their goals and plans based on changing conditions
  • Multi-Agent Collaboration - Involves several agents working collaboratively, each handling different aspects of the task, allows modularization, specialized in solving specific business problems

LangChain Agents - Provides a structure for building agents that can use tools to interact with the world

AutoGPT - Provides tools to build AI agents

AutoGen - A framework that enables applications with multiple agents that can communicate with each other

Transformers Agents - Provides a natural language API for interacting with transformers

Multi-Modal AI - Models with inputs or outputs that include data types beyond text, it can include images, audios and videos

  • Multi-Modal Retrieval - Embeds all modalities (data types) in the same vector space (e.g. CLIP) OR into different vector spaces
  • Multi-Modal Generator - Enables generating responses in multiple formats (e.g. generating a story with images, GPT-4V)

LLM Security

  • Guardrails - A prompt injection risk mitigation technique where additional guidance is provided to LLMs to control it's responses
  • DASF (Data and AI Security Framework) - A security approach by organizing AI security problem with a component based approach, 12 AI system components and 55 associated risks have been identified as of now (e.g. Catalog, Algorithm, Evaluation, Model management, Operations, Platform)
  • Databricks as Security - Databricks meets the AI security needs by having a in-built tight security framework involving components such as Unity Catalog and Mosaic AI

LLM Evaluation

  • Loss (Evaluation Metrics) - Measures the difference between predictions and the truth, measures when training LLMs by how well they predict the next token
  • Perplexity - It's the model's confidence in it's predictions i.e. measuring if the model was surprised that its prediction was correct? Low perlexity = High confidence and High perlexity = Low confidence. A sharp peek in the LLMs probability distribution reflects a low perplexity
  • Toxicity - Measures the harmfulness of the responses generated by the LLMs, it identifies and flags harmful offensive or inappropriate language. Low toxicity = Low harm. It uses a pre-trained hate speech classification model
  • BLEU (BiLingual Evaluation Understudy) - Compares translated output to a reference, comparing n-gram similarities between the output and reference.
  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) - Compares summarized output to a reference, comparing n-gram similarities between the output and reference.
  • Benchmarking - Comparing models against standard evaluation datasets, LLMs are evaluated on large reference datasets (e.g. Stanford Q&A for Q&A evaluation) and your own data too.
  • LLM-as-a-Judge - Ask an existing LLM to do the evaluation for you. Uses few-shot examples with human-provided scores for more guidance, it also provides more specific instructions of what good looks like. The evaluation scale is more specific and component-based rubric. Often it utilizes prompt engineering templating.
  • Offline Evaluation - Evaluating LLMs and it's components in static and non-prod environments i.e. before deployment to prod.
  • Online Evaluation - Real-time evaluation of LLMs after it has been deployed in prod.

LLM Deployment

  • Model Flavor - A standard format for packaging machine learning models with additional metadata such as signature, input example etc. e.g. MLflow LangChain flavor, Open AI flavor, HuggingFace flavor, PyTorch, Python Function etc.
  • Unity Catalog Model Registry - A centralized model store with full-fledged model lifecycle management with versioning, it can deploy and organize models, manage ACLs, stores full model lineage, taggings and annotations.
  • Gen AI Model deployment - Process of integrating an AI model into a production environment, making it accessible for end-users or other systems to generate predictions or completions. Deployment strategies can be either batch, streaming, real-time or embedded.
  • TensorRT - A Tensorflow-friendly SDK from NVIDIA for high performing batch interface on GPUs
  • vLLM - A Transformer-friendly library for memory-efficient inference on GPUs
  • Ray on Spark - A pythonic distributed computing primitive for parallelizing and scaling python applications, can be used on AWS, Azure or GCP
  • Databricks Model Serving - A databricks native production grade model serving framework with high availability and low latency, accelerated deployments with Lakehouse-Unified serving and simplified deployment through UI or API
  • Inference Tables - Databricks delta tables used for monitoring and debugging deployed models, each request-response is appended to the table in UC. It can perform diagnostics and debugging of suspicious inferences, it can also create a dataset of mislabeled data to be re-labeled.
  • Databricks Lakehouse Monitoring - A monitoring tool for automated insights and out-of-the-box metrics on data and ML pipelines. It is a fully managed infrastructure, frictionless with easy setup and provides a unified solution for data and models for holistic understanding.
  • MLOps - Set of processes and automation for managing fata, code and models to improve peroformance, stability and long-term efficiency of ML systems. MLOps = DataOps + DevOps + ModelOps
  • LLMOps - MLOps for GenAI applications and environments with code management across different environments as well as data/system component management. Areas under LLMOps (as well as MLOps) include Dev patterns, Packaging, Serving, API Governance, Cost & Performance and Human Feedback.


#GenerativeAI #Databricks #MosaicAI #MLOps #MachineLearning #DataEngineering #TechInnovation

To view or add a comment, sign in

More articles by Mohammed Arif

Others also viewed

Explore content categories