Databricks meets Salesforce Data Cloud — and the integration is more powerful than most people realize
Most enterprises store their richest data — transactions, behavior, IoT, ML models — in a data lake. But their service agents, marketers, and AI tools only see what's inside Salesforce.
That gap is now closed. Three native ingestion patterns connect Databricks directly to Salesforce Data Cloud — no MuleSoft, no custom pipelines.
⚡ Pattern 01 — Ingestion API: Streaming Data Cloud Setup → Ingestion API → Streaming Pattern
Events pushed into Data Cloud as JSON micro-batches via REST API. Processed every ~3 minutes using OAuth via a Connected App. A Data Lake Object (DLO) is auto-created on first run.
📍 Real example: A customer completes a purchase on your website. Your checkout system pushes that event to Salesforce immediately. 30 seconds later, a service agent opens the customer record and already sees the purchase — before the customer even explains why they're calling.
Best for: Live events, clickstream, CDC, webhooks
Latency: ~3 minutes
Copies data: Yes
Complexity: Your system needs to call Salesforce APIs
🔄 Pattern 02 — Ingestion API: Bulk Data Cloud Setup → Ingestion API → Bulk Pattern
CSV-based batch ingestion via the same Ingestion API connector. Runs on a schedule you define — daily, weekly, or monthly. Handles large historical datasets with incremental updates after the first full load.
📍 Real example: You have 500,000 customer records sitting in Databricks — purchase history, loyalty scores, product preferences. Every night at 2 AM, Bulk ingestion pulls them all into Data Cloud. By morning, your marketing team has a fully updated customer view ready to segment and activate.
Best for: Historical data, backfills, master data, legacy exports
Latency: Hours
Copies data: Yes
Complexity: Set it up once, then it runs automatically
Note: Patterns 01 and 02 use the same Ingestion API connector in Data Cloud Setup — Salesforce designed them as two modes of one unified API.
Recommended by LinkedIn
🔍 Pattern 03 — Zero Copy Data Federation Data Cloud Setup → Other Connectors → Databricks Connector
Data Cloud sends a live SQL query directly to Databricks. Nothing is copied or stored in Data Cloud. Supports two variants: File Federation (reads Iceberg files directly at the storage layer — ideal for massive volumes) and Query Federation (SQL push-down to a Databricks SQL Warehouse).
📍 Real example: A service agent opens a customer record. They click "Show transaction history." Instead of a stale nightly copy, Data Cloud asks Databricks live: "Give me the last 10 transactions for customer ID 12345." Databricks queries its billion-row table and returns the results in under a second. The agent sees live data. No pipeline ran. No data was duplicated. No storage cost in Data Cloud.
Best for: Massive datasets, live lookups, cost optimization, single source of truth
Latency: Under 1 second
Copies data: No
Complexity: Setup once, then queries happen automatically when agents need data
The bigger picture
These three patterns aren't mutually exclusive. Most enterprises use all of them together:
The result is a unified customer profile in Data Cloud that draws from your entire data lake — in real time, at scale, without the ETL.
Your Agentforce agents become smarter. Your marketers segment on richer data. Your service agents see the complete customer picture. And your data team stops maintaining fragile pipelines.
Full architecture diagram and technical deep-dive in the article below 👇
#SalesforceDataCloud #Databricks #DataArchitecture #Agentforce #ZeroCopy #DataEngineering #DigitalTransformation