Edge Computing Deployment

Explore top LinkedIn content from expert professionals.

Summary

Edge computing deployment refers to installing and running computing resources—such as AI models and processing systems—closer to where data is created, like sensors, cell towers, or devices, rather than relying solely on distant cloud servers. This shift allows for faster data processing, greater privacy, and improved efficiency for real-world applications across industries.

Prioritize real-time response: Deploy systems that process and analyze data locally to enable instant decision-making and minimize delays from cloud communication.
Design for resilience: Build edge solutions that keep working even when internet connectivity fails or power is limited, ensuring consistent performance in challenging environments.
Integrate smart transmission: Configure devices to send data only when necessary, saving battery life and reducing network load, especially in industrial and remote monitoring scenarios.

Summarized by AI based on LinkedIn member posts

Nick Tudor

CEO/CTO & Co-Founder, Whitespectre | Advisor | Investor

13,870 followers 3mo
Report this post
I've witnessed incredible Edge AI projects stumble, not because of a lack of groundbreaking models, but because they neglected the gritty reality of real-world deployment. It's tempting to focus solely on the intelligence, but the true differentiator lies in building systems that are reliably efficient and adapt seamlessly under pressure. After years of shipping IoT solutions, I've distilled these 15 non-negotiable design principles. These aren't just technical considerations; they're lessons learned from countless hours of debugging, optimizing, and ensuring devices perform flawlessly where it matters most: ➞ 1. Latency-First Design: Prioritize real-time decision-making. Edge AI exists to respond fast, not wait for the cloud. ➞ 2. Offline-First Reliability: Design assuming the internet will fail. Systems must continue to work locally. ➞ 3. Compute-Aware Models: Choose models that fit device limits (CPU/GPU/NPU), not the latest hype. ➞ 4. Memory Efficiency: Optimize RAM and storage usage to avoid crashes and ensure stable performance. ➞ 5. Power-Aware Inference: Respect battery and energy constraints at all times, especially in mobile or remote environments. ➞ 6. Thermal Stability: Heat reduces performance - design for throttling and harsh environmental conditions. ➞ 7. Data Filtering at Source: Don’t send raw streams. Filter, compress, and extract features locally before transmitting. ➞ 8. Event-Driven Processing: Trigger AI actions only when needed (state or threshold changes) to save compute cost. ➞ 9. Model Compression: Use quantization, pruning, and distillation to shrink models for edge devices. ➞ 10. Edge-to-Cloud Sync Strategy: Sync only what matters - summaries, learned insights, or anomalies. ➞ 11. Human Override Safety: For critical systems, always include manual control and an emergency kill switch. ➞ 12. Secure Device Identity: Each device must have strong authentication, certificates, and trust verification. ➞ 13. OTA Update Discipline: Enable safe over-the-air model updates with rollback and version control. ➞ 14. Fleet Observability: Monitor latency, drift, and device performance across the entire fleet in real time. ➞ 15. Continuous Drift Monitoring: Edge environments evolve - detect data drift and retrain models proactively. The projects that truly win in Edge AI aren't just powerful; they're fast, resilient, and adaptive. These principles are how you build systems that think locally, act instantly, and scale globally with confidence. 🔁 Repost if you're building for the real world, not just connected demos. ➕ Follow Nick Tudor for more insights on AI + IoT that actually ship.
No more previous content

No more next content
46 Comments
Like Comment
Amin Shad

Founder | CEO | Visionary Physical AI and IIoT Technologist | Connecting the Dots to Solve Big Problems

11,627 followers 11mo
Report this post
Edge capability and conditional transmission ... How edge computing on LPWAN devices extends the battery life by factor of 4 As industrial IoT systems continue to scale across critical infrastructure—pipelines, reservoirs, remote assets, and urban utilities—one question persists across all engineering teams: "How do we make the device smarter without draining the battery faster or make the firmware more complex?" The answer is not in more power—it’s in more intelligence at the edge. > What Is #EdgeCapability in #LPWAN Devices? Edge capability refers to the ability of the device to process and analyze data locally, before deciding whether to transmit it over the network. This is a critical advancement in the design of battery-powered LPWAN devices—whether #LoRaWAN, #NB-IoT, or #LTE-M. Instead of blindly transmitting data at fixed intervals, smart edge devices evaluate conditions such as: - Threshold violations (e.g., pressure above X bar) - Anomalous patterns (e.g., sudden temperature spike) - Predictive failure signals (via trend detection) Only when action is needed, do they transmit. > Why Conditional Transmission Changes the Game Let’s take a real-world example from our deployments at Ellenex: - Scenario A: Traditional Mode Transmit every 15 minutes (fixed schedule) 96 transmissions/day Average battery life: < 1 year - Scenario B: Edge Mode with Conditional Transmission Sample every 5 minutes Transmit only when threshold conditions are met or at max once per day 1–5 transmissions/day depending on conditions Average battery life: 3.5–4 years By eliminating unnecessary network sessions, power-hungry radio activations, and overhead from MAC layer interactions, energy usage drops dramatically. > Implications for Industrial Use Cases Water Utilities can detect leaks without flooding the network with data. Smart Agriculture devices react only to critical soil moisture levels, not morning dew. Asset Monitoring for pressure, level, vibration, or flow becomes cost-effective in remote areas. And most importantly: maintenance intervals are extended dramatically. Battery replacements become rare events, not monthly line items. > What This Means for Product Designers When we design LPWAN devices at Ellenex, edge intelligence is not optional—it’s a core requirement. Every mA-hour counts. We, at Ellenex Industrial IoT, design products with: - Smart wakeup logic - Configurable edge thresholds - Modular firmware to enable OTA updates of local logic Because the edge is not just about faster insights—it’s about operational viability. Final Thought Nowadays, data is only valuable when it's actionable—and battery life is only long when data knows when not to leave the device. Edge capability + conditional transmission provides longer life, smarter systems, and scalable deployments. If you're still pushing data every 15 minutes—it is time to re-think 🤔 . #monitoring #IoT #ellenex #EdgeComputing #LPWAN #batterylife
No more previous content

No more next content
76 Comments
Like Comment
Pradeep Sanyal

AI Leader | Scaling AI from Pilot to Production | Chief AI Officer | Agentic Systems | AI Operating model, Governance, Adoption

22,231 followers 7mo Edited
Report this post
𝐀𝐈 𝐢𝐬 𝐬𝐡𝐢𝐟𝐭𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐜𝐥𝐨𝐮𝐝 𝐭𝐨 𝐭𝐡𝐞 𝐞𝐝𝐠𝐞 𝐚𝐧𝐝 𝐆𝐨𝐨𝐠𝐥𝐞 𝐣𝐮𝐬𝐭 𝐦𝐚𝐝𝐞 𝐭𝐡𝐚𝐭 𝐫𝐞𝐚𝐥. With the release of 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐆𝐞𝐦𝐦𝐚, Google introduced a 308M parameter multilingual embedding model that runs in under 200MB of RAM and delivers state-of-the-art results. It is compact, fast, and designed to live directly on your device. This is not just about another benchmark win. It signals a bigger change: • AI that runs offline, privately, and instantly • Models that no longer need to send sensitive data to external servers • Applications that adapt to the user, rather than relying on cloud calls For enterprises, this means RAG systems that can analyze contracts, financial records, or patient notes without ever leaving secure environments. For individuals, it means assistants that search your personal files and knowledge locally, without leaking data. For devices, it means IoT and industrial sensors that interpret events on-site, in real time. And Google didn’t just release a model. They built an ecosystem around it. Seamless Integration Ecosystem EmbeddingGemma already plugs into the tools developers actually use: • Sentence Transformers for direct embeddings • LangChain and LlamaIndex for building RAG pipelines • Ollama, LMStudio, llama.cpp for local inference • Transformers.js for browser-based apps • MLX for optimized performance on Apple Silicon This is not about showing off new benchmarks. It is about making AI systems easier to build, scale, and deploy. This matters because it lowers the friction for adoption. Developers can pull EmbeddingGemma into existing workflows with minimal change, which accelerates experimentation and real-world deployment. 𝐓𝐡𝐞 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐜 𝐭𝐚𝐤𝐞𝐚𝐰𝐚𝐲: 𝐞𝐝𝐠𝐞-𝐟𝐢𝐫𝐬𝐭 𝐀𝐈 𝐢𝐬 𝐡𝐞𝐫𝐞. Instead of shipping your data to the model, the model comes to your data. That shift unlocks privacy, speed, and regulatory control while making AI more practical for everyday use. For leaders, the signal is clear: AI infrastructure is shifting from closed experiments to open, modular building blocks. That means lower lock-in, faster adoption, and a faster path from proof-of-concept to value. Ignore this, and you’ll still be waiting for vendors to catch up. Act on this, and you can build systems that scale ahead of the market. 𝐁𝐞𝐜𝐚𝐮𝐬𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐧𝐞𝐱𝐭 𝐰𝐚𝐯𝐞, 𝐚𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞 𝐰𝐨𝐧’𝐭 𝐜𝐨𝐦𝐞 𝐟𝐫𝐨𝐦 𝐦𝐨𝐝𝐞𝐥𝐬. 𝐈𝐭 𝐰𝐢𝐥𝐥 𝐜𝐨𝐦𝐞 𝐟𝐫𝐨𝐦 𝐡𝐨𝐰 𝐟𝐚𝐬𝐭 𝐲𝐨𝐮 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐞 𝐭𝐡𝐞𝐦 𝐢𝐧𝐭𝐨 𝐫𝐞𝐚𝐥 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰𝐬. 🔔 Follow for commentary at the intersection of AI, technology leadership, and business outcomes.
No more previous content

No more next content
5 Comments
Like Comment
Brian Newman

Helping Leaders Navigate AI, 5G, and 6G | Strategic Advisor | 25K+ Students | Online Educator | Simplifying Emerging Tech for Real-World Impact

7,410 followers 2w
Report this post
Your cell tower is no longer just a signal relay. It is now an AI inference engine. NVIDIA and T-Mobile announced this week that they are deploying physical AI workloads directly on AI-RAN infrastructure, with Nokia's anyRAN software running on NVIDIA compute at cell sites and mobile switching offices. The headline framing calls this edge compute. The strategic reality is different. What is actually happening is a fundamental restructuring of where compute lives. For the past decade, the cloud hyperscalers owned the AI inference stack. Every smart application phoned home to a data center. AI-RAN flips that model. The radio access network becomes a distributed compute layer, and the cell site becomes a node in a national AI fabric. T-Mobile is the first US carrier to operationalize this. The pilot use cases include computer vision for traffic management in San Jose and autonomous drone inspection of power lines, both achieving roughly five times faster response than cloud-routed alternatives. For operators, this changes the unit economics of the cell site investment. A tower that generates only connectivity revenue has one ROI model. A tower that also monetizes compute workloads for smart cities, utilities, and logistics has a different one entirely. For investors, the question shifts from 'which carrier wins on 5G coverage' to 'which carrier builds the better edge AI platform.' Those are not the same race. Where do you see edge compute creating the most durable operator revenue over the next five years? #AI #5G #ORAN #Telecom #NetworkStrategy #TechInvesting

1 Comment
Like Comment
Paul Golding

VP, Edge & Enterprise AI | Physical Intelligence | Scaling real-world intelligent systems from silicon to deployment | Robotics & Industrial AI

5,285 followers 1y
Report this post
Paper Title: "Multiscale echo self-attention memory network for multivariate time series (TS) classification" Whilst consulting for ThousandEyes (Cisco), our team explored TS techniques for anomaly detection, especially under statistically-contaminated constraints (with multiple modes, common with network measurement stratification). A basis for featurization was the t-digest (robust quantile estimation, as used in elasticsearch, for example). Now that I have turned my attention to edge ("far edge", or "sensor edge" -- ultra-low power), the constraints are more severe. Here my interest turned to "reservoir computing" (in the form of Echo-State Networks [ESN]). This paper addresses a critical challenge in edge computing - how to efficiently process time-series sensor data with limited computational resources while maintaining high accuracy. The authors' combine ESNs with self-attention mechanisms, offering an architecture valuable for resource-constrained edge devices that need to classify complex sensor inputs. ESNs' fixed-weight training enables minimal parameter updates - crucial for edge deployment. Multi-head attention mechanism shows potential for edge optimization through pruning/quantization. Strong performance on multimodal fusion (96.79% on UTD-MHAD combining depth and inertial data) suggests viability for edge NPU deployment. Sensor Fusion Perspective: The architecture naturally handles varying sensor sampling rates and missing data while capturing temporal dependencies across modalities. The key innovation demonstrates that attention mechanisms, typically computationally expensive, can be efficiently combined with ESNs for high-accuracy sensor fusion at the edge. This represents a practical advance for deploying sophisticated sensor fusion algorithms on power-constrained edge devices.
No more previous content

No more next content
5 Comments
Like Comment
Ahmed GabAllah

Fix Pipeline & Deal Slippage | Revenue Control | Sales Execution Owner

19,238 followers 10mo
Report this post
𝗜𝗻𝘀𝗶𝗱𝗲 𝘁𝗵𝗲 𝗜𝗻𝗱𝘂𝘀𝘁𝗿𝗶𝗮𝗹 𝗘𝗱𝗴𝗲 𝗦𝘁𝗮𝗰𝗸 𝗧𝗟;𝗗𝗥 𝘈 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘵 𝘦𝘥𝘨𝘦 𝘴𝘵𝘢𝘤𝘬 𝘪𝘴 𝘢 3-𝘭𝘢𝘺𝘦𝘳 𝘦𝘯𝘨𝘪𝘯𝘦. 𝘚𝘦𝘯𝘴𝘰𝘳𝘴 𝘧𝘦𝘦𝘥 𝘥𝘢𝘵𝘢 𝘪𝘯𝘵𝘰 𝘩𝘢𝘳𝘥𝘦𝘯𝘦𝘥 𝘮𝘪𝘤𝘳𝘰-𝘤𝘰𝘮𝘱𝘶𝘵𝘦 𝘯𝘰𝘥𝘦𝘴. 𝘓𝘰𝘤𝘢𝘭 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦 𝘵𝘳𝘪𝘨𝘨𝘦𝘳𝘴 𝘰𝘱𝘵𝘪𝘮𝘪𝘴𝘢𝘵𝘪𝘰𝘯 𝘢𝘱𝘱𝘴 𝘪𝘯 𝘮𝘪𝘭𝘭𝘪𝘴𝘦𝘤𝘰𝘯𝘥𝘴. 𝘛𝘩𝘦 𝘳𝘦𝘴𝘶𝘭𝘵 𝘪𝘴 𝘱𝘳𝘦𝘴𝘤𝘳𝘪𝘱𝘵𝘪𝘷𝘦 𝘤𝘰𝘯𝘵𝘳𝘰𝘭 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘵𝘩𝘦 𝘤𝘭𝘰𝘶𝘥 𝘱𝘦𝘯𝘢𝘭𝘵𝘺. 𝘌𝘥𝘨𝘦-𝘈𝘐 𝘴𝘪𝘭𝘪𝘤𝘰𝘯 𝘪𝘴 𝘨𝘳𝘰𝘸𝘪𝘯𝘨 𝘢𝘵 𝘮𝘰𝘳𝘦 𝘵𝘩𝘢𝘯 𝟮𝟭% 𝘢 𝘺𝘦𝘢𝘳 𝘢𝘯𝘥 𝘸𝘪𝘭𝘭 𝘳𝘦𝘢𝘤𝘩 𝟲𝟲 𝗯𝗶𝗹𝗹𝗶𝗼𝗻 𝗨𝗦𝗗 𝘣𝘺 2030 (𝘙𝘦𝘴𝘦𝘢𝘳𝘤𝘩 𝘢𝘯𝘥 𝘔𝘢𝘳𝘬𝘦𝘵𝘴, 2024; 𝘖𝘮𝘥𝘪𝘢 𝘌𝘥𝘨𝘦-𝘈𝘐 𝘛𝘳𝘢𝘤𝘬𝘦𝘳, 2024). Think of it as a MECE “𝟯-𝗟𝗮𝘆𝗲𝗿 𝗘𝗱𝗴𝗲 𝗦𝘁𝗮𝗰𝗸”: Sense and Pre-process, Local Inference, Optimise and Act. In most plants the production line, the network, and the analytics platform grew up separately. That leads to chokepoints. An industrial edge stack addresses this mismatch by aligning hardware, software, and decision logic around a single, local loop. 𝗟𝗮𝘆𝗲𝗿 𝟭. 𝗦𝗲𝗻𝘀𝗲 𝗮𝗻𝗱 𝗽𝗿𝗲-𝗽𝗿𝗼𝗰𝗲𝘀𝘀 Industrial sensors and PLC signals are normalised at source. Noise filtering and lightweight compression keep payloads small. This reduces bandwidth by up to 𝟰𝟬% before data even leaves the cabinet. 𝗟𝗮𝘆𝗲𝗿 𝟮. 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗮𝘁 𝘁𝗵𝗲 𝗻𝗼𝗱𝗲 Ruggedised GPUs and NPUs run containerised models within 𝟯 𝗳𝗲𝗲𝘁 of the asset. New edge-AI chips now deliver 𝟮𝟱 𝘁𝗲𝗿𝗮-𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀 𝗽𝗲𝗿 𝘀𝗲𝗰𝗼𝗻𝗱 at 𝟭𝟱 𝘄𝗮𝘁𝘁𝘀, turning vibration or vision streams into anomaly scores in under 𝟱𝟬 𝗺𝗶𝗹𝗹𝗶𝘀𝗲𝗰𝗼𝗻𝗱𝘀. 𝗟𝗮𝘆𝗲𝗿 𝟯. 𝗢𝗽𝘁𝗶𝗺𝗶𝘀𝗲 𝗮𝗻𝗱 𝗮𝗰𝘁 A container orchestration layer hosts prescriptive apps that translate scores into set-point adjustments, work-order triggers, or safety interlocks. Because the control logic sits on the same node, action latency remains below 𝟭𝟬𝟬 𝗺𝗶𝗹𝗹𝗶𝘀𝗲𝗰𝗼𝗻𝗱𝘀, even during network congestion. FirstStep.ai deploys this architecture in packaging, refining, and discrete-assembly sites. In a recent fast-moving consumer-goods plant, local inference reduced micro-stoppages by 𝟵% and pushed OEE above 𝟵𝟬 (FirstStepAI field deployment data, 2025). The stack paid for itself in 𝟭𝟬 𝗺𝗼𝗻𝘁𝗵𝘀 at an electricity price of 𝟬.𝟬𝟴 𝗨𝗦𝗗 per kilowatt-hour. A modular edge stack also simplifies governance. Data-sovereignty rules are met because raw streams never leave the site. Security is reinforced by a zero-trust mesh across every node, and firmware updates travel through signed containers only. If your current setup still routes high-frequency data to a distant data centre, you are bearing an unseen latency cost. How modular and close-coupled is your own stack? In the next post, I will outline how the numbers translate into hard return on investment when edge meets OEE. #EdgeComputing #IndustrialAI #SmartManufacturing #IIoT #FirstStepAI
No more previous content

No more next content
103 Comments
Like Comment
Jaswindder Kummar

Engineering Director | Cloud, DevOps & DevSecOps Strategist | Security Specialist | Published on Medium & DZone | Hackathon Judge & Mentor

22,774 followers 6mo
Report this post
𝐌𝐨𝐬𝐭 𝐀𝐈 𝐩𝐫𝐨𝐣𝐞𝐜𝐭𝐬 𝐟𝐚𝐢𝐥 𝐧𝐨𝐭 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐭𝐡𝐞 𝐦𝐨𝐝𝐞𝐥 𝐢𝐬 𝐛𝐚𝐝… 𝐛𝐮𝐭 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐭𝐡𝐞𝐲 𝐧𝐞𝐯𝐞𝐫 𝐦𝐚𝐤𝐞 𝐢𝐭 𝐭𝐨 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧. You can build the smartest model in the world but if you do not know how to deploy it efficiently, it will never deliver business value. And here is the thing most engineers do not realize: There is no “one-size-fits-all” deployment. The right choice depends on latency, scale, and where your users are. 𝐇𝐞𝐫𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞 𝟒 𝐦𝐚𝐣𝐨𝐫 𝐰𝐚𝐲𝐬 𝐭𝐨 𝐝𝐞𝐩𝐥𝐨𝐲 𝐀𝐈 𝐦𝐨𝐝𝐞𝐥𝐬 𝐚𝐧𝐝 𝐰𝐡𝐞𝐧 𝐭𝐨 𝐮𝐬𝐞 𝐞𝐚𝐜𝐡: 𝟏. 𝐁𝐚𝐭𝐜𝐡 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭: 𝐁𝐞𝐬𝐭 𝐟𝐨𝐫 𝐧𝐨𝐧-𝐭𝐢𝐦𝐞-𝐜𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐭𝐚𝐬𝐤𝐬 * Predictions are generated in bulk at scheduled intervals (like once a day). * The model processes large amounts of data offline and stores results in a database. * The backend then pulls predictions when needed. * Use case: Fraud detection summaries, daily demand forecasting, or churn predictions. * Why it matters: Simple, cost-effective, and highly scalable but not ideal for real-time use. 𝟐. 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭: 𝐖𝐡𝐞𝐧 𝐬𝐩𝐞𝐞𝐝 𝐢𝐬 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠 * The model serves predictions instantly as requests come in. * The backend calls the ML service directly and delivers results in milliseconds. * Use case: Recommendation systems, credit scoring, chatbots, or personalized search. * Why it matters: It powers real-time decision-making but requires robust infrastructure and low latency networks. 𝟑. 𝐒𝐭𝐫𝐞𝐚𝐦𝐢𝐧𝐠 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭: 𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐝𝐚𝐭𝐚, 𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐬 * Ideal for scenarios where data arrives in streams (like sensors or live events). * A prediction queue handles requests asynchronously, while the model processes data on the fly. * Use case: Stock price prediction, IoT analytics, fraud detection on live transactions. * Why it matters: Delivers near-real-time predictions while managing fluctuating data flow efficiently. 𝟒. 𝐄𝐝𝐠𝐞 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭: 𝐀𝐈 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐜𝐥𝐨𝐮𝐝 * The model runs directly on the device (like a phone or IoT sensor), close to the data source. * Reduces latency, improves privacy, and works even without internet. * Use case: Smart home devices, autonomous vehicles, healthcare wearables. * Why it matters: Critical for scenarios where cloud connectivity is limited or latency must be near-zero. The real skill for an AI engineer is not just training models it is knowing how and where to deploy them so they solve real-world problems. 𝐖𝐡𝐢𝐜𝐡 𝐨𝐟 𝐭𝐡𝐞𝐬𝐞 𝐝𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐭𝐲𝐩𝐞𝐬 𝐝𝐨 𝐲𝐨𝐮 𝐮𝐬𝐞 𝐦𝐨𝐬𝐭 𝐨𝐟𝐭𝐞𝐧 𝐢𝐧 𝐲𝐨𝐮𝐫 𝐩𝐫𝐨𝐣𝐞𝐜𝐭𝐬? ♻️ Repost this to help your network learn AI deployment ➕ Follow Jaswindder Kummar for more practical AI engineering insights #MachineLearning #AIDeployment #MLOps #AIEngineering #CloudComputing #EdgeAI
No more previous content

No more next content
35 Comments
Like Comment
Prashant Rathi

Principal Architect at McKinsey | AI and GenAI Architect | LLMOps | Cloud and DevOps Leader | Speaker and Mentor

25,678 followers 3mo
Report this post
𝐓𝐡𝐞 𝐀𝐈 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐃𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐓𝐫𝐞𝐞 𝐓𝐞𝐚𝐦𝐬 𝐒𝐤𝐢𝐩 AI Deployment Architecture is not about picking the "BEST" option, it is about Matching your Requirements to the Right Pattern. 𝐇𝐞𝐫𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞 𝟒 𝐀𝐈 𝐌𝐨𝐝𝐞𝐥 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐌𝐨𝐝𝐞𝐬: 𝟏. 𝐁𝐚𝐭𝐜𝐡 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 • Latency: Minutes to hours acceptable • Scalability: High throughput, schedule-based • Best fit: Offline decisions, reports, scoring at scale • Architecture: User requests predictions → Backend pulls from DB → ML service runs daily batch → Serves pre-computed results • When to use: Your predictions don't need to be real-time, and you can afford to compute them in advance 𝟐. 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 • Latency: Sub-100ms to few seconds • Scalability: Auto-scaling required for spikes • Best fit: User-facing and decision-time predictions • Architecture: User request → Backend → ML service computes on-demand → Pulls features from DB → Returns prediction • When to use: Predictions must happen at request time and can't be pre-computed 𝟑. 𝐒𝐭𝐫𝐞𝐚𝐦𝐢𝐧𝐠 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 • Latency: Near real-time, asynchronous • Scalability: Handles bursty, event-driven loads • Best fit: Event-driven and continuous prediction flows • Architecture: Events trigger prediction requests → Queues manage load → ML service processes asynchronously → Results stored and served when ready • When to use: You have continuous data streams and can tolerate slight delays for predictions 𝟒. 𝐄𝐝𝐠𝐞 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 • Latency: Ultra-low, local execution • Scalability: Scales by device distribution • Best fit: Mobile, IoT, privacy-sensitive use cases • Architecture: Local ML model on device → Processes data locally → Optional backend sync for updates • When to use: Network latency is unacceptable or data privacy requires local processing The decision tree most teams skip: • Can predictions be pre-computed? → Batch • Need instant results at request time? → Real-Time • Processing continuous event streams? → Streaming • Network/privacy constraints? → Edge What I see teams get wrong: They build real-time infrastructure for batch workloads because "we might need real-time later." The operational complexity and cost difference is 10x. Or they deploy everything to the cloud when edge deployment would solve latency and privacy issues simultaneously. My advice: Start with the simplest architecture that meets your SLA. Batch is simpler than real-time. Real-time is simpler than streaming. Cloud is simpler than edge. Complexity is easy to add and painful to remove. Which deployment pattern are you currently using? Is it the right one for your actual requirements? ♻️ Repost this to help your network get started ➕ Follow Prashant Rathi for more PS. Opinions expressed are my own in a personal capacity and do not represent the views, policies, or positions of my employer (currently McKinsey & Company) or affiliates. #GenAI #EnterpriseAI #AgenticAI
No more previous content

No more next content
89 Comments
Like Comment
David Aronchick

9,788 followers 6mo
Report this post
We're spending $$$$$$/year on cloud egress fees to move ML models and data that could fit in a text message. I think the folks in the WebAssembly 3.0 community just shipped part of the solution. A 1.1GB transformer model becomes 7GB once you add Python, PyTorch, and CUDA. The actual inference code? 2MB. We solved "works on my machine" by shipping the entire machine with every deployment. For edge AI, this is backwards. WebAssembly 3.0 just changed the game: → 64-bit memory (no more 4GB limit) → Garbage collection (Python/Java/Go all compile to WASM now) → Sub-millisecond cold starts vs. 30-second container spin-ups → Runs in browsers, edge devices, serverless - everywhere containers can't Real numbers: - TinyGo service: 1.1MB → 377KB - Cold starts: seconds → sub-millisecond - Model updates: 4GB container rebuild → 500KB push No hate on containers - use them for what they're great at! Containers for environments. WASM for computation. The infrastructure we need isn't always what we expect. #MachineLearning #MLOps #EdgeComputing #WebAssembly #DataEngineering

4 Comments
Like Comment

Edge Computing Deployment

Summary

More in Cloud Application Deployment

Explore categories