You're a #CTO. Your board asks: "What's our ROI on AI coding tools?" Your answer: "40% of our code is AI-generated!" They respond: "So what? Are we shipping faster? Are customers happier?" Most CTOs are measuring AI impact completely wrong. Here's what some are tracking: - Percentage of AI-generated code - Developer hours saved per week - Lines of code produced - AI tool adoption rates These metrics are like measuring how fast your assembly line workers attach parts while ignoring whether your cars actually start. Here's what you SHOULD measure instead: 1. Delivered business value 2. Customer cycle time 3. Development throughput 4. Quality and reliability 5. Total cost of delivery (not just development) 6. Team satisfaction Software development isn't a typing competition—it's a complex system. If AI makes your developers 30% faster but your deployment takes 2 weeks and QA adds another week, your customer delivery improves by maybe 7%. You've speed up the wrong part. The solution: A/B test your teams. Give half your teams AI tools, measure business outcomes over 2-3 release cycles. Track what customers actually experience, not how much developers produce. Companies that measure business impact from AI will pull ahead. Those measuring vanity metrics will wonder why their expensive tools aren't moving the needle. Stop measuring how much code AI generates. Start measuring how much faster you deliver value to customers. What are you actually measuring? And is it moving your business forward? -> Follow me for more about building great tech organizations at scale. More insights in my book "All Hands on Tech"
Optimizing Technology Spending
Explore top LinkedIn content from expert professionals.
-
-
You don't need a 2 trillion parameter model to tell you the capital of France is Paris. Be smart and route between a panel of models according to query difficulty and model specialty! New paper proposes a framework to train a router that routes queries to the appropriate LLM to optimize the trade-off b/w cost vs. performance. Overview: Model inference cost varies significantly: Per one million output tokens: Llama-3-70b ($1) vs. GPT-4-0613 ($60), Haiku ($1.25) vs. Opus ($75) The RouteLLM paper propose a router training framework based on human preference data and augmentation techniques, demonstrating over 2x cost saving on widely used benchmarks. They define the problem as having to choose between two classes of models: (1) strong models - produce high quality responses but at a high cost (GPT-4o, Claude3.5) (2) weak models - relatively lower quality and lower cost (Mixtral8x7B, Llama3-8b) A good router requires a deep understanding of the question’s complexity as well as the strengths and weaknesses of the available LLMs. Explore different routing approaches: - Similarity-weighted (SW) ranking - Matrix factorization - BERT query classifier - Causal LLM query classifier Neat Ideas to Build From: - Users can collect a small amount of in-domain data to improve performance for their specific use cases via dataset augmentation. - Can expand this problem from routing between a strong and weak LLM to a multiclass model routing approach where we have specialist models(language vision model, function calling model etc.) - Larger framework controlled by a router - imagine a system of 15-20 tuned small models and the router as the n+1'th model responsible for picking the LLM that will handle a particular query at inference time. - MoA architectures: Routing to different architectures of a Mixture of Agents would be a cool idea as well. Depending on the query you decide how many proposers there should be, how many layers in the mixture, what the aggregate models should be etc. - Route based caching: If you get redundant queries that are slightly different then route the query+previous answer to a small model to light rewriting instead of regenerating the answer
-
If you’re an AI engineer trying to optimize your LLMs for inference, here’s a quick guide for you 👇 Efficient inference isn’t just about faster hardware, it’s a multi-layered design problem. From how you compress prompts to how your memory is managed across GPUs, everything impacts latency, throughput, and cost. Here’s a structured taxonomy of inference-time optimizations for LLMs: 1. Data-Level Optimization Reduce redundant tokens and unnecessary output computation. → Input Compression: - Prompt Pruning, remove irrelevant history or system tokens - Prompt Summarization, use model-generated summaries as input - Soft Prompt Compression, encode static context using embeddings - RAG, replace long prompts with retrieved documents plus compact queries → Output Organization: - Pre-structure output to reduce decoding time and minimize sampling steps 2. Model-Level Optimization (a) Efficient Structure Design → Efficient FFN Design, use gated or sparsely-activated FFNs (e.g., SwiGLU) → Efficient Attention, FlashAttention, linear attention, or sliding window for long context → Transformer Alternates, e.g., Mamba, Reformer for memory-efficient decoding → Multi/Group-Query Attention, share keys/values across heads to reduce KV cache size → Low-Complexity Attention, replace full softmax with approximations (e.g., Linformer) (b) Model Compression → Quantization: - Post-Training, no retraining needed - Quantization-Aware Training, better accuracy, especially <8-bit → Sparsification: - Weight Pruning, Sparse Attention → Structure Optimization: - Neural Architecture Search, Structure Factorization → Knowledge Distillation: - White-box, student learns internal states - Black-box, student mimics output logits → Dynamic Inference, adaptive early exits or skipping blocks based on input complexity 3. System-Level Optimization (a) Inference Engine → Graph & Operator Optimization, use ONNX, TensorRT, BetterTransformer for op fusion → Speculative Decoding, use a smaller model to draft tokens, validate with full model → Memory Management, KV cache reuse, paging strategies (e.g., PagedAttention in vLLM) (b) Serving System → Batching, group requests with similar lengths for throughput gains → Scheduling, token-level preemption (e.g., TGI, vLLM schedulers) → Distributed Systems, use tensor, pipeline, or model parallelism to scale across GPUs My Two Cents 🫰 → Always benchmark end-to-end latency, not just token decode speed → For production, 8-bit or 4-bit quantized models with MQA and PagedAttention give the best price/performance → If using long context (>64k), consider sliding attention plus RAG, not full dense memory → Use speculative decoding and batching for chat applications with high concurrency → LLM inference is a systems problem. Optimizing it requires thinking holistically, from tokens to tensors to threads. Image inspo: A Survey on Efficient Inference for Large Language Models ---- Follow me (Aishwarya Srinivasan) for more AI insights!
-
Measuring ROI in AI: What Success Really Looks Like in Enterprises I get asked this question a lot lately: “What does ROI in AI actually look like?” Not in theory. Not in a board slide. But in real enterprises trying to make this work. Here’s the uncomfortable truth: Most companies are measuring AI ROI the wrong way. They’re asking: “How many hours did Copilot save?” “Did this chatbot reduce headcount?” “Is the model cheaper than before?” That’s like judging the success of electricity by asking 👉 “How many candles did it replace?” What AI ROI isn’t AI ROI is not: A single number A one‑quarter metric A cost‑cutting exercise Or a model accuracy score Those are inputs. Not outcomes. What AI ROI actually looks like From what I’ve seen across enterprises, real AI ROI shows up in 3 quieter but more powerful ways: 1️⃣ Work changes - before cost does The first signal isn’t savings. It’s work that stops needing to happen. Example: A procurement team doesn’t “save 2 hours per report.” They stop writing reports altogether - because decisions are auto‑prepared. That’s not productivity. That’s workflow elimination. 2️⃣ Decisions get faster - and safer AI ROI often shows up as decision velocity with guardrails. Think of it like: Going from asking 10 people for opinions… to getting a grounded recommendation in minutes - with sources. When leaders trust the output and understand why it said what it said, adoption sticks. 3️⃣ Capability compounds over time This is the part most ROI models miss. AI value compounds. Month 1: A pilot works Month 3: Teams reuse patterns Month 6: Agents start orchestrating work Month 12: The organization operates differently Measuring AI ROI too early is like judging a gym membership after week one. A better question to ask Instead of “What’s the ROI of this AI tool?”, try asking: What work will disappear? What decisions will move faster? What capabilities will compound over time? And… what new risks are now controlled automatically? If you can answer those, the financial ROI usually follows. AI success isn’t about doing the same things cheaper. It’s about doing different things entirely. For those asking how enterprises are actually measuring AI success (beyond time saved), a few Microsoft perspectives worth exploring in comments 👇 Curious - how are you measuring AI success in your organization today? ***************************************************************************** Ranjani Mani #reviewswithranjani #Technology | #Books | #BeingBetter
-
𝗔𝗿𝗲 𝘆𝗼𝘂 𝗽𝗿𝗼𝗮𝗰𝘁𝗶𝘃𝗲𝗹𝘆 𝗺𝗮𝗻𝗮𝗴𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝗦𝗼𝘂𝗿𝗰𝗲-𝘁𝗼-𝗣𝗮𝘆 𝘁𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆 𝗰𝗼𝘀𝘁𝘀? If not, why let savings from smart Procurement slip away due to outdated technology or suboptimal use? S2P technology plays a central role in cost management, yet many companies lack a strategic approach to continuously assess and optimise their tech stack. Companies can adopt Bain & Co’s "𝗥𝗲𝗱𝘂𝗰𝗲, 𝗥𝗲𝗽𝗹𝗮𝗰𝗲, 𝗮𝗻𝗱 𝗥𝗲𝘁𝗵𝗶𝗻𝗸" model to continuously evaluate their technology infrastructure and costs, ensuring a more optimised and sustainable cost profile. Here is the model in action for Source to Pay technology cost optimisation: ▪️ 𝗥𝗲𝗱𝘂𝗰𝗲 to recover 10 to 20% of costs through short-term actions such as - adjusting licenses to match actual usage and adoption patterns - discontinuing features or functionalities that add little value - switching off modules where business capabilities have not yet caught up Avoid over-licensing by matching user access to actual needs, ensuring modules align with Procurement’s readiness. ▪️ 𝗥𝗲𝗽𝗹𝗮𝗰𝗲 to yield 20 to 30% of savings by - transitioning to cost-optimal, flexible solutions and getting out of lock-ins - switching subscription models when premium offerings are unnecessary - consolidating overlapping tools that offer similar features For example, merge multiple eSourcing tools into a primary platform and adopt a tender-based pricing for niche auction needs. This helps to adjust the cost profile of your Source to Pay technology with the actual needs. ▪️ 𝗥𝗲𝘁𝗵𝗶𝗻𝗸 to realise up to 40% cost optimisation by: - reimagining the architecture with a modular, composable design - automating and orchestrating processes and integrating new digital tools - reevaluate the mix of best-of-breed solutions vs integrated suites A new Procurement strategy requires a fresh look at the S2P tech stack to ensure it adapts and supports growth cost-effectively, while offering flexibility through additional digital levers like AI and automation. 𝗢𝗽𝘁𝗶𝗺𝗶𝘀𝗶𝗻𝗴 𝗦𝟮𝗣 𝘁𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆 𝗶𝘀 𝗮 𝗰𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗷𝗼𝘂𝗿𝗻𝗲𝘆, 𝗻𝗼𝘁 𝗮 𝗼𝗻𝗲-𝘁𝗶𝗺𝗲 𝗲𝗳𝗳𝗼𝗿𝘁, especially with contractual commitments, sunk costs, and change management challenges. Rather than following IT preferences and standards, it’s about keeping technology fresh and aligned with business needs as they evolve. ❓How do you manage your S2P technology to adapt to changing business needs while maintaining cost efficiency.
-
After optimizing costs for many AI systems, I've developed a systematic approach that consistently delivers cost reductions of 60-80%. Here's my playbook, in order of least to most effort: Step 1: Optimizing Inference Throughput Start here for the biggest wins with least effort. Enabling caching (LiteLLM (YC W23), Zilliz) and strategic batch processing can reduce costs by a lot with very little effort. I have seen teams cut costs by half simply by implementing caching and batching requests that don't require real-time results. Step 2: Maximizing Token Efficiency This can give you an additional 50% cost savings. Prompt engineering, automated compression (ScaleDown), and structured outputs can cut token usage without sacrificing quality. Small changes in how you craft prompts can lead to massive savings at scale. Step 3: Model Orchestration Use routers and cascades to send prompts to the cheapest and most effective model for that prompt (OpenRouter, Martian). Why use GPT-4 for simple classification when GPT-3.5 will do? Smart routing ensures you're not overpaying for intelligence you don't need. Step 4: Self-Hosting I only suggest self-hosting for teams at scale because of the complexities involved. This requires more technical investment upfront but pays dividends for high-volume applications. The key is tackling these layers systematically. Most teams jump straight to self-hosting or model switching, but the real savings come from optimizing throughput and token efficiency first. What's your experience with AI cost optimization?
-
In a recent roundtable with fellow CXOs, a recurring theme emerged: the staggering costs associated with artificial intelligence (AI) implementation. While AI promises transformative benefits, many organizations find themselves grappling with unexpectedly high Total Cost of Ownership (TCO). Businesses are seeking innovative ways to optimize AI spending without compromising performance. Two pain points stood out in our discussion: module customization and production-readiness costs. AI isn't just about implementation; it's about sustainable integration. The real challenge lies in making AI cost-effective throughout its lifecycle. The real value of AI is not in the model, but in the data and infrastructure that supports it. As AI becomes increasingly essential for competitive advantage, how can businesses optimize costs to make it more accessible? Strategies for AI Cost Optimization 1.Efficient Customization - Leverage low-code/no-code platforms can reduce development time - Utilize pre-trained models and transfer learning to cut down on customization needs 2. Streamlined Production Deployment - Implement MLOps practices for faster time-to-market for AI projects - Adopt containerization and orchestration tools to improve resource utilization 3. Cloud Cost Management -Use spot instances and auto-scaling to reduce cloud costs for non-critical workloads. - Leverage reserved instances For predictable, long-term usage. These savings can reach good dollars compared to on-demand pricing. 4.Hardware Optimization - Implement edge computing to reduce data transfer costs - Invest in specialized AI chips that can offer better performance per watt compared to general-purpose processors. 5.Software Efficiency - Right LLMS for all queries rather than single big LLM is being tried by many - Apply model compression techniques such as Pruning and quantization that can reduce model size without significant accuracy loss. - Adopt efficient training algorithms Techniques like mixed precision training to speed up the process -By streamlining repetitive tasks, organizations can reallocate resources to more strategic initiatives 6.Data Optimization - Focus on data quality since it can reduce training iterations - Utilize synthetic data to supplement expensive real-world data, potentially cutting data acquisition costs. In conclusion, embracing AI-driven strategies for cost optimization is not just a trend; it is a necessity for organizations looking to thrive in today's competitive landscape. By leveraging AI, businesses can not only optimize their costs but also enhance their operational efficiency, paving the way for sustainable growth. What other AI cost optimization strategies have you found effective? Share your insights below! #MachineLearning #DataScience #CostEfficiency #Business #Technology #Innovation #ganitinc #AIOptimization #CostEfficiency #EnterpriseAI #TechInnovation #AITCO
-
Many of you are working on 2026 budgets. Here are %s to spend by area 👇🏻 2026 budgets can’t treat AI as side projects anymore. Leading companies are making it core line items. • Put 20–30% of your tech spend into AI. Not experiments, but core business. • Build two moats: defensive (compliance, risk, trust) and offensive (fast pilots, talent, competitive edge). • Fund only what ties directly to real outcomes (more revenue, lower costs, reduced risk). Not vanity adoption stats. • Stay flexible: budget for multi-model tools and vertical solutions so you’re not locked to one vendor. • Invest in people: train 10–20% of your workforce as AI champions and give premium AI tools to your highest-impact roles. Boards should ask: are we validating ROI, proving compliance, and moving money fast from experiments into what works? SPECIFIC SPEND ALLOCATIONS 1. Fund Board + Exec AI Literacy (1–2%) Workshops, advisors, training. You can’t delegate this. 2. Acceleration Funds (2–5%) Fast-track pilots, MVP builds, and quick wins. 3. Competitive Intelligence (2–4%) Invest in data modernization and market scanning. 4. Compliance & Governance (3–6%) AI laws bite in 2026. Treat compliance as a moat, not overhead. 5. ROI & Outcome Tracking (1–3%) Budget for tools that prove value (hours saved, revenue grown, risk reduced); not vanity adoption dashboards. 6. Hybrid Infrastructure (5–8%) NPU hardware + local/cloud orchestration. Build for flexibility, privacy, and speed. 7. Talent & Training (3–7%) Fund AI champion programs. 10–20% of employees trained deeply in AI will outproduce entire teams. 8. Segmented AI Access (2–3%) Premium AI for high-leverage performers, commodity AI for the rest. Treat AI like capital allocation. 9. Platform Flexibility + Open Standards (2–4%) Budget for Markdown, APIs, and portable formats so your data and workflows remain yours. Portability is the cheapest insurance policy against lock-in. 10. Catch-Up & Modernization (3–6%) AI labs, partnerships, and M&A to accelerate where you’re behind. 💡 Bottom line: Every dollar should be tied to ROI, trust, or speed.
-
Over USD$800 billion will be invested in data centres across Asia Pacific by 2030, supporting rapid digital expansion. Yet with total regional electricity demand projected to rise nearly 50% by 2035, and data centre electricity needs expected to increase up to five‑fold by the mid-2030s, the choices we make today will define the resilience, affordability and sustainability of tomorrow’s energy systems. Deloitte’s latest Asia Pacific report, shows a clear path forward: ➡️ A power‑first, clean‑energy‑aligned approach can turn data centres from grid stressors into strategic assets that accelerate decarbonisation. ➡️ In most Asia Pacific markets, clean energy is already faster to build, cheaper and more resilient than conventional sources. ➡️ With coordinated planning, data centres can support grid stability, boost reliability for all users, and enable faster speed to market for data centre operators. Unlocking this opportunity requires multi‑stakeholder action across operators, governments, energy providers, asset owners, financiers and major customers. From my conversations with these stakeholders, its clear that zero carbon electricity supply is no longer a differentiator — it’s a prerequisite and they are reluctant to advance data centre investments where its not available. Deloitte’s new report “Powering Asia Pacific’s data centre boom” provides clear guidance on how to make this happen, bringing together insights from industry leaders and Deloitte’s analysis of energy, regulatory and sustainability trends across Asia Pacific. 💡 The message is clear: When new data centres contribute additively to clean energy supply, everybody wins. Excited to share this important work and continue the conversation on how we can shape an Asia Pacific digital economy that is powered sustainably. 🔗 Access the report here: https://lnkd.in/ggWEpFzs #DataCentres #AsiaPacific #CleanEnergy #EnergyTransition #AI Yosuke I. S. Anjani Kumar Steven Zhong Matt Walden Piyush J. Andrea Culligan David Hill
-
Gartner emphasizes that successful CIOs transition from reactive to proactive cost management by implementing IT smart spending strategies. This involves continuously rationalizing expenditures, optimizing underutilized assets, and reinvesting in high-performing technologies to maximize business value. To achieve this, CIOs should: - Embrace Smart Spending: Develop a strategic cost optimization discipline within IT to maximize business value and minimize spend. - Establish Financial Transparency: Track spending at the outcome level to better understand its value to the organization. - Set Targets and Benchmarks: Examine how your spending compares with that of your peers through external benchmarking. - Establish Accountability: Run cost optimization as an ongoing discipline with your business unit leaders and infuse it into your organization’s culture. - Use Savings to Drive Enterprise Strategy: Reduce and optimize where possible to help fund new initiatives to drive the strategy of the organization. By adopting these practices, CIOs can ensure that IT investments are strategically aligned with business objectives, fostering sustainable growth and innovation. #CIO #ITStrategy #SmartSpending
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development