BREAKING: OpenAI’s latest models introduce a new standard for open-source reasoning systems. They have released two Mixture of Experts models under the Apache 2.0 license: gpt-oss-20b and gpt-oss-120b. Both are built specifically for tool use, advanced reasoning, and integration into agent-based workflows. Key insights: 1. Open access with strong performance: These models are fully open-weight and match or exceed the performance of commercial options such as o3-mini and 04-mini. The 120B model surpasses o3-mini on standard benchmarks including MMLU, GPQA, and code generation tasks. 2. Efficient deployment across hardware: The 20B model is small enough to run on edge devices and consumer-grade hardware. Both models support over 130,000 tokens of context and use Mixture of Experts routing to reduce compute costs during inference. 3. Advanced tool interaction capabilities: Both models are capable of fetching current information from the web, executing Python code within a notebook-style environment, and calling custom functions defined by the user. 4. Customizable reasoning depth: Users can adjust the level of reasoning between low, medium, and high depending on the complexity of the task and the desired response speed. This allows for dynamic control in agentic applications. 5. Seamless integration with deployment platforms: OpenAI has collaborated with several infrastructure providers to ensure these models work immediately across a wide range of systems, making them accessible to developers without the need for extensive setup. 6. Structured interaction format: The models use a harmony chat format that supports interleaving reasoning with tool execution. This enhances performance in multi-step, tool-augmented tasks. Have you used it yet?
Updates on New AI Model Releases
Explore top LinkedIn content from expert professionals.
Summary
Updates on new AI model releases highlight how artificial intelligence is rapidly advancing, with new models offering smarter reasoning, better collaboration, and more autonomy in tasks. These updates explain how AI models are becoming more capable, open to the public, and able to handle both technical and creative challenges.
- Explore open access: Take advantage of open-source AI models that can be deployed on local devices, making advanced technology more accessible for personal and business use.
- Test collaborative features: Experiment with AI agents and models that can generate presentations, analyze images, and make decisions autonomously to streamline your workflow.
- Monitor ongoing progress: Stay informed about frequent upgrades, new releases, and safety improvements to keep your organization up to date with the latest AI advancements.
-
-
🚨 Big week for OpenAI. After the 4.1 model family on Monday, we now get o3 and o4-mini - and while the model naming remains chaotic, the message couldn’t be clearer: it's a step change, not an incremental gain. OpenAI calls these “the smartest models we’ve released to date,” and early signs back that up. So what’s actually new? 🖼️ They can “think with images”. Not just caption or describe, but truly reason through visual content: solve puzzles, interpret data viz, connect dots between text and image in a way that feels useful, not just novel. 🛠️ Autonomous tool use is baked into the core. They don’t wait for a prompt to browse or code - they decide when to invoke Python, DALL·E, or search. This shifts the paradigm from “chatbot with tools” to “agent with judgment.” The model is no longer the product. The agent is. 💡 They IDEATE. The wildest claim is that these models can generate novel ideas - not just retrieve or remix. That’s a step beyond summarizing the internet and towards real knowledge creation. Once users battle-test these models in the real world, we’ll have the true answer but it seems like OpenAI may be back on top of leaderboards after this model drop. ⬆️ o3 reportedly sets new SOTA on Codeforces, SWE-bench (without model-specific scaffolding), and MMMU. On the AIME 2025, o4-mini scored 99.5 percent when given access to a Python interpreter. The secret sauce? Large-scale reinforcement learning, doubling down on the “more compute = better performance” principle that powered the original GPT series. But the bigger story is strategic. As foundation models become commoditized, labs are racing up the stack - in search of margin, control, and distribution. Anthropic recently launched Claude Code. Today, OpenAI responded with Codex CLI, a fully open-source, local coding agent that runs on your machine. It’s not just a wrapper; it’s the beginning of a developer-native AI runtime. Now comes word that OpenAI is exploring an acquisition of Windsurf, a company building agent orchestration infra. The model wars are giving way to the agent wars. And the real moat isn’t model size - it’s vertical integration. This is the clearest signal yet that the frontier has moved up the stack.
-
OpenAI just rolled out a major update to ChatGPT, quietly releasing three new models (o3, o4-mini, and o4-mini-high) that offer the most advanced reasoning capabilities the company has ever shipped. You’ll see o3 in ChatGPT Pro and Team plans; o4-mini and o4-mini-high are running across ChatGPT Plus, with o4-mini available even to free users. The naming conventions may be opaque, but the takeaway is clear: OpenAI is moving faster and getting smarter. The noticeable upgrade here is autonomy. These models can now decide when to browse the web, generate an image, run Python code, or analyze a file — without you prompting them to switch tools. It’s a meaningful shift from chatbots as assistants to AI as a collaborative partner that understands what you need and figures out how to get it done. What makes this more than just another model release is how well these systems handle visual reasoning. They are multimodal. You can now drop in a chart or image, and the model will “look” at it and respond. Want to rotate a diagram, pull insights from a chart, or modify a layout on the fly? Just ask. According to the company, in side-by-side tests, o3 is making 20% fewer major reasoning errors than GPT-4 on hard tasks — real progress, not marketing fluff. OpenAI isn’t waiting for GPT-5 to push boundaries. Instead, they’re iterating in public and optimizing as they go. The shift from monolithic releases to a cadence of smaller, high-impact upgrades that embed intelligence more deeply into workflows seems to be an industry trend. In practice, if you’re not testing these tools in your org, you’re already behind. -s
-
🚨 Big week in AI. Four major updates, one clear signal: the AI frontier is rapidly opening up — and transforming. 🔓 OpenAI launches gpt-oss: two open-weight language models (120B and 20B) that rival their closed counterparts on reasoning, coding, and health benchmarks. These models are available under the Apache 2.0 license and optimized for on-device deployment. With safety at the forefront, OpenAI even adversarially fine-tuned the models to test worst-case misuse scenarios — a first-of-its-kind for open model releases. 🌍 Google DeepMind unveils Genie 3, a real-time world model capable of simulating rich, navigable 3D environments from a single text prompt. It marks a leap toward embodied agents and AGI training grounds — combining physics modeling, long-horizon memory, and interactive world events at 24 fps. 🧠 Claude Opus 4.1 is here, and it’s a sharp upgrade in code generation and agentic tasks. With 74.5% on SWEbench Verified, it edges out Claude 4 and solidifies Anthropic’s place in agentic reasoning. Companies like GitHub and Rakuten are already reporting real-world value for debugging and research tasks. 🧧 Geopolitical backdrop: This push toward openness isn’t just about access — it’s strategic. OpenAI’s release comes amid competition from China’s DeepSeek and others, as nations and companies race to define the next-gen open AI stack. 💡 Takeaway: We’re entering a new era where openness, capability, safety, and geopolitics intersect. Whether you're building tools, researching safety, or thinking about AI’s societal role — the stack is shifting fast. Which of these breakthroughs are you most excited about? Let’s discuss 👇 #AI #OpenSourceAI #GenAI #gptOSS #Claude #Genie3 #AGI #AIOpenModels
-
The development of AI agents and top-of-the-line LLMs continues at a breakneck speed. A new generation of models has just been released by OpenAI (o3, o4-mini) and Google (Gemini 2.5), and new agent vendors like Manus are rolling out their products broadly. What can these new products do for business users? Most benchmarks for LLMs and reasoning models focus on topics such as math, science and coding. These domains are special in the sense that most questions have an unambiguously correct answer. In business, things tend to be more fluid and varied. For example, a PowerPoint presentation certainly has verifiable elements (such as cited data and sources), but how do we judge if it’s really good? Can these latest models and agents produce a presentation that is as good as a human expert’s? I used my favorite real-world example of “Current AI Trend Update for Investors” as the topic for a test. This is an interesting problem because it’s multi-faceted and needs very fresh data. Good news for business users: OpenAI’s o3 model can now produce actual PowerPoint files. My first attempt with Deep Research on top of o3 produced an extensive report in about 12 minutes, but then struggled to carry all the details over to the PPT — sources in particular. Surprisingly, just using the “naked” o3 model produced a result that not only was nearly as detailed, but also needed just 4 minutes to complete. A very remarkable achievement. Google’s Gemini 2.5 in Deep Research mode turned out to be the most thorough of the bunch. It produced a very detailed, remarkably well sourced report. Unfortunately, it can’t create slide decks, so used o3 to turn its output into an actual deck. Models can work together as a team… Manus, the AI agent from China, was a surprise. It interpreted the task most liberally and set interesting priorities, but produced rich, pretty good-looking charts. It worked for 34 minutes, clearly the longest, but the result was truly interesting. Each deck had its strengths and weaknesses. So what do we do to get a strong final result? I asked o3 to compare the decks, which it did in great detail, and then told it to put together a “best of” deck from all three inputs. The result is still far from perfect, but really quite compelling for something that took only minutes of manual work. With another hour or so of work this could be a final presentation, and it would have saved me at least 4-5 hours of research and drafting time. The latest models and agents are yet again a big leap forward. Just a few months ago, this kind of output wouldn’t have been possible. Which model is best? It depends. o3 is clearly a huge step forward in its agentic and analytical capabilities, and it is quite opinionated. Gemini 2.5 is great for very detailed, matter-of-fact tasks. And agents such as Manus work well for complex tasks that benefit from direct computer use. Decks: https://lnkd.in/etARUk_D
-
A 9 billion parameter AI model is now beating a 120 billion parameter LLM on major benchmarks. It runs on a laptop. And it's open source. That's the AI story you should be paying attention to today. Alibaba's Qwen team just dropped four new models, from 0.8B to 9B parameters. The 9B version is outscoring much larger 120B models on many angles. An AI model that's 13x smaller, beating it's much more resource-hungry LLM cousin. Free. Open source. Apache 2.0 license. You can download it right now. And these aren't just text models. They're natively multimodal. The 9B runs on a single consumer-grade GPU at full precision. Grab the 4-bit model and you can run it on a 3-year old macbook. The 0.8B model runs nicely on an iphone or raspberry pi. Here's why this matters: For the last two years, the AI conversation has been dominated by who can build the biggest model, who can raise the most money for GPU clusters, and who can win the next capability benchmark. That race isn't over. But a different race has started. The race to figure out how small you can go while keeping the performance that actually matters for production use cases. If a 9B model can match a 120B model on reasoning and vision tasks, the economics of deploying AI just changed. That opens up a different business model that a whole new set of companies can play. The frontier models will keep getting bigger, better, and more capable. I'm excited to see where leaders like like Anthropic OpenAI Google and others are going with their models. But the most interesting story in AI right now might be how small the useful models are getting. Watch this space.
-
Last week in AI revealed a structural shift that many people are missing. For the past two years, the conversation has centered on one question: who has the best model? But the latest announcements suggest something deeper is happening. AI is becoming infrastructure. OpenAI: -- released GPT-5.4, its newest frontier model designed for complex professional tasks. On benchmarks, it scored 57.7% on SWE-Bench Pro for software engineering, 82.7% on BrowseComp, and 92.8% on the GPQA Diamond science benchmark. -- The company also introduced Codex Security, an AI agent designed to detect software vulnerabilities, and -- launched ChatGPT for Excel, which allows users to analyze spreadsheets using natural language while connecting to financial data providers like FactSet and Moody’s. At the same time, the competition is accelerating on efficiency and cost. Google released Gemini 3.1 Flash Lite, designed to deliver responses about 2.5× faster and generate output roughly 45% faster than earlier Gemini models, with pricing starting at $0.25 per million input tokens. Alibaba also released Qwen 3.5 small models ranging from 0.8B to 9B parameters. In some benchmarks, the 9B model reportedly outperformed systems with more than 120B parameters, highlighting how efficiency is becoming a competitive frontier. But the biggest signals this week came from infrastructure. Nvidia introduced AI models designed to monitor and manage telecom networks, helping detect failures and automate network operations. Huawei announced a new AI-native network architecture that includes agent layers capable of automating telecom management, with forecasts suggesting that up to 15% of network decisions could be handled autonomously by AI agents by 2028. Governments are now responding as well. The White House announced a Ratepayer Protection Pledge signed by Microsoft, Google, Amazon, Meta, and OpenAI. Under the pledge, companies building large AI data centers must pay for electricity grid expansions instead of passing those costs to residential ratepayers. Meanwhile, research shows AI is already reshaping work: -- A study cited by Scientific American found that developers using AI coding tools produced 27% more merged code changes and nearly 20% more after-hours commits. -- A separate survey of nearly 5,000 developers reported that more than 90% now use AI tools, and over 80% say they improve productivity, though many also reported increased debugging after releases. Adoption globally is still uneven. Japan is investing ¥340 billion in subsidies to accelerate AI adoption as it prepares for a projected labor shortage of 11 million workers by 2040. Yet today only about 8.4% of workers in Japan report using AI at work. The AI race is no longer just about building better models. It’s about controlling the infrastructure around them: energy systems, developer ecosystems, enterprise workflows, and the industries where AI actually runs.
-
The AI bubble has long been fueled by claims that artificial general intelligence (AGI) is within reach—just a few months or, at worst, a few years away, but recent studies suggest that not only is it decades away, the latest generative #AI models aren’t much better than the previous ones. A recent paper in Nature argued that, because the internet data that LLMs train on is increasingly dodgy computer-generated text, “indiscriminate use of model-generated content in training causes irreversible defects in the resulting models.” An article in the IEEE Spectrum agreed: "The prevailing methods to make large language models more powerful and amenable have been based on continuous scaling up (that is, increasing their size, data volume and computational resources) and bespoke shaping up (including post-filtering, fine tuning or use of human feedback). However, larger and more instructable large language models may have become less reliable." Recent information from analyses of OpenAI’s newest model, Orion, suggest it is no better than ChatGPT4 in terms of hallucinations. The official release of Sora has also disappointed many because it still violates laws of physics, objects sometimes disappear, and other hallucinations are common. Similar stories are emerging from Google (Gemini 2.0 is barely better than 1.0) and Anthropic (Claude 3.5 Opus has been likely scrapped or at least the timetable has significantly slipped) while Apple's much awaited AI update (iOS 18.2) is more for entertainment than business according to WSJ. Even researchers who have every incentive to tout the reasoning abilities of LLMs have become critical. Six Apple researchers recently wrote that “current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data.” Even the founders of the venture capital firm Andreessen Horowitz admitted recently that “they’ve noticed a drop off in AI model capability improvements in recent years.” Co-founder Ben Horowitz said that a comparison of the differences between GPT-2.0, GPT-3 and GPT-3.5 models and the difference between GPT-3.5 and GPT-4 show that “we’ve really slowed down in terms of the amount of improvement.” Co-founder Marc Andreessen added that, two years ago, GPT-3.5 model was “way ahead of everybody else…. Sitting here today, there’s six that are on par with that. They’re sort of hitting the same ceiling on capabilities.” Gary Smith and I have previously argued that revenue from corporate adoption of AI continues to disappoint and, so far, pales in comparison to the Internet revenue that sustained the dot-com bubble until it didn’t. A growing recognition that there are fundamental challenges that make LLMs unreliable and that these are not going to be solved by increasingly expensive scaling are likely to hasten the popping of the AI bubble. See Unicorns, Hype and Bubbles. #technology #innovation #startups #artificialintelligence #hype
-
OpenAI has recently launched three new models: GPT‑4.1, 4.1 Mini, and 4.1 Nano. The updates emphasize performance, context length, and efficiency, while introducing a new “Nano” class of models for the first time. Key highlights about these models: 🔹1M-token context via API → Enables full codebase analysis, long-form reasoning, and multi-doc workflows (without chunking). 🔹Benchmark improvements vs GPT-4o: → SWE-bench (coding): 54.6% (+21.4 pts) → MultiChallenge (instruction): 38.3% (+10.5 pts) → Video-MME (long-context): 72.0% (+6.7 pts) 🔹Training data cutoff: June 2024 🔹GPT-4.1 Nano is OpenAI’s first tiny model, is designed for ultra-low latency and edge use cases. While performance is lower than full-scale models, it’s intended for scenarios where speed and cost matter more than raw capability. 🔹Mini bridges the gap between full-scale and Nano, targeting mid-range workloads where inference speed is important but task complexity remains moderate. OpenAI appears to be refining its model tiering strategy, prioritizing cost-effective deployment at different levels of performance while continuing to push context limits. Full documentation: https://lnkd.in/dx8vjywF #technology #generativeai #llms #programming #openai
-
Choosing the right AI model just got a lot easier. I pulled together the latest frontier-class models (April 2025) across Anthropic, OpenAI, Google, Meta and DeepSeek AI — and broke them down by what matters most: ⸻ 1. GPT-4o (OpenAI) 🗓️ Release: May 2024 🧠 Max context: 128K tokens 🛠️ Modalities: Text, Image, Audio (input+output) ⚙️ Built-in tools: Function-calling, Browser, Code 🔒 Access: Closed (ChatGPT, API) ✨ Best for: Real-time voice, image, and document assistants ⸻ 2. GPT-4.1 (OpenAI) 🗓️ Release: April 2025 🧠 Max context: 1M tokens 🛠️ Modalities: Text, Image ⚙️ Built-in tools: Batch APIs, Functions 🔒 Access: Closed (API) ✨ Best for: Working with huge documents and faster, cheaper deployments ⸻ 3. o3 (OpenAI) 🗓️ Release: April 2025 🧠 Max context: 128K tokens (200K for o3-mini) 🛠️ Modalities: Text, Image ⚙️ Built-in tools: Full toolchain baked into RL training 🔒 Access: Closed (ChatGPT, API) ✨ Best for: Smart problem-solving with built-in Python and SQL skills ⸻ 4. Claude 3.5 Sonnet (Anthropic) 🗓️ Release: April 2025 🧠 Max context: 200K tokens 🛠️ Modalities: Text, Image ⚙️ Built-in tools: Claude Code sandbox, MCP function calls 🔒 Access: Closed (API, Bedrock, Vertex) ✨ Best for: Enterprise chat, document analysis, and safe coding ⸻ 5. Gemini 2.5 Pro (Google DeepMind) 🗓️ Release: April 2025 🧠 Max context: 1M tokens (2M on roadmap) 🛠️ Modalities: Text, Image, Audio, Video ⚙️ Built-in tools: Native agents and function calling 🔒 Access: Closed (GCP, Workspace) ✨ Best for: Multimodal copilots and large-scale data analysis ⸻ 6. Llama 3 (Meta) 🗓️ 70B (April 2024 refresh), 405B (November 2024) 🧠 Max context: 128K tokens 🛠️ Modalities: Text (+ Code) ⚙️ Built-in tools: Open-source agent kits 🔓 Access: Open weights ✨ Best for: Custom chatbots, fine-tuning, and private deployments ⸻ 7. DeepSeek-V3 MoE (DeepSeek) 🗓️ Release: February 2025 (report) / March 2025 (GA) 🧠 Max context: 128K tokens 🛠️ Modalities: Text ⚙️ Built-in tools: External orchestration 🔓 Access: Open weights (Apache 2.0 license) ✨ Best for: Cost-effective advanced reasoning on smaller GPUs ⸻ Quick picker: ⚡ Need >128K context + closed-source safety? → Claude 3.5 or Gemini 2.5 ⚡ Need million-token context + cheap mini tiers? → GPT-4.1 ⚡ Real-time voice/vision UX? → GPT-4o ⚡ Deep reasoning with automatic Python/web chains? → o3 ⚡ Full control, open deployments? → Llama 3 or DeepSeek-V3 ⸻ Context cheat-codes: 📖 128K tokens ≈ Harry Potter #1 📚 1M tokens ≈ Entire 7-book Harry Potter series ⸻ Use this breakdown to stop guessing and pick the right model based on window, modality, tool depth, and licence.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development