🎉 passmark@1.0.9 now supports CUA mode! Until now, passmark drove the browser through ARIA accessibility snapshots - fast and reliable when the DOM is well-structured, but it can struggle with canvas-heavy apps, custom elements, or markup that doesn't expose a clean a11y tree. After working with a lot of customers, we realized that the winning combo is ARIA snapshot + CUA. In CUA mode, the agent sees screenshots, clicks coordinates, and types like a human - powered by the new OpenAI gpt-5.5. Snapshot mode is still the default and what we recommend for most flows. CUA is the escape hatch for when the accessibility tree isn't enough. One flag to switch: configure({ ai: { mode: "cua" } }); Requires an OPENAI_API_KEY. Would love feedback from anyone testing flows that snapshot-based agents have trouble with. Passmark is OSS. Check it out and contribute: https://lnkd.in/gbG7HrSu Shout-out to Ipseeta Priyadarshini for the PR!
Passmark 1.0.9 Supports CUA Mode
More Relevant Posts
-
Something very interesting is happening this week in the AI space (is there a space with no AI?), and it has nothing to do with Opus 4.7 or any other model. Cloudflare is having a week, Agents Week, to be exact and they announced a bunch of things: Project Think (agents SDK), AI Platform, AI Search, Email Service, all for AI agents. OpenAI also announced something. That's not news, that's half of their marketing strategy. But they did update their Agents SDK. The part that matters is that it now allows durable execution. Like the one you need when your AI agent is browsing the web. I wrote about this, and what it means for you here: https://lnkd.in/eixXzfmf
To view or add a comment, sign in
-
Claude just dropped a massive update to Claude Code on desktop. Parallel agentic sessions. Multiple agents running from one window. Drag-and-drop layout. Integrated terminal. In-app file editing. HTML and PDF preview. Side chats that let you branch without losing your main thread. Sessions that auto-archive when PRs merge. Three view modes so you can dial the interface from full transparency into every tool call down to just the results. SSH support on Mac now too. This is genuinely impressive work. I use Claude Code every single day to build skills, templates, workflows, and system prompts. It's the best tool I've found for the creation and architecture side of my AI stack. And this update just made it significantly better for running parallel work. Now here's what most people don't realize. Claude Code natively only runs Claude models. But with a proxy like LiteLLM or tools like AnyClaude, you can route it to GPT-5.4, Gemma 4, or basically any model you want. It takes some setup. And honestly the experience is best with Claude since the tool was optimized for it. But the option is there. What Anthropic WON'T let you do is run Claude inside other apps like OpenClaw without paying API rates. So Claude's door is open inward. Not outward. That's fine. I work with it. Claude Code for building. OpenClaw with GPT-5.4 and Spark for execution. Best of both worlds. The agentic AI space is moving so fast right now. OpenClaw shipping every other day. Anthropic redesigning Claude Code for parallel agents. OpenAI optimizing Codex with Spark and Fast Mode. Every player is sprinting. And honestly that's great for all of us. Competition makes everything better. Better tools. Faster updates. More options. The people who win in this environment aren't loyal to one platform. They're loyal to results. Use Claude where Claude is best. Use OpenAI where OpenAI is best. Use open source where open source is best. Stack them. Layer them. Build systems that leverage the strengths of each. That's what I've been doing. That's what I teach my students. And right now is the best time to be building with AI that I've ever seen.
To view or add a comment, sign in
-
We are releasing our Nyquest Compression Engine as open source on GitHub for everyone to use. (https://lnkd.in/evv5yx94). It comes with a one-shot installer, can be set up as a systemd service, supports 300+ models, and allows you to bring your own keys (BYOK) seamlessly. We are introducing Nyquest AI, a multi-model AI workspace. Oversampling wastes tokens. Undersampling distorts intent. Nyquest sits exactly on the boundary. The name comes from the Nyquist theorem, an important concept in telecommunications. Considering AI and its usage represent a new layer of infrastructure in the modern world, Nyquest fits the name perfectly. That is why we are making our compression engine available as open source: a drop-in proxy with 350+ compiled rules, local LLM semantic condensation (Qwen 2.5 1.5B), a one-shot installer, system preflight, and systemd service support. It works with Anthropic, OpenAI, Gemini, xAI, OpenRouter, and local models. Tokens are the new currency in the world of AI. With rising costs per million tokens for better models, enterprises with high token usage and small businesses that may not be well-versed in AI but still want to integrate it into their operations need tools that can reduce costs without degrading output quality.
To view or add a comment, sign in
-
Semantic Kernel is Microsoft’s SDK for integrating AI models into applications. We analyzed its C# codebase using PVS-Studio to see what kind of issues show up in a real, actively maintained project. https://lnkd.in/ePkNPRDC #SemanticKernel #Csharp #AI #Debugging #CodeQuality
To view or add a comment, sign in
-
Countdown to Irrelevance — Apr 26, 2026 Every breakthrough brings us closer to goodbye. 1️⃣ DeepSeek Slashes V4-Pro API Prices by 75% — Limited-Time Fire Sale 🎯 DeepSeek announced a temporary 75% discount on V4-Pro API, dropping input to ~$0.435/M tokens — roughly 1/7th of GPT-5.5. The deal runs until May 5. Alongside, updated SDK requirements and dev toolchain docs were pushed, signaling aggressive expansion. Relevance: Price wars are the new battleground. When inference runs cheaper than compute, incumbents lose their moats. 2️⃣ DeepSeek Engineer Drops Role-Play Mode Prompt for V4 Chain-of-Thought 🎭 An engineer on the DeepSeek team revealed a "role-play mode" prompt trick that unlocks customized chain-of-thought behavior in V4. The prompt transforms the model's reasoning style — think less "output machine," more "character simulator." Relevance: Prompt engineering isn't dead. If a DeepSeek dev is sharing COT tricks, the community playbook just got thicker. 3️⃣ Alibaba Qwen Drops Qwen-Image-2.0-Pro — Open Source Image Model 🖼️ Qwen released Qwen-Image-2.0-Pro, an open-source image generation model. Weights available, MIT-style license. Another aggressive salvo from the Chinese open-source camp, pushing the cost of image gen toward zero. Relevance: Open-source image gen is catching up fast. Proprietary image APIs better have a moat. 4️⃣ Cursor 3.2 Ships With /multitask Commands and Async Sub-Agents 💻 Cursor 3.2 introduces `/multitask` — a command that spawns parallel async sub-agents within the editor. Think Claude Code-style orchestration but inside an IDE. This shifts the AI coding paradigm from "pair programmer" to "team manager." Relevance: The IDE is becoming an operating system for AI agents. Cursor just added multi-threading. 5️⃣ Google Cloud CEO Teases New Gemini Model on the Horizon ☁️ Google Cloud CEO hinted at an upcoming Gemini model in an interview, calling it "the next leap." No specs, no timeline — just positioning. But when Google pre-announces, the release window is usually weeks, not months. Relevance: The model release cadence is accelerating. We're entering the "announce before launch" era — and everyone's racing. ────────── Tags: #AI #DeepSeek #V4Pro #Qwen #Cursor #Gemini #GoogleCloud #OpenSource #ArtificialIntelligence #LLM #TechNews #Innovation Which breakthrough worries you most? 👇
To view or add a comment, sign in
-
AI writes fast. It also lies fast. Yesterday it confidently “fixed” our caching bug by wrapping a fetch in try/catch and returning null on error. The code looked clean in the diff. Production would have turned it into a silent failure factory. Real issue was SSR caching inconsistency: we were keying cache by pathname only. AI didn’t notice the query string and locale header were part of the response shape. So /products?sort=price cached over /products?sort=popular. And en US HTML got served to fr FR users during a traffic spike. AI suggested “just add a TTL”. I overrode it and changed the key to include search params plus a normalized accept language. Then I added a cache bypass for authenticated requests because we saw personalized fragments in the markup. The best part: AI helped me write the tests. The dangerous part: it wrote tests that asserted the implementation, not the behavior. I rewrote them to assert cache separation across two requests with different headers. AI is a great pair. But it doesn’t carry your invariants in its head. You do. Ship the diff, not the vibe. #JavaScript #SSR #Caching #FrontendArchitecture #AIEngineering
To view or add a comment, sign in
-
-
If you've built anything with LLMs in the last two years, you've probably noticed that every model provider, every agent framework, every MCP server speaks the same schema language: JSON Schema. They didn't coordinate. They just all arrived at it independently. OpenAI, Anthropic, Google, xAI, Mistral AI, and DeepSeek AI: all six major LLM providers ship structured outputs that take a JSON Schema as input. With .txt by Rémi Louf taking the lead here. MCP, the protocol that unifies tool calling across the ecosystem, is built on JSON Schema 2020-12. LangChain, CrewAI, Vercel AI SDK, Microsoft Semantic Kernel, the OpenAI Agents SDK: all use JSON Schema as the contract between models and tools. Excellent companies like Retab by Louis de Benoist, LandingAI, Vapi, AssemblyAI, Firecrawl, and Tavily make use of JSON Schema as part of their official interfaces. The implication for anyone building APIs or AI-facing software: your schema layer is no longer documentation. It is the interface AI systems will use to consume your software. Here is a full article digging deeper into the central role of JSON Schema in the world of LLMs: https://lnkd.in/dpiS9wBC
To view or add a comment, sign in
-
In the past week alone, we’ve seen major players push the boundaries of what’s possible: from fully local, high-performance models with built-in tool use, to multimodal systems that can turn ideas into working code, and agent platforms that run complex workflows across environments without human intervention. Here’s a quick breakdown of the most important updates shaping the next generation of AI development 👇 📌 Google Gemma 4 and Platform Updates: https://lnkd.in/eaaKWzne, https://lnkd.in/dTccNZVW, https://lnkd.in/e37nbcaB, https://lnkd.in/dVGg8Anv Google open-sourced Gemma 4, a family of models (31B dense, 26B MoE) that run locally via Ollama or Hugging Face and outperform models up to 20x their size, with native tool use and 256K context for full codebases. Alongside, Google launched Agent Skills that let Gemma 4 execute multi-step workflows across tools and data sources directly on-device. On the media side, Veo 3.1 Lite cut video generation costs to under half of Veo 3.1 Fast while matching its speed, and the Agent Development Kit added Java support with context control and memory services. 📌 Alibaba Qwen Model Updates: qwen.ai/blog?id=qwen3.6, https://lnkd.in/eED_7V4A, create.wan.video Alibaba shipped Qwen3.6-Plus with 1M token context built for agent workflows, scoring 78.8 on SWE-bench Verified and 87.1 on LiveCodeBench, so it can read entire repos and execute multi-step coding tasks without stitching context manually. It also released Qwen3.5-Omni, a multimodal model that surpasses Gemini 3.1 Pro on audio reasoning and lets you describe an app idea with voice or camera and get working code back. A third release, Wan2.7-Image, enables interactive image editing using visual instructions drawn directly on the image. 📌 Cursor 3 Platform Update: cursor.com/blog/cursor-3 Cursor launched Cursor 3 with an agent-first interface that replaces file-centric navigation with a control layer for managing multiple coding agents. You can run agents in parallel across local machines, SSH servers, and cloud environments, move sessions between them without restarting, and keep tasks running when your machine goes offline. The integrated diff and PR workflow means you review and ship agent output without leaving the tool. 📌 Nous Research Hermes Agent Updates: https://lnkd.in/dttj2iKD, agent/releases/tag/https://lnkd.in/d3vmH-Wr Nous Research shipped two Hermes Agent releases in one week. The first added Hugging Face integration and expanded model support to over 400 models with improved reliability across tool-use workflows. The second release introduced autonomous video generation and multi-instance agent workflows that run without human intervention, allowing you to orchestrate parallel agent tasks from a single setup.
To view or add a comment, sign in
-
We talk a lot about AI transformation, but it often starts with small, real problems. In this case: manual SOW validation, repeated errors and wasted time. NovaValidator uses AI to solve that in a practical way, with measurable impact. Neat project by Rackspace Technology's Wade Wierman: https://lnkd.in/gYjM8kXS.
To view or add a comment, sign in
-
Peter Steinberger, now at OpenAI, says Anthropic "copied popular features into their closed harness, then locked out open source." That's a convenient framing from someone with a direct commercial interest in the narrative. The reality is that OpenClaw users were always operating in a grey area. Anthropic's terms of service technically prohibited third-party tool access since early 2024. OpenClaw users found a loophole in the OAuth authentication flow and exploited it to run agent workloads that are nothing like what a $20/month subscription was designed to cover. A single day of OpenClaw usage could consume over $100 in tokens. Anthropic's own benchmarks put typical Claude Code professional usage at $6 a day -- a stark difference. The "they copied our features" complaint doesn't hold water. OpenClaw's architecture is full of heavyweight context, relies on a flat-rate subsidy it was never entitled to, and is basically such garbage that it only works if you throw millions of tokens per day at it. That's not a model worth emulating and it's a model that only works if someone else pays for it. Anthropic handled the transition with more generosity than they were obligated to: credits, discounts, refunds. I can't say I blame them for closing this loophole. The criticism from Steinberger is exactly what you'd expect from someone who's now an OpenAI employee with a vested (literally) interest in their success. https://lnkd.in/gXPAjhc3
To view or add a comment, sign in
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development