OpenOnco quality control: testing a closely integrated diagnostics database and codebase: LLM Data Review + UI Regression Testing. OpenOnco grew from prototype to production in about a month: 80+ diagnostic tests, complex filtering, PDF export, comparison tools. 12K lines of code. Manual QA stopped working, fortunately some smart software folks advised us. Here's our system: (1) Multi-LLM Data Verification Before each deploy, I run the full database through Claude, Grok, GPT-5, and Gemini 3. Each model reviews test data for: → Inconsistencies between related tests → Outdated info vs. current clinical guidelines → Missing fields that should be populated → Logical errors (FDA-approved test with no approval date) Different models catch different things. Claude finds logical inconsistencies. GPT-5 catches formatting. Grok flags outdated clinical data. Gemini spots missing cross-references. (2) Automated UI Regression Testing Regression testing: "Did my changes break something that was working?" For us this means testing actual user workflows — clicking buttons, filling forms, navigating between pages — and verifying the interface behaves correctly every time. We test the actual UI, not just components in isolation: → Filter interactions: Click "IVD Kit" filter → verify correct tests appear → click "MRD" category → verify intersection is correct → clear filters → verify all tests return → Test card workflows: Click test card → modal opens with correct data → click "Compare" → test added to comparison → open comparison modal → verify all fields populate → Search behavior: Type "EGFR" → verify matching tests surface → clear search → verify full list returns → Direct URL testing: Navigate to /mrd?test=mrd-1 → verify modal auto-opens with correct test → Navigate to /tds?compare=tds-1,tds-2,tds-3 → verify comparison modal loads with all three → PDF export: Generate comparison PDF → verify page count matches content → verify no repeated pages (caught a real bug where Page 1 rendered on every page) → Mobile responsiveness: Run full suite at 375px, 768px, 1024px, 1440px breakpoints We run these tests using Playwright — an open-source browser automation framework. It launches real browsers (Chrome, Firefox, Safari), executes user actions, and asserts outcomes. Tests run on every push via GitHub Actions; deploy is blocked if anything fails. Full suite takes ~4 minutes 🤯🤯🤯 The combination of LLM data review + real UI regression testing catches what unit tests miss: so far, hundreds of issues 👍🏼👍🏼👍🏼
Automating Code Testing and Regression Detection
Explore top LinkedIn content from expert professionals.
Summary
Automating code testing and regression detection means using software tools to check that new code changes don’t break features that worked before. This process relies on automated systems to catch errors and highlight unintended changes, making it easier to maintain reliable software.
- Expand automation tools: Adopt AI-powered and rule-driven solutions so your team can review more code and interface changes without spending extra time on manual checks.
- Integrate visual review: Use visual regression testing to compare how pages look before and after updates, helping you spot real issues while filtering out harmless design tweaks.
- Streamline test management: Set up automatic detection and reporting for test failures, so you can quickly address problems before they reach users.
-
-
Don’t Focus Too Much On Writing More Tests Too Soon 📌 Prioritize Quality over Quantity - Make sure the tests you have (and this can even be just a single test) are useful, well-written and trustworthy. Make them part of your build pipeline. Make sure you know who needs to act when the test(s) should fail. Make sure you know who should write the next test. 📌 Test Coverage Analysis: Regularly assess the coverage of your tests to ensure they adequately exercise all parts of the codebase. Tools like code coverage analysis can help identify areas where additional testing is needed. 📌 Code Reviews for Tests: Just like code changes, tests should undergo thorough code reviews to ensure their quality and effectiveness. This helps catch any issues or oversights in the testing logic before they are integrated into the codebase. 📌 Parameterized and Data-Driven Tests: Incorporate parameterized and data-driven testing techniques to increase the versatility and comprehensiveness of your tests. This allows you to test a wider range of scenarios with minimal additional effort. 📌 Test Stability Monitoring: Monitor the stability of your tests over time to detect any flakiness or reliability issues. Continuous monitoring can help identify and address any recurring problems, ensuring the ongoing trustworthiness of your test suite. 📌 Test Environment Isolation: Ensure that tests are run in isolated environments to minimize interference from external factors. This helps maintain consistency and reliability in test results, regardless of changes in the development or deployment environment. 📌 Test Result Reporting: Implement robust reporting mechanisms for test results, including detailed logs and notifications. This enables quick identification and resolution of any failures, improving the responsiveness and reliability of the testing process. 📌 Regression Testing: Integrate regression testing into your workflow to detect unintended side effects of code changes. Automated regression tests help ensure that existing functionality remains intact as the codebase evolves, enhancing overall trust in the system. 📌 Periodic Review and Refinement: Regularly review and refine your testing strategy based on feedback and lessons learned from previous testing cycles. This iterative approach helps continually improve the effectiveness and trustworthiness of your testing process.
-
Mutation-Guided LLM-based Test Generation at Meta As a next step to last year's super cool Meta paper on LLMs generating tests, here we have it. Testing has moved, finally, beyond mere coverage. The guarantees are a lot stronger too, because automated compliance hardner always give examples of the specific kinds of faults that its tests will find (rather than just claiming more line coverage, which they also can do anyway). Abstract: "This paper1 describes Meta’s ACH system for mutation-guided LLM-based test generation. ACH generates relatively few mutants (aka simulated faults), compared to traditional mutation testing. Instead, it focuses on generating currently undetected faults that are specific to an issue of concern. From these currently uncaught faults, ACH generates tests that can catch them, thereby ‘killing’ the mutants and consequently hardening the platform against regressions. We use privacy concerns to illustrate our approach, but ACH can harden code against any type of regression. In total, ACH was applied to 10,795 Android Kotlin classes in 7 software platforms deployed by Meta, from which it generated 9,095 mutants and 571 privacy-hardening test cases. ACH also deploys an LLM-based equivalent mutant detection agent that achieves a precision of 0.79 and a recall of 0.47 (rising to 0.95 and 0.96 with simple preprocessing). ACH was used by Messenger and WhatsApp test-athons where engineers accepted 73% of its tests, judging 36% to privacy relevant. We conclude that ACH hardens code against specific concerns and that, even when its tests do not directly tackle the specific concern, engineers find them useful for their other benefits." https://lnkd.in/dyAn3G_k
-
I’m inspired by this story on AI-powered process innovation around QA from Thorsten Ott and his SiteWatch team at Fueled. 🤩 https://lnkd.in/gGeinzNY Anyone who’s built or maintained large digital properties knows regression testing is essential... but often a tedious time sink relative to its value. Visual Regression Testing (VRT) tools promised to streamline QA by showing heat maps of visual differences before and after a code change or update. In reality, I consistently found two big issues that made them maddening: (1) Most “flags” are just expected content changes, like new headlines, ad swaps, or personalized components, so you mostly waste time pouring over false positives. 😑 (2) Even small, intentional design tweaks (like typography adjustments) can flood the screen with red highlights, obfuscating real problems. 😣 When I was involved in testing, I often found myself giving up on VRT and relying on error prone manual checks. Fueled’s new homegrown tool changes the equation: it uses AI to automatically review every flagged difference, separating real breakages from harmless updates, then summarizes issues in plain language right inside Slack. 🤖🧠 This solution not only makes VRT faster, it also makes it more effective. By filtering out noise and focusing on true regressions, teams can expand their test coverage and review more pages with each release without exponentially driving up time and effort. Meaning: the AI saves time *and* reduces the likelihood of missing real quality issues. It’s a perfect example of how smart AI integrations can improve efficiency *and* quality.
-
Playwright visual regression testing using your existing automation framework with minimal custom code. 🏗️ HOOK-DRIVEN ARCHITECTURE Phase 1: Specialized Agents - visual-regression-agent: Handles baseline capture, comparison, and management - url-change-detector-agent: Maps code changes to affected URLs using git diff analysis - playwright-baseline-agent: Manages scrolling capture and segmented storage Phase 2: Git Hooks Integration - post-commit hook: Triggers change detection and selective visual testing - pre-push hook: Validates all baselines are current before deployment - post-merge hook: Updates baselines after approved changes Phase 3: Rule-Based Automation - visual-regression-rules.md: Defines when/how visual tests trigger - baseline-storage-rules.md: Governs folder vs file storage logic - change-detection-rules.md: Maps file patterns to affected URLs Phase 4: Minimal Custom Code - URL mapping config (JSON): File patterns → affected URLs - Baseline storage config (JSON): Page → storage strategy - Integration scripts: Glue code to connect hooks → agents → rules 🛠️ IMPLEMENTATION COMPONENTS Agents (Leveraging existing Task tool) .claude/agents/ ├── visual-regression.md # Baseline management ├── url-change-detector.md # Change impact analysis └── playwright-baseline.md # Capture automation Hooks (Extending existing hook system) .claude/hooks/ ├── post-commit-visual.sh # Trigger after commits ├── pre-push-baseline.sh # Validate before push └── visual-test-runner.sh # Execute selective tests Rules (Auto-loaded by keyword) .claude/rules/ ├── visual-regression.md # Testing workflow rules ├── baseline-management.md # Storage and versioning └── change-detection.md # Impact analysis rules Minimal Code (Configuration-driven) playwright/ ├── visual-config.json # URL mappings & storage rules ├── baseline-runner.js # Lightweight test executor └── change-detector.js # Git diff → URL mapper 🔄 AUTOMATED WORKFLOW 1. Code Change → post-commit hook detects changes 2. Hook → launches url-change-detector-agent 3. Agent → applies change-detection rules to identify affected URLs 4. Hook → launches visual-regression-agent with affected URL list 5. Agent → runs selective Playwright tests with baseline comparison 6. Results → automatically update baselines or report failures 🎯 BENEFITS - ✅ 90% automation through existing hook/agent/rule system - ✅ Minimal custom code - mostly configuration - ✅ Self-managing - hooks handle trigger logic - ✅ Rule-driven - easy to modify behavior without code changes - ✅ Agent-powered - leverage existing Task tool capabilities
-
lets say, you are QA manager and i am a QA and Manager asked you to start implementing automation testing for regression testcases for a website first. how would you do it? this will be my approach: Since we already have a set of manual regression test cases, I’d begin by reviewing and prioritizing them. Not all test cases are worth automating immediately—some may be too unstable or rarely executed. So, I'd focus first on high-impact, frequently executed tests like login, signup, checkout, and other critical flows. I'd organize these into a clear, shared spreadsheet or test management tool and tag them as "Ready for Automation." a tag always helps. Next, I’d set up a basic Java + Selenium framework. If we don’t already have one, I’d recommend using Maven for dependency management, TestNG or JUnit for test orchestration, and Page Object Model (POM) as the design pattern to keep our tests modular and maintainable. I'd also propose integrating ExtentReports for test reporting and Log4j for logging. I can bootstrap this framework myself or pair with a dev/test automation resource if needed. Once the skeleton framework is ready, I’d start converting manual test cases into automated scripts one by one. I’d begin with the smoke tests and top-priority regressions. For each script, I’d ensure proper setup, execution, teardown, and validations using assertions. then, I’ll commit code to a shared Git repo with meaningful branches and naming conventions. For execution, I'd run the tests locally first, then configure them to run on different browsers. Later, we can integrate the suite with a CI/CD tool like Jenkins to schedule regular test runs (e.g., nightly builds or pre-release checks). This would give us feedback loops without manual intervention. I’d document everything—how to run the tests, add new ones, and generate reports—so the team can scale this effort. I’d also recommend setting aside a couple of hours weekly to maintain and update tests as the app evolves. Finally, I’d keep you in the loop with weekly updates on automation progress, blockers, and test coverage. Once the core regression suite is automated and stable, we can expand into edge cases, negative tests, and possibly integrate with tools like Selenium Grid or cloud providers (e.g., BrowserStack) for cross-browser coverage. what will you be your action plan? let's share. #testautomation #automationtesting #testautomationframework #sdets
-
"Quality starts before code exists", This is how AI can be used to reimagine the Testing workflow Most teams start testing after the build. But using AI, we can start it in design phase Stage - 1: WHAT: Interactions, font-size, contrast, accessibility checks etc. can be validated using GPT-4o / Claude / Gemini (LLM design review prompts) - WAVE (accessibility validation) How we use them: Design files → exported automatically → checked by accessibility scanners → run through LLM agents to evaluate interaction states, spacing, labels, copy clarity, and UX risks. Stage - 2: Tools: • LLMs (GPT-4o / Claude 3.5 Sonnet) for requirement parsing • Figma API + OCR/vision models for flow extraction • GitHub Copilot for converting scenarios to code skeletons • TestRail / Zephyr for structured test storage How we use them: PRDs + user stories + Figma flows → AI generates: ✔ functional tests ✔ negative tests ✔ boundary cases ✔ data permutations SDETs then refine domain logic instead of writing from scratch. Stage - 3: Tools: • SonarQube + Semgrep (static checks) • LLM test reviewers (custom prompt agents) • GitHub PR integration How we use them: Every test case or automation file passes through: SonarQube: static rule checks LLM quality gate that flags: - missing assertions - incomplete edge coverage - ambiguous expected outcomes - inconsistent naming or structure We focus on strategy -> AI handles structural review. Stage - 4: Tools: • Playwright, WebDriver + REST Assured • GitHub Copilot for scaffold generation • OpenAPI/Swagger + AI for API test generation How we use them: Engineers describe intent → Copilot generates: ✔ Page objects / fixtures ✔ API client definitions ✔ Custom commands ✔ Assertion scaffolding SDETs optimise logic instead of writing boilerplate. THE RESULT - Test design time reduced 60% - Visual regressions detected with near-pixel accuracy - Review overhead for SDETs significantly reduced - AI hasn’t replaced SDETs. It removed mechanical work so humans can focus on: • investigation • creativity • user empathy • product risk understanding -x-x- Learn & Implement the fundamentals required to become a Full Stack SDET in 2026: https://lnkd.in/gcFkyxaK #japneetsachdeva
-
AI coding agents like Cursor and Claude Code have accelerated development speeds by 10x. But there is a massive hidden tax if you want to prevent shipping slop: 𝐓𝐞𝐬𝐭 𝐌𝐚𝐢𝐧𝐭𝐞𝐧𝐚𝐧𝐜𝐞. When you ship a new feature, you don’t just risk old tests being "broken" but also being outdated. You don't just need them to pass; you need them to reflect the flows your users take. Usually, this means context switching, digging through DOM selectors, and manually rewriting scripts to cover the new steps. At Decipher AI, we treat test updates like a conversation with someone that knows your product inside and out. You don't rewrite code to update a test. You just ask for the change. Whether you added new tabs or inserted new steps into the checkout flow, you simply describe the new requirement in plain English. Our Computer Use Agent then takes over: 1️⃣ Interprets your intent and interacts with your live UI. 2️⃣ Adds the specific assertions you ask for to verify the product behaves properly. 3️⃣ Iterates on the flow to ensure stability. 4️⃣ Converts those actions into fast, reliable Playwright scripts. This isn't just "self-healing" (we do that automatically for UI changes). This is AI-driven test expansion. You keep shipping features. Let the agent handle the regression suite. Check out the demo below to see how fast you can iterate on tests.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development