Consequences of Software Failures

Explore top LinkedIn content from expert professionals.

Summary

Consequences of software failures refer to the negative impacts that occur when programs or systems malfunction, ranging from lost revenue and damaged reputations to legal action and ruined careers. These failures can disrupt business operations, harm individuals, and create widespread chaos across industries.

Prioritize regular testing: Invest time and resources in thorough testing and validation before releasing new software or updates to avoid unexpected disruptions.
Establish clear accountability: Maintain audit trails, reporting channels, and chain of responsibility to help detect and address errors early while protecting users from cascading harm.
Build robust risk management: Develop backup plans and contingency strategies so your business can quickly recover from outages and minimize financial, legal, and reputational damages.

Summarized by AI based on LinkedIn member posts

Paul Meredith

I build start-up and scale-up fintechs. I help fintech CEOs deliver annual revenue growth of £15m+, by leading and optimising the change and delivery function

12,850 followers 1y
Report this post
The biggest businesses can get major programmes horribly wrong. Here are 4 famous examples, the fundamental reasons for failure and how that might have been avoided. Hershey: Sought to replace its legacy IT systems with a more powerful ERP system. However, due to a rushed timeline and inadequate testing, the implementation encountered severe issues. Orders worth over $100 million were not fulfilled. Quarterly revenues fell by 19% and the share price by 8% Key Failures: ❌ Rushed implementation without sufficient testing ❌ Lack of clear goals for the transition ❌ Inadequate attention and resource allocation Hewlett Packard: Wanted to consolidate its IT systems into one ERP. They planned to migrate to SAP, expecting any issues to be resolved within 3 weeks. However, due to the lack of configuration between the new ERP and the old systems, 20% of customer orders were not fulfilled. Insufficient investment in change management and the absence of manual workarounds added to the problems. This entire project cost HP an estimated $160 million in lost revenue and delayed orders. Key Failures: ❌ Failure to address potential migration complications. ❌ Lack of interim solutions and supply chain management strategies. ❌ Inadequate change management planning. Miller Coors: Spent almost $100 million on an ERP implementation to streamline procurement, accounting, and supply chain operations. There were significant delays, leading to the termination of the implementation partner and subsequent legal action. Mistakes included insufficient research on ERP options, choosing an inexperienced implementation partner, and the absence of capable in-house advisers overseeing the project. Key Failures: ❌ Inadequate research and evaluation of ERP options. ❌ Selection of an inexperienced implementation partner. ❌ Lack of in-house expertise and oversight. Revlon: Another ERP implementation disaster. Inadequate planning and testing disrupted production and caused delays in fulfilling customer orders across 22 countries. The consequences included a loss of over $64 million in unshipped orders, a 6.9% drop in share price, and investor lawsuits for financial damages. Key Failures: ❌ Insufficient planning and testing of the ERP system. ❌ Lack of robust backup solutions. ❌ Absence of a comprehensive change management strategy. Lessons to be learned: ✅ Thoroughly test and evaluate new software before deployment. ✅ Establish robust backup solutions to address unforeseen challenges. ✅ Design and implement a comprehensive change management strategy during the transition to new tools and solutions. ✅ Ensure sufficient in-house expertise is available; consider capacity of those people as well as their expertise ✅ Plan as much as is practical and sensible ✅ Don’t try to do too much too quickly with too few people ✅ Don’t expect ERP implementation to be straightforward; it rarely is
No more previous content

No more next content
24 Comments
Like Comment
Hassan Basil Hassan, Esq.

Chief Legal Officer & General Counsel | Trusted Where It Matters Most

5,850 followers 1y
Report this post
Behind the Blue Screen: How the CrowdStrike Glitch Exposed Global Vulnerabilities Have you ever considered how a single software update could paralyze global operations? This alarming reality unfolded on July 19, when a routine update from CrowdStrike, a leading cybersecurity firm, caused widespread disruption. For me, the impact was intensely personal as my computer crashed the night before a crucial lecture for LLM students. Despite multiple reboot attempts, my anxiety grew as my case studies remained inaccessible. We were facing the infamous ‘Blue Screen of Death’—a critical system error in Windows that sounds like the punchline of a bad tech joke, but the reality was far from humorous. This routine software update had escalated into a worldwide disruption. Immediate Economic Impact The CrowdStrike BSOD incident had far-reaching consequences. Airlines experienced delays and cancellations, grounding flights and stranding passengers. Major banks reported halted transactions, leading to customer frustration and financial losses. This incident starkly illustrated how a single software patch could disrupt global operations. According to Gartner, the average cost of IT downtime is $5,600 per minute, equating to over $300,000 per hour. By 2025, 60% of organizations are expected to suffer major service failures due to mismanagement of cyber risks. Beyond the economic repercussions, this incident also highlighted significant legal challenges. Legal Implications In addition to economic turmoil, the incident posed significant legal challenges. CrowdStrike may face lawsuits from businesses claiming damages. Grounded flights could result in breach of contract claims and passenger compensation demands. This raises critical questions about the liability of software providers for unintended update consequences. Such incidents may prompt regulators to introduce stricter requirements for software update testing and validation. Lessons for Businesses This incident is a stark reminder for businesses to reassess their reliance on third-party software and improve their preparedness for disruptions. Regularly reviewing risk management strategies and developing robust contingency plans is essential. Ensuring vendor agreements outline responsibilities, liabilities, and mitigation processes for update-related issues can help minimize the impact. These steps are crucial for maintaining operational continuity and reducing potential damages. The CrowdStrike BSOD incident underscores the urgent need for businesses to be prepared for digital disruptions. Strengthening legal frameworks and enhancing risk management are vital. This incident serves as a wake-up call: businesses must fortify their defenses against the unexpected. Personally, this experience reminded me of the importance of having reliable backups and a robust contingency plan. Is your organization prepared for the next disruption? #TechDisruption #RiskManagement #Cybersecurity #BusinessContinuity
No more previous content

No more next content
5 Comments
Like Comment
Artem Golubev

Co-Founder and CEO of testRigor, the #1 Generative AI-based Test Automation Tool

35,948 followers 1y
Report this post
"It was just a software bug" — the excuse that cost 700+ postal workers their careers. The Post Office Horizon scandal exposes what happens when we compromise on testing. Senior executives walked away with bonuses while postal workers were accused of theft and fraud because of software bugs in the Horizon accounting system. The true scandal? Officials knew Fujitsu could remotely alter accounts. They prosecuted innocent people anyway. Lives ruined. Homes lost. Prison sentences served. Some died before being cleared. All because of unchecked software bugs. Three Hard Lessons: The Cascade Software bugs escalate beyond technical issues. What starts in code impacts real people and businesses. The Cover-Up Complex systems make perfect hiding places. When bugs appear, accountability disappears. End users pay the price for others' mistakes. The Prevention Mandatory audit trails Third-party verification Protected channels for reporting issues Clear chain of responsibility Regular security assessments Tools like testRigor can help catch these issues early through intelligent test automation. But it starts with taking testing seriously. The next Horizon scandal is likely brewing right now, in a project where testing is seen as "too expensive." But cutting corners on testing isn't saving money— it's passing risk to your users. Quality isn't expensive. Failures are. What steps do you take to prevent software bugs from spiraling out of control? Image source: 4gifs #technology #qa #techleadership #software

2 Comments
Like Comment
Nicholas P.

I’m the GeoSec Guy | AI GRC Specialist | Creator of GeoSec & Trust Engineering Frameworks for Enterprise Adoption | Top 1% Fastest Growing Voice in AI GRC

3,794 followers 8mo
Report this post
Push a bad update, and your company goes under. That’s not an exaggeration - it’s happened at massive scale. In July 2024, a CrowdStrike update triggered a faulty configuration that caused Windows machines globally to crash, creating widespread service disruption - including airline delays, emergency response outages, and hospital system chaos. Here’s where QA/governance comes in: - Many teams still think of QA as bug-catching. But a failed release can cascade into operational downtime, compliance risk, and financial losses - sometimes worse than a data breach. - In regulated environments, every release should be a compliance checkpoint, not just an engineering milestone. - That CrowdStrike incident showed how skipping proper validation, rollback testing, and staged deployments can turn a security tool into a systemic risk. QA isn’t just testing - it’s GRC. With the rise in YoY zero-days, and third party risk being at the forefront of scrutiny - How does your org plan on releasing updates ahead of competitors, without turning speed into your biggest vulnerability? #CyberSecurity #GRC #DevOps #QA #RiskManagement #SoftwareTesting
No more previous content

No more next content
23 Comments
Like Comment
Ben Thomson

Founder and Ops Director @ Full Metal Software | Improving Efficiency and Productivity using bespoke software

17,190 followers 1y
Report this post
💥 𝗬𝗼𝘂𝗿 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗥𝘂𝗻𝘀 𝗼𝗻 𝗖𝗼𝗱𝗲—𝗦𝗼 𝗪𝗵𝘆 𝗔𝗿𝗲 𝗬𝗼𝘂 𝗚𝗮𝗺𝗯𝗹𝗶𝗻𝗴 𝗪𝗶𝘁𝗵 𝗜𝘁? In 2024, UK businesses lost an estimated £3.6 million each due to IT failures. Yet, most companies still treat software maintenance as an afterthought—waiting for a breakdown before scrambling to fix it. Let’s be clear: downtime is 𝗡𝗢𝗧 a tech issue. It’s a revenue, reputation, and survival issue. When systems crash: 🚨 Revenue stops—Every minute costs an average of £4,300 in lost sales. 🚨 Customers leave—91% of UK consumers won’t return after a bad digital experience. 🚨 Reputation takes a hit—Frustrated users don’t wait; they switch. So why do so many businesses still react to failures instead of preventing them? 𝙏𝙝𝙚 𝙎𝙞𝙡𝙚𝙣𝙩 𝙆𝙞𝙡𝙡𝙚𝙧 𝙞𝙣 𝙔𝙤𝙪𝙧 𝙏𝙚𝙘𝙝 𝙎𝙩𝙖𝙘𝙠 Warning signs that your business is heading for a tech disaster: ⚠️ Frequent system crashes or slow performance—frustrating employees and customers alike. ⚠️ Delayed software updates & security patches—opening the door to cyberattacks. ⚠️ Unexpected outages during peak periods—costing thousands in lost sales. If any of these sound familiar, your business isn’t running software—it’s running on borrowed time. 𝙏𝙝𝙚 𝙎𝙢𝙖𝙧𝙩 𝙁𝙞𝙭: 𝙋𝙧𝙤𝙖𝙘𝙩𝙞𝙫𝙚 𝙎𝙤𝙛𝙩𝙬𝙖𝙧𝙚 𝙎𝙩𝙖𝙗𝙞𝙡𝙞𝙩𝙮 Preventing downtime isn’t about waiting for the next crisis. It’s about engineering resilience into your systems. 🔹 Emergency Bug Fixing & System Rollbacks – Rapid-response fixes to restore operations fast. 🔹 Automated Security Patching – Shield your business from compliance fines and cyber risks. 🔹 Scalability Planning – Ensure your systems can handle peak demand without breaking. 🔹 AI-Powered Monitoring – Catch small issues before they turn into major failures. At Full Metal Software, we specialise in Maintenance Rescue—helping UK businesses avoid IT catastrophes before they happen. 𝙇𝙚𝙩’𝙨 𝙏𝙖𝙡𝙠: 𝙄𝙨 𝙔𝙤𝙪𝙧 𝘽𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝘽𝙪𝙞𝙡𝙩 𝙛𝙤𝙧 𝙍𝙚𝙨𝙞𝙡𝙞𝙚𝙣𝙘𝙚? ❓ When was the last time your software was audited for risks? ❓ What’s your biggest challenge in keeping systems running 24/7? Let’s discuss how UK businesses can move from reacting to IT failures to building bulletproof systems that just work. 📩 Drop a comment or DM me if you want a free software health check—before the next outage costs you thousands. . . . . . #SoftwareReliability #BusinessContinuity #ITLeadership
No more previous content

No more next content
38 Comments
Like Comment
Pragyan Tripathi

Clojure Developer @ Amperity | Building Chuck Data

4,048 followers 1y
Report this post
How a few bits in the wrong place broke 1 billion computers and brought the world to standstill? 𝗧𝗵𝗲 𝗜𝗻𝗰𝗶𝗱𝗲𝗻𝘁 Recently, a serious issue emerged with Crowdstrike's software on Windows systems, causing widespread system crashes (Blue Screens of Death). This incident highlighted the critical nature of system driver stability and the potential far-reaching consequences of software errors at this level. 𝗥𝗼𝗼𝘁 𝗖𝗮𝘂𝘀𝗲 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 1. Null Pointer Exception: The core of the problem was a null pointer dereference, a common issue in memory-unsafe languages like C++. 2. Memory Access Violation: The software attempted to read from memory address 0x9c (156 in decimal), which is an invalid region for program access. Any attempt to read from this area triggers immediate termination by Windows. 3. Programmer Error: The issue stemmed from a failure to properly check for null pointers before accessing object members. In C++, address 0x0 is used to represent "null" or "nothing here." 4. System Driver Context: As the error occurred in a system driver with privileged access, Windows was forced to crash the entire system rather than just terminating a single program. 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗗𝗲𝘁𝗮𝗶𝗹𝘀 1. The error likely occurred when trying to access a member variable of a null object pointer. 2. The memory address being accessed (0x9c) suggests that the code was attempting to read from an offset of 156 bytes from a null pointer (0 + 0x9c = 0x9c). 3. This type of error is preventable with proper null checking or use of modern tooling that can detect such issues. 𝗜𝗺𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀 𝗮𝗻𝗱 𝗙𝘂𝘁𝘂𝗿𝗲 𝗦𝘁𝗲𝗽𝘀 Microsoft's Role and Crowdstrike's Response • Need for improved policies to roll back defective drivers. • Potential enhancement of code safety measures. • Potential implementation of automated code sanitization tools. • Consideration of rewriting system driver in memory-safe language like Rust. • Industry-wide discussion on moving from C++ to safer languages like Rust. 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗜𝗺𝗽𝗮𝗰𝘁 • Delivery services like FedEx, UPS, DHL face disruptions and delays. • Supermarkets struggle to accept mobile payments. • Major corporate IT worldwide struggles with point-of-purchase. • Major hospitals halt surgeries. • Airports ground and delay flights while engineers recover affected systems. • Repercussions spread to other platforms, with Amazon Web Services reporting issues. 𝗖𝗼𝗻𝗰𝗹𝘂𝘀𝗶𝗼𝗻 The losses incurred by businesses would easily be in 100 billions if not in trillions. This incident serves as a stark reminder of the importance of rigorous testing and safety checks in system-level software, especially in privileged contexts like drivers. It also highlights the ongoing challenges posed by memory-unsafe languages in critical software components. #crowdstrike #software #tech #microsoft #techtrend
No more previous content

No more next content
1 Comment
Like Comment
Aayush Bhatnagar

Building 5G, 6G & AI for India 🇮🇳

42,667 followers 1y
Report this post
5 Learnings from the Microsoft Cloud Outage ! (1) Mass Upgrades are Dangerous & should be moderated through a gradual rollout process. Any software patch rollout should not be carried out at a mass scale at all the sites together. In carrier-grade telecom networks for example, we follow the process of First Office Application (FOA), where the upgrade is first rolled out at a few clusters of the product at a SINGLE geographical site / customer site. Only after a soak period of 48 hours, we upgrade the patch at other instances of the same site and observe thereafter, and that upgrade too is carried out in batches. We apply the software upgrade to other sites, once there is confirmation that there is no collateral damage at the FOA site. Automation of upgrades should not mean that we push the software at all global locations concurrently without a feedback loop. (2) First Office Upgrade requires Pre and Post Analysis for all software versions being upgraded. It is possible that the software being upgraded is running at different versions on different geographical sites. During the FOA process, there should be a proper pre-and-post analysis to ascertain the impact of the upgrade during the soak period. This can mitigate issues such as intermittent crashes / core dumps (eg: blue screen of windows is a core dump) or even memory leaks that lead to “slowdown” and “hanging” of the software. This helps in early isolation of side-effects and pausing further upgrades, in case of faulty software & prevent it from becoming “viral”. (3) Automated Rollback after a crash via backed-up software. Once crashes are discovered during a FOA upgrade, there needs to be an automated process to “fallback” to the original software version. This needs backup of the entire application binary, configurations and databases prior to attempting an upgrade. The rollback should automatically commence as soon as a core dump is observed or basis human action at the central level (through automation). (4) Real-Time fault management and anomaly detection via Machine Learning. It is surprising that Microsoft and CrowdStrike got to know about the issues only when customers reported the crashes. As these upgrades were done in bulk, thousands of windows machines would have gone down in a small period of time. This should have been caught by their fault management systems, as a major anomaly prior to customers reporting failures. (5) Disaster Recovery Architecture and Process was missing. Even when Microsoft and CrowdStrike were working to recover the primary systems, there was no process of moving mission critical workloads to a disaster recovery site. It is reported that emergency 911 services were down at several states in the US, and hospitals were also impacted. Such customers who are saving lives of people should have been moved to a DR Cloud Site to restore their services immediately. My 2 cents :-)
No more previous content

No more next content
57 Comments
Like Comment
Bask Iyer

CEO, BaskMind.com, Advisor/ Board member at BCG, Zoom, Automation Anywhere, Cohesity, Iron Mountain, and tech startups. Former C level at Honeywell, Dell, VMware, Juniper Networks,etc.

10,419 followers 1y
Report this post
What Can We Learn from the CrowdStrike-Caused Outage? The recent widespread outage triggered by a faulty CrowdStrike upgrade has underscored a critical truth: in the complex tapestry of modern enterprise systems, the buck stops at the top. While the immediate blame may rest with CrowdStrike, the ripple effects and subsequent fallout are a stark reminder of the broader systemic vulnerabilities that exist within our IT infrastructure. The Customer Perspective From the standpoint of employees, end-users, and customers, the source of the disruption is irrelevant. Their focus is on the company they interact with, and their expectation is simple: reliable service. When this service is interrupted, the consequences, both in terms of productivity and customer satisfaction, are tangible The CIO's Dilemma CIOs often find themselves in the crosshairs during such crises, bearing the brunt of their organization's frustrations. While initially this may seem unfair, it's essential to recognize the validity of this perspective. The decision to partner with specific vendors, including the reliance on their update processes, ultimately falls on the IT leadership. The rapid pace of technological advancement has introduced unprecedented complexities. The subscription model, which has become the norm for countless software providers, has shifted the control over updates from IT departments to vendors. This lack of oversight creates a new set of challenges, particularly when considering the potential impact of updates on critical systems. Beyond CrowdStrike It's crucial to emphasize that this is not an isolated incident. Similar disruptions can and have occurred with other software vendors, including antivirus software. Microsoft’s infamous Blue Screen of Death (BSOD) been around since the 90s. How many of your applications make kernel level changes? Should Microsoft notify you before permitting this? The over-reliance on a limited pool of major providers, while seemingly a safe bet, can create vulnerabilities. A Call to Action To bolster resilience in the face of such disruptions, organizations must adopt a proactive approach: Reinforce Enterprise Resilience Planning: While many IT departments have contingency plans in place, the increasing frequency and complexity of updates necessitate a more rigorous approach. Diversify the Technology Stack: Consider incorporating a wider range of vendors and platforms to reduce dependency on any single provider. Enhance Board Oversight: Boards of directors must take a more active role in assessing the organization's critical infrastructure. This includes holding IT leadership accountable for system reliability and demanding robust contingency plans. We must continue the trend of adding technically and operationally competent board members. The companies big economic, legal and brand risks are now, increasingly technology related How can organizations better prepare for such incidents? Share your insights in the comments below.

19 Comments
Like Comment

Consequences of Software Failures

Summary

More in Software Development

Explore categories