Data Privacy Risks in Writing Software

Explore top LinkedIn content from expert professionals.

Summary

Data privacy risks in writing software refer to potential threats where sensitive information, such as personal data or confidential business details, is exposed or mishandled during the development and use of software, especially in the context of AI-powered tools like large language models. These risks include accidental leaks, unauthorized access, and improper data sharing, making it crucial for developers and users to prioritize privacy protections.

Secure data flows: Always map out how information moves through your software and make sure sensitive data is protected at every stage.
Audit and monitor: Regularly review access logs and AI interactions to catch unintentional exposures or breaches before they escalate.
Limit data sharing: Only share the minimum amount of information needed for software functions and use privacy-preserving techniques like anonymization.

Summarized by AI based on LinkedIn member posts

Ankita Gupta

Co-founder and CEO at Akto.io - Building the world’s #1 MCP and AI Agent Security Platform

24,470 followers 10mo
Report this post
Day 6 of MCP Security: How Does MCP Handle Data Privacy and Security? In MCPs, AI agents don’t just call APIs — they decide which APIs to call, what data to inject, and how to act across tools. But that introduces new privacy and security risks 👇 𝗪𝗵𝗮𝘁’𝘀 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝘄𝗶𝘁𝗵 𝗠𝗖𝗣𝘀? In traditional systems, data moves in defined flows: Frontend → API → Backend You know what’s shared, when, and with whom. 𝗜𝗻 𝗠𝗖𝗣𝘀: • Context (PII, tokens, metadata) is injected at runtime • The model decides what’s relevant • The agent can store, reason over, and share user data autonomously • Tool calls are invisible unless explicitly audited 𝗞𝗲𝘆 𝗣𝗿𝗶𝘃𝗮𝗰𝘆 𝗥𝗶𝘀𝗸𝘀 𝘄𝗶𝘁𝗵 𝗠𝗖𝗣𝘀 1. Context Leakage: Memory and prompt history may persist across sessions, allowing PII to leak between users or flows. 2. Excessive Data Exposure: Agents may call APIs or tools with more data than needed, violating the principle of least privilege. 3. Unlogged Data Flows: Tool calls, prompt injections, and chained actions may bypass traditional logging, breaking auditability. 4. Consent Drift: A user consents to one action, but the agent infers and performs other actions based on the user's intent. That’s a privacy violation. 𝗪𝗵𝗮𝘁 𝗣𝗿𝗶𝘃𝗮𝗰𝘆 𝗖𝗼𝗻𝘁𝗿𝗼𝗹𝘀 𝗠𝗖𝗣 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗠𝘂𝘀𝘁 𝗜𝗻𝗰𝗹𝘂𝗱𝗲: ✔️ Context Isolation Prevent data from crossing agent sessions or user boundaries without explicit logic. ✔️ Prompt-Level Redaction Strip sensitive data before it's passed into agent prompts. ✔️ Chain-Aware Access Controls Control not just what tool can be called, but how and when it’s called, especially for downstream flows. ✔️ Logging & Audit Trails for Reasoning Log not just API calls, but: Prompt inputs Tool decisions Context usage Response paths ✔️ Dynamic Consent Models Support user-level prompts that include consent logic, especially when agents make cross-domain decisions. In short: MCPs don’t just call APIs, they decide what data to use and how. If you’re not securing the context, the memory, and the tools, you’re not securing the system.

3 Comments
Like Comment
Luiza Jarovsky, PhD Luiza Jarovsky, PhD is an Influencer

Co-founder of the AI, Tech & Privacy Academy (1,400+ participants), Author of Luiza’s Newsletter (94,000+ subscribers), Mother of 3

131,288 followers 1y
Report this post
🚨 AI Privacy Risks & Mitigations Large Language Models (LLMs), by Isabel Barberá, is the 107-page report about AI & Privacy you were waiting for! [Bookmark & share below]. Topics covered: - Background "This section introduces Large Language Models, how they work, and their common applications. It also discusses performance evaluation measures, helping readers understand the foundational aspects of LLM systems." - Data Flow and Associated Privacy Risks in LLM Systems "Here, we explore how privacy risks emerge across different LLM service models, emphasizing the importance of understanding data flows throughout the AI lifecycle. This section also identifies risks and mitigations and examines roles and responsibilities under the AI Act and the GDPR." - Data Protection and Privacy Risk Assessment: Risk Identification "This section outlines criteria for identifying risks and provides examples of privacy risks specific to LLM systems. Developers and users can use this section as a starting point for identifying risks in their own systems." - Data Protection and Privacy Risk Assessment: Risk Estimation & Evaluation "Guidance on how to analyse, classify and assess privacy risks is provided here, with criteria for evaluating both the probability and severity of risks. This section explains how to derive a final risk evaluation to prioritize mitigation efforts effectively." - Data Protection and Privacy Risk Control "This section details risk treatment strategies, offering practical mitigation measures for common privacy risks in LLM systems. It also discusses residual risk acceptance and the iterative nature of risk management in AI systems." - Residual Risk Evaluation "Evaluating residual risks after mitigation is essential to ensure risks fall within acceptable thresholds and do not require further action. This section outlines how residual risks are evaluated to determine whether additional mitigation is needed or if the model or LLM system is ready for deployment." - Review & Monitor "This section covers the importance of reviewing risk management activities and maintaining a risk register. It also highlights the importance of continuous monitoring to detect emerging risks, assess real-world impact, and refine mitigation strategies." - Examples of LLM Systems’ Risk Assessments "Three detailed use cases are provided to demonstrate the application of the risk management framework in real-world scenarios. These examples illustrate how risks can be identified, assessed, and mitigated across various contexts." - Reference to Tools, Methodologies, Benchmarks, and Guidance "The final section compiles tools, evaluation metrics, benchmarks, methodologies, and standards to support developers and users in managing risks and evaluating the performance of LLM systems." 👉 Download it below. 👉 NEVER MISS my AI governance updates: join my newsletter's 58,500+ subscribers (below). #AI #AIGovernance #Privacy #DataProtection #AIRegulation #EDPB
No more previous content

No more next content
26 Comments
Like Comment
Peter Slattery, PhD

MIT AI Risk Initiative | MIT FutureTech

68,458 followers 11mo
Report this post
Isabel Barberá: "This document provides practical guidance and tools for developers and users of Large Language Model (LLM) based systems to manage privacy risks associated with these technologies. The risk management methodology outlined in this document is designed to help developers and users systematically identify, assess, and mitigate privacy and data protection risks, supporting the responsible development and deployment of LLM systems. This guidance also supports the requirements of the GDPR Article 25 Data protection by design and by default and Article 32 Security of processing by offering technical and organizational measures to help ensure an appropriate level of security and data protection. However, the guidance is not intended to replace a Data Protection Impact Assessment (DPIA) as required under Article 35 of the GDPR. Instead, it complements the DPIA process by addressing privacy risks specific to LLM systems, thereby enhancing the robustness of such assessments. Guidance for Readers > For Developers: Use this guidance to integrate privacy risk management into the development lifecycle and deployment of your LLM based systems, from understanding data flows to how to implement risk identification and mitigation measures. > For Users: Refer to this document to evaluate the privacy risks associated with LLM systems you plan to deploy and use, helping you adopt responsible practices and protect individuals’ privacy. " >For Decision-makers: The structured methodology and use case examples will help you assess the compliance of LLM systems and make informed risk-based decision" European Data Protection Board

12 Comments
Like Comment
Mani Keerthi N

Cybersecurity Strategist & Advisor || LinkedIn Learning Instructor

17,667 followers 2y
Report this post
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey From the research paper: In this paper, we extensively investigate data privacy concerns within Large LLMs, specifically examining potential privacy threats from two folds: Privacy leakage and privacy attacks, and the pivotal technologies for privacy protection during various stages of LLM privacy inference, including federated learning, differential privacy, knowledge unlearning, and hardware-assisted privacy protection. Some key aspects from the paper: 1)Challenges: Given the intricate complexity involved in training LLMs, privacy protection research tends to dissect various phases of LLM development and deployment, including pre-training, prompt tuning, and inference 2) Future Directions: Protecting the privacy of LLMs throughout their creation process is paramount and requires a multifaceted approach. (i) Firstly, during data collection, minimizing the collection of sensitive information and obtaining informed consent from users are critical steps. Data should be anonymized or pseudonymized to mitigate re-identification risks. (ii) Secondly, in data preprocessing and model training, techniques such as federated learning, secure multiparty computation, and differential privacy can be employed to train LLMs on decentralized data sources while preserving individual privacy. (iii) Additionally, conducting privacy impact assessments and adversarial testing during model evaluation ensures potential privacy risks are identified and addressed before deployment. (iv)In the deployment phase, privacy-preserving APIs and access controls can limit access to LLMs, while transparency and accountability measures foster trust with users by providing insight into data handling practices. (v)Ongoing monitoring and maintenance, including continuous monitoring for privacy breaches and regular privacy audits, are essential to ensure compliance with privacy regulations and the effectiveness of privacy safeguards. By implementing these measures comprehensively throughout the LLM creation process, developers can mitigate privacy risks and build trust with users, thereby leveraging the capabilities of LLMs while safeguarding individual privacy. #privacy #llm #llmprivacy #mitigationstrategies #riskmanagement #artificialintelligence #ai #languagelearningmodels #security #risks

2 Comments
Like Comment
Martin Delahunty

Company Director, Inspiring STEM Consulting

4,612 followers 10mo
Report this post
⚠️ CRITICAL AI SECURITY ALERT FOR MEDICAL WRITERS The recent Fortune investigation into Microsoft Copilot's "EchoLeak" vulnerability should be a wake-up call for the medical writing industry. As medical writers increasingly rely on AI tools like Copilot to draft clinical study reports, regulatory submissions, and other documents containing sensitive patient data, we need to address some uncomfortable truths. The Reality Check: ⚠️ A "zero-click" attack could expose patient data without any user interaction ⚠️ Hackers could access clinical trial data, patient information, and proprietary research simply by sending an email ⚠️ The vulnerability bypassed Copilot's built-in protections designed to secure user files Why This Matters for Medical Writing: ✅ We handle HIPAA-protected patient data daily ✅ Clinical study reports contain sensitive efficacy and safety information ✅ Regulatory submissions include proprietary drug development data ✅ Competitive intelligence could be compromised through document access While Microsoft has reportedly fixed this specific flaw, the researchers warn this represents a "fundamental design flaw" in AI agents similar to vulnerabilities that plagued software for decades. Questions We Need to Ask: ⁉️ Are our current AI tool policies adequate for protecting patient privacy? ⁉️ Do we have sufficient oversight when AI assistants access clinical databases? ⁉️ Are we creating audit trails for AI interactions with sensitive documents? ⁉️ Have we assessed the security posture of ALL AI tools in our workflows? The pharmaceutical industry has been cautiously adopting AI agents, and frankly, this caution appears justified. As one researcher noted: "Every Fortune 500 I know is terrified of getting agents to production." Moving Forward: We can't abandon AI innovation, but we must demand transparency about security measures, implement robust data governance, and maintain human oversight of AI interactions with sensitive clinical data. ❓ What security protocols has your organization implemented for AI tool usage? How are you balancing innovation with patient data protection? #MedicalWriting #AIethics #DataSecurity #ClinicalTrials #HIPAA #PharmaSecurity #RegulatoryAffairs https://lnkd.in/eEX2pJ6d

Microsoft Copilot flaw raises urgent questions for any business deploying AI agents | Fortune fortune.com

3 Comments
Like Comment
Richard Lawne

Privacy & AI Lawyer

2,758 followers 1y
Report this post
The EDPB recently published a report on AI Privacy Risks and Mitigations in LLMs. This is one of the most practical and detailed resources I've seen from the EDPB, with extensive guidance for developers and deployers. The report walks through privacy risks associated with LLMs across the AI lifecycle, from data collection and training to deployment and retirement, and offers practical tips for identifying, measuring, and mitigating risks. Here's a quick summary of some of the key mitigations mentioned in the report: For providers: • Fine-tune LLMs on curated, high-quality datasets and limit the scope of model outputs to relevant and up-to-date information. • Use robust anonymisation techniques and automated tools to detect and remove personal data from training data. • Apply input filters and user warnings during deployment to discourage users from entering personal data, as well as automated detection methods to flag or anonymise sensitive input data before it is processed. • Clearly inform users about how their data will be processed through privacy policies, instructions, warning or disclaimers in the user interface. • Encrypt user inputs and outputs during transmission and storage to protect data from unauthorized access. • Protect against prompt injection and jailbreaking by validating inputs, monitoring LLMs for abnormal input behaviour, and limiting the amount of text a user can input. • Apply content filtering and human review processes to flag sensitive or inappropriate outputs. • Limit data logging and provide configurable options to deployers regarding log retention. • Offer easy-to-use opt-in/opt-out options for users whose feedback data might be used for retraining. For deployers: • Enforce strong authentication to restrict access to the input interface and protect session data. • Mitigate adversarial attacks by adding a layer for input sanitization and filtering, monitoring and logging user queries to detect unusual patterns. • Work with providers to ensure they do not retain or misuse sensitive input data. • Guide users to avoid sharing unnecessary personal data through clear instructions, training and warnings. • Educate employees and end users on proper usage, including the appropriate use of outputs and phishing techniques that could trick individuals into revealing sensitive information. • Ensure employees and end users avoid overreliance on LLMs for critical or high-stakes decisions without verification, and ensure outputs are reviewed by humans before implementation or dissemination. • Securely store outputs and restrict access to authorised personnel and systems. This is a rare example where the EDPB strikes a good balance between practical safeguards and legal expectations. Link to the report included in the comments. #AIprivacy #LLMs #dataprotection #AIgovernance #EDPB #privacybydesign #GDPR

4 Comments
Like Comment
Viresh Kumar

Digital Transformation Manager | AI‑Enabled Enterprise Platforms | Agile & SAFe | AWS | Application Engineering Leader

8,552 followers 8mo
Report this post
As AI systems become smarter, they also become juicier targets for attackers and unlike traditional software, AI brings new kinds of risks. Here are the big ones to watch: 🔹Input Manipulation Risks These are the “front door” attacks — they exploit how an AI is fed data. - Prompt Injection → Super common in LLMs. Attackers hide instructions inside text (or documents/images) that override safety rules. Defending is hard because natural language itself is so flexible. - Data Poisoning → If attackers sneak bad data into your training set, the model “learns” to make biased or dangerous outputs. Cloud datasets scraped from the internet are especially vulnerable. - Adversarial Examples → Small tweaks to an input (like barely pixel-changed images or weird punctuation in text) can mislead AI. This is one of the hardest to detect because for humans it looks “normal.” 🔹Protocol Vulnerabilities These reflect traditional cyber risks but in an AI-enabled system. - API Misuse → If the AI API isn’t rate-limited or validated, attackers can overload it or run “prompt brute-forcing.” - Session Hijacking→ Common in any authenticated AI service. If a hijacker steals your token/session, they control your AI feed. - Weak Authentication → This is a human/system design failure, not AI-specific, but still a big gap. 🔹System & Privacy Risks This is where AI overlaps with sensitive data handling. - Unauthorized Access → Hackers running arbitrary commands through AI → think “prompt as the new SQL injection.” - Memory Leaks → Chatbots sometimes “remember” and accidentally share PII or corporate secrets in later conversations. - Data Exfiltration → Attackers can use crafted prompts to slowly extract confidential knowledge from the system. 🔹Model Compromise - This is the “core AI asset” risk. - Model Extraction → Attackers query your model enough times to clone its behavior. Bad for companies with proprietary LLMs. - Model Inversion→ Attackers pull out private training data (e.g., names, addresses, secrets) from model responses. GDPR/Privacy nightmare. - Backdoor Attacks→ If a model is trained on poisoned data with a hidden trigger (“if I type 🔑word, give admin access”), it may look normal until activated. This can sit undetected for a long time. 💡 Which is hardest to defend? 👉 In practice, input Manipulation (Prompt Injection & Adversarial Examples) are the toughest. Why? - Because AI works on probabilistic reasoning, not strict rules — so attackers can always find new wordings, encodings, or formats that “slip through.” - Unlike traditional software bugs, you can’t patch human language. - Every new feature (like letting AI browse the web or run tools) widens the attack surface. That’s why companies focus heavily on red-teaming, layered defense, human-in-the-loop monitoring, and continuous fine-tuning #AISecurity #AIrisks #PromptInjection #AdversarialAI #CyberSecurity #DataPrivacy #ResponsibleAI #FutureOfAI
No more previous content

No more next content
8 Comments
Like Comment
Jodi Daniels

Practical Privacy Advisor / Fractional Privacy Officer / AI Governance / WSJ Best Selling Author / Keynote Speaker

20,613 followers 11mo
Report this post
In AI tools, the fine print isn’t optional. It’s everything. Recently checked out a cool new AI tool that promised awesome graphics. First red flag? No mention of data use, privacy or security on the site. Second red flag? Reading the terms of service, it said it takes no responsibility - it's all the LLMs it uses. Third red flag? Same terms say it can use the data for its own use. Fourth red flag? Same terms specifically state do not upload confidential information. Even if my content would be outward facing, I don't want to knowingly share my information to a third party who then shares it with LLMs and uses it for themselves. This was just my simple one AI tool review. Managing AI privacy risks is critical for all companies to do, no matter the size. Here are 5 tips to help manage AI risk: 1. Strengthen Your Data Governance Create a cross-functional team to develop clear policies on AI use cases. Consider third-party data access and usage, how AI will be used within the business, and if it involves sensitive data. Pro Tip: Use frameworks like NIST’s Data Privacy Framework to guide your efforts. 2. Conduct Privacy Impact Assessments (PIAs) for AI Review your existing PIA processes to determine if AI can be integrated into the assessment process. Assess AI-specific risks like bias, ethics, discrimination, and data inferences often made by AI models. 3. Train Your Team on AI Transparency Develop ongoing training programs to increase awareness of AI and how it intersects with privacy and employee roles. 4. Address Privacy Rights Challenges Posed by AI Determine how you will uphold privacy rights once data is embedded in a model. Consider how you will handle requests for access, portability, rectification, erasure, and processing restrictions. Remember, privacy notices should include provisions about how AI is used. 5. Manage Third-Party AI Vendors Carefully Ask vendors where they get their AI model, what kind of data is used to train the AI, and how often they refresh their data. Determine how vendors handle bias, inaccuracies, or underrepresentation in the AI’s outputs. Audit AI vendors and contracts regularly to identify new risks. AI’s potential is immense, but so are the challenges it brings. Be proactive. Build trust. Stay ahead. Learn more in our carousel and blog link below 👇

6 Comments
Like Comment
Vadym Honcharenko

Privacy Engineer @ Google | AIGP, CIPP/E/US/C, CIPM/T, CDPSE, CDPO | LLB | MSc Cybersecurity | EDPB Pool of Experts | ex-Grammarly

16,831 followers 10mo
Report this post
How would you evaluate privacy risks after purchasing a web-scraped dataset for model training? You may be curious about why this is even necessary, as a dataset provider will likely claim it is de-identified or anonymized. Well, the authors of the attached paper evaluated the privacy risks of the CommonPool, one of the largest publicly available image-text datasets scraped from the web (it has been downloaded over 2 million times and utilized by well-known image generation models). 👉 My key takeaways: 1️⃣ In the CommonPool data set, the authors found identifiable human faces, full names, and contact details on resumes, government ID numbers, financial information (such as credit card numbers with security codes), and even content involving children (e.g., birth certificates). Also, the dataset included image EXIF tags that can reveal the precise geolocation of individuals. 2️⃣ To identify personal data remaining in the dataset, the authors used the following techniques: optical character recognition (OCR), Microsoft Presidio named entity recognition, querying the OCR-extracted text for keywords matching regular expressions related to religion, race, and ethnicity, sexual orientation; identified children's related websites by relying on the Cloudflare website categorizations; used Amazon Rekognition face detection algorithm and Single-Shot Scale-Aware Face Detector algorithm. 3️⃣ Main legal privacy and data protection risks considered: • Data protection by design and by default: obligation to remove unnecessary data from the dataset before processing; • The distinction between data that is legally public versus data that is available online (CCPA/CPRA/OCPA require a basis to believe the data was made public, so additional privacy requirements might apply to the dataset); • Identification of the sensitive/special category of data, children's data, or financial data: all these types of data in the dataset could trigger privacy law's requirements (e.g., consent, purpose limitation, data breach notification, etc.); • Legal basis, transparency, data subject's rights in the EU: The findings are similar to the CNIL's recent recommendations I posted here: https://lnkd.in/dMjp-Htz ------------------------------------------------------------------------------- 👋 I'm Vadym, an expert in integrating privacy requirements into AI-driven data processing operations. 🔔 Follow me to stay ahead of the latest trends and to receive actionable guidance on the intersection of AI and privacy. ✍ Expect content that is solely authored by me, reflecting my reading and experiences.
Like Comment
Stefan Eder

Where Law and Technology Meet Attorney - Computer Scientist - University Lector - Speaker

28,003 followers 6mo
Report this post
⚡️ When AI Developers Don’t Agree on Privacy Risks 📍 A key painpoint sre the standards on dealing with AI risks already on the development level 🚨 A recent study, “We Are Not Future-Ready” (Klymenko et al., 2025 from Technische Universität München and Google), captures an important truth from inside Europe’s AI industry: 👉 AI privacy is being managed inconsistently and often without a shared understanding of the risks involved. 📌 Through 25 developer interviews, the authors found deep misalignment about where privacy risks arise: some focus on training data, others on deployment features like model memory, logging, or connected tools. Few systematically assess how AI systems leak or retain sensitive information once deployed. 📌 Instead, most rely on manual controls, ad-hoc reviews, and informal fixes, leaving gaps that become critical once systems scale. 📌 Legal and product teams, meanwhile, often disagree on who owns privacy governance. 👆 The authors argue that privacy-by-design for AI must go beyond data redaction. It should include clear ownership, threat modelling, and standardised development practices that make privacy risk measurable and auditable. 🎯 Bottowm Line: Developing standards for AI model development and deployment - similar to security or safety engineering - could help align accross the industry and move AI privacy from reactive patchwork to structured protection. 🔗 to the paper in the comments Florian Matthes Alexandra Klymenko Stephen Meisenbacher Patrick Gage Kelley Sai Teja Peddinti Kurt Thomas #artificialintelligence #development #risk #responsibility #innovation #governance

8 Comments
Like Comment

Data Privacy Risks in Writing Software

Summary

More in Navigating Data Privacy

Explore categories