Solutions for Managing Telemetry Data

Explore top LinkedIn content from expert professionals.

Summary

Solutions for managing telemetry data refer to tools and practices that help organizations collect, organize, and analyze streams of monitoring information from their systems. These solutions are crucial for making sense of complex data, improving system reliability, and reducing costs in today’s technology-driven businesses.

Adopt standardized schemas: Use consistent data formats and naming conventions to make telemetry data easier to share, analyze, and maintain across different tools and teams.
Centralize policy management: Set up systems where you can define and update data routing, filtering, and privacy rules in one place, so you avoid duplicated effort and reduce errors.
Include ownership attributes: Add information about which team or service owns each data stream to quickly identify issues, charge back costs, and streamline communication when problems arise.

Summarized by AI based on LinkedIn member posts

Francis Odum

Founder @ Software Analyst Cybersecurity Research (SACR)

31,358 followers 11mo
Report this post
CISOs, you're likely spending more on Splunk or Elastic than you're comfortable admitting? You’re not alone. I've recently spoken to many SOC leaders who felt almost helpless at their SIEM bills (primarily because they will never replace their legacy SIEMs because of the cost of switching, features and integrations etc.). The story around next-gen SIEM is for another day..... Regardless of your SIEM deployment, we know across the industry, security teams are facing a common pain: growing data volumes → rising Splunk bills → limited visibility due to cost-driven ingestion filters. But there’s a fix. The smartest SOC leaders are now deploying Security Data Pipeline Platforms (SDPPs) solutions purpose-built to optimize, enrich, and route security telemetry before it hits destination SIEMs. Essentially, helping you get the best out of your Splunk, Elastic or Sentinel SIEMs etc. These solutions help: ▪️ Reduce data sources and ingestion volume ▪️ Filter out noise, and enrich critical signals for alerts ▪️ Centralized policy management: Define routing, filtering, masking, and enrichment rules once and apply across multiple destinations (e.g., Splunk, S3, Snowflake, etc.). Then makes it easy to route to lower-cost destinations (SIEM + data lake + cold storage) ▪️ Improved visibility & troubleshooting for data observability: Track dropped logs, schema errors, misrouted data, or delayed ingestion with a real-time view of data flow health ▪️ PII Redaction / Masking: Redact sensitive fields before logs reach third-party analytics tools, ensuring privacy compliance (e.g., GDPR, HIPAA). And much more...... (I outline them in my report below) This new class of data pipeline vendors help extend the life of your SIEM, ie, not replace it, but better leverage it. There are many solutions on the market, but in our research piece, we go super in-depth into some of the leading vendors on the market as case studies into the overall market: ✔️ Cribl ✔️ Abstract Security ✔️ Onum ✔️ VirtualMetric ✔️ Monad ✔️ DataBahn.ai ✔️ Datadog ✔️ Stellar Cyber ➕ There is a longer list in the market map, but every leader should look at these solutions first. TLDR:The ROI/cost savings I've heard for those using SDPP (especially if you're using a legacy SIEM) is mindblowing based on the numbers I've heard from SOC leaders using one of these solutions above or below. In my opinion, if you’re using any old SIEM without a telemetry pipeline, you’re likely paying for noise, lots of extra bills, and honestly, it feels like a no-brainer..... And worse, you're likely not filtering correctly for the context your SOC actually needs to do good threat hunting/compliance reporting etc. 🔗 I published a full market guide on everything here: https://lnkd.in/gYfKwYCA *** If you're a SOC leader, feel free to DM on any of the solutions. Would love your thoughts as well — what tools are helping you balance cost and signal?
No more previous content

No more next content
65 Comments
Like Comment
Ariful Alam

Sr. Backend Engineer @ Monstarlab | Distributed Systems Specialist | Microservices Architecture | 7+ yrs Exp.

11,919 followers 1y
Report this post
Imagine this: You’re debugging a critical issue in a distributed system. Logs from one service point to an error, but the trace IDs don’t match up with what’s in your monitoring tool. Metrics are reported in inconsistent formats, and key attributes like 𝘩𝘵𝘵𝘱.𝘴𝘵𝘢𝘵𝘶𝘴_𝘤𝘰𝘥𝘦 are labelled differently across services (𝘴𝘵𝘢𝘵𝘶𝘴, 𝘴𝘵𝘢𝘵𝘶𝘴𝘊𝘰𝘥𝘦, 𝘳𝘦𝘴𝘱𝘰𝘯𝘴𝘦_𝘤𝘰𝘥𝘦). Sound familiar? The problem isn’t just the complexity of distributed systems—it’s the lack of a shared language. Without a standard way to structure telemetry data, every team ends up reinventing the wheel, leading to fragmented observability and wasted effort. This is where the 𝐎𝐩𝐞𝐧𝐓𝐞𝐥𝐞𝐦𝐞𝐭𝐫𝐲 𝐒𝐜𝐡𝐞𝐦𝐚 comes in—not as another tool, but as a universal framework for making sense of telemetry data. It refers to the standardized structure and format for telemetry data, including traces, metrics, and logs. This schema defines how data should be structured, what fields should be included, and how different types of telemetry data relate to each other. By adhering to this schema, different logging systems and tools can interchange log data in a standardized way, promoting interoperability and easing the integration between various components of a logging infrastructure. 1️⃣ 𝐈𝐧𝐭𝐞𝐫𝐨𝐩𝐞𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐋𝐨𝐜𝐤-𝐈𝐧 The schema is vendor-neutral, meaning you can adopt it today without tying yourself to a specific observability platform. Whether you’re using Jaeger, Prometheus, or something else entirely, the data model ensures compatibility. 2️⃣ 𝐅𝐮𝐭𝐮𝐫𝐞-𝐏𝐫𝐨𝐨𝐟𝐢𝐧𝐠 𝐘𝐨𝐮𝐫 𝐎𝐛𝐬𝐞𝐫𝐯𝐚𝐛𝐢𝐥𝐢𝐭𝐲 Even if you’re not ready to adopt OpenTelemetry libraries, structuring your telemetry data according to the schema sets you up for seamless integration later. 3️⃣ 𝐅𝐥𝐞𝐱𝐢𝐛𝐢𝐥𝐢𝐭𝐲 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐂𝐨𝐦𝐩𝐥𝐞𝐱𝐢𝐭𝐲 You don’t need to use OpenTelemetry SDKs to benefit from the schema. It’s just a set of guidelines—use it to structure your custom instrumentation, serialize data in JSON, or even design your own exporters. If you’re interested in exploring how to apply these principles in practice, I’ve pieced together all the essential information, best practices, and tools to build a logging system that’s consistent, scalable, and future-proof. 𝐆𝐢𝐭𝐇𝐮𝐛 𝐥𝐢𝐧𝐤: https://lnkd.in/gNri_HxK. #OpenTelemetry #Observability #DistributedSystems #logging
No more previous content

No more next content
1 Comment
Like Comment
Steve Flanders

Engineering Leader | Building Observability with OpenTelemetry | Author of Mastering OpenTelemetry and Observability

8,061 followers 2mo
Report this post
OpenTelemetry started as a way to emit telemetry. But if you look closely at where the project is headed, something bigger is happening. OTel is quietly becoming a control plane for observability. Not just data in motion, but configuration, policy, rollout, and coordination across the entire telemetry system. Here's what that looks like in practice: • OTLP standardizes the data plane • Collectors centralize processing and routing • OCB lets you build purpose-fit distributions • OpAMP enables remote configuration, upgrades, and lifecycle management • Kubernetes Operators make observability declarative • Semantic conventions act as shared contracts • Pipelines encode policy, not just transport Individually, these are useful features. Together, they form something more powerful. A control plane. 🧠 Why this matters In most organizations, observability fails not because of missing tools, but because it's inconsistent, fragmented, and unmanaged. Different teams instrument differently. Collectors drift. Configs diverge. Upgrades lag. Policies are tribal knowledge. A control plane solves systemic problems: • Centralized policy, distributed execution • Safe rollouts of config and pipeline changes • Standardization without blocking teams • Platform ownership instead of ad-hoc tooling • Observability as infrastructure, not a side quest This is the shift from: "everyone does observability their own way" to "this is how observability works here." 🧩 A concrete example Imagine a platform team responsible for observability across hundreds of services. Instead of: • Manually updating collectors • Chasing config drift • Debugging inconsistent pipelines • Relying on docs and best effort They define: • Approved collector builds (OCB) • Default pipelines and processors • Semantic conventions • Rollout policies • Remote config and upgrades (OpAMP) Teams still own their services. But observability becomes governed, reliable, and evolvable. That's not just telemetry. That's control. 🎯 The takeaway OpenTelemetry isn't trying to be flashy. It's doing something harder. It's turning observability into a managed system with standards, policy, and operational leverage. OTel isn't just the instrumentation layer anymore. It's becoming the backbone that observability platforms, and AI-driven systems, are built on. 💬 Do you see OpenTelemetry evolving this way in your org, or is observability still treated as tooling? #OpenTelemetry #PlatformEngineering #Observability #ControlPlane #O11yEngineering

6 Comments
Like Comment
David Hope

Head of GTM Enablement at Obsidian Security | AI Strategy (I vibecoded an app once so i can put this here right?)

4,891 followers 1y
Report this post
I recently had the opportunity to work with a large financial services organization implementing OpenTelemetry across their distributed systems. The journey revealed some fascinating insights I wanted to share. When they first approached us, their observability strategy was fragmented – multiple monitoring tools, inconsistent instrumentation, and slow MTTR. Sound familiar? Their engineering teams were spending hours troubleshooting issues rather than building new features. They had plenty of data but struggled to extract meaningful insights. Here's what made their OpenTelemetry implementation particularly effective: 1️⃣ They started small but thought big. Rather than attempting a company-wide rollout, they began with one critical payment processing service, demonstrating value quickly before scaling. 2️⃣ They prioritized distributed tracing from day one. By focusing on end-to-end transaction flows, they gained visibility into previously hidden performance bottlenecks. One trace revealed a third-party API call causing sporadic 3-second delays. 3️⃣ They standardized on semantic conventions across teams. This seemingly small detail paid significant dividends. Consistent naming conventions for spans and attributes made correlating data substantially easier. 4️⃣ They integrated OpenTelemetry with Elasticsearch for powerful analytics. The ability to run complex queries across billions of spans helped identify patterns that would have otherwise gone unnoticed. The results? Mean time to detection dropped by 71%. Developer productivity increased as teams spent less time debugging and more time building. They could now confidently answer "what's happening in production right now?" Interestingly, their infrastructure costs decreased despite collecting more telemetry data. The unified approach eliminated redundant collection and storage systems. What impressed me most wasn't the technology itself, but how this organization approached the human elements of the implementation. They recognized that observability is as much about culture as it is about tools. Have you implemented OpenTelemetry in your organization? What unexpected challenges or benefits did you encounter? If you're still considering it, what's your biggest concern about making the transition? #OpenTelemetry #DistributedTracing #Observability #SiteReliabilityEngineering #DevOps
No more previous content

No more next content
Like Comment
Juraci Paixão Kröhling

Making telemetry pipelines more efficient

4,839 followers 3mo
Report this post
I learned a new trick while reviewing the telemetry from a OllyGarden user recently. The platform team is treating `team` ownership as a required resource attribute, just like `service.name`. They set attributes like `team.squad` or `team.channel` as internal requirements. While not every team follows it perfectly yet, a gap we plan to help visualize in OllyGarden, this standard solves the "million-dollar question" of centralized observability: "Who is sending all this data?" Without this attribution, you cannot charge back costs, enforce quotas, or quickly find the team that just deployed a noisy loop. Identifying *who* owns a service is often harder than identifying the service itself. We often see ad-hoc attributes like `owner`, `team`, or `contact` scattered across different services. There is good news on the horizon. A new proposal in the OpenTelemetry Semantic Conventions (Issue #3101) aims to formalize the `service.owner` entity. Proposed attributes include: * `service.owner.name`: The team responsible (e.g., "Checkout Team") * `service.owner.url`: Link to the repo or docs * `service.owner.contact`: Slack channel or email Standardizing this means your platform can automatically route alerts to the right Slack channel or generate cost reports by team, without custom mapping logic. Until then, take a page from these power users: make ownership attributes a standard requirement.

8 Comments
Like Comment
Omer Schneider

7,661 followers 9mo
Report this post
Security data issues rarely begin at the SIEM. They start upstream when no one defines what good telemetry looks like, who owns it, or how it should evolve as threats and infrastructure change. When governance is missing, monitoring agents fail silently, log formats shift without warning, and critical fields vanish, leaving detection teams debugging broken data pipelines instead of stopping real threats. Real-world example? Here are three: ▪ Microsoft lost over two weeks of security logs due to a silent failure in its telemetry agents, leaving customers blind to potential threats during that window. ▪ In a major retail breach, authentication logs were either misconfigured or ignored, and failed login attempts went unmonitored. The attackers used this gap to move laterally into payment systems and steal credit and debit card numbers of millions of customers. ▪ OpenAI overloaded internal infrastructure, causing a widespread outage when resource usage wasn’t properly staged or governed. So what does effective telemetry governance actually involve? ▪ Define Telemetry Expectations Upfront Set clear, use-case-driven standards for what “good” telemetry looks like down to required fields, formats, and frequency. Align logs to specific detection or compliance needs, so that critical fields like device_id, user_agent, or geo_ip are treated as non-negotiable, not best-effort. ▪ Establish Ownership Across the Pipeline Governance starts with clarity on who owns what. Define responsibilities for source selection, enrichment, normalization, validation, and routing ensuring each team knows their role in maintaining telemetry integrity. ▪ Monitor for Drift, Not Just Volume Telemetry should be continuously validated in-flight. Use telemetry pipeline solutions to catch missing fields, schema shifts, malformed events, and time drift before they impact detection. ▪ Align Detection Logic with Telemetry Evolution Threats change, and so do detection rules. Governance ensures telemetry keeps pace by creating structured feedback loops between detection engineers and those managing telemetry. 👉Data Governance means building operational habits into how telemetry is defined, owned and maintained turning telemetry from something you hope is working into something you know is working. #TelemetryGovernance #SecurityData #DetectionEngineering #SIEM #Observability #DataOwnership #SecOps #DataTrust #CyberSecurity #SOC #SecOps #ThreatDetection #Telemetry #DataStrategy #DataQuality #OptimizeLogs #LogReduction #SecurityEfficiency #SIEMOptimization #AlertFatigue #TelemetryPipeline

1 Comment
Like Comment
Vineet Chirania

Co-Founder @ CubeAPM | Built Trainman to 25M+ users (Acquired by Adani) | Now saving infra costs for tech teams

14,205 followers 2mo
Report this post
I came across this cloud monitoring cheat sheet and it is one of the core reasons why most teams struggle with observability. The chart shows native monitoring tools across AWS, GCP, Azure, and Oracle Cloud. Data collection, storage, analysis, alerting, visualization, compliance, automation, integration. Everything looks organized until you actually start using it. Native cloud tools are built to keep you inside that provider's ecosystem. Single cloud, fine. The moment you have workloads across AWS and GCP, you are now managing multiple monitoring stacks with different query languages, different alerting setups, and different cost models. The bigger issue is cost predictability. Cloud monitoring bills compound in ways that are hard to see upfront. You start with basic metrics, add logs, layer in traces, and suddenly your monitoring bill is growing faster than your infrastructure spend. This is why OpenTelemetry matters. Instrument once, route telemetry wherever it needs to go. The other thing is data residency. If you are dealing with GDPR, HIPAA, DPDP, or data localization laws, sending telemetry to a SaaS vendor outside your region creates problems. We built CubeAPM to solve this. OpenTelemetry-native, deploys inside your VPC, predictable pricing at $0.15/GB with no hidden fees. Teams typically see 60-80% lower costs compared to traditional APM tools, with full observability and no lock-in.
No more previous content

No more next content
6 Comments
Like Comment
Cillian Kieran

Founder & CEO @ Ethyca (we're hiring!)

6,171 followers 5mo
Report this post
Automatically classifying petabytes of historical data across distributed systems requires systematic technical precision. One global enterprise came to us for help to solve a massive challenge: years of telemetry and behavioral data stored across multiple environments including databases, data warehouses, data lakes and third-party vendors. All of it lay unlabeled (or if it was tagged, ineffectively so). In large part, it was almost ungoverned and almost ungovernable. Traditional approaches couldn't work: - Manual classification would take years and miss evolving data flows - Broad categorization would lack the granularity so critical for proper governance - Point-in-time analysis could never hope to keep pace with ongoing data collection Our technical approach via the Fides suite allowed comprehensive discovery: → Deep system scanning across all data repositories, identifying data types, relationships and contextual information with high accuracy. → Pattern recognition algorithms automatically detected personal identifiers, sensitive categories, and regulatory classifications, even within unstructured and semi-structured data. → Continuous monitoring ensured new data flows inherited appropriate classifications automatically, preventing future governance gaps. → Metadata integration embedded governance context directly into data schemas, making compliance information instantly available. The system, a new trusted data layer for global enterprises, maintains accuracy while scaling to the massive volumes of data behind all modern global enterprises. Classification happens in real-time as data moves through pipelines, ensuring governance context never gets lost. One critical outcome is that technical teams can now build with confidence, because they know that every dataset systematically carries the governance context needed for compliant use. Is your team using a systematic approach for large-scale data classification and governance? Would you like them to have one at their fingertips?
Like Comment
Hadeel SK

Senior Data Engineer/ Analyst@ Mckesson | Cloud(AWS,Azure and GCP) and Big data(Hadoop Ecosystem,Spark) Specialist | Snowflake, Redshift, Databricks | Specialist in Backend and Devops | Pyspark,SQL and NOSQL

3,031 followers 10mo
Report this post
🌐 Building Real-Time Observability Pipelines with AWS OpenSearch, Kinesis, and QuickSight Modern systems generate high-velocity telemetry data—logs, metrics, traces—that need to be processed and visualized with minimal lag. Here’s how combining Kinesis, OpenSearch, and QuickSight creates an end-to-end observability pipeline: 🔹 1️⃣ Kinesis Data Streams – Ingestion at Scale Kinesis captures raw event data in near real time: ✅ Application logs ✅ Structured metrics ✅ Custom trace spans 💡 Tip: Use Kinesis Data Firehose to buffer and transform records before indexing. 🔹 2️⃣ AWS OpenSearch – Searchable Log & Trace Store Once data lands in Kinesis, it’s streamed to OpenSearch for indexing. ✅ Fast search across logs and trace IDs ✅ Full-text queries for error investigation ✅ JSON document storage with flexible schemas 💡 Tip: Create index templates that auto-apply mappings and retention policies. 🔹 3️⃣ QuickSight – Operational Dashboards in Minutes QuickSight connects to OpenSearch (or S3 snapshots) to visualize trends: ✅ Error rates over time ✅ Latency distributions by service ✅ Top error codes or patterns 💡 Tip: Use SPICE caching to accelerate dashboard performance for high-volume datasets. 🚀 Why This Stack Works ✅ Low-latency ingestion with Kinesis ✅ Rich search and correlation with OpenSearch ✅ Interactive visualization with QuickSight ✅ Fully managed services — less operational burden 🔧 Common Use Cases 🔸 Real-time monitoring of microservices health 🔸 Automated anomaly detection and alerting 🔸 Centralized log aggregation for compliance 🔸 SLA tracking with drill-down capability 💡 Implementation Tips Define consistent index naming conventions for clarity (e.g., logs-application-yyyy-mm) Attach resource-based policies to secure Kinesis and OpenSearch access Automate index lifecycle management to control costs Embed QuickSight dashboards into internal portals for live visibility Bottom line: If you need scalable, real-time observability without stitching together a dozen tools, this AWS-native stack is one of the most effective solutions. #Observability #AWS #OpenSearch #Kinesis #QuickSight #RealTimeMonitoring #Infodataworx #DataEngineering #Logs #Metrics #Traces #CloudNative #DevOps #C2C #C2H #SiteReliability #DataPipelines
No more previous content

No more next content
1 Comment
Like Comment
Priyanka Vergadia

#1 Visual Storyteller in Tech | VP Level Product & GTM | TED Speaker | Enterprise AI Adoption at Scale

117,300 followers 5mo Edited
Report this post
𝗗𝗮𝘀𝗵𝗯𝗼𝗮𝗿𝗱𝘀 𝗮𝗿𝗲 𝗱𝗲𝗮𝗱. 𝗧𝗵𝗶𝘀 𝗶𝘀 𝘄𝗵𝗮𝘁 𝗸𝗶𝗹𝗹𝗲𝗱 𝘁𝗵𝗲𝗺: ⚠️3 AM. Alert fires. You scramble through 50 dashboards. Zero idea where to start. The original dev left 6 months ago. 🛑 Stop drowning in metrics. AI has advanced enough for us to just ask what’s wrong. 😑 𝐓𝐡𝐞 𝐨𝐥𝐝 𝐰𝐚𝐲: → Build dashboards for known problems → Scroll through logs manually → Pray you find the issue → Waste hours connecting dots 😍𝐓𝐡𝐞 𝐧𝐞𝐰 𝐰𝐚𝐲: → Ask in plain English → AI investigates your entire system → Get answers in seconds → See the exact query it ran ⌨️Type: "𝘏𝘢𝘷𝘦 𝘵𝘩𝘦𝘳𝘦 𝘣𝘦𝘦𝘯 𝘢𝘯𝘺 𝘴𝘭𝘰𝘸𝘥𝘰𝘸𝘯𝘴?" AI returns: ✓ Exact route with latency spikes ✓ Error patterns by region ✓ Root cause analysis ✓ Full investigation report No query language needed. No tribal knowledge required. No guessing. 👩🔧This fixes the real problem: systems outlive developers. When code is disposable and teams shift, dashboards become archaeological artifacts. 📊Your data needs to explain itself. honeycomb.io Canvas is a great start: https://fandf.co/3JH8lZx → Instrument with OpenTelemetry → Send traces to Honeycomb - from your apps in any cloud ☁️ → Ask questions in Canvas Thanks Honyecomb for partnering with me on this post. #honeycombpartner #observability #telemetry #troubleshooting

7 Comments
Like Comment

Solutions for Managing Telemetry Data

Summary

More in IoT Solutions for Industry

Explore categories