The Hidden Problem Breaking Your Packet Analysis

Rob MacDonald

Published Dec 11, 2025

The Troubleshooting Scenario

You're deep into troubleshooting a performance issue. The application team reports intermittent slowdowns on a critical database connection. Response times spike from 2ms to 200ms randomly, then return to normal. Users are complaining, and you need answers.

You start with TCP analysis (referencing the techniques from my previous articles):

SACK blocks are present, indicating selective packet loss
Retransmission rate sits at 0.8% (well above the healthy 0.1% baseline)
Window scaling is configured correctly on both sides
No obvious congestion signals in the TCP window behavior
MSS and MTU settings look fine

The data points to packet loss somewhere in the path. Time to dig deeper.

You request packet captures from both endpoints:

Client-side capture (application server)
Server-side capture (database server)
Network path: application server → TOR switch → spine → TOR switch → database server
Baseline RTT measured at 0.4ms under normal conditions

The captures arrive. You open both in Wireshark, ready to correlate the two sides of the connection and pinpoint where packets are disappearing.

Then things get weird.

You start with the TCP handshake - the simplest interaction:

Client capture shows SYN sent at 10:15:23.450123
Server capture shows SYN received at 10:15:23.462891
The timestamps suggest 12.7ms of network delay for a 0.4ms path

That's already suspicious. But maybe there's congestion you haven't seen yet. You check the SYN-ACK response:

Server sends SYN-ACK at 10:15:23.463104 (213μs after receiving the SYN)
Client receives SYN-ACK at 10:15:23.450891 (768μs after it sent the original SYN)

Wait. Stop.

According to these timestamps, the client received the SYN-ACK at 10:15:23.450891. But the server didn't even receive the SYN until 10:15:23.462891 - 12 milliseconds later. The response arrived before the request was received. Time is flowing backwards.

You try to manually align the captures by offsetting one by 12ms. Some packets line up better. But then other parts of the flow break - packets appear to arrive before they're sent, responses precede requests. The offset isn't consistent throughout the capture.

You can't determine:

The actual network transit time for any given packet
Whether a retransmission at the client happened before or after the original packet arrived at the server
If the application's query processing started before or after the database received the request
The true sequence of events when both sides show simultaneous activity
Whether that 200ms delay is network transit, queuing, or application processing

Without accurate timestamps, you can't merge these captures into a single coherent timeline. You have two separate stories being told by clocks that disagree about what time it is.

Root cause analysis requires understanding causality - what happened first, what happened as a result. With broken time, causality is impossible.

You're flying blind.

The Revelation

The problem isn't your captures. It's not Wireshark. It's not even the network.

It's clock drift.

You check the time synchronization on both systems:

Application server: running chrony (NTP client) Last sync: 45 seconds ago Current offset: +8.2ms (clock is running fast)
Database server: running systemd-timesyncd (NTP client) Last sync: 127 seconds ago Current offset: -4.5ms (clock is running slow)
Total skew between the two systems: ~12.7ms

Both systems think they're synchronized. By NTP standards, they are:

±50ms is considered "acceptable" by most NTP implementations
±10ms is considered "good" in production environments
Both servers are well within spec

But for packet-level analysis, you need microsecond precision. These clocks are off by 12,700 microseconds. You're trying to measure 400μs of network transit time with clocks that disagree by 12,700μs.

It's like trying to measure the thickness of a piece of paper with a yardstick.

Why did this happen?

NTP syncs periodically, not continuously (typically every 64-1024 seconds)
Clock drift occurs between sync intervals based on oscillator quality
Software timestamping happens in the kernel, affected by system load and scheduling delays
Network jitter impacts NTP's ability to accurately measure time offset

The broader realization hits you: This isn't just about packet captures.

Every distributed system in your infrastructure suffers from this same problem:

Log aggregation across multiple servers
Distributed tracing timestamps
Security event correlation
Database replication monitoring
Causality determination in microservices

You've been troubleshooting in quicksand this whole time, building analysis on top of timestamps that are fundamentally unreliable.

Why This Matters Beyond Troubleshooting

The engineering principle: In distributed systems, time is not a free resource you can take for granted. It's infrastructure that must be explicitly designed, maintained, and monitored - just like your network, compute, or storage.

Here's a concrete example of why this matters in business terms.

In financial markets and trading systems (where I spent years of my career), time accuracy isn't just a troubleshooting inconvenience - it's a regulatory requirement with serious consequences:

MiFID II (Markets in Financial Instruments Directive) in Europe requires:

100 microsecond maximum divergence from UTC for high-frequency algorithmic trading
1 millisecond accuracy for other electronic trading activities
1 second accuracy for voice trading
Traceable synchronization to UTC
Documented time sync procedures and monitoring
Penalties for non-compliance include fines and trading suspension

CAT (Consolidated Audit Trail) in US equity markets requires:

Synchronized timestamps across all market participants
Timestamps in millisecond or finer increments for all reportable events
Industry members already capturing finer granularity must report up to nanosecond precision
Traceability to NIST (National Institute of Standards and Technology)
Serious regulatory penalties for inaccurate reporting

Recommended by LinkedIn

How the TCP/IP Protocols Handle process to process…

Eng. Kamrul Hasan Akash 2 years ago

Portals Isolate Partitions

Ralph Moore 2 months ago

Do you have a MOM? (Part IV)

Magen Margalit 8 years ago

Why regulators care: Market manipulation investigations, trade dispute resolution, and systemic risk analysis all depend on accurately reconstructing the sequence of events across multiple systems and venues. If your timestamps are wrong, you can't prove what happened when.

Without proper time synchronization, firms face:

Failed regulatory audits
Inability to definitively resolve trade disputes
Fines for inaccurate reporting
Potential trading restrictions

This isn't theoretical. Firms have been fined for timestamp accuracy violations. The infrastructure investment in precision time synchronization is mandatory, not optional.

If you're not actively managing time synchronization, you're building your observability, compliance, and troubleshooting capabilities on quicksand.

Why NTP Isn't Enough

NTP uses network round-trip time to estimate the time offset between client and server. The protocol assumes network delay is symmetric - the time from client to server equals the time from server to client (RFC 5905).

But in real networks, this assumption breaks constantly:

Asymmetric routing is common (forward path ≠ return path)
Variable queuing delays at every hop
Different link speeds on different paths
Bufferbloat and queue management algorithms

NTP's accuracy in production:

Best-case over the internet: ±1-5ms
Best-case on a local LAN: ±0.1-1ms
Typical production: ±1-50ms (sync intervals, jitter, asymmetric paths)
Under system load: worse - software timestamping in the kernel is subject to scheduling delays and CPU contention

The fundamental issue: NTP delivers millisecond-scale accuracy for systems that need microsecond-scale precision.

What Accuracy Do You Actually Need?

This is the critical question. The answer depends on what you're trying to measure or correlate.

For packet-level troubleshooting and network analysis:

Minimum: better than your network RTT
If RTT is 0.4ms, you need sub-400μs accuracy just to determine packet ordering
Target: 10x better than RTT (±40μs) for confident analysis
NTP at ±10ms is 250x worse than required

For distributed systems observability:

Log correlation: ±1ms minimum baseline
APM/distributed tracing: ±100μs for meaningful waterfall charts
Database replication monitoring: ±100μs to separate actual lag from clock skew
Trading/financial systems: ±1μs or better for regulatory compliance

The reality: If you're doing performance engineering, capacity planning, or root cause analysis with NTP-synchronized clocks, your data has a ±1-50ms margin of error built into every timestamp. You're measuring microsecond-scale phenomena with millisecond-scale clocks.

You're building conclusions on garbage time data.

Check your current time sync. Run this on your Linux systems:

chronyc tracking (if using chrony)
ntpq -p (if using ntpd)
timedatectl timesync-status (for systemd-timesyncd)

Look at the "Offset" or "System time offset" value. That's your current accuracy. If it's measured in milliseconds, you have a problem.

What's Next

So what's the solution? This is where Precision Time Protocol (PTP) enters the picture.

But simply switching from NTP to PTP isn't enough. There are critical decisions around:

Hardware timestamping vs software timestamping (and why it matters)
Network architecture options: boundary clocks vs transparent clocks vs end-to-end
PTP profiles and domains for different use cases
Client implementation, tuning, and validation
Cost and complexity tradeoffs at different accuracy requirements
Selecting your UTC reference source and establishing traceability

In the next article, we'll dig into PTP architectures and explore when software PTP with kernel timestamping is "good enough" versus when you need to invest in hardware-accelerated solutions with smart NICs and dedicated grandmaster clocks.

We'll also cover the monitoring and validation you need to ensure your time sync is actually working - because a misconfigured PTP deployment can be worse than NTP.

Until then: go check your clock offsets. You might be surprised what you find.

References

[1] Mills, D., et al. "Network Time Protocol Version 4: Protocol and Algorithms Specification." RFC 5905, Internet Engineering Task Force, June 2010. https://datatracker.ietf.org/doc/html/rfc5905

[2] European Securities and Markets Authority (ESMA). "Commission Delegated Regulation (EU) 2017/574 - RTS 25 on clock synchronisation." Official Journal of the European Union, June 2016. http://ec.europa.eu/finance/securities/docs/isd/mifid/rts/160607-rts-25_en.pdf

[3] U.S. Securities and Exchange Commission. "Rule 613 - Consolidated Audit Trail." https://www.sec.gov/about/divisions-offices/division-trading-markets/rule-613-consolidated-audit-trail

[4] U.S. Securities and Exchange Commission. "Order Granting Conditional Exemptive Relief, Pursuant to Section 36 of the Securities Exchange Act of 1934 and Rule 608(e) of Regulation NMS, Relating to Granularity of Timestamps Specified in Section 6.8(b) and Appendix D, Section 3 of the National Market System Plan Governing the Consolidated Audit Trail." Securities Exchange Act Release No. 88608, 85 FR 20743, April 14, 2020. https://www.federalregister.gov/documents/2020/04/14/2020-07789/order-granting-conditional-exemptive-relief-pursuant-to-section-36-of-the-securities-exchange-act-of

[5] FINRA. "Regulatory Notice 20-41: FINRA Amends Its Equity Trade Reporting Rules Relating to Timestamp Granularity." December 2020. https://www.finra.org/rules-guidance/notices/20-41

[6] Network Time Protocol Project. "Association Management." NTP.org Documentation. https://www.ntp.org/documentation/4.2.8-series/assoc/

[7] Network Time Protocol Project. "Poll Process." NTP.org Documentation. https://www.ntp.org/documentation/4.2.8-series/poll/

[8] Meinberg Radio Clocks. "Time Synchronization Accuracy with NTP." Knowledge Base. https://kb.meinbergglobal.com/kb/time_sync/time_synchronization_accuracy_with_ntp

[9] Meinberg Radio Clocks. "Time Synchronization Errors Caused by Network Asymmetries." Knowledge Base. https://kb.meinbergglobal.com/kb/time_sync/time_synchronization_errors_caused_by_network_asymmetries

[10] Lombardi, M.A., et al. "Practical Limitations of NTP Time Transfer." National Institute of Standards and Technology, 2016. https://tf.nist.gov/general/pdf/2776.pdf

[11] FSMLabs. "MiFID II - Ten Things You Need to Know About Clock Sync." April 2021. https://fsmlabs.com/mifid-ii-ten-things-you-need-to-know-about-clock-sync/

Colin Reed 4mo

If you care about measurement, do not ever mess with your clocks. The military has been measuring fast events across long distances for many decades and has great standards for what to do. Look into IRIG, especially 106. Timestamp a standard like GPS with your local clock so you can measure and correct for drift later, but if you try to correct in realtime, you are just adding another noise source.

1 Reaction

Simon W. 4mo

NTP is one of the basic requirements for any network, or go with ptp for even greater accuracy.

Sean Blanton, Ph.D. 4mo

Great topic! It's definitely a real-world problem I've had to deal with in the electronic trading industry.

Martin James 4mo

Not a financial use case, but a sporting one...recently watching Australia vs England (this is a tier 1 rivalry) the commentators audio was heard before I saw the event take place. The commentators weren't prophets, it was just bad time sync so I wrote a blog about it - https://www.timebeat.app/post/how-timebeat-eliminates-broadcast-synchronisation-issues Time sync crosses many industries Rob MacDonald

The Hidden Problem Breaking Your Packet Analysis

Rob MacDonald

Recommended by LinkedIn

More articles by Rob MacDonald

Others also viewed

Digitally Signing SMF Records - The Gory Details

API Layer & Its Responsibilities

TDE/OKV Rekey Best Practices: Addressing ORA-64807 Datafile Encryption Issues in ZDLRA Backups

MQTT 5.0 Control Packet Explained 01: CONNECT & CONNACK

Troubleshooting TCP and FIX

Achieving a Clean Recovery from an Incorrectly Rebuilt RAID Array

All about TCP - Part 3 - TCP 3-way handshaking (Cont'd).

Storing IP SLA data

Deep Understanding Of JWT and Bcrypt!

Explore content categories

Recommended by LinkedIn

More articles by Rob MacDonald

TCP Troubleshooting with Packet Captures: When Wireshark is Your Only Tool

Troubleshooting Slow TCP Transfers: A Stack-Level Approach

Stop Collecting Everything: A Better Approach to Network Telemetry

When the Network Is Clean: Troubleshooting the Host Network Stack

It's Always The Network (Until You Prove It Isn't)

Others also viewed

Digitally Signing SMF Records - The Gory Details

API Layer & Its Responsibilities

TDE/OKV Rekey Best Practices: Addressing ORA-64807 Datafile Encryption Issues in ZDLRA Backups

MQTT 5.0 Control Packet Explained 01: CONNECT & CONNACK

Troubleshooting TCP and FIX

Achieving a Clean Recovery from an Incorrectly Rebuilt RAID Array

All about TCP - Part 3 - TCP 3-way handshaking (Cont'd).

Storing IP SLA data

Deep Understanding Of JWT and Bcrypt!

Similar topics

How to Troubleshoot KUBERNETES Issues

Explore content categories