Sub-second network visibility - what are you missing?
Standard network monitoring tools can lull you into a false sense of security and leave you with blind spots in your network.
High-performance network monitoring tools can be expensive to deploy and maintain.
stream2Cloud gives you the best of both worlds - low-cost, easy to deploy but with the in-depth analysis you would expect from a high-end solution.
The new normal
A trading application reports regular perceived delays on data arriving from the exchange. "My prices are late" - it's a common problem raised to infrastructure teams.
The chart below shows the utilisation on the 1Gbit link to the exchange with metrics computed every 10 seconds. This looks like the link is only at around 1.2% utilisation - looks good, right?
The next port of call would be to look for common problems with packet loss.
The TCP gap analysis shows a regular low level of gaps happening - worrying.
However, the TCP retransmission analysis shows that there are no observed retransmissions. This means that the TCP gaps are an artefect of the data capture layer. In this case - we know that the connection from the source of packets to the stream2Cloud endpoint is experiencing problems of its own, but despite that we can troubleshoot the problem further as we are confident that the monitored TCP sessions themselves are not showing retransmissions.
No TCP zero-byte window size events either - another common cause of perceived latency - when the receiving application becomes a slow-consumer.
Only when looking at the stream2Cloud microburst metrics can you start to see the signs of a problem - regular utilisation up to around 45% at the millisecond scale. Network bursts can cause delays and are a good indicator of congestion or components within your infrastructure that are misbehaving. Here you can see that this behaviour is fairly consistent.
Looking at the IP addresses involved - also broken down at the millisecond level - you can see that - although 10.101.1.32 (the exchange gateway) has the highest burstiness, there are no unusual patterns that stick out - the microburst behaviour seems to be happening uniformly.
Network Microscope
How can we look into this problem in more detail?
stream2Cloud captures the packets as well as generating telemetry, so we can drill down to the packets from this time-frame and see what can be observed with our microscope - CloudShark.
We can zoom in to a finer and finer resolution, looking to see where the 400Mbit bursts appear.
1 second - all looking fine.
Peaks appearing at the 10ms level, but we can zoom in further still.
This starts to tally up with what we were observing at the 1ms in the stream2Cloud telemetry. In this case a single major spike during this 10 second interval.
Zooming in further we can see there are actually several bursts with large pauses.
CloudShark has a feature to allow for sub-millisecond analysis
Finally - the maximum level of detail - 200 microseconds - shows the full scale of the problem - huge bursts followed by large delays - 10s of milliseconds. Behaviour like this can be caused by buffering in switch ports - packets are not lost - just held up in-buffer before being forwarded. This is a more common problem than you might think with store-and-forward switches, especially with line-rate conversion - e.g. from 1G to 10G.
This kind of behaviour may not be important to many enterprise applications, but when your business depends on timely delivery of data - such as electronic trading - your network may be hiding some terrible secrets.
Take it for a spin
Find out more about how to get started with stream2Cloud - and sign up for a free evaluation.