Use Java Agent to Profile Hadoop Job
- the header image is from FreeDigitalPhotos.net by jscreationzs.

Use Java Agent to Profile Hadoop Job

Introduction

Hadoop is great to get you started with analyzing your big data. Sometimes you may need to analyze your Hadoop job to figure out why it gets slow or takes long time. A common practice is to hook up a java agent to profile the job in a distributed way. You could collect thread stack traces and build flame graphs to visualize CPU time. This technique could be applied to different types of Hadoop jobs including Hive and Spark.

One great example is like this: Spark in Flames – Profiling Spark Applications Using Flame Graphs. The basic idea is to sample program stack trace in a certain frequency (sample rate, e.g. per 100 milliseconds), and calculate how many times each stack trace happens. This will tell where CPU time is spent.

Three Types of Java Profilers

There are multiple ways to write java agent to do the stack track sampling, and they have different pros and cons. Following information will help you to choose what type of java agent to profile your Hadoop job:

JMX Thread Profiler: use ThreadMXBean API to dump JVM threads and get stack traces, for example, Statsd Profiler

AsyncGetCallTrace Profiler: use an internal JVM API (AsyncGetCallTrace) to get stack traces, for example, Honest Profiler

Linux Perf Profiler: use Linux perf command to get stack traces, for example, KittenWhisker

More Details

The JMX Thread Profiler is very easy to implement and deploy, but it has safepoint bias. In other words, it has to wait for JVM entering safepoint to dump stack traces, thus could not get consistent random sampling. It will also miss some stack trace where JVM could not enter safepoint.

The AsyncGetCallTrace Profiler uses an internal JVM API (AsyncGetCallTrace) to avoid safepont bias and get consistent random sampling. But not every JVM implementations have that API. Also that API needs JNI code and is pretty complicated.

The Linux Perf Profiler uses Linux perf command to sample Java process from outside, thus has not safepoin bias. But you need extra work to generate symbol file for the Java process, which translates memory addresses into function and variable names. Also you could not get stack trace when Java byte code is executed in interpreted mode (comparing with JIT, see here).

Summary

Regarding how to choose which profiler, the short suggestion is trying JMX Thread Profiler first. If you find safepoint matters, try Linux Perf Profiler. If you still hit issue, try AsyncGetCallTrace Profiler.



To view or add a comment, sign in

More articles by Bo Y.

Others also viewed

Explore content categories