Analyzing Java “Garbage First Garbage Collection” (G1GC) Logs

Garbage Collection can take a big toll on any Java application, so it’s important to understand its behavior and impact. After a JVM upgrade of Knewton’s Cassandra database, we needed a tool to compare the performance and impact of different garbage collection strategies, but  we couldn’t find an existing tool that would parse gc logs and analyze them.

This post explains the process followed and discusses some results that we believe may be useful while evaluating Java garbage collection strategies.

Evolution of Garbage Collection on Java

One of the biggest challenges for any programming language is the way that it will allocate memory while executing a task. Garbage collection is the action to regain sectors of memory that are no longer in use. These mechanisms are usually evaluated in three dimensions that are known as the three legs of the performance stool: latency, throughput, and footprint. In real life, there is always a tradeoff, and if all the stars align, you can optimize for two of the three dimensions.

carlos-blog-1

Source: Explorations of the three legged performance stool

Over the course of its history, Java has handled garbage collection in different ways. The early JDK (before 1.4) had a single thread to sweep memory to find out which areas were in use and which ones could be reclaimed.

Java 4 introduced more complex algorithms based on tracing collectors The idea was to find root objects and identify all instances that were still in use. If an object didn’t have a trace, it was considered not in use, and that area of memory would be reclaimed. The trouble with this approach is that applications had cease all activities to allow the analysis. Examples of these “stop the world” algorithms include Mark and Sweep, Heap Compaction, and Copying Collector.

Garbage collection strategies before Java 7 relied on the definition of fixed regions of memory named generations:

  • Objects created recently were gathered in the Eden Generation,
  • When the garbage collector acted, objects that had traces to be in use were “promoted” to the Survivor Generation when an object were not referenced by other objects, they are removed.
  • Objects that survived multiple passes eventually reach the Permanent Generation.

One disadvantage of these garbage collection algorithms is that they require copying and deleting objects during promotions, temporarily using more memory and in some cases causing an overflow that could crash an application. As the size of those memory buckets is inflexible, this approach could limit applications when they needed to scale. Examples of these strategies are Concurrent Mark-Sweep (CMS) and Parallel Collector.

Java 7 update 4 came with a new algorithm: Garbage First (G1), which divides memory into heap regions of the same size. Unlike CMS, while sweeping memory, G1 uses a pause predictive model to identify areas likely to contain objects not in use and targets them as first to be evacuated, hence the name. Only the evacuation process, when objects from the heap that are still in use are compacted and copied to a different region to avoid fragmentation has a stop-the-world behavior. In our tests, the stop-the-world phase has been imperceptible, however.

G1 doesn’t sort objects by generation, thus there are no limits on generation size. The result is a strategy that frees up large portions of memory in less time and that keeps adjusting based on of previous garbage collections.

Java 8 supports G1, CMS, and Parallel Collector, and G1 has been proposed as the default strategy for Java 9.

Why Knewton Created a Parser/Analyzer

While upgrading the Cassandra database, the database  team searched for a tool that could evaluate objectively the performance of G1 vs. CMS; Although G1 was released in April 2012, utilities available for analyzing the garbage collection logs, like Netflix’s gcviz, didn’t support G1 logs; We also tried an approach proposed by Jeff Taylor of Oracle, but their code was not able to interpret all the information from our servers. The scripts are an update of Taylor’s approach: With AWK they collect the logs and format them as a table so they can be interpreted and analyzed with R.

The JVM options that we have been using are:

-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms1995M -Xmx1995M
-Xss256k -XX:StringTableSize=1000003
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:CompileCommandFile=/hotspot_compiler
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=25
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:+ParallelRefProcEnabled
-XX:ParallelGCThreads=2
-XX:ConcGCThreads=2
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/<valid path>/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M

The resulting logs contain entries formatted as JSON:

2016-05-12T20:42:14.917+0000: 2.438: Total time for which application threads were stopped: 0.0204068 seconds, Stopping threads took: 0.0000649 seconds
{Heap before GC invocations=2 (full 0):
garbage-first heap  total 2043904K, used 120064K [0x0000000083400000, 0x0000000083503e60, 0x0000000100000000) region size 1024K, 111 young (113664K), 7 survivors (7168K)
Metaspace     used 18551K, capacity 18694K, committed 18944K, reserved 1067008K 
class space    used 2265K, capacity 2299K, committed 2304K, reserved 1048576K
2016-05-12T20:42:15.303+0000: 2.824: [GC pause (G1 Evacuation Pause) (young) 
Desired survivor size 7340032 bytes, new threshold 1 (max 1) - age   1:    6402368 bytes,    6402368 total, 0.0183564 secs]
  [Parallel Time: 17.2 ms, GC Workers: 2]
     [GC Worker Start (ms): Min: 2824.3, Avg: 2824.3, Max: 2824.3, Diff: 0.0]
     [Ext Root Scanning (ms): Min: 2.2, Avg: 2.2, Max: 2.2, Diff: 0.0, Sum: 4.5]
     [Update RS (ms): Min: 0.4, Avg: 0.4, Max: 0.4, Diff: 0.1, Sum: 0.8]
        [Processed Buffers: Min: 2, Avg: 3.5, Max: 5, Diff: 3, Sum: 7]
     [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
     [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
     [Object Copy (ms): Min: 9.2, Avg: 11.7, Max: 14.2, Diff: 5.0, Sum: 23.4]
     [Termination (ms): Min: 0.0, Avg: 2.6, Max: 5.2, Diff: 5.2, Sum: 5.2]
        [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
     [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
     [GC Worker Total (ms): Min: 17.1, Avg: 17.1, Max: 17.1, Diff: 0.0, Sum: 34.3]
     [GC Worker End (ms): Min: 2841.5, Avg: 2841.5, Max: 2841.5, Diff: 0.0]
  [Code Root Fixup: 0.0 ms]
  [Code Root Purge: 0.0 ms]
  [Clear CT: 0.1 ms]
  [Other: 1.1 ms]
     [Choose CSet: 0.0 ms]
     [Ref Proc: 0.6 ms]
     [Ref Enq: 0.0 ms]
     [Redirty Cards: 0.0 ms]
     [Humongous Register: 0.1 ms]
     [Humongous Reclaim: 0.0 ms]
     [Free CSet: 0.1 ms]
  [Eden: 104.0M(104.0M)->0.0B(1189.0M) Survivors: 7168.0K->8192.0K Heap: 117.3M(1996.0M)->15.0M(1996.0M)]
Heap after GC invocations=3 (full 0):
garbage-first heap   total 2043904K, used 15360K [0x0000000083400000, 0x0000000083503e60, 0x0000000100000000) region size 1024K, 8 young (8192K), 8 survivors (8192K)
Metaspace       used 18551K, capacity 18694K, committed 18944K, reserved 1067008K
class space     used 2265K, capacity 2299K, committed 2304K, reserved 1048576K

The Cassandra upgrade was a consequence of a fix for a Thrift Buffer anomaly; this is another example of the continuous efforts of Knewton’s database team to tune and optimize our database servers.

Results

Knewton has conducted extensive analysis of the performance and tuning of our Cassandra nodes, the following are some of the information collected from a cluster with 24 nodes worth of information that are presented as the hypothetical Knerd Cassandra cluster, also described in Dr. Josh Wickman’s post. Each node has 2 CPU’s and 8 GB of RAM.

The following examples show ways to analyze the the gc.logs. They consider only the logs obtained from Production Cassandra nodes during a 24 hours window; during that period it was also done a full repair on the database, and an automated backup. Note that the results shown were from the test once after the change was done; for exploratory phases we used cassandra-stress tool for functional testing, and an end-to-end load test using Locust.

The following histogram compares the duration of garbage collection execution and groups them in three categories:

  1. Real Time: the total time from start to finish for the garbage collection
  2. User Time: the time used for processes that were not related to the kernel (operating system) and pauses.
  3. Sys Time: The time spent by kernel operations,  most of it corresponding to time that the CPU is locked.lso known as stop-the-world scenario.

carlos-blog-2

Time spent on garbage collection in a “stop-the-world” phase (Sys Time) is a fraction of the time when user processes are blocking operations. The garbage collection adapts to minimize impact on the the application.

We can also compare the sizes for each generation before and after the garbage collection and confirm the impact that a Cassandra nodetool repair action can have on the garbage collection.

First, a brief explanation of how repairs work in Cassandra: Cassandra is a masterless database, which means that each server that is part of the same cluster will share data (sometimes several copies of the same record). Each node is responsible for knowing where to find a record for a fraction of the data; each record will have a unique token identification; and each node will be responsible for keeping track of all the copies of specific tokens.

When a repair is requested, each node will be responsible for validating the consistency of all copies of the data. (If there are differences the latest version will prevail.) Depending on the amount of data and the replication factor, this can be a very expensive operation on CPU and memory.

The graph below shows the total memory used before and after a garbage collection

carlos-blog-3

At (1), the command nodetool repair executed a sequential repair in the entire cluster, which used more memory after the garbage collection since the anti-entropy mechanism of the repair put more objects in use.

carlos-blog-4

Garbage collection has a non deterministic behaviour: it does not start at the exact time that the memory threshold is surpassed (2), as sometimes it can take fractions of the second afterward, also, there can be a delay while G1 determines the areas that will release the most memory; that delay will depend on the heap size allocated and how much is in use, this can be an area of concern for environments with limited resources (i.e.: development virtual environments, docker containers).

carlos-blog-5

After the repair, the pattern changed, with garbage collections that effectively decreased the memory used below 0.25 GB (3); one explanation is that the garbage collector learns from its previous runs when the repair was being executed.

carlos-blog-6

Finally, there is an additional spike (4) caused by our automated backup mechanism.

carlos-blog-7

From the total heap size, we can also analyze and compare components; in the example we’ll show only the parts of the New Generation: Eden and Survivor spaces, as they usually are the earliest indicators of issues with memory consumption.

carlos-blog-8carlos-blog-9
Garbage collection is triggered by the total heap size heap, so garbage collection will happen even when the Eden and Survivors Generations are relatively small.

After garbage collection, the Eden space should be at or close to zero, while the Survivor space gets bigger.. The Eden space is also larger than the Survivor space; in this example, by 20:1. This ratio will be different depending on the application.

If needed, we can get more granular information about the different operations (like scanning memory sections or copying objects) that take place during a garbage collection; in the same way as before, we can also identify the change of the behavior that occurs during a repair.
carlos-blog-10carlos-blog-11

As explained before, a Cassandra repair can consume a lot of resources; but this is necessary to ensure that the data is consistent (especially when there are constant updates). One finding of the composition is that the change on the behavior pattern: the Objects Copy operation (in green) stayed longer hours after the repair was completed. which could be also an effect of the adaptive nature of the G1 algorithm.

Conclusions

The algorithms and strategies for garbage collection will keep evolving and changing, as we have not yet found one that maximizes throughput with minimum latency and footprint. In the meantime, Java applications will continue to rely upon G1GC for versions 7, 8, and, most likely, 9. By sharing some experiences with logs from Generation First Garbage Collection (G1GC) and open source data modeling tools, we hope to help other database teams understand their systems better.

References

For Garbage Collection

IBM – Java theory and practice: A brief history of garbage collection

infoq – G1: One Garbage Collector To Rule Them All

Oracle – Getting Started with the G1 Garbage Collector

Oracle – Java SE 6 Garbage Collection Tuning

jaxenter – Java 9’s new garbage collector: What’s changing? What’s staying?

Oracle – Understanding G1 GC Logs

For visualization tools of Garbage Collection logs

GCViewer

Netflix’s gcviz

For data modeling with R

r-project.org

Code

Code is available at this gist