What is a jvm_ What a great discovery! A Bug that causes a large consumption of JVM physical memory

summary

Recently, our company was helping a customer check a JVM problem (JDK1.8.0_191-b12), and found that a system was always killed by the OS, which was caused by memory leaks. In the process of checking, another JVM Bug was found by mistake. This Bug may cause a large amount of physical memory to be used. We have given feedback to the community and got quick feedback. It is expected to be released in the latest version of OpenJDK8 (JDK 11 also has this problem).

PS: The problem of the user was finally solved. It was determined that a design defect in C2 caused a large amount of memory to be used, and the security was not guaranteed.

Find threads consuming large memory

Next, we mainly share the BUG discovery process. First, the customer needs to track the process in real time. When memory usage increases significantly, through/proc/<pid>/maps, we can see a lot of 64MB of memory allocation, and Rss is basically consumed.

7fd690000000-7fd693f23000 rw-p 00000000 00:00 0 
Size:              64652 kB
Rss:               64652 kB
Pss:               64652 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:     64652 kB
Referenced:        64652 kB
Anonymous:         64652 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me nr sd 
7fd693f23000-7fd694000000 ---p 00000000 00:00 0 
Size:                884 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: mr mw me nr sd 

Then trace the next system call through the strace command, and return to the above virtual address. We find the relevant mmap system call

[pid    71] 13:34:41.982589 mmap(0x7fd690000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fd690000000 <0.000107>

The thread that executes the mmap is thread 71. Then dump the thread through the jstack. The corresponding thread is actually C2 CompilerThread0

"C2 CompilerThread0" #39 daemon prio=9 os_prio=0 tid=0x00007fd8acebb000 nid=0x47 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

Finally, grep the strace output, and you can see that this thread is allocating a lot of memory, which is more than 2G in total.

Classic 64M problem

The 64M problem is a very classic problem. There is no such logic to allocate 64M in a large amount in the JVM, so you can exclude the JVM specific allocation. This is actually a mechanism in glibc to allocate memory for malloc functions. glibc has provided a mechanism since 2.10. In order to allocate memory more efficiently, glibc provides a arena mechanism. By default, the size of each arena in 64 bits is 64M. The following is the 64M computing logic, in which the sizeof(long) is 8

define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))
define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)

p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
                          MAP_NORESERVE);

The maximum number of arena s that a process can allocate is 8 * core s in 64 bits and 2 * core s in 32 bits

#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))

 {
              int n = __get_nprocs ();

              if (n >= 1)
                narenas_limit = NARENAS_FROM_NCORES (n);
              else
                /* We have no information about the system.  Assume two
                   cores.  */
                narenas_limit = NARENAS_FROM_NCORES (2);
            }

The advantage of this allocation mechanism is mainly to deal with the multithreaded environment, leaving several 64M cache blocks for each core, so that threads become more efficient when allocating memory because there is no lock. If the upper limit is reached, slow main will be removed_ Assigned in arena.

You can set the environment variable MALLOC_ARENA_MAX sets the number of 64M blocks. When we set it to 1, we will find that these 64M blocks are gone, and then they are all centrally allocated to a large area, that is, main_arena, indicating that this parameter is effective.

An inadvertent discovery

When thinking about why the memory consumption of C2 thread is greater than 2G, I inadvertently traced the C2 code and found that the following code may cause a large amount of memory consumption. The location of this code is nmethod Nmethod of cpp:: metadata_ Do method, however, if this happens, you must not see a large number of C2 threads allocated, but see the VMThread thread, because the following code is mainly executed by it.

void nmethod::metadata_do(void f(Metadata*)) {
  address low_boundary = verified_entry_point();
  if (is_not_entrant()) {
    low_boundary += NativeJump::instruction_size;
    // %%% Note:  On SPARC we patch only a 4-byte trap, not a full NativeJump.
    // (See comment above.)
  }
  {
    // Visit all immediate references that are embedded in the instruction stream.
    RelocIterator iter(this, low_boundary);
    while (iter.next()) {
      if (iter.type() == relocInfo::metadata_type ) {
        metadata_Relocation* r = iter.metadata_reloc();
        // In this metadata, we must only follow those metadatas directly embedded in
        // the code.  Other metadatas (oop_index>0) are seen as part of
        // the metadata section below.
        assert(1 == (r->metadata_is_immediate()) +
               (r->metadata_addr() >= metadata_begin() && r->metadata_addr() < metadata_end()),
               "metadata must be found in exactly one place");
        if (r->metadata_is_immediate() && r->metadata_value() != NULL) {
          Metadata* md = r->metadata_value();
          if (md != _method) f(md);
        }
      } else if (iter.type() == relocInfo::virtual_call_type) {
        // Check compiledIC holders associated with this nmethod
        CompiledIC *ic = CompiledIC_at(&iter);
        if (ic->is_icholder_call()) {
          CompiledICHolder* cichk = ic->cached_icholder();
          f(cichk->holder_metadata());
          f(cichk->holder_klass());
        } else {
          Metadata* ic_oop = ic->cached_metadata();
          if (ic_oop != NULL) {
            f(ic_oop);
          }
        }
      }
    }
  }


inline CompiledIC* CompiledIC_at(RelocIterator* reloc_iter) {
  assert(reloc_iter->type() == relocInfo::virtual_call_type ||
      reloc_iter->type() == relocInfo::opt_virtual_call_type, "wrong reloc. info");
  CompiledIC* c_ic = new CompiledIC(reloc_iter);
  c_ic->verify();
  return c_ic;
}

Note that CompiledIC * ic=CompiledIC above_ at(&iter); Because CompiledIC is a ResourceObj, this resource will be allocated in the c heap (malloc), but they are associated with threads. If we declare a ResourceMark in a code somewhere, the current location will be marked when the code is executed here. If the memory associated with the thread is not enough, the malloc will be inserted and managed, Otherwise, the memory will be reused. When the ResourceMark destructor executes, it will restore the previous location. Later, if the thread wants to allocate memory, it will reuse the memory block from this location. Note that the memory block mentioned here is not the same as the 64M memory block above.

Because this code is in the while loop, there are many repeated calls. It is clear that memory can not be reused after one execution, but a large amount of memory may be continuously allocated. It may be that the physical memory consumption is very large, which is far greater than Xmx.

This repair method is also very simple, that is, in CompiledIC * ic=CompiledIC_ at(&iter); Add ResourceMark rm before; OK.

This problem mainly occurs in the scenario of frequent and massive Class Retransform or Class Redefine. Therefore, if there is such an agent in the system, pay attention to this problem.

After the problem was found, we put forward a patch to the community, but later found that JDK12 had actually been repaired, but none of the previous versions had been repaired. After the problem was submitted to the community, someone responded quickly and may be fix ed in OpenJDK1.8.0-212.

Finally, I will briefly mention the problem on the customer side. The main reason why C2 threads consume too much is that there are very large methods that need to be compiled. This compilation process requires a lot of memory consumption. Because of this, memory will suddenly increase. So I give you a suggestion that methods should not be written too much. If this method is called frequently, it will be really tragic.

Author: PerfMa
Reprinted from: a bug that causes high consumption of JVM physical memory - PerfMa community - OSCHINA

Posted by tmayder on Sun, 25 Sep 2022 22:24:39 +0530