The NVRAM technology deployed in this project (3D XPoint) offers higher densities than DRAM with lower energy usage such that higher memory capacities are feasible compared with DRAM. On the other hand, the access times are still inferior to DRAM. It is important, therefore, for tools to offer an insight to when and how an application uses NVRAM so developers can optimize their code to avoid incurring undue access penalties, in the same way that tools help developers optimize for CPU caches today. Conversely, when NVRAM is used as an I/O system instead of, or in addition to, a network filesystem, it is important to highlight when the NVRAM I/O system is being used, or perhaps more pertinently, when it is not being used and extra I/O overhead is incurred.
Finding potential hotspots manually is difficult, as pointers to mapped DRAM or NVRAM memory are not distinguishable at first glance. Therefore, we extended the Vampir tool with separate low-overhead sampling-based memory usage statistics for local and remote accesses to NVRAM and DRAM. We distinguish the memory types by their respective address spaces. Tracking allocation and mapping calls to the Persistent Memory Development Kit (PMDK) library enables us to do so. A post-processing step generates monotonic increasing load and store counters for each address space and its corresponding mapped file.
The figure below shows the performance timeline of a multi-threaded heat equation application. The depicted 2d heat equation computations thread alternates between a buffer in DRAM and a secondary buffer in NVRAM. The latter is flushed every second step. The top-most dot graph shows store accesses to the buffer in NVRAM whereas the dot graph below visualizes load accesses. The call stack representation below depicts eight iterations of the heat equations solver, which nicely correlate to the above access graphs.
Memory accesses to an address space in NVRAM as depicted with the Vampir tool