Memory behavior analysis is one of the most important
technologies for architecture design, system software (i.e., OS,
compiler) optimization, and application performance
improvement. Moreover, multicore imposes higher demands to the
A complete trace includes all memory references made by
each component of the system, including all user-level processes
and the operating system kernel. User-level processes include
not only applications, but also OS server and daemon processes.
An ideal detailed trace is one that is annotated with
information beyond simple raw addresses. Useful annotations include
changes in VM page-table state for translating between physical
and virtual addresses, context switch points with identiers
specifying newly-activated processes, and tags that mark each
address with a reference type (read, write, execute),
size (word, half word, byte) and a timestamp.
Traces should be undistorted so that they do not include
any additional memory references, or references that appear
out of order relative to the actual reference stream of
the workload had it not been monitored.
Portability, both in moving to other machines of the same type
and to machines that are architecturally different is important.
An ideal trace collector should be fast,
inexpensive and easy to operate.
Many approaches such as simulation and instrument and hardware
snooping can collect memory trace. However, they are usually
subject to time, accuracy, and capacity constraints.
Overview of the HMTT
We designed and implemented the Hybrid Memory Trace Toolkit(HMTT),
an approach which integrates hardware and software to
track and analyze physical or virtual memory trace of OS kernel,
libraries, and applications in real systems.
The HMTT is nearly an ideal memory trace collector:
The HMTT is able to track complete memory reference trace
from the real systems, including applications, libraries, kernel.
It is also able to track memory trace from different level of memory hierarchy.
The HMTT can only track the cache filtered trace for analysis of L2/L3 cache,
memory controller, and memory system performance, with no slowdown.
When disabled cache, it can track the whole trace, from L1 cache to DRAM,
with a slowdown factor of 10~100.
The trace collected by the HMTT include physical address, virtual address, r/w,
timestamp, process' pid, page_table changes, and kernel entry/exit tags etc.
There are almost no additional references expect synchronizing
the HMTT with page_table changes, which will introduce less
than 1% additional references and about 1% addtional execution time.
The hardware borad of HMTT is plugged in a DIMM slot which is commonly
used in contemporaneity computers. The software components work on Linux now,
and can be ported to other OSs easily.
There is no slowdown when collecting cache filtered trace.
The slowdown factor is about 10~100 when disable cache in order to collect whole
trace. However, it is still competitive to other approaches, such as simulation,
The HMTT is quite easy and cheap to implement.
Our hardware implementation costs less than $1000.
Easy to operate:
The HMTT provides several toolkits to auto-generate and auto-analyze memory reference
The trace size is quite large because we have not adopt any compression approaches yet.
Thus, most applications' trace-generation rate is about 30~50MB/s.
Moreover, if disable the cache, the trace size will be magnified.
So, The HMTT provides a toolkit to instrument codes into specified functions or loops.
We suggest that only collecting trace of the functions or loops which we are
interested in would reduce trace size when disable the cache.
Moreover, it can only listen to one DIMM at a time
because the Chip Select (CS) signal is not shared. But we can use
large capacity memory chip to overcome this limitation.
This figure shows the comparison with other approaches:
Features of the HMTT
The HMTT provides memory reference trace. Moreover, it also provides online
and offline memory trace analysis, e.g.
Memory bandwidth statistic;
Page reuse distance calculation;
Hot pages collection;
Virtual or physical memory reference pattern of an individual process (including kernel);