Memory Controller Design

Virtual Channel Mechanism

Insights

Interleaved Memory Accesses
[mc_vc_1]
Physical addresses are interleaved among multiple processes. Most proposed optimization techniques at memory controller level use only the interleaved addresses, and may not achieve same improvements in multicore platform, because they cannot differentiate address spaces of VMMs/processes.

Information Flow in Memory Hierarchy
[mc_vc_2]
In common memory hierarchy, while the hierarchy level increases, the information reduces. Only physical address survives after address translation.

Virtual Channel

Reverse-TLB (RTLB)
[mc_vc_3]
Reverse-TLB (RTLB) is introduced to support virtual channel mechanism for memory controller design, avoiding cache and program codes modification.

Memory Controller Architecture
[mc_vc_4]

RTLB Update
[mc_vc_5]

Experimental results

Experimental Data Source
[mc_vc_6]

Virtual Channel helps explore more spatial regularity
[mc_vc_7]

Reference

[A Virtual Channel Mechanism For Memory Controller Design in Multicore Era]


Network Memory Architecture Model

Motivation

The motivation for the research came from the questions: Is remote memory feasible if remote assess delay can be decreased to certain low level? How does the remote access delay influence application performance? How to use cache and prefetching to improve the performance?

Network Memory Architecture and Performance Model

Remote memory architecture
[mc_nm_1]
The Smart Memory Controller (SMC) can access remote memory by hardware directly. The Memory Access Monitor Engine (MAME) monitors all memory access and provides hint to Prefetching Engine (PE) to do prefetching. Local memory is only used as local page buffer cache and local prefetching buffer. The cache and prefetching pages are all indexed by virtual page address and cache line size is 4K also.

Page Frame Stream Prefetching

Linpack , Quicksort memory page access
[mc_nm_2]
The memory trace collection tool HMTT is able to monitor the DIMM slot to get the memory access trace data. The trace contain following information (process id (16bit), address (physical/virtual, cache line level, 16bit), read/write bit, duration time (ns level, 17 bit)).

Remote access rate comparison on 4MB cache, 5us remote delay
[mc_nm_3]

Communication cost comparison
[mc_nm_4]

Reference

[A Network Memory Architecture Model and Performance Analysis]