DMA

DMA cache

Motivation

DMA Receiving Mechanism
[dma_cache_1]
As the throughput of the I/O devices grows rapidly, memory data moving operations have become critical for DMA scheme, which becomes a performance bottleneck for I/O operations.

Research Methodology

HMTT: A Hyper Memory Trace Tool
[dma_cache_2]
HMTT is a platform independent full system memory trace monitoring system. The system adopts a DIMM-snooping mechanism, which uses hardware boards plugged in DIMM slots to track virtual memory reference trace of full systems (including OS, VMMs, libraries, and applications). Furthermore, HMTT provides APIs for user to inject user-defined tags into memory trace. To distinguish a memory reference issued by DMA engine or processor, we have inserted HMTT's APIs into the device drivers of hard disk controller and network interface card (NIC) on Linux platform.

FPGA-based Trace-driven Emulation System
A combination of the HMTT memory trace collection tool and the trace-driven FPGAbased cache emulation system has significantly reduced our research periods. For example, we can use HMTT to collect memory traces with user-aware tags from various real machines without any slowdowns, analyze memory traces to gain some observations and insights and evaluate new ideas in the FPGA-based emulation accelerate system. In such a method, the research period can be reduced to only a few hours from trace collection, off-line analysis to emulation and result data collection.

The DMA Cache for Improving I/O Performance

DMA Cache Scheme
[dma_cache_3]
In this scheme, DMA cache is a dedicated cache placed at the same level of the processor's last level cache (LLC) in memory hierarchy. A data path exists between the processor cache and the DMA cache for data exchange and coherence maintaining.

Design Issues

  • Data Coherency

  • Data Migration between Processor Cache and DMA Cache

  • Replace Policy and Write Policy

  • Prefetching

Evaluations

Normalized Speedup
[dma_cache_4]

Breakdown of Normalized Total Cycles
[dma_cache_5]

Cumulative Distribution of DMA Request Size
[dma_cache_6]

Reference

[Exploiting the Produce-Consume Relationship in DMA to Improve I/O Performance]

Dan Tang, Yungang Bao, Weiwu Hu, Mingyu Chen, DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance, the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA-16), Jan 2010

** The intel DDIO is announced several years after this paper was published.