GPUdirect

GPUdirect by NVIDIA

Hardware

Mellanox ConnectX-3 MT27500
NVIDIA Tesla K40c

Software

Ubuntu 14.04.1
Kernel 3.13.0-32-generic
NVIDIA Driver + Cuda 6.5
Mellanox OFED
Mellanox GPUdirect kernel module (nv_peer_mem)

Problems

Mellanox OFED does not compile on ARM

Solution: Intel Atom

Problems

NVIDIA driver sometimes crashes during boot (dmesg => “Oops”)

Solution: Reboot until it works (NVIDIA driver is closed source)

Problems

Mellanox driver “mlx4_core” is not assigned to Mellanox card (lspci -k | grep -A 2 Mellanox)

Solution:

echo "1" > /sys/bus/pci/devices/0000\:03\:00.0/remove
echo "1" > /sys/bus/pci/rescan

Problems

GPUdirect transmition does not mark caches as dirty. So data may be stale.

Solution: Always "cudaMemcpyDeviceToHost" data for inspection.

Check status via script

#!/bin/bash
OOPS=`dmesg | grep "Oops"`
if [ -n "$OOPS" ]; then
echo "ERROR: nvidia driver crashed during boot."
fi
MLX=`lspci -k | grep "mlx4_core"`
if [ -z "$MLX" ]; then
echo "ERROR: mlx4_core is not assigned to ib card."
fi
IFCONFIG=`ifconfig | grep 192.168.11.72`
if [ -z "$IFCONFIG" ]; then
echo "ERROR: IP is not set."
fi
IBHOSTS=`ibhosts | grep gpudirect`
if [ -z "$IBHOSTS" ]; then
echo "ERROR: Infiniband hostname not set."
fi

NVIDIA interface

NVIDIA closed source driver presents an API to pin memory.

// From nv-p2p.h
/*
 * @brief
 *   Make the pages underlying a range of GPU virtual memory
 *   accessible to a third-party device.
 *
...
 */
int nvidia_p2p_get_pages(uint64_t p2p_token, uint32_t va_space,
        uint64_t virtual_address,
        uint64_t length,
        struct nvidia_p2p_page_table **page_table,
        void (*free_callback)(void *data),
        void *data);

Mellanox kernel module

Graphics card gets registered as peer memory client to infiniband driver.

// From nv_peer_mem.c
static int __init nv_mem_client_init(void)
{
  strcpy(nv_mem_client.name, DRV_NAME);
  strcpy(nv_mem_client.version, DRV_VERSION);
  reg_handle = ib_register_peer_memory_client(&nv_mem_client,
               &mem_invalidate_callback);
  if (!reg_handle)
    return -EINVAL;

  return 0;
}

libibverbs infiniband library

Register memory from userland program. If addr has been allocted with cudaMalloc(), libibverbs knows to use previously registered peer memory client.

// From verbs.h
/**
 * ibv_reg_mr - Register a memory region
 */
struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr,
        size_t length, int access);

kiro rdma extended for gpudirect

If nv_peer_mem kernel module has registered the graphics card as "peer memory client", it is sufficient to give ibv_reg_mr a virtual adress that associates to a physical adress on the graphics card.

// From kiro-rdma.h
#ifdef GPUDIRECT
void *mem_handle = mem;
int error;

if (!mem_handle) {
  error = cudaMalloc (&mem_handle, mem_size);
  ...
  *mr = ibv_reg_mr (pd, mem_handle, mem_size, access);

GPUdirect Benchmark

Default case: Receive data over infiniband and store it in main memory. Copy it to graphics card memory. Run a simple cuda math algorithm and copy data back into main memory for inspection. Average throughput 1.75 Gbit/s. GPUdirect case: Receive data over infiniband and store it directly in graphics card memory. Run a simple cuda math algorithm then copy data into main memory for inspection. Average throughput 3.70 Gbit/s.

GPUdirect Benchmark

The Mellanox Infiniband PCI card on the testsystem uses 5 GT/s and 4 Lanes. In this setup we have a speedup of: $$S_p = \frac{T_1}{T_p}=\frac{1.75}{3.70}=0.47$$ Using GPUdirect in a typical workflow makes us 111% faster.

GPUdirect

Kaikas

GPUdirect

0 0

GPUdirect-presentation

GPUdirect

GPUdirect by NVIDIA

GPUdirect by NVIDIA

Hardware

Software

Problems

Problems

Problems

Problems

Check status via script

NVIDIA interface

Mellanox kernel module

libibverbs infiniband library

kiro rdma extended for gpudirect

GPUdirect Benchmark

GPUdirect Benchmark

GPUdirect

Kaikas

GPUdirect

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

GPUdirect-presentation

GPUdirect

GPUdirect by NVIDIA

GPUdirect by NVIDIA

Hardware

Software

Problems

Problems

Problems

Problems

Check status via script

NVIDIA interface

Mellanox kernel module

libibverbs infiniband library

kiro rdma extended for gpudirect

GPUdirect Benchmark

GPUdirect Benchmark

0 0