Platform-Aware Coding Inside HIP

Posted on January 26, 2016November 26, 2016 by Ben Sander

HCC, Heterogeneous Compute Compiler, HIP

Intro

The “P” in HIP literally stands for portability – HIP’s full and formal name is the “Heterogeneous-computing Interface for Portability”. However, even in a portable world you still may find the occasional need to specialize compile steps or code for the target platform – for example, to access functionality only available on one platform, or to tune the core sections of an algorithm in a platform-specific way. This post discusses how to specialize these core pieces of code while still retaining the portability benefits provided by HIP.

Readers should have a working HIP and compiler (HCC or NVCC) compiler installation as covered in previous posts. Most of the code snippets

Compiler Options

First we’ll look at how to detect the platform and use this to provide specialized compiler options. Here’s a simple example from the Makefile.


HIP_PLATFORM=$(shell hipconfig --platform)
ifeq (${HIP_PLATFORM}, nvcc)
HIPCC_FLAGS += -gencode=arch=compute_20,code=sm_20
endif
ifeq (${HIP_PLATFORM}, hcc)
# Can add HCC-specific flags here:
HIPCC_FLAGS +=
endif

$(EXE): transpose.cpp
$(HIPCC) $(HIPCC_FLAGS) $< -o $@

hipconfig is a an executable program that lives in the hip/bin directory and should be in your path after correctly setting up HIP. It returns configuration information about hip such as the HIP_PATH setting, compiler options for standard compilers, and the compiler name. The first line shown above calls hipconfig to extract the name of the platform, and will return “nvcc” or “hcc”. The Makefile then uses this to set HIPCC_FLAGS to platform-specific options. hipcc passes all arguments onto the underlying compiler (merging in the options set by hipcc), so in the case of nvcc platform the “gencode=…” options are effectively passed only to nvcc. We could also use this technique to add hcc-specific compiler options as well. (none required in the example)

If we set HIPCC_VERBOSE environment variable, hipcc will show us the command-line for the underling platform. Here’s the above make run on nvcc – note the “-gencode…” options from the Makefile are passed to the nvcc compilation step (near the end):

FPTITAN1:~/bit_extract$ HIPCC_VERBOSE=1 make

hipcc -gencode=arch=compute_20,code=sm_20  bit_extract.cpp -o bit_extract

hipcc-cmd: /usr/local/cuda/bin/nvcc  -I/usr/local/cuda/include -I/home/fpadmin/ben/hip2/include
-x cu  -gencode=arch=compute_20,code=sm_20 bit_extract.cpp -o bit_extract

Detecting HIP Architecture Features

CUDA® code will sometimes test the “compute capability” to determine if the device supports a given feature (for example, double-precision floating point or cross-lane “shuffle” instructions). AMD hardware has a different mapping of features to architecture, and thus a comparison against an aggregated compute capability revision number is insufficient to tell if the device supports a given feature. Instead, HIP provides feature query defines (for use inside device code) and property bits (for use in host code):

Inside device code the __HIP_ARCH* family of defines are set to 1 if the feature is supported on the target architecture, or 0 if not. This should be used to replace checks against specific values of the __CUDA_ARCH__ define. For example:

__global__ void
myKernel (hipLaunchParm lp, …)

{
// #if __CUDA_ARCH__ >= 300  /* non-portable */

#if __HIP_ARCH_HAS_WARP_SHUFFLE__ /* portable hip query feature */

// use cool __shfl* instructions

int l = __shfl(x, laneId+1));

#else
// Implement another way (perhaps using shared memory)
#endif

}

Note the __HIP_ARCH feature flags are always defined with value 0 or 1 – so the proper code should check the value not merely that the flag is defined. And, like __CUDA_ARCH__, the __HIP_ARCH flags always have a value of 0 in in host code when hipcc is run. In host code, the hipDeviceProp_t structure returned by hipGetDeviceProperties contains architecture feature bits that describe the capabilities of the current device. For example:

hipDeviceProp_t deviceProp;

hipDeviceGetProperties(&deviceProp, device);

//if ((deviceProp.major == 1 && deviceProp.minor < 2))  // non-portable

if (deviceProp.arch.hasSharedInt32Atomics) {            // portable hip feature query

// has shared int32 atomic operations …

}

The full set of feature capabilities (defines and feature bits) is described in the HIP Porting Guide. Also, you can use the hipInfo tool included in the samples directory to print device properties, including the architectural feature flags.

Detecting HIP Platform

Now we’ll look at detecting the hip platform inside the source code and controlling the code generation appropriately. This is handy when the applications needs to use features which are only supported by one platform. A good example is the CUDA texture APIs, which are supported by NVCC but not (yet) in HIPCC.

The __HIP_PLATFORM_NVCC__ macro is defined when the compilers are targeting NVCC. The __HIP_PLATFORM__HCC__ macro is defined when the compilers are targeting HCC. Exactly one of these macros is defined. The macro is defined for standard compilers (ie g++) as well as accelerator compilers (hcc or nvcc) so you can safely use it in header files. Here’s an example pseudo-code :


#ifdef __HIP_PLATFORM_NVCC__

#define USE_TEXTURES 1

#else

#define USE_TEXTURES 0

#endif
#if USE_TEXTURES

texture<float, 1, cudaReadModeElementType> t_features;

#endif
void __global__ MyKernel(float *d_features /* pass pointer parameter, if not already available*/…)

{

// …
#if USE_TEXTURES

float tval = tex1Dfetch(t_features,addr);

#else

float tval = d_features[addr];

#endif
}

__host__ void myFunc ()

{

// …

hipMalloc(&d_features, N);

#if USE_TEXTURES

cudaChannelFormatDesc chDesc0 = cudaCreateChannelDesc<float>();

t_features.filterMode = cudaFilterModePoint;

t_features.normalized = false;

t_features.channelDesc = chDesc0;
cudaBindTexture(NULL, &t_features, d_features, &chDesc0, nN*sizeof(float));

#endif
hipLaunchKernel(MyKernel, dim3(grid), dim3(blocks), 0, 0, d_features, …);
};

The code guards the texture code with ifdef checks against __HIP_PLATFORM_NVCC__ (setting USE_TEXTURES only if on the NVCC platform). Also, if textures are not supported, the code provides a alternate implementation which passes the data used by the texture (d_features) to the kernel as a kernel parameter, and then accesses this data using a regular load instruction rather than the “tex1dfetch” texture load. Applications which use textures often already contain an alternate implementation like the one shown here so they can experiment with the performance of the texture code on different architectures. More generally, these ifdef checks provide a powerful mechanism to access unique features of the platform which are outside the boundaries provided by HIP, or to use platform-specific tuning inside host or kernel code.

Conclusion

We looked at techniques to pass compiler options based on the target platform, to detect architecture features in a portable way, and to compile code conditionally based on the platform. These are useful techniques to introduce small pockets of platform-specific code inside a larger portable HIP application.

Ben Sander is a Senior Fellow at AMD and the lead software architect for the ROCm and HSA projects. He has held a variety of management and leadership roles during his career at AMD including positions in CPU micro-architecture, performance modeling, and GPU software development and optimization. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

5 Comments

So i find this very interesting and would like to use this, however I can find the documentation anywhere. Could someone please direct me to it?

Hi Ethan – Docs are on the GitHub site, there is a mini table-of-contents near the top of README.md.
https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/README.md

Ahh sorry not quite sure how i missed that. Thanks!

Something that you may wish to know is that the “HIP Porting Guide link” is broken, its a 404.

thanks! We fixed the link.

Jul	AUG	Sep
	02
2018	2019	2020

CodeXL 2.6 is released!

ROCm Tensorflow 1.8 Release

TrueAudio Next Version 1.2 Now Posted to Github

V-EZ brings “Easy Mode” to Vulkan

Deferred Path Tracing By Enscape

AMD Vega Instruction Set Architecture documentation

What’s new in HIP and HCC for ROCm 1.6

Developer Quick Start: MIOpen 1.0

Developer Quickstart: OpenCL on ROCm 1.6

Open and Shut: The Case for AMD’s Open-Source Machine Intelligence Software Stack

We ported CAFFE to HIP – and here’s what happened…

CodeXL 2.3 is released!

Live VGPR Analysis with Radeon GPU Analyzer

Using Sub DWord Addressing on AMD GPUs with ROCm

TrueAudio Next Demo and Paper at GameSoundCon

Using ROCm to leverage HBM: A Matrix-Vector Multiplication Case Study

ROCm 1.2 Rocking Hawaiian Style

News from the HIP part of the world

AMD GCN Assembly: Cross-Lane Operations

Blazing CodeXL 2.2 is here!

Extending Support for In-Place Transpose to Compute FFTs Without Using Extra Memory

The Art of AMDGCN Assembly: How to Bend the Machine to Your Will

ROCm with Rapid Harmony : Optimizing HSA Dispatch

HIP Release 0.86 Now Available

ROCm With Harmony: Combining OpenCL, HCC, and HSA in a Single Program

CodeXL 2.1 is out and Searing hot with Vulkan

Turbocharge your Graphics and GPU Compute Applications with GPUPerfAPI

AMD DOPPEngine – Post Processing on Your Desktop in Practice

Rocking ROCm-gdb’s New Features

Computing Very Large FFTs on AMD GPUs

ROCm, Do You Speak My Language?

Getting Started with ROCm: Components, Platforms & Installation

CodeXL 2.0 is Here and Open Source

Getting Up to Speed on the CodeXL GPU Profiler with Radeon Open Compute

HIP release 0.82

Can you build a 1 Petaflop DNN or Molecular Dynamics Computing Solution with ROCm in Single Rack?

ROCm: Platform For A New Era of Heterogeneous in HPC and Ultrascale Computing

GPUOpen, an Uninhibited Path to Science Discovery, Exploring the Limits of Engineering, or Just Creating Your Artistic World of Wonder

HSAIL GDB: HSAIL-level Debugger With AMD GCN Debug Technology

CodeXL Analyzer CLI – Open Source Announcement

HIP to be Squared : An Introductory HIP Tutorial

A Brief Intro to the Heterogeneous Compute Compiler

Platform-Aware Coding Inside HIP

Platform-Aware Coding Inside HIP

Intro

Compiler Options

Detecting HIP Architecture Features

Detecting HIP Platform

Conclusion

5 Comments

Leave a Reply Cancel reply