CodeXL 2.6 is released!
A new version of the CodeXL open-source developer tool is now available! What’s New? For current users of CodeXL, this new release may look and …

A new version of the CodeXL open-source developer tool is now available! What’s New? For current users of CodeXL, this new release may look and …
We are excited to announce the release of ROCm enabled TensorFlow v1.8 for AMD GPUs. This post demonstrates the steps to install and use …
We have posted the version 1.2 update to the TrueAudio Next open-source library to Github. It is available here. This update has a number of …
Why Vulkan? With advantages like reduced driver overhead and more control over GPUs, Vulkan has become the 3D graphics and compute API of choice for …
Insights from Enscape as to how they designed a renderer that produces path traced real time global illumination and can also converge to offline rendered image quality
Understanding the instruction-level capabilities of any processor is a worthwhile endeavour for any developer writing code for it, even if the instructions that get executed …
I wanted to share an update on “what’s new” with HIP and HCC in ROCm 1.6: HIP has a new home We’re still on GitHub, …
Overview Announcing our new Foundation for Deep Learning acceleration MIOpen 1.0 which introduces support for Convolution Neural Network acceleration — built to run on top …
Overview ROCm 1.6 introduces big updates to our OpenCL compiler and runtime implementation — built on top of the ROCm software stack! This developer release includes …
A revolution in machine learning In 2012, a research group from the University of Toronto led by Geoffrey Hinton created an earth-shattering advancement to the …
The Challenge CAFFE is a popular machine learning framework created by the Berkeley Vision and Learning Center. The code base contains more than 55,000 lines of …
A new version of the CodeXL open-source developer tool is out! Here are the major new features in this release: CPU Profiling Support for AMD …
This article explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage …
Introduction Sub DWord Addressing is a feature of the AMD GCN architecture which allows the efficient extraction of 8-bit and 16-bit values from a 32-bit register. …
In 2016, AMD brought TrueAudio Next to GameSoundCon. GameSoundCon was held Sept 27-28 at the Millennium Biltmore Hotel in Los Angeles. GameSoundCon caters to game …
HBM The AMD Radeon™ R9 Fury Series graphics cards (Fury X, R9 Fury and the R9 Nano graphics cards) are the world’s first GPU family …
With ROCm 1.2 we are moving beyond the Fiji Islands of GPU’s to bring a broader selection of hardware with the inclusion of the Hawaii …
We’ve been super-busy – so busy that it has been a while since I’ve been able to post. I wanted to pause long enough to …
Cross-lane operations are an efficient way to share data between wavefront lanes. This article covers in detail the cross-lane features that GCN3 offers. I’d like …
A new release of the CodeXL open-source developer tool is out! Here’s the hot new stuff in this release: New platforms support Support Linux systems …
Many fast Fourier transform (FFT) algorithms implement an intermediate transpose stage. Traditionally, the transpositions have used an out-of-place approach in the clFFT library – that …
The ability to write code in assembly is essential to achieving the best performance for a GPU program. In a previous blog we described how …
We previously looked at how to launch an OpenCL™ kernel using the HSA runtime. That example showed the basics of using the HSA Runtime. Here we’ll …
The team just released an update to HIP in version 0.86 which includes several improvements in the functionality and tools. Also we have included several additional …
Introduction In a previous blog we discussed the different languages available on the ROCm platform. Here we’ll show you how to combine several of these …
A new CodeXL release is out! For the first time the AMD Developer Tools group worked on this release on the CodeXL GitHub public repository, …
Achieving high performance from your Graphics or GPU Compute applications can sometimes be a difficult task. There are many things that a shader or kernel …
A Complete Tool to Transform Your Desktop Appearance After introducing our Display Output Post Processing (DOPP) technology, we are introducing a new tool to change …
ROCm-gdb v1.0 includes new features to assist application developers with understanding their application’s behavior. To get started with ROCm-gdb follow the installation directions and introductory …
One of the exciting new features that is available in clFFT 2.10 is the ability to compute very large FFTs. By very large, I mean …
The open-source ROCm stack offers several programming-language choices. Overall, the goal is to give you a range of tools to help solve the problem at …
Are You Ready to ROCK! The ROCm Platform delivers on the vision of the Boltzmann Initiative, bringing new opportunities in GPU Computing Research. On November …
CodeXL major release 2.0 is out! It is chock-full of new features and a drastic change in the CodeXL development model: CodeXL is now open …
With the announcement of the Boltzmann Initiative and the recent releases of ROCK and ROCR, AMD has ushered in a new era of Heterogeneous Computing. …
It’s been just under two months since we publicly launched the HIP repository, and I wanted to share a quick update on the work we’ve …
It was a critical question we asked ourselves early in the project, but we also asked if we can bring together a solution where you …
The ROCm Platform Deliver on the Vison of the Boltzmann Initiative, Bringing a New Opportunities in GPU Computing Research On November 16th, 2015, the Radeon Technology …
The Open Path to Bring Forward Your Ideas to High-Performance GPU Computing Welcome to the new Portal I want to welcome you to the new …
Announcing HSAIL GDB Version 1.0 … Today as part of AMD’s GPUOpen initiative, we are happy to announce the release of HSAIL GDB version 1.0 …
“AMD is releasing open source code for CodeXL Analyzer CLI. This is a performance analysis tool for OpenCL™ kernels, DirectX® shaders and OpenGL shaders. Using …
This tutorial shows how to get started with HIP. We’ll take a simple CUDA application, hipify it, and run it on multiple platforms. Editor’s note: …
In November, AMD launched the Boltzmann Initiative at Supercomputing 2015 with the goal of enabling developers to more easily employ the full compute potential of …
Intro The “P” in HIP literally stands for portability – HIP’s full and formal name is the “Heterogeneous-computing Interface for Portability”. However, even in a …
The “P” in HIP literally stands for portability – HIP’s full and formal name is the “Heterogeneous-computing Interface for Portability”. However, even in a portable world you still may find the occasional need to specialize compile steps or code for the target platform – for example, to access functionality only available on one platform, or to tune the core sections of an algorithm in a platform-specific way. This post discusses how to specialize these core pieces of code while still retaining the portability benefits provided by HIP.
Readers should have a working HIP and compiler (HCC or NVCC) compiler installation as covered in previous posts. Most of the code snippets
First we’ll look at how to detect the platform and use this to provide specialized compiler options. Here’s a simple example from the Makefile.
HIP_PLATFORM=$(shell hipconfig --platform)
ifeq (${HIP_PLATFORM}, nvcc)
HIPCC_FLAGS += -gencode=arch=compute_20,code=sm_20
endif
ifeq (${HIP_PLATFORM}, hcc)
# Can add HCC-specific flags here:
HIPCC_FLAGS +=
endif
$(EXE): transpose.cpp
$(HIPCC) $(HIPCC_FLAGS) $< -o $@
hipconfig is a an executable program that lives in the hip/bin directory and should be in your path after correctly setting up HIP. It returns configuration information about hip such as the HIP_PATH setting, compiler options for standard compilers, and the compiler name. The first line shown above calls hipconfig to extract the name of the platform, and will return “nvcc” or “hcc”. The Makefile then uses this to set HIPCC_FLAGS to platform-specific options. hipcc passes all arguments onto the underlying compiler (merging in the options set by hipcc), so in the case of nvcc platform the “gencode=…” options are effectively passed only to nvcc. We could also use this technique to add hcc-specific compiler options as well. (none required in the example)
If we set HIPCC_VERBOSE environment variable, hipcc will show us the command-line for the underling platform. Here’s the above make run on nvcc – note the “-gencode…” options from the Makefile are passed to the nvcc compilation step (near the end):
FPTITAN1:~/bit_extract$ HIPCC_VERBOSE=1 make
hipcc -gencode=arch=compute_20,code=sm_20 bit_extract.cpp -o bit_extract
hipcc-cmd: /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/home/fpadmin/ben/hip2/include
-x cu -gencode=arch=compute_20,code=sm_20 bit_extract.cpp -o bit_extract
CUDA® code will sometimes test the “compute capability” to determine if the device supports a given feature (for example, double-precision floating point or cross-lane “shuffle” instructions). AMD hardware has a different mapping of features to architecture, and thus a comparison against an aggregated compute capability revision number is insufficient to tell if the device supports a given feature. Instead, HIP provides feature query defines (for use inside device code) and property bits (for use in host code):
Inside device code the __HIP_ARCH* family of defines are set to 1 if the feature is supported on the target architecture, or 0 if not. This should be used to replace checks against specific values of the __CUDA_ARCH__ define. For example:
__global__ void
myKernel (hipLaunchParm lp, …)
{
// #if __CUDA_ARCH__ >= 300 /* non-portable */
#if __HIP_ARCH_HAS_WARP_SHUFFLE__ /* portable hip query feature */
// use cool __shfl* instructions
int l = __shfl(x, laneId+1));
#else
// Implement another way (perhaps using shared memory)
#endif
}
Note the __HIP_ARCH feature flags are always defined with value 0 or 1 – so the proper code should check the value not merely that the flag is defined. And, like __CUDA_ARCH__, the __HIP_ARCH flags always have a value of 0 in in host code when hipcc is run. In host code, the hipDeviceProp_t structure returned by hipGetDeviceProperties contains architecture feature bits that describe the capabilities of the current device. For example:
hipDeviceProp_t deviceProp;
hipDeviceGetProperties(&deviceProp, device);
//if ((deviceProp.major == 1 && deviceProp.minor < 2)) // non-portable
if (deviceProp.arch.hasSharedInt32Atomics) { // portable hip feature query
// has shared int32 atomic operations …
}
The full set of feature capabilities (defines and feature bits) is described in the HIP Porting Guide. Also, you can use the hipInfo tool included in the samples directory to print device properties, including the architectural feature flags.
Now we’ll look at detecting the hip platform inside the source code and controlling the code generation appropriately. This is handy when the applications needs to use features which are only supported by one platform. A good example is the CUDA texture APIs, which are supported by NVCC but not (yet) in HIPCC.
The __HIP_PLATFORM_NVCC__ macro is defined when the compilers are targeting NVCC. The __HIP_PLATFORM__HCC__ macro is defined when the compilers are targeting HCC. Exactly one of these macros is defined. The macro is defined for standard compilers (ie g++) as well as accelerator compilers (hcc or nvcc) so you can safely use it in header files. Here’s an example pseudo-code :
#ifdef __HIP_PLATFORM_NVCC__
#define USE_TEXTURES 1
#else
#define USE_TEXTURES 0
#endif
#if USE_TEXTURES
texture<float, 1, cudaReadModeElementType> t_features;
#endif
void __global__ MyKernel(float *d_features /* pass pointer parameter, if not already available*/…)
{
// …
#if USE_TEXTURES
float tval = tex1Dfetch(t_features,addr);
#else
float tval = d_features[addr];
#endif
}
__host__ void myFunc ()
{
// …
hipMalloc(&d_features, N);
#if USE_TEXTURES
cudaChannelFormatDesc chDesc0 = cudaCreateChannelDesc<float>();
t_features.filterMode = cudaFilterModePoint;
t_features.normalized = false;
t_features.channelDesc = chDesc0;
cudaBindTexture(NULL, &t_features, d_features, &chDesc0, nN*sizeof(float));
#endif
hipLaunchKernel(MyKernel, dim3(grid), dim3(blocks), 0, 0, d_features, …);
};
The code guards the texture code with ifdef checks against __HIP_PLATFORM_NVCC__ (setting USE_TEXTURES only if on the NVCC platform). Also, if textures are not supported, the code provides a alternate implementation which passes the data used by the texture (d_features) to the kernel as a kernel parameter, and then accesses this data using a regular load instruction rather than the “tex1dfetch” texture load. Applications which use textures often already contain an alternate implementation like the one shown here so they can experiment with the performance of the texture code on different architectures. More generally, these ifdef checks provide a powerful mechanism to access unique features of the platform which are outside the boundaries provided by HIP, or to use platform-specific tuning inside host or kernel code.
We looked at techniques to pass compiler options based on the target platform, to detect architecture features in a portable way, and to compile code conditionally based on the platform. These are useful techniques to introduce small pockets of platform-specific code inside a larger portable HIP application.
So i find this very interesting and would like to use this, however I can find the documentation anywhere. Could someone please direct me to it?
Hi Ethan – Docs are on the GitHub site, there is a mini table-of-contents near the top of README.md.
https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/README.md
Ahh sorry not quite sure how i missed that. Thanks!
Something that you may wish to know is that the “HIP Porting Guide link” is broken, its a 404.
thanks! We fixed the link.