This example showcases launching kernels and printing from device programs.
- A kernel is launched: function
hello_world_kernelis executed on the device. This function uses the coordinate built-ins to print a unique identifier from each thread. - Synchronization is performed: the host program execution halts until all kernels on the device have finished executing.
-
myKernelName<<<gridDim, blockDim, dynamicShared, stream>>>(kernelArguments)launches a kernel. In other words: it calls a function marked with__global__to execute on the device. An execution configuration is specified, which are the grid and block dimensions, the amount of additional shared memory to allocate, and the stream where the kernel should execute. Optionally, the kernel function may take arguments as well. -
hipDeviceSynchronizesynchronizes with the device, halting the host until all commands associated with the device have finished executing. -
Printing from device functions is performed using
printf. -
Function-type qualifiers are used to indicate the type of a function.
__global__functions are executed on the device and called from the host.__device__functions are executed on the device and called from the device only.__host__functions are executed on the host and called from the host.- Functions marked with both
__device__and__host__are compiled for host and device. This means that these functions cannot contain any device- or host-specific code.
-
Coordinate built-ins determine the coordinate of the active work item in the execution grid.
threadIdxis the 3D coordinate of the active work item in the block of threads.blockIdxis the 3D coordinate of the active work item in the grid of blocks.
hipDeviceSynchronize__device____global____host__threadIdxblockIdx
Windows is currently not supported by the hello world example, due to a driver failure with printf from device code.