Compute capability fix by umar456 · Pull Request #2996 · arrayfire/arrayfire

umar456 · 2020-08-15T02:58:38Z

Fixes checkAndSet

Description

Fixes an issue where the device compute capability is larger than
the supported maximum of the CUDA runtime used to build ArrayFire.
This happens for example when you run the Turing card with a CUDA
runtime of 9.0. The compute capability of Turing is 7.5 and the
maximum supported by the runtime is 7.0/7.2. Before this change
we were only checking the major compute capability and not checking
the minor version to set the max compute capability of the device.
This caused errors like:

In file src/backend/cuda/compile_module.cpp:266
NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION
Log:
nvrtc: error: invalid value for --gpu-architecture (-arch)

This PR also updates the error messages for failure cases.

The utility header in cuda_fp16.hpp is not included automatically
in CUDA 9. Additionally we need to pass the
--device-as-default-execution-space flag to nvrtc for JIT and
non-JIT kernels

The moduleKey is an size_t object so the maximum number of digits
it can have is 20 so the format length for that value is updated
The runtime check messages are always logged (but not displayed)
Errors are still only thrown in debug modes
Display the compute capability of the CUDA device along with
its name and other stats

example:

  Found device: Quadro T2000 (sm_75) (3.82 GB | ~3164.06 GFLOPs | 16 SMs)

Changes to Users

Better error messages and better support for newer devices with older CUDA toolkits

Checklist

Rebased on latest master
Code compiles
Tests pass
~~[ ] Functions added to unified API~~
~~[ ] Functions documented~~

Fixes an issue where the device compute capability is larger than the supported maximum of the CUDA runtime used to build ArrayFire. This happens for example when you run the Turing card with a CUDA runtime of 9.0. The compute capability of Turing is 7.5 and the maximum supported by the runtime is 7.0/7.2. Before this change we were only checking the major compute capability and not checking the minor version to set the max compute capability of the device. This caused errors like: In file src/backend/cuda/compile_module.cpp:266 NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION Log: nvrtc: error: invalid value for --gpu-architecture (-arch) This commit also updates the error messages for failure cases.

The utility header in cuda_fp16.hpp is not included automatically in CUDA 9. Additionally we need to pass the --device-as-default-execution-space flag to nvrtc for JIT and non-JIT kernels

* The moduleKey is an size_t object so the maximum number of digits it can have is 20 so the format length for that value is updated * The runtime check messages are always logged (but not displayed) Errors are still only thrown in debug modes * Display the compute capability of the CUDA device along with its name and other stats example: Found device: Quadro T2000 (sm_75) (3.82 GB | ~3164.06 GFLOPs | 16 SMs)

src/backend/cuda/compile_module.cpp

src/backend/common/half.hpp

src/backend/cuda/device_manager.cpp

9prady9

Really like the commit messages 👍

tvandera · 2021-03-02T11:41:10Z

Hi,

I'm having a similar issue on this configuration:

ArrayFire v3.7.3 (CUDA, 64-bit Linux, build 59ac7b980)
Platform: CUDA Runtime 10.1, Driver: 460.32.03
[0] A100-SXM4-40GB, 40537 MB, CUDA Compute 8.0

This is the error:

In function cuda::Module common::compileModule(const string&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char
> >&, const std::vector<std::__cxx11::basic_string<char> >&, bool)
In file src/backend/cuda/compile_module.cpp:277
NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION
Log:
nvrtc: error: invalid value for --gpu-architecture (-arch)

I do not understand what I should do to fix this. Maybe I should upgrade to CUDA 11?

9prady9 · 2021-03-02T13:15:36Z

@tvandera That is expected outcome given that v3.7.3 installers aren't built with CUDA 11 support. Please check the latest 3.8 release which has CUDA 11 support.

https://arrayfire.com/arrayfire-v3-8-release/

umar456 added 3 commits August 14, 2020 22:37

Add utility header included from cuda_fp16.hpp for CUDA 9

e8a69c2

The utility header in cuda_fp16.hpp is not included automatically in CUDA 9. Additionally we need to pass the --device-as-default-execution-space flag to nvrtc for JIT and non-JIT kernels

umar456 added the build label Aug 15, 2020

umar456 added this to the 3.7.3 milestone Aug 15, 2020

Fix errors and warnings in RNG for CUDA 9.0

5b350de

9prady9 reviewed Aug 15, 2020

View reviewed changes

9prady9 approved these changes Aug 15, 2020

View reviewed changes

umar456 merged commit e62aab0 into arrayfire:master Aug 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute capability fix#2996

Compute capability fix#2996
umar456 merged 4 commits intoarrayfire:masterfrom
umar456:compute_capability_fix

umar456 commented Aug 15, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

9prady9 left a comment

Uh oh!

tvandera commented Mar 2, 2021 •

edited

Loading

Uh oh!

9prady9 commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

umar456 commented Aug 15, 2020

Description

Changes to Users

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

9prady9 left a comment

Choose a reason for hiding this comment

Uh oh!

tvandera commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tvandera commented Mar 2, 2021 •

edited

Loading