Merged
Conversation
Fixes an issue where the device compute capability is larger than the supported maximum of the CUDA runtime used to build ArrayFire. This happens for example when you run the Turing card with a CUDA runtime of 9.0. The compute capability of Turing is 7.5 and the maximum supported by the runtime is 7.0/7.2. Before this change we were only checking the major compute capability and not checking the minor version to set the max compute capability of the device. This caused errors like: In file src/backend/cuda/compile_module.cpp:266 NVRTC Error(5): NVRTC_ERROR_INVALID_OPTION Log: nvrtc: error: invalid value for --gpu-architecture (-arch) This commit also updates the error messages for failure cases.
The utility header in cuda_fp16.hpp is not included automatically in CUDA 9. Additionally we need to pass the --device-as-default-execution-space flag to nvrtc for JIT and non-JIT kernels
* The moduleKey is an size_t object so the maximum number of digits it can have is 20 so the format length for that value is updated * The runtime check messages are always logged (but not displayed) Errors are still only thrown in debug modes * Display the compute capability of the CUDA device along with its name and other stats example: Found device: Quadro T2000 (sm_75) (3.82 GB | ~3164.06 GFLOPs | 16 SMs)
9prady9
reviewed
Aug 15, 2020
9prady9
reviewed
Aug 15, 2020
Member
9prady9
left a comment
There was a problem hiding this comment.
Really like the commit messages 👍
9prady9
approved these changes
Aug 15, 2020
|
Hi, I'm having a similar issue on this configuration: This is the error: I do not understand what I should do to fix this. Maybe I should upgrade to CUDA 11? |
Member
|
@tvandera That is expected outcome given that v3.7.3 installers aren't built with CUDA 11 support. Please check the latest 3.8 release which has CUDA 11 support. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes checkAndSet
Description
Fixes an issue where the device compute capability is larger than
the supported maximum of the CUDA runtime used to build ArrayFire.
This happens for example when you run the Turing card with a CUDA
runtime of 9.0. The compute capability of Turing is 7.5 and the
maximum supported by the runtime is 7.0/7.2. Before this change
we were only checking the major compute capability and not checking
the minor version to set the max compute capability of the device.
This caused errors like:
This PR also updates the error messages for failure cases.
The utility header in cuda_fp16.hpp is not included automatically
in CUDA 9. Additionally we need to pass the
--device-as-default-execution-space flag to nvrtc for JIT and
non-JIT kernels
The moduleKey is an size_t object so the maximum number of digits
it can have is 20 so the format length for that value is updated
The runtime check messages are always logged (but not displayed)
Errors are still only thrown in debug modes
Display the compute capability of the CUDA device along with
its name and other stats
example:
Changes to Users
Better error messages and better support for newer devices with older CUDA toolkits
Checklist
[ ] Functions added to unified API[ ] Functions documented