OPT: Optimize indexing using dynamic thread block sizes.#3111
Merged
9prady9 merged 1 commit intoarrayfire:masterfrom Mar 23, 2021
Merged
OPT: Optimize indexing using dynamic thread block sizes.#31119prady9 merged 1 commit intoarrayfire:masterfrom
9prady9 merged 1 commit intoarrayfire:masterfrom
Conversation
Member
|
What is the speedup range ? |
Member
Author
Master: This PR: 3.8x faster |
This optimization dynamically sets the block size based on the output array dimension. Originally we had a block size of 32x8 threads per block. This configuration was not ideal when indexing into a long array where you had few columns and many rows. The current approach creates blocks of 256x1, 128x2, 64x4 and 32x8 to better accommodate smaller dimensions.
9prady9
approved these changes
Mar 23, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This optimization dynamically sets the block size based on the output array
dimension. Originally we had a block size of 32x8 threads per block. This
configuration was not ideal when indexing into a long array where you
had few columns and many rows. The current approach creates blocks of
256x1, 128x2, 64x4 and 32,8 to better accommodate smaller dimensions.
Description
This optimization dynamically sets the block size based on the output array
dimension. Originally we had a block size of 32x8 threads per block. This
configuration was not ideal when indexing into a long array where you
had few columns and many rows. The current approach creates blocks of
256x1, 128x2, 64x4 and 32,8 to better accommodate smaller dimensions.
Changes to Users
None
Checklist
[ ] Functions added to unified API[ ] Functions documented