Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage#2791
Merged
9prady9 merged 2 commits intoarrayfire:masterfrom Mar 14, 2020
Merged
Conversation
afc0ff8 to
e0a0307
Compare
umar456
reviewed
Mar 13, 2020
| class LookupTable1D { | ||
| public: | ||
| LookupTable1D() = delete; | ||
| LookupTable1D(const LookupTable1D& arg) = delete; |
Member
There was a problem hiding this comment.
It makes sense to have a copy constructor for this right? You can copy a texture if you need it.
Member
Author
There was a problem hiding this comment.
Then we have to take care of how texture object is copied also. Just handle copy won't work. Move operation wont have issues though. We don't need neither for these use cases.
added 2 commits
March 13, 2020 23:30
cuda::kernel::locate_features is the CUDA kernel that uses the fast lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 1.48% 101.09us 3 33.696us 32.385us 34.976us void cuda::kernel::locate_features<float, int=9> 1.34% 91.713us 2 45.856us 45.792us 45.921us void cuda::kernel::locate_features<double, int=9> 1.02% 69.505us 2 34.752us 34.400us 35.105us void cuda::kernel::locate_features<unsigned int, int=9> 0.99% 67.456us 2 33.728us 32.768us 34.688us void cuda::kernel::locate_features<int, int=9> 0.95% 65.186us 2 32.593us 31.201us 33.985us void cuda::kernel::locate_features<short, int=9> 0.93% 63.874us 2 31.937us 30.817us 33.057us void cuda::kernel::locate_features<unsigned short, int=9> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 1.45% 99.776us 3 33.258us 32.896us 33.504us void cuda::kernel::locate_features<float, int=9> 1.33% 91.105us 2 45.552us 44.961us 46.144us void cuda::kernel::locate_features<double, int=9> 1.02% 70.017us 2 35.008us 34.273us 35.744us void cuda::kernel::locate_features<unsigned int, int=9> 0.97% 66.689us 2 33.344us 32.065us 34.624us void cuda::kernel::locate_features<int, int=9> 0.95% 65.249us 2 32.624us 31.585us 33.664us void cuda::kernel::locate_features<short, int=9> 0.95% 65.025us 2 32.512us 30.945us 34.080us void cuda::kernel::locate_features<unsigned short, int=9>
cuda::kernel::extract_orb is the CUDA kernel that uses the orb lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 3.02% 292.26us 24 12.177us 11.360us 14.528us void cuda::kernel::extract_orb<float> 2.16% 209.00us 16 13.062us 11.616us 16.033us void cuda::kernel::extract_orb<double> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 2.84% 270.63us 24 11.276us 9.6970us 15.040us void cuda::kernel::extract_orb<float> 2.20% 209.28us 16 13.080us 10.688us 16.960us void cuda::kernel::extract_orb<double>
umar456
approved these changes
Mar 14, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There is negligible to no difference between texture based look-up table and constant memory look-up table. Hence, shifted to texture memory look-up table to reduce global constant memory usage.