Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Add nondeterministic alert to index_copy, median CUDA and kthvalue CUDA #46942
Conversation
|
2b49191
to
5f0265a
|
Quick notes (did not do a full review)
|
|
Hi @kurtamohler! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Hey @kurtamohler! Great to see more nondeterministic behavior caught by this flag. For now @ngimel and I think we should just always error like we do with other operations. It's an interesting design question to follow-up with for whether we should add these checks to all operations that are nondeterministic when given duplicate indices. |
|
Follow-up: we should also throw an error if the determinism flag is set and indices are returned for median or kth value on CUDA. |
30ae80e
to
8596d36
|
Tests for |
|
Looks like the |
8596d36
to
7c99047
|
Barring any additional test failures, I think this is ready for a re-review. |
2da9bf6
to
a564999
| @@ -410,6 +413,9 @@ std::tuple<Tensor&, Tensor&> kthvalue_out_cuda( | |||
| int64_t k, | |||
| int64_t dim, | |||
| bool keepdim) { | |||
| // See note [Writing Nondeterministic Operations] | |||
| // Nondeterministic if indices output contains duplicates | |||
| at::globalContext().alertNotDeterministic("kthvalue CUDA with indices output"); | |||
mruberry
Nov 6, 2020
Contributor
Unfortunately, unlike median, the kthvalue documentation does not mention its possible nondeterminism. I think the nondeterminism occurs if there are multiple valid kth values, in which case any index can be returned. This PR should probably update the documentation to reflect this, too.
Is it really only kth value on CUDA that has this issue? If so, we should be explicit about which indices the CPU version will return in the documentation.
kurtamohler
Nov 6, 2020
Author
Contributor
Thanks for the extra details! Do you know if there is an issue somewhere describing the nondeterminism of kthvalue? I only put the alert in the CUDA version because in a comment above, you said "kth value on CUDA."
mruberry
Nov 6, 2020
Contributor
I don't know if there's an issue for kthvalue, sorry. We could always file an extra one and resolve it as a duplicate later if it is redundant.
That's true, and it probably is just CUDA, but it might be worth checking and documenting the CPU behavior.
kurtamohler
Nov 9, 2020
•
Author
Contributor
Ok I understand kthvalue's nondeterminism more clearly now after writing a test: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/master/determinism_test/kthvalue_determinism.py
In the cases I tried, CPU looks like it could be deterministic, but CUDA is not, depending on the inputs. Results are below.
Aside from determinism, there's another interesting aspect--the uniqueness of the k-to-index mapping. If the input has N duplicate elements, do we get a unique index output from kthvalue for each corresponding k argument?
For example, with an input tensor a = torch.tensor([0.12, 3.4, 0.12, 5.2]), we have duplicate 0.12 elements at index 0 and 2. The k arguments for these are 1 and 2, since 0.12 would be at indices 0 and 1 after sorting the tensor. The question is, when we call a.kthvalue(1) and a.kthvalue(2), do the outputs of those calls always give two different index values, 0 and 2? If so, then we can say that the k-to-index mapping is unique, because each k value will give a different index. CPU gave me unique mappings for each case I tried. CUDA never gave me unique mappings for duplicates, but for some inputs the mapping was at least consistent (and thus deterministic). For an example of a consistent but not unique mapping, in the above simple example, imagine that a.kthvalue(1) and a.kthvalue(2) both always return an index of 0.
Click for results
device size number-of-copies k-to-index-unique-mapping maybe-deterministic
cpu 20 1 True True
cpu 20 2 True True
cpu 20 5 True True
cpu 20 10 True True
cpu 20 11 True True
cpu 100 1 True True
cpu 100 2 True True
cpu 100 5 True True
cpu 100 10 True True
cpu 100 11 True True
cpu 1000 1 True True
cpu 1000 2 True True
cpu 1000 5 True True
cpu 1000 10 True True
cpu 1000 11 True True
cpu 10000 1 True True
cpu 10000 2 True True
cpu 10000 5 True True
cpu 10000 10 True True
cpu 10000 11 True True
cuda 20 1 True True
cuda 20 2 False True
cuda 20 5 False True
cuda 20 10 False True
cuda 20 11 False True
cuda 100 1 True True
cuda 100 2 False True
cuda 100 5 False True
cuda 100 10 False True
cuda 100 11 False True
cuda 1000 1 True True
cuda 1000 2 N/A False
cuda 1000 5 N/A False
cuda 1000 10 N/A False
cuda 1000 11 N/A False
cuda 10000 1 True True
cuda 10000 2 N/A False
cuda 10000 5 N/A False
cuda 10000 10 N/A False
cuda 10000 11 N/A False
I have some questions about what to do next:
- My tests do not show that CPU is nondeterministic, but that doesn't mean that we can be sure that it isn't. Should we just assume that CPU is nondeterministic and add the alert, or should we leave it alone until someone finds a nondeterministic case?
- I can document what I know about CUDA's nondeterminism. What about the question of unique k-to-index mappings? Is that important or useful for users to know?
ngimel
Nov 9, 2020
Contributor
Please check cpu implementation for the sources of nondeterminism, it's a question to which we can know the exact answer.
As for cuda, there are no guarantees of consistent mapping, any of the indices corresponding to a particular value can be returned. It may appear deterministic on some hardware/small examples, but don't be deceived by it.
kurtamohler
Nov 10, 2020
Author
Contributor
I read through the kthvalue CPU code, and I don't see anything that would be an obvious source of nondeterminism, as far as I can tell. There is a parallel aspect of kthvalue CPU, but it's just for splitting up the separate arrays that need to be sorted (which happens if the input tensor has more than one dimension), so it is not a source of nondeterminism. On each array, a serial quicksort algorithm is used and then the kth value is pulled from the sorted array. The quicksort looks deterministic to me: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Sorting.cpp#L36
|
Hey @kurtamohler, made a few inline suggestions to align the comments and documentation. Is there a good determinism test that verifies this runtime error will trigger when these functions are called? It'd be good to see it triggered on the function, method, inplace, and out variants. For median it should not be triggered if indices aren't returned but triggered if they are. |
a564999
to
c83faf3
Almost--I need to find out how to fix the buffer overflow one of my tests is causing in |
52f6fd2
to
38a06ab
|
I found out that the buffer overflow happens when I think this is ready for another review when you get a chance @mruberry |
|
I think we should disallow overlapping outputs in this case, kthvalue at least on cuda won't produce correct result. |
|
Ah ok,makes sense. I can add an overlap check and error in a different PR. |
38a06ab
to
8747a2d
Codecov Report
@@ Coverage Diff @@
## master #46942 +/- ##
==========================================
+ Coverage 80.91% 81.14% +0.23%
==========================================
Files 1855 1838 -17
Lines 200241 198605 -1636
==========================================
- Hits 162021 161164 -857
+ Misses 38220 37441 -779 |
| else: | ||
| restore() | ||
|
|
||
| def __init__(self): |
mruberry
Nov 17, 2020
Contributor
Again I don't think we want to model a decorator as using dunder init and dunder del. There are some examples of different kinds of Pythonic decorators in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_device_type.py that may be interesting to look at.
kurtamohler
Nov 17, 2020
Author
Contributor
I've changed CuBLASConfigGuard to act like a context manager
|
Hey @kurtamohler! Overall the actual change looks straightforward and good. Just a few questions/comments about "Python-ness" of the testing stuff. |
27823d0
to
330ba8a
|
I'm not sure what is causing the |
|
There was upstream failure for this, rebase and you should be fine. |
330ba8a
to
c6048a1
ddcda11
to
e7386ba
e7386ba
to
9f68aaf
|
Had to rebase to fix a conflict. I think this is ready to go. |
| # Ensures that median throws nondeterministic alerts in the correct cases | ||
| @dtypes(torch.double) | ||
| def test_median_nondeterministic_alert(self, device, dtype): | ||
| def test_func(slf, device, call_type): |
| @@ -3894,6 +3894,10 @@ def merge_dicts(*dicts): | |||
| (see :func:`torch.squeeze`), resulting in both the :attr:`values` and | |||
| :attr:`indices` tensors having 1 fewer dimension than the :attr:`input` tensor. | |||
| .. note:: | |||
| When there are multiple valid :attr:`k` th values, this function may | |||
mruberry
Dec 2, 2020
Contributor
Only if the input is CUDA, though.
"When :attr:`input` is on CUDA and there are multiple valid :attr:`k`th values, this function may nondeterministically return :attr:`indices` for any of them."
| @@ -3894,6 +3894,10 @@ def merge_dicts(*dicts): | |||
| (see :func:`torch.squeeze`), resulting in both the :attr:`values` and | |||
| :attr:`indices` tensors having 1 fewer dimension than the :attr:`input` tensor. | |||
| .. note:: | |||
mruberry
Dec 2, 2020
Contributor
Index_copy_ needs a note, too:
https://pytorch.org/docs/master/tensors.html?highlight=index_copy#torch.Tensor.index_copy_
but median already has one:
https://pytorch.org/docs/master/generated/torch.median.html?highlight=median#torch.median
|
Hey @kurtamohler! Had a look, just a few more comments/suggestions. |
| @@ -408,33 +421,28 @@ def wrapper(*args, **kwargs): | |||
| def wrapDeterministicFlagAPITest(fn): | |||
mruberry
Dec 2, 2020
Contributor
Please elaborate in this comment that tests using this wrapper need to start a subprocess (how and why) so that their cuBLAS is initialized with the new workspace config


Also fixes issue where skipped tests did not properly restore deterministic flag.
Fixes #46743