X Tutup
The Wayback Machine - https://web.archive.org/web/20241008011253/https://github.com/python/cpython/issues/125022
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide detection for SIMD features in autoconf and at runtime #125022

Open
picnixz opened this issue Oct 6, 2024 · 3 comments
Open

Provide detection for SIMD features in autoconf and at runtime #125022

picnixz opened this issue Oct 6, 2024 · 3 comments
Assignees
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Contributor

picnixz commented Oct 6, 2024

Feature or enhancement

Proposal:

In #124951, there has been some initial discussion on improving the performances of base64 and possibly {bytearray,bytes,str}.translate using SIMD instructions.

More generally, if we want to use specific SIMD instructions, it'd be good if we at least know whether the processor supports them or not. Note that we already support SIMD in blake2 when possible. As such, I suggest an internal framework for detecting SIMD features for other part of the library as well as a compiler flag support detection.

Note that a single part of the code could benefit from some SIMD calls without having to link the entire library against the entire SIMD-128 or SIMD-256 instruction sets. Note that having a way to detect SIMD support should probably be independent of whether we would use them or not apart from the blake2 module because it could only benefit the standard library if we were to include them.


The blake2 module SIMD support is fairly... complicated due to the wide variety of platforms that need to be supported and due to the mixture of many SIMD instructions. So I don't think I want to touch that part and make it work under the new interface (at least, not for now). While I can say that I'm confident in detecting features on "widely used" systems, there are definitely systems that I don't know so I'd appreciate any help on this topic.

Has this already been discussed elsewhere?

I don't want to open a Discourse thread for now since it's mainly something that will be used internally and not to be exposed to the world.

Links to previous discussion of this feature:

There has been some discussion on Discourse already about SIMD in general and whether to include them (e.g., https://discuss.python.org/t/standard-library-support-for-simd/35138) but the number of results containing "SIMD" or "AVX" is very small. Either this is because the topic is too advanced (detecting CPU features is NOT funny and there is a lack of documentation, the best one being the Wikipedia page) or the feature request is too broad.

Linked PRs

@picnixz picnixz added type-feature A feature request or enhancement build The build process and cross-build labels Oct 6, 2024
@picnixz picnixz self-assigned this Oct 6, 2024
@picnixz picnixz added the performance Performance or resource usage label Oct 6, 2024
@corona10
Copy link
Member

corona10 commented Oct 6, 2024

My general opinion about managing SIMD logics in CPython side: #124951 (comment)
And if we begin to depend on SIMD detection, do you have any concern that an unexpected illegal instruction error can occur from the unsupported machine side because of the difference between the build machine and the execution machine?

@picnixz
Copy link
Contributor Author

picnixz commented Oct 6, 2024

I do have concerns and that's why I'd like to hear from people that 1) know about weird architectures 2) deal with real-life scenarios.

What I have in mind:

  • The feature should be entirely opt-in. Possibly under some optimization flag as well.
  • We already have SIMD in blake2 so we may already possibly have those "unexpected illegal instruction" situation. I'm currently (well not today) trying to harden the detection of AVX instructions because even recognizing -mavx may not be sufficient (e.g., we also need to handle XSAVE and how it handles YMM registers).

So, yes, I definitely have concerns on the differences. Using SIMD instructions could probably make local builds faster or builds managed by distributions themselves though we should be really careful. This is also the reason why I want to keep runtime detection to avoid issues.

The idea was to open a wider discussion on SIMD support itself. If you want we can move to Discourse though I'm not sure whether it's better to keep it internal for now (the PR is just a PoC and it probably won't cover those cases we're worried about).


I don't think we should add SIMD for every possible parts of the library, only those that are critical enough IMO. And they should be carefully worked out. However, in order to investigate them (and test them using the CI), I think having an internal detection framework would at least be the first step (or maybe I'm wrong here?).

@picnixz
Copy link
Contributor Author

picnixz commented Oct 7, 2024

I've harden the detection of AVX instructions. I've also learned that macOS may not like AVX-512 at all (or at least some registers states won't be restored correctly upon context-switching). So there are real-life issues that we should address. What I'll maybe do is first try to make a PoC for str.translate and see how AVX could be used and how it could improve Python, then I'll come back (as Gregory said on the othere issue, we are targetting relatively simpler algorithms).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants
X Tutup