Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-30102: Improve libssl performance on POWER8 for, e.g sha256 #1181
Conversation
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mention-bot
Apr 19, 2017
@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers.
mention-bot
commented
Apr 19, 2017
|
@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
the-knights-who-say-ni
Apr 19, 2017
Hello, and thanks for your contribution!
I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).
Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.
Thanks again to your contribution and we look forward to looking at it!
the-knights-who-say-ni
commented
Apr 19, 2017
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
the-knights-who-say-ni
added
the
CLA not signed
label
Apr 19, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 19, 2017
Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find _OPENSSL_config symbol. Should I #ifdef the block to linux only or what?
gut
commented
Apr 19, 2017
|
Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 19, 2017
Member
Apparently OPENSSL_config is deprecated and has been mistakenly removed from Windows builds, see openssl/openssl#684 and openssl/openssl@c35f5c3.
Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace OPENSSL_config with the newer idiom?
|
Apparently Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 20, 2017
@pitrou : I found a useful documentation talking about the deprecated interface:
https://github.com/openssl/openssl/blob/cda3ae5bd0798c56fef5a5c1462d51ca1776504e/doc/crypto/OPENSSL_config.pod#notes
However I didn't have the same results when using CONF_modules_load_file(NULL, NULL, 0); instead. I'm checking it...
gut
commented
Apr 20, 2017
|
@pitrou : I found a useful documentation talking about the deprecated interface: However I didn't have the same results when using |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 20, 2017
BTW, is there a deinitialization hook in order to call ENGINE_cleanup();? As a python user, I don't recall such a feature however there might be something on cpython's core... like after the whole module was garbage collected?
https://wiki.openssl.org/index.php/Manual:Engine(3)
gut
commented
Apr 20, 2017
|
BTW, is there a deinitialization hook in order to call |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 20, 2017
Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint?
gut
commented
Apr 20, 2017
|
Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 20, 2017
Member
The requirement to call ENGINE_cleanup() sounds a bit surprising. Wouldn't the "engines" simply be deallocated at process exit?
|
The requirement to call |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 20, 2017
Member
I thought it would fix windows build but now some other symbols are not being found at link time
Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions?
Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 20, 2017
Member
As for the Windows issue, perhaps you can try applying the following diff:
diff --git a/PCbuild/_hashlib.vcxproj b/PCbuild/_hashlib.vcxproj
index b1300cb..704527c 100644
--- a/PCbuild/_hashlib.vcxproj
+++ b/PCbuild/_hashlib.vcxproj
@@ -64,7 +64,7 @@
<AdditionalIncludeDirectories>$(opensslIncludeDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
- <AdditionalDependencies>ws2_32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
+ <AdditionalDependencies>ws2_32.lib;crypt32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this.
|
As for the Windows issue, perhaps you can try applying the following diff:
If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 20, 2017
As for the Windows issue, perhaps you can try applying the following diff:
it worked! Thanks.
gut
commented
Apr 20, 2017
it worked! Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 20, 2017
Member
Great! Perhaps @haypo and @tiran give the PR another look now that it passes the CI tests.
|
Great! Perhaps @haypo and @tiran give the PR another look now that it passes the CI tests. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 21, 2017
Member
@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86?
|
@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 24, 2017
@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on...
gut
commented
Apr 24, 2017
|
@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 24, 2017
@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86.
As for AES, could you benchmark it with the change? I don't have a benchmark environment set and I can't spend the time for doing that at the moment.
gut
commented
Apr 24, 2017
|
@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
commented
Apr 25, 2017
|
@haypo @tiran : ping. Any other comments? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pitrou
Apr 25, 2017
Member
@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch.
The script I'm using is here: https://gist.github.com/pitrou/0c3586abb17d6a00853509e0afda9a68, perhaps you want to give it a quick try on POWER8.
|
@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 25, 2017
@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch.
Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop.
Thanks for the script
gut
commented
Apr 25, 2017
|
@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch. Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop. Thanks for the script |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tiran
Apr 25, 2017
Member
Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html
#ifndef OPENSSL_VERSION_1_1
OpenSSL_add_all_digests();
ERR_load_crypto_strings();
ENGINE_load_builtin_engines();
#endif
|
Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 25, 2017
Very interesting. I was testing with libssl1.0.0 version 1.0.2g-1ubuntu11.
However besides init and deinit performed automatically as stated on:
As of version 1.1.0 OpenSSL will automatically allocate all resources that it needs so no explicit initialisation is required. Similarly it will also automatically deinitialise as required.
It also states that some initialization are only performed explicitly. E.g: (emphasis added by me)
OPENSSL_INIT_ENGINE_ALL_BUILTIN
With this option the library will automatically load and initialise all the built in engines listed above with the exception of the openssl and dasync engines. This not a default option.
Which looks like what I'm trying to initialize on this PR.
I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered.
gut
commented
Apr 25, 2017
|
Very interesting. I was testing with However besides init and deinit performed automatically as stated on: It also states that some initialization are only performed explicitly. E.g: (emphasis added by me) Which looks like what I'm trying to initialize on this PR. I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tiran
Apr 26, 2017
Member
I don't think that OPENSSL_INIT_ENGINE_ALL_BUILTIN is a good idea, either. It loads too many engines. Hashing and crypto accelerator engines are covered by OPENSSL_INIT_ADD_ALL_CIPHERS and OPENSSL_INIT_ADD_ALL_DIGEST, both are default. The ALL_BUILTIN option also loads and potentially enables RDRAND on Intel, CRYPTODEV on BSD, PADLOCK on VIA CPUs, CAPI on Windows and more.
I don't trust RDRAND and so do countless people which more security experience than me. Before I agree on ENGINE_load_builtin_engines(), we must make sure that the call does not register any of the extra engines and set them as default provider for ciphers, digests, RSA, ECDSA, RNG and so on.
|
I don't think that I don't trust |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 26, 2017
BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with OPENSSL_INIT_ENGINE_ALL_BUILTIN.
As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256.
gut
commented
Apr 26, 2017
|
BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tiran
Apr 27, 2017
Member
Awesome! Thanks for you hard work! :)
I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0.
|
Awesome! Thanks for you hard work! :) I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Apr 28, 2017
@tiran Don't worry. I see your reasons for being picky.
I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:
- Calling
OPENSSL_cpuid_setupon ppc sets this HW detection flag: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L342 - That flag can be used to call a more efficient implementation on POWER8 on hashing functions like sha256_block_data_order: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L69
One drawback though is that I don't like to declare the function OPENSSL_cpuid_setup myself. How do you see it?
gut
commented
Apr 28, 2017
•
|
@tiran Don't worry. I see your reasons for being picky. I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:
One drawback though is that I don't like to declare the function |
brettcannon
removed
the
CLA not signed
label
May 9, 2017
the-knights-who-say-ni
added
the
CLA not signed
label
May 9, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Jul 11, 2017
Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an * now next to my username).
How do we trigger the @the-knights-who-say-ni to recheck this?
I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first.
Thanks in advance
gut
commented
Jul 11, 2017
|
Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first. Thanks in advance |
zware
removed
the
CLA not signed
label
Jul 11, 2017
the-knights-who-say-ni
added
the
CLA signed
label
Jul 11, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Aug 7, 2017
just solved the conflict: now the PCbuild/_hashlib.vcxproj file looks more similar to the approach of PCbuild/_ssl.vcxproj.
Is there anything else missing?
@tiran : please comment my previous reply to your review. Is that enough?
gut
commented
Aug 7, 2017
|
just solved the conflict: now the Is there anything else missing? @tiran : please comment my previous reply to your review. Is that enough? |
tiran
referenced this pull request
Aug 16, 2017
Merged
bpo-30102: Call OPENSSL_add_all_algorithms_noconf #3112
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gut
Sep 5, 2017
I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on.
Thanks you all for taking your time to handle this issue.
gut
commented
Sep 5, 2017
|
I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

gut commentedApr 19, 2017
•
edited by bedevere-bot
To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API
This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.
https://bugs.python.org/issue30102