New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30102: Improve libssl performance on POWER8 for, e.g sha256 #1181

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@gut

gut commented Apr 19, 2017

To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API

This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.

https://bugs.python.org/issue30102

Gustavo Serra Scalet
Improve libssl performance on POWER8 for, e.g sha256
To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API

This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.
@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Apr 19, 2017

@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers.

mention-bot commented Apr 19, 2017

@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers.

@the-knights-who-say-ni

This comment has been minimized.

Show comment
Hide comment
@the-knights-who-say-ni

the-knights-who-say-ni Apr 19, 2017

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

the-knights-who-say-ni commented Apr 19, 2017

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 19, 2017

Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find _OPENSSL_config symbol. Should I #ifdef the block to linux only or what?

gut commented Apr 19, 2017

Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find _OPENSSL_config symbol. Should I #ifdef the block to linux only or what?

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 19, 2017

Member

Apparently OPENSSL_config is deprecated and has been mistakenly removed from Windows builds, see openssl/openssl#684 and openssl/openssl@c35f5c3.

Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace OPENSSL_config with the newer idiom?

Member

pitrou commented Apr 19, 2017

Apparently OPENSSL_config is deprecated and has been mistakenly removed from Windows builds, see openssl/openssl#684 and openssl/openssl@c35f5c3.

Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace OPENSSL_config with the newer idiom?

Show outdated Hide outdated Modules/_hashopenssl.c
@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 20, 2017

@pitrou : I found a useful documentation talking about the deprecated interface:
https://github.com/openssl/openssl/blob/cda3ae5bd0798c56fef5a5c1462d51ca1776504e/doc/crypto/OPENSSL_config.pod#notes

However I didn't have the same results when using CONF_modules_load_file(NULL, NULL, 0); instead. I'm checking it...

gut commented Apr 20, 2017

@pitrou : I found a useful documentation talking about the deprecated interface:
https://github.com/openssl/openssl/blob/cda3ae5bd0798c56fef5a5c1462d51ca1776504e/doc/crypto/OPENSSL_config.pod#notes

However I didn't have the same results when using CONF_modules_load_file(NULL, NULL, 0); instead. I'm checking it...

Gustavo Serra Scalet
Use ENGINES instead of deprecated config interface
Both solutions lead to a performance improvement on POWER8 when using
sha digest
@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 20, 2017

BTW, is there a deinitialization hook in order to call ENGINE_cleanup();? As a python user, I don't recall such a feature however there might be something on cpython's core... like after the whole module was garbage collected?
https://wiki.openssl.org/index.php/Manual:Engine(3)

gut commented Apr 20, 2017

BTW, is there a deinitialization hook in order to call ENGINE_cleanup();? As a python user, I don't recall such a feature however there might be something on cpython's core... like after the whole module was garbage collected?
https://wiki.openssl.org/index.php/Manual:Engine(3)

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 20, 2017

Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint?

gut commented Apr 20, 2017

Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint?

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 20, 2017

Member

The requirement to call ENGINE_cleanup() sounds a bit surprising. Wouldn't the "engines" simply be deallocated at process exit?

Member

pitrou commented Apr 20, 2017

The requirement to call ENGINE_cleanup() sounds a bit surprising. Wouldn't the "engines" simply be deallocated at process exit?

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 20, 2017

Member

I thought it would fix windows build but now some other symbols are not being found at link time

Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions?

Member

pitrou commented Apr 20, 2017

I thought it would fix windows build but now some other symbols are not being found at link time

Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions?

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 20, 2017

Member

As for the Windows issue, perhaps you can try applying the following diff:

diff --git a/PCbuild/_hashlib.vcxproj b/PCbuild/_hashlib.vcxproj
index b1300cb..704527c 100644
--- a/PCbuild/_hashlib.vcxproj
+++ b/PCbuild/_hashlib.vcxproj
@@ -64,7 +64,7 @@
       <AdditionalIncludeDirectories>$(opensslIncludeDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
     </ClCompile>
     <Link>
-      <AdditionalDependencies>ws2_32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>ws2_32.lib;crypt32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
     </Link>
   </ItemDefinitionGroup>
   <ItemGroup>

If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this.

Member

pitrou commented Apr 20, 2017

As for the Windows issue, perhaps you can try applying the following diff:

diff --git a/PCbuild/_hashlib.vcxproj b/PCbuild/_hashlib.vcxproj
index b1300cb..704527c 100644
--- a/PCbuild/_hashlib.vcxproj
+++ b/PCbuild/_hashlib.vcxproj
@@ -64,7 +64,7 @@
       <AdditionalIncludeDirectories>$(opensslIncludeDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
     </ClCompile>
     <Link>
-      <AdditionalDependencies>ws2_32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>ws2_32.lib;crypt32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
     </Link>
   </ItemDefinitionGroup>
   <ItemGroup>

If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this.

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 20, 2017

As for the Windows issue, perhaps you can try applying the following diff:

it worked! Thanks.

gut commented Apr 20, 2017

As for the Windows issue, perhaps you can try applying the following diff:

it worked! Thanks.

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 20, 2017

Member

Great! Perhaps @haypo and @tiran give the PR another look now that it passes the CI tests.

Member

pitrou commented Apr 20, 2017

Great! Perhaps @haypo and @tiran give the PR another look now that it passes the CI tests.

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 21, 2017

Member

@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86?

Member

pitrou commented Apr 21, 2017

@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86?

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 24, 2017

@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on...

gut commented Apr 24, 2017

@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on...

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 24, 2017

@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86.
As for AES, could you benchmark it with the change? I don't have a benchmark environment set and I can't spend the time for doing that at the moment.

gut commented Apr 24, 2017

@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86.
As for AES, could you benchmark it with the change? I don't have a benchmark environment set and I can't spend the time for doing that at the moment.

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 25, 2017

@haypo @tiran : ping. Any other comments?

gut commented Apr 25, 2017

@haypo @tiran : ping. Any other comments?

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou Apr 25, 2017

Member

@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch.
The script I'm using is here: https://gist.github.com/pitrou/0c3586abb17d6a00853509e0afda9a68, perhaps you want to give it a quick try on POWER8.

Member

pitrou commented Apr 25, 2017

@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch.
The script I'm using is here: https://gist.github.com/pitrou/0c3586abb17d6a00853509e0afda9a68, perhaps you want to give it a quick try on POWER8.

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 25, 2017

@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch.

Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop.

Thanks for the script

gut commented Apr 25, 2017

@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch.

Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop.

Thanks for the script

@tiran

This comment has been minimized.

Show comment
Hide comment
@tiran

tiran Apr 25, 2017

Member

Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html

#ifndef OPENSSL_VERSION_1_1
OpenSSL_add_all_digests();
ERR_load_crypto_strings();
ENGINE_load_builtin_engines();
#endif
Member

tiran commented Apr 25, 2017

Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html

#ifndef OPENSSL_VERSION_1_1
OpenSSL_add_all_digests();
ERR_load_crypto_strings();
ENGINE_load_builtin_engines();
#endif
@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 25, 2017

Very interesting. I was testing with libssl1.0.0 version 1.0.2g-1ubuntu11.

However besides init and deinit performed automatically as stated on:
As of version 1.1.0 OpenSSL will automatically allocate all resources that it needs so no explicit initialisation is required. Similarly it will also automatically deinitialise as required.

It also states that some initialization are only performed explicitly. E.g: (emphasis added by me)
OPENSSL_INIT_ENGINE_ALL_BUILTIN
With this option the library will automatically load and initialise all the built in engines listed above with the exception of the openssl and dasync engines. This not a default option.

Which looks like what I'm trying to initialize on this PR.

I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered.

gut commented Apr 25, 2017

Very interesting. I was testing with libssl1.0.0 version 1.0.2g-1ubuntu11.

However besides init and deinit performed automatically as stated on:
As of version 1.1.0 OpenSSL will automatically allocate all resources that it needs so no explicit initialisation is required. Similarly it will also automatically deinitialise as required.

It also states that some initialization are only performed explicitly. E.g: (emphasis added by me)
OPENSSL_INIT_ENGINE_ALL_BUILTIN
With this option the library will automatically load and initialise all the built in engines listed above with the exception of the openssl and dasync engines. This not a default option.

Which looks like what I'm trying to initialize on this PR.

I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered.

@tiran

This comment has been minimized.

Show comment
Hide comment
@tiran

tiran Apr 26, 2017

Member

I don't think that OPENSSL_INIT_ENGINE_ALL_BUILTIN is a good idea, either. It loads too many engines. Hashing and crypto accelerator engines are covered by OPENSSL_INIT_ADD_ALL_CIPHERS and OPENSSL_INIT_ADD_ALL_DIGEST, both are default. The ALL_BUILTIN option also loads and potentially enables RDRAND on Intel, CRYPTODEV on BSD, PADLOCK on VIA CPUs, CAPI on Windows and more.

I don't trust RDRAND and so do countless people which more security experience than me. Before I agree on ENGINE_load_builtin_engines(), we must make sure that the call does not register any of the extra engines and set them as default provider for ciphers, digests, RSA, ECDSA, RNG and so on.

Member

tiran commented Apr 26, 2017

I don't think that OPENSSL_INIT_ENGINE_ALL_BUILTIN is a good idea, either. It loads too many engines. Hashing and crypto accelerator engines are covered by OPENSSL_INIT_ADD_ALL_CIPHERS and OPENSSL_INIT_ADD_ALL_DIGEST, both are default. The ALL_BUILTIN option also loads and potentially enables RDRAND on Intel, CRYPTODEV on BSD, PADLOCK on VIA CPUs, CAPI on Windows and more.

I don't trust RDRAND and so do countless people which more security experience than me. Before I agree on ENGINE_load_builtin_engines(), we must make sure that the call does not register any of the extra engines and set them as default provider for ciphers, digests, RSA, ECDSA, RNG and so on.

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 26, 2017

BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with OPENSSL_INIT_ENGINE_ALL_BUILTIN.

As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256.

gut commented Apr 26, 2017

BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with OPENSSL_INIT_ENGINE_ALL_BUILTIN.

As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256.

@tiran

This comment has been minimized.

Show comment
Hide comment
@tiran

tiran Apr 27, 2017

Member

Awesome! Thanks for you hard work! :)

I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0.

Member

tiran commented Apr 27, 2017

Awesome! Thanks for you hard work! :)

I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0.

Gustavo Serra Scalet
Detecting platform with OPENSSL_cpuid_setup instead
Replace ENGINE_load_builtin_engines to OPENSSL_cpuid_setup. It leads to
the performance gain on hashing on POWER8 platform.

I don't like the explicit function declaration but I didn't find it on
OpenSSL headers.
@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Apr 28, 2017

@tiran Don't worry. I see your reasons for being picky.

I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:

  1. Calling OPENSSL_cpuid_setup on ppc sets this HW detection flag: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L342
  2. That flag can be used to call a more efficient implementation on POWER8 on hashing functions like sha256_block_data_order: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L69

One drawback though is that I don't like to declare the function OPENSSL_cpuid_setup myself. How do you see it?

gut commented Apr 28, 2017

@tiran Don't worry. I see your reasons for being picky.

I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:

  1. Calling OPENSSL_cpuid_setup on ppc sets this HW detection flag: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L342
  2. That flag can be used to call a more efficient implementation on POWER8 on hashing functions like sha256_block_data_order: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L69

One drawback though is that I don't like to declare the function OPENSSL_cpuid_setup myself. How do you see it?

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Jul 11, 2017

Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an * now next to my username).
How do we trigger the @the-knights-who-say-ni to recheck this?

I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first.

Thanks in advance

gut commented Jul 11, 2017

Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an * now next to my username).
How do we trigger the @the-knights-who-say-ni to recheck this?

I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first.

Thanks in advance

Show outdated Hide outdated Modules/_ssl.c
Show outdated Hide outdated Modules/_ssl.c
@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Aug 7, 2017

just solved the conflict: now the PCbuild/_hashlib.vcxproj file looks more similar to the approach of PCbuild/_ssl.vcxproj.

Is there anything else missing?

@tiran : please comment my previous reply to your review. Is that enough?

gut commented Aug 7, 2017

just solved the conflict: now the PCbuild/_hashlib.vcxproj file looks more similar to the approach of PCbuild/_ssl.vcxproj.

Is there anything else missing?

@tiran : please comment my previous reply to your review. Is that enough?

@gut

This comment has been minimized.

Show comment
Hide comment
@gut

gut Sep 5, 2017

I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on.
Thanks you all for taking your time to handle this issue.

gut commented Sep 5, 2017

I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on.
Thanks you all for taking your time to handle this issue.

@gut gut closed this Sep 5, 2017

@gut gut deleted the PPC64:improve-POWER8-performance-on-libssl branch Sep 5, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment