New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: add simdutf dependency #45803
deps: add simdutf dependency #45803
Conversation
|
Review requested: |
2c20c9a
to
6daa546
Compare
|
This would help speedup both ws and undici's WebSocket implementation (which is still WIP). When we receive a text frame or receive a close frame with a reason, we need to validate that the buffer contains valid utf-8. There are a few ways of doing so currently: a js implementation by default in both undici and ws, and optionally a package such as utf-8-validate. Note that simdutf is many times faster than the c++ version of utf-8-validate in the benchmark above, and the js fallback version is the slowest. Here is a PR from @lpinca that shows massive speedups when using simdutf: websockets/utf-8-validate#101. Considering how widespread usage of ws is, exposing a very fast ability to validate utf-8 would improve a ton of the ecosystem. |
5027cae
to
e94ba5f
Compare
bed88cc
to
4269faf
Compare
ced7ef2
to
5566c99
Compare
|
@lpinca Please open an issue upstream ( |
|
I will do it tomorrow. Anway as written in #45803 (comment) those 1.3 million downloads a week are basically all |
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Co-authored-by: Daniel Lemire <daniel@lemire.me> PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: TBD
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Co-authored-by: Daniel Lemire <daniel@lemire.me> PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Co-authored-by: Daniel Lemire <daniel@lemire.me> PR-URL: #45803 Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michael Dawson <midawson@redhat.com>
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061


simdutf provides a faster way of providing utf8 operations with SIMD instructions. @nodejs/undici team was looking for a way to validate utf8 input, and this dependency can make it happen.
Edit: I'm proposing either exposing the following functionality through a new module (like
node:encoding) or throughutil.typesorbuffervalidate_ascii(string)validate_utf8(string)count_utf8(string)PS:
simdutfsupports more features, and depending on the need, it makes more sense to expose them through a new module, instead ofutil.typesorbuffer.