X Tutup
The Wayback Machine - https://web.archive.org/web/20200918095425/https://github.com/PowerShell/PSScriptAnalyzer/pull/1465
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Eliminate Regex overhead in AvoidTrailingWhitespace -> Speedup of 5% (PowerShell 5.1) or 2.5 % (PowerShell 7.1-preview.2) #1465

Merged
merged 7 commits into from Apr 28, 2020

Conversation

@bergmeister
Copy link
Collaborator

bergmeister commented Apr 27, 2020

PR Summary

Whitespace ignoring diff makes it clearer. This was the most expensive script analysis rule when being run in warm mode and also easy to fix :-)
It also shows the performance improvements in .Net Core 5

PR Checklist

@bergmeister bergmeister changed the title Performance: Eliminate Regex overhead in AvoidTrailingWhitespace -> Speedup of 5% (PowerShell 5.1) or 2.5 % (PowerShell 7) Performance: Eliminate Regex overhead in AvoidTrailingWhitespace -> Speedup of 5% (PowerShell 5.1) or 2.5 % (PowerShell 7.1-preview.2) Apr 27, 2020
@bergmeister bergmeister requested review from JamesWTruher and rjmholt Apr 27, 2020
Christoph Bergmeister
Copy link
Member

rjmholt left a comment

If we use regexes anywhere else in the codebase, we could probably save some performance by just making the regex static and constructing it with RegexOptions.Compile

Rules/AvoidTrailingWhitespace.cs Show resolved Hide resolved
));
continue;
}
if (line[line.Length - 1] != ' ' &&

This comment has been minimized.

@rjmholt

rjmholt Apr 27, 2020 Member

Would this be better as char.IsWhiteSpace(line[line.Length - 1])?

This comment has been minimized.

@bergmeister

bergmeister Apr 27, 2020 Author Collaborator

@rjmholt because of

  • readablity
  • perfomance
  • covering the variety of unicode chars? from the docs here, it would probably be good but what about the UnicodeCategory.LineSeparator char? I don't have much Unicode experience to make a judgement call here tbh if this list includes too much or not
    image

This comment has been minimized.

@rjmholt

rjmholt Apr 28, 2020 Member

My thinking here is actually just that PowerShell uses that API to see whitespace.

Given how we split the string already, it's possibly dangerous to go by unicode whitespace, but possibly not...

I suspect that really this won't make much difference; leaving non-ASCII whitespace at the ends of lines isn't something I can imagine being an issue for anyone really.

This comment has been minimized.

@bergmeister

bergmeister Apr 28, 2020 Author Collaborator

Ok, so that sounds more like a tendency to use IsWhiteSpace? I'd be OK with that, you are right that the impact is probably quite low, especially since this rules is not enabled by default for vs-code users.

Rules/AvoidTrailingWhitespace.cs Show resolved Hide resolved
Rules/AvoidTrailingWhitespace.cs Outdated Show resolved Hide resolved
Rules/AvoidTrailingWhitespace.cs Outdated Show resolved Hide resolved
Rules/AvoidTrailingWhitespace.cs Outdated Show resolved Hide resolved
Rules/AvoidTrailingWhitespace.cs Outdated Show resolved Hide resolved
@@ -36,52 +36,67 @@ public IEnumerable<DiagnosticRecord> AnalyzeScript(Ast ast, string fileName)

var diagnosticRecords = new List<DiagnosticRecord>();

string[] lines = Regex.Split(ast.Extent.Text, @"\r?\n");
string[] lines = ast.Extent.Text.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);

This comment has been minimized.

@rjmholt

rjmholt Apr 27, 2020 Member

This makes me wonder: if we're just trying to find the extents of trailing whitespace, there's no need to split the string at all; we should just read through ourselves without allocating all these strings... But too much burden for this PR!

This comment has been minimized.

@bergmeister

bergmeister Apr 27, 2020 Author Collaborator

Hmm, yh, I hear what you say, I guess for perf what counts is the 80-20 rule :-) Technically speaking string.IndexOf would probably the fastest way of finding the indices where \s\r or \s\n occurs....
I'm aware of lot's of other small micro optimisations that one can make and even tried some but they didn't have a measurable outcome. Therefore I am focussed on just fixing what gives at least a measurable return.

bergmeister and others added 2 commits Apr 27, 2020
Co-Authored-By: Robert Holt <rjmholt@gmail.com>
Christoph Bergmeister
@bergmeister bergmeister merged commit 6fa29cb into PowerShell:master Apr 28, 2020
12 checks passed
12 checks passed
PSScriptAnalyzer-CI Build #20200428.6 succeeded
Details
PSScriptAnalyzer-CI (Build Full_Build) Build Full_Build succeeded
Details
PSScriptAnalyzer-CI (Test Ubuntu_16_04) Test Ubuntu_16_04 succeeded
Details
PSScriptAnalyzer-CI (Test Ubuntu_18_04) Test Ubuntu_18_04 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2016_PowerShell_5_1) Test Windows_Server2016_PowerShell_5_1 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2016_PowerShell_Core) Test Windows_Server2016_PowerShell_Core succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2019_PowerShell_5_1) Test Windows_Server2019_PowerShell_5_1 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2019_PowerShell_Core) Test Windows_Server2019_PowerShell_Core succeeded
Details
PSScriptAnalyzer-CI (Test macOS_10_14_Mojave) Test macOS_10_14_Mojave succeeded
Details
PSScriptAnalyzer-CI (Test macOS_10_15_Catalina) Test macOS_10_15_Catalina succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.
X Tutup