Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upPerformance: Eliminate Regex overhead in AvoidTrailingWhitespace -> Speedup of 5% (PowerShell 5.1) or 2.5 % (PowerShell 7.1-preview.2) #1465
Conversation
|
If we use regexes anywhere else in the codebase, we could probably save some performance by just making the regex static and constructing it with |
| )); | ||
| continue; | ||
| } | ||
| if (line[line.Length - 1] != ' ' && |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bergmeister
Apr 27, 2020
•
Author
Collaborator
@rjmholt because of
- readablity
- perfomance
- covering the variety of unicode chars? from the docs here, it would probably be good but what about the UnicodeCategory.LineSeparator char? I don't have much Unicode experience to make a judgement call here tbh if this list includes too much or not

This comment has been minimized.
This comment has been minimized.
rjmholt
Apr 28, 2020
Member
My thinking here is actually just that PowerShell uses that API to see whitespace.
Given how we split the string already, it's possibly dangerous to go by unicode whitespace, but possibly not...
I suspect that really this won't make much difference; leaving non-ASCII whitespace at the ends of lines isn't something I can imagine being an issue for anyone really.
This comment has been minimized.
This comment has been minimized.
bergmeister
Apr 28, 2020
Author
Collaborator
Ok, so that sounds more like a tendency to use IsWhiteSpace? I'd be OK with that, you are right that the impact is probably quite low, especially since this rules is not enabled by default for vs-code users.
| @@ -36,52 +36,67 @@ public IEnumerable<DiagnosticRecord> AnalyzeScript(Ast ast, string fileName) | |||
|
|
|||
| var diagnosticRecords = new List<DiagnosticRecord>(); | |||
|
|
|||
| string[] lines = Regex.Split(ast.Extent.Text, @"\r?\n"); | |||
| string[] lines = ast.Extent.Text.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None); | |||
This comment has been minimized.
This comment has been minimized.
rjmholt
Apr 27, 2020
Member
This makes me wonder: if we're just trying to find the extents of trailing whitespace, there's no need to split the string at all; we should just read through ourselves without allocating all these strings... But too much burden for this PR!
This comment has been minimized.
This comment has been minimized.
bergmeister
Apr 27, 2020
•
Author
Collaborator
Hmm, yh, I hear what you say, I guess for perf what counts is the 80-20 rule :-) Technically speaking string.IndexOf would probably the fastest way of finding the indices where \s\r or \s\n occurs....
I'm aware of lot's of other small micro optimisations that one can make and even tried some but they didn't have a measurable outcome. Therefore I am focussed on just fixing what gives at least a measurable return.
6fa29cb
into
PowerShell:master

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

bergmeister commentedApr 27, 2020
•
edited
PR Summary
Whitespace ignoring diff makes it clearer. This was the most expensive script analysis rule when being run in warm mode and also easy to fix :-)
It also shows the performance improvements in .Net Core 5
PR Checklist
.cs,.ps1and.psm1files have the correct copyright headerWIP:to the beginning of the title and remove the prefix when the PR is ready.