X Tutup
The Wayback Machine - https://web.archive.org/web/20250624132115/https://github.com/PowerShell/PowerShell/pull/24711
Skip to content

Fix Start-Process -Wait polling #24711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 28, 2025
Merged

Conversation

jborean93
Copy link
Collaborator

@jborean93 jborean93 commented Dec 27, 2024

PR Summary

Update the logic for waiting for a process and its dependencies to use a completion port rather than a spin lock that polls every second. This makes it more efficient to wait for it to complete as it will return straight away.

Also adds stopping support for Unix and -Wait and when a job isn't used to track processes as currently they do not respond to StopProcessing.

PR Context

Fixes: #24709

The new approach is the one recommended by Raymond Chen at https://devblogs.microsoft.com/oldnewthing/20130405-00/?p=4743. It sets up an IO completion port and monitors the completion codes returned by the job until we've received JOB_OBJECT_MSG_ACTIVE_PROCESS_ZERO signalling all the processes have ended inside the job.

There should not be any breaking changes here, just an improvement in how we poll for the job completion.

PR Checklist

Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only style comments.

Also please add more info in the PR description: motivation, links to blog posts.

out completionCode,
out _,
out _,
INFINITE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFINITE

Will does Ctrl-C still kill the cmdlet/pipeline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point, I never hooked that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SO post contains a reasonable-looking summary of possible ways to wake up the thread: https://stackoverflow.com/questions/32921513/how-to-force-getqueuedcompletionstatus-to-return-immediately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only have a single thread on the completion port, I think the cleanest option is to use PostQueuedCompletionStatus and post a custom completion to the IOCP.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added code to post the same JOB_OBJECT_MSG_ACTIVE_PROCESS_ZERO completion code when a cancellation is done. This means we don't step on any other completion codes added in the future and a simple is cancelled check means we can differentiate between no processes and cancelled.

int dwSize = 0;
const int JOB_OBJECT_BASIC_PROCESS_ID_LIST = 3;
JOBOBJECT_BASIC_PROCESS_ID_LIST JobList = new();
if (this._completionPortHandle == nint.Zero)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason not to just use == 0?

Copy link
Collaborator Author

@jborean93 jborean93 Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just personal preference, I prefer to be explicit that we are working with native int types with these checks. The generated IL for both methods is practically the same.

@@ -9,7 +9,7 @@ internal static partial class Interop
{
internal static partial class Windows
{
[LibraryImport("Kernel32.dll", SetLastError = true)]
[LibraryImport("api-ms-win-core-job-l2-1-0.dll", SetLastError = true)]
Copy link
Contributor

@MatejKafka MatejKafka Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we should depend on the specific API set? Documentation recommends to link with kernel32.dll, and CsWin32 also uses kernel32.dll, so I'd follow that convention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll update them to use only what is documented.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Review - Needed The PR is being reviewed label Jan 3, 2025
@jborean93
Copy link
Collaborator Author

jborean93 commented Jan 4, 2025

@iSazonov someone should look at why code factor is not respecting the rules set by PowerShell. You can see the failures from missing this. at https://www.codefactor.io/repository/github/powershell/powershell/pull/24711. There are other failures but now I have no idea if they are ok to ignore or whether they are actual changes to conform to the pwsh repo. I'm not going to spend time fixing things that turn out aren't rules pwsh cares about.

This problem has existed all the way back in September, see 6736622 for a change I made to satisfy code factor and this. I never had to do it before but if CI is failing then either we need to conform to the new rule or it should be fixed so it respects the pwsh rules.

Quality software, faster.

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 5, 2025

@jborean93 I filtered out two noise issues on CodeFactor site. It seems now they takes into account only .editorconfig file. At least their documentation points only to this file. I will open a new issue here so that someone migrates the Settings.StyleCop rules to this file.

Also Code Factor check is optional. Usually maintainers ask contributors to fix some CodeFactor issues. All mandatory rules are in .globalconfig file and violations of the rules will break the build forcing contributor to fix the code.

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Review - Needed The PR is being reviewed label Jan 5, 2025
@jborean93
Copy link
Collaborator Author

Also Code Factor check is optional. Usually maintainers ask contributors to fix some CodeFactor issues. All mandatory rules are in .globalconfig file and violations of the rules will break the build forcing contributor to fix the code.

Thanks for looking into that. If this is optional then really the check should be removed. While this isn't probably the best place to talk about it having what looks to be a failing CI check can be annoying for contributors trying to get their PR in a ready for review state. An optional check is ultimately not a check at all and is extra noise and work for everyone involved. It should either be enforced or just removed altogether.

As for the other comments, I'll push a commit too add the safe handles back in. I've replied to some of the other review comments you've made though as to why I think they shouldn't change. Appreciate you looking through this.

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 6, 2025

@jborean93 Before we continue the code review in details I want to share some conceptual thoughts.

If this is optional then really the check should be removed.

We can not remove CodeFactor because it does useful work for us. For example, look issues for XML documentation comments. Historically pwsh has thousands of empty XML doc comments. If we make CodeFactor mandatory we block whole project since it is not really fix all such issues. We can not disable such issues because we want contributors add XML doc comments in new code. So CodeFactor as "optional" check is a compromise. Our workflow is quite gentle - if the maintainer sees significant CodeFactor notices, it will ask contributor to fix them to follow the repository rules.
Also issues the CodeFactor reports comes from StyleCop analyzer. The analyzer works locally for you too. Main problem is CodeFactor has false positives from time to time. We can't influence it. So this is another argument to make it optional.
I hope this information will help make your work here a little more comfortable.


Looking on a piece of the method we change here, it is conceptually as simple as:

  • create Process object
  • write it in pipeline if needed
  • wait it if needed

But the real code, due to the previously added workarounds, has become more confusing, and with the current fixes, the situation has become even worse.
We have already fixed this code more than once, and we can assume that this will continue.
In this case, we would like to bring the code to a state more convenient for maintenance.
Now, in addition to the problems already mentioned, there are other amazing things.
For example, a JobCollection. Why is this a collection? Obviously, it was designed for use in pipeline, but we only use it only in BeginProcessing for years. It is worth not only renaming this type but also simplifying it.

Another amazing thing is the comment:

using ProcessInformation processInfo = StartWithCreateProcess(startInfo);
process = Process.GetProcessById(processInfo.ProcessId);
// Starting a process as another user might make it impossible
// to get the process handle from the S.D.Process object. Use
// the ALL_ACCESS token from CreateProcess here to setup the
// job object assignment early if -Wait was specified.
// https://github.com/PowerShell/PowerShell/issues/17033

For some reason, this comment is located after the code it refers to.

Next amazing thing is:

if (PassThru.IsPresent)
{
if (process != null)
{
WriteObject(process);
}
else
{
message = StringUtil.Format(ProcessResources.CannotStarttheProcess);
ErrorRecord er = new(new InvalidOperationException(message), "InvalidOperationException", ErrorCategory.InvalidOperation, null);
ThrowTerminatingError(er);
}
}
if (Wait.IsPresent)
{
if (process != null)
{

The process is created by constructor (throw on error) and cannot be null. So no need to check this and write terminating error (after if (Wait.IsPresent) we do the same too).

Perhaps I didn't see all the problems when I skimmed through this method.

And I wouldn't write all this if you had already fixed this code more than once and probably would again. So I would welcome converting this code into something cleaner and more maintainable.
At first glance, it would be possible to create a ProcessWating wrapper for Process with different implementations for Windows and Unix. This would make the main code very simple. The wrapper could be just a disposable struct.

@jborean93
Copy link
Collaborator Author

jborean93 commented Jan 6, 2025

and with the current fixes, the situation has become even worse.

While yes the current code is a bit of a mess I'm not sure I agree that this makes it worse. This change

  • Moves out a nested class
  • Moves out some PInvoke definitions, even some that was defined in another area for some reason
  • Simplifies some nullability checks and object initialisation
  • Removes some null condition checks around the wait handle to the more common cancellation token setup

Yes some work should be done to tidy the code up and probably unpick the attempts to share some of the code flow into their own specific paths but I don't think it should be done in the same PR. I'm happy to open another PR to try and clean some more stuff up later on though.

We have already fixed this code more than once, and we can assume that this will continue.

We've fixed similar code but not this issue, this specific issue around jobs polling every 1 second has always been around.

For some reason, this comment is located after the code it refers to.

The comment is directly for the code just below it

if (Wait)
{
    jobObject = new();
    jobAssigned = jobObject.AssignProcessToJobObject(processInfo.Process);
}

It is referring to the fact that AssignProcessToJobObject might fail unless we have the ALL_ACCESS token from creating the process. In the past we did

Process process = Process.GetProcessById(processInfo.ProcessId);
jobObject.AssignProcessToJobObject(process.SafeHandle);

This was problematic because the handle returned by process.SafeHandle may not have enough permissions in the case when we started the process with custom credentials.

The process is created by constructor (throw on error) and cannot be null. So no need to check this and write terminating error (after if (Wait.IsPresent) we do the same too).

I've not looked at this code and think it's unrelated to the problem at hand.

And I wouldn't write all this if you had already fixed this code more than once and probably would again. So I would welcome converting this code into something cleaner and more maintainable.

I certainly agree that this should be fixed but I don't think it should be rewritten at the same time as fixing an unrelated problem. Refactoring should only happen when you want to cleanup the code, not do bugfixes at the same time. Doing both at the same time is makes it harder for reviews as it has to deal with both behaviour changes and just general code changes. Doing them in smaller steps makes it easier on both the person implementing the change and the people who review the code.

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 6, 2025

/azp run

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@microsoft-github-policy-service microsoft-github-policy-service bot added the Review - Needed The PR is being reviewed label Jan 15, 2025
jborean93 and others added 6 commits January 27, 2025 07:20
Update the logic for waiting for a process and its dependencies to use a
completion port rather than a spin lock that polls every second. This
makes it more efficient to wait for it to complete as it will return
straight away.
Co-authored-by: Ilya <darpa@yandex.ru>
Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jborean93 LGTM with same style comments.
Please don't rebase without maintainer ask. Rebase kill commit history and resets previous reviews.

{
Interop.Windows.GetQueuedCompletionStatus(
_completionPort,
INFINITE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the const and move it to the helper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jborean93
Copy link
Collaborator Author

Please don't rebase without maintainer ask. Rebase kill commit history and resets previous reviews.

CI says that the branch is out of date and I rebase to bring it back up to date. Using a merge pull creates merge commits and while yes a rebase requires a force push it still preserves the commit history for the PR. If you have a better suggestion then please share, personally I'm getting sick of PowerShell's CI that marks PRs as red or not ready when some things seem to be optional.

I still see the reviews, I marked them as resolved because I responded to the latest comments on there, sorry if I missed anything.

jborean93

This comment was marked as off-topic.

@iSazonov
Copy link
Collaborator

@jborean93 MSFT team use a tool for reviewing commit by commit and they ask us to keep commit history. GitHub Web GUI also doesn't like rebase - it can hide some comments (and it is very tricky to find them).
Of course, there are situations (very rarely) when we are forced to do a rebase, but at the same time we must protect the time spent by others. So it's better to coordinate rebase.

"Update branch with merge commit" button is not useful for us since we always do "Rebase and Merge" finally. I have not permission to remove the option. "Update branch with rebase" could be useful for us (mainly maintainers) if we need to rebase for running CIs before merge of very old PR.

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Review - Needed The PR is being reviewed label Jan 28, 2025
@iSazonov

This comment was marked as outdated.

This comment was marked as duplicate.

@iSazonov iSazonov added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Jan 28, 2025
@iSazonov iSazonov self-assigned this Jan 28, 2025
@iSazonov iSazonov merged commit 4e79421 into PowerShell:master Jan 28, 2025
39 of 41 checks passed
Copy link
Contributor

microsoft-github-policy-service bot commented Jan 28, 2025

📣 Hey @jborean93, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://aka.ms/PSRepoFeedback

@jborean93 jborean93 deleted the proc-wait branch January 28, 2025 17:08
@jborean93
Copy link
Collaborator Author

Thanks for the review and merge.

@iSazonov
Copy link
Collaborator

@jborean93 Thanks for great work! I hope you will find the time to clean up the file(s) so that it will be easier for us to do the subsequent work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Start-Process -Wait always waits for a whole number of seconds on Windows
3 participants
X Tutup