X Tutup
The Wayback Machine - https://web.archive.org/web/20221022035828/https://github.com/PowerShell/PowerShell/pull/17857
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Preserve stdout byte stream for native commands #17857

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SeeminglyScience
Copy link
Collaborator

@SeeminglyScience SeeminglyScience commented Aug 5, 2022

Note that this is currently a proof of concept (albeit completely functional afaik). A lot of work is still needed in terms of polish and consistency. I'll try to outline what work still needs to be done below. Trying to get some feedback as early as possible for this change though so please feel free to review!

PR Summary

PR Context

Left to do:

  • Hook up provider path resolution and error handling
  • See if it's feasible to wrap as an experimental feature
  • Investigate Stream.CopyToAsync (thank you @daxian-dbw!)
  • Tests

PR Checklist

Also preserve for `native > file.txt`.

namespace System.Management.Automation;

internal sealed class AsyncByteStreamDrainer : IDisposable
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really don't like this name. StreamReader implies string or I'd use that. Open to suggestions

Copy link

@AE1020 AE1020 Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncByteStreamRepeater?


namespace System.Management.Automation;

internal interface IStreamSource
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file is a little over engineered and needs to be rethought a bit

if (_isPreparedCalled)
{
return;
}

_isPreparedCalled = true;

DownStreamNativeCommand?.Prepare(psDefaultParameterValues);
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be removed. Initially I wasn't lazily retrieving the stdout stream so the downstream native command had to be prepared before the upstream. I believe I've fixed that

public ProcessOutputHandler(
Process process,
BlockingCollection<ProcessOutputObject> queue,
BytePipe stdOutDestination,
out AsyncByteStreamDrainer stdOutDrainer)
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying not to too drastically alter the classes here but ProcessOutputHandler is a bit non-typical already. We were just creating an instance of it and not saving it anywhere so the constructor sort of functioned like a static method. I need to take a closer look at if it's worth rethinking this class in general as this out param doesn't help it's readability.

foreach (var redirection in redirections) {
if (redirection is MergingRedirection)
{
redirection.Bind(pipe, commandProcessor, context);
}
}
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double enumeration here was a quick band-aid for native 2>&1 > file.txt. In that scenario we want to revert to the current string reading behavior so I needed to process merging redirections before I tried to create a specialized byte pipe for a file stream.

I need to implement a better pattern here.

@@ -723,7 +757,7 @@ internal static SteppablePipeline GetSteppablePipeline(PipelineAst pipelineAst,

foreach (var commandTuple in commandTuples)
{
var commandProcessor = AddCommand(pipelineProcessor, commandTuple.Item2.ToArray(), commandTuple.Item1, commandTuple.Item3.ToArray(), context);
var commandProcessor = AddCommand(pipelineProcessor, commandTuple.Item2.ToArray(), commandTuple.Item1, commandTuple.Item3.ToArray(), context, false);
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't touched steppable pipelines yet, that likely doesn't work atm.

&& FromStream is RedirectionStream.Output
&& !string.IsNullOrWhiteSpace(File))
{
nativeCommand.StdOutDestination = FileBytePipe.Create(File, Appending);
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently just opening a file stream. It needs path resolution to be done through FileSystemProvider and it needs to surface errors similarly to how Out-File does today.

if (_inputFormat is not NativeCommandIOFormat.Text)
{
AddTextInput(input);
AddXmlInput(input);
return;
}
else // Xml

object baseObjInput = input;
if (input is PSObject pso)
{
AddXmlInput(input);
baseObjInput = pso.BaseObject;
}

if (baseObjInput is byte[] bytes)
{
_streamWriter.BaseStream.Write(bytes, 0, bytes.Length);
return;
}

if (baseObjInput is byte b)
{
_streamWriter.BaseStream.WriteByte(b);
return;
}

AddTextInput(input);
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit enables piping bytes directly to native commands. Might need to be in a different PR, not strictly in scope.

return new FileBytePipe(
new FileStream(
fileName,
append ? FileMode.Append : FileMode.OpenOrCreate,
FileAccess.Write,
FileShare.ReadWrite | FileShare.Delete));
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options here likely need to be changed to match Out-File

lock (_syncObject)
{
GetStream().Write(bytes);
}
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock might not be needed here. Originally this was due to the drainers for stdout and stderr writing to the same stream when redirected, but now we revert to string reading for that.

@AE1020
Copy link

AE1020 commented Aug 6, 2022

Glad to see this being taken seriously! It is important for generic shells--especially ones that aspire to work cross platform--to be able to pass data through agnostically.

(I cloned and built Powershell just to try this PR! 😃)

The stdout redirection did appear to work on my tests (which require not introducing CR into LF-only streams). That is good! I was able to pipe | and redirect > successfully.

However, the powershell I built seems to put the redirected native outputs wherever powershell was started up, not in the current directory. So if run from some directory like:

C:\projects\> C:\projects\powershell\src\powershell-win-core\bin\Debug\net7.0\win7-x64\publish\pwsh
PowerShell 7.3.0-preview.3-267-g4faa5f99e3f6c7f959da35ce264959bb43017126
PS C:\Projects> cd test
PS C:\Projects\test> git --version > gitver.txt
PS C:\Projects\test> cat gitver.txt
Get-Content: Cannot find path 'C:\Projects\test\gitver.txt' because it does not exist.

But we find the file in the original directory:

PS C:\Projects\test> cat ..\gitver.txt
git version 2.36.1.windows.1

Piping commands that aren't native output, e.g. echo "hello" > echotest.txt act as expected, and go to the current directory.

Redirecting 2> for errors is introducing CRs when CMD.EXE does not give them, and does not have the directory issue...so I assume it is not running the new handling. (As per my post on the discussion thread, my vote is very much that stderr be covered by the same handling!)

AE1020 added a commit to AE1020/PowerShell that referenced this pull request Aug 6, 2022
I do not know much about PowerShell, but wanted to try [building a PR to help test it](PowerShell#17857 (comment)).  (The PR addresses one big reason why I *can't* use PowerShell...[pipe/redirect corruption](PowerShell#1908).)

It is nice that there is a script which will automate the build process.  But the default disposition of powershell is to reject it.  And the link suggested by the warning message given goes to a very long page, with no obvious prescription for the exact incantation to make to run the script.

I don't know if what I did is the "right" answer: (`pwsh -ExecutionPolicy Unrestricted`).  But it worked--so I'm offering it as a first draft.  If there's a better way to advise first-time-builders, then that should be said instead.  But some sort of guidance here would be very helpful.
@SeeminglyScience
Copy link
Collaborator Author

SeeminglyScience commented Aug 6, 2022

Glad to see this being taken seriously! It is important for generic shells--especially ones that aspire to work cross platform--to be able to pass data through agnostically.

(I cloned and built Powershell just to try this PR! 😃)

❤️

However, the powershell I built seems to put the redirected native outputs wherever powershell was started up, not in the current directory. So if run from some directory like:

Ah yeah that makes sense. That'd be part of what gets fixed with my comment above about hooking up path resolution via FileSystemProvider.

Redirecting 2> for errors is introducing CRs when CMD.EXE does not give them, and does not have the directory issue...so I assume it is not running the new handling. (As per my post on the discussion thread, my vote is very much that stderr be covered by the same handling!)

Yeah I can definitely understand the desire, but the only real way to do that with consistency would be to rewrite the Process class from scratch with p/invoke code specific to every platform. While that does genuinely sound like a fun thing to write, that's a ton of extra code to maintain for the level of impact it would have. I asked in that thread because if there were show stoppers around reverting to current behavior for stderr, I'd likely have to scrap the idea entirely.

Maybe in the future dotnet will add something to the SD.Process class to make it feasible, but for now this is an acceptable compromise imo.

@AE1020
Copy link

AE1020 commented Aug 6, 2022

the only real way to do that with consistency would be to rewrite the Process class from scratch with p/invoke code specific to every platform

Can you explain in more detail why this is significantly different from the stdout case?

@SeeminglyScience
Copy link
Collaborator Author

SeeminglyScience commented Aug 6, 2022

Can you explain in more detail why this is significantly different from the stdout case?

The biggest technical issue is around merging stderr into stdout. So the ideal way this would work (using Windows APIs as an example) is I would create a single pipe (or file for redirection) with kernel32!CreatePipe (or kernel32!CreateFile) that I could pass to kernel32!CreateProcess for both stdout and stderr. I don't have a generic way to do this without writing a lot of platform specific code so instead I'm reading from the Stream created by System.Diagnostics.Process for stdout.

Doing the same for stderr and trying to flush them both into the downstream's stdin Stream leads to a significant amount of inconsistency in order. That inconsistency already exists in the current string reading behavior but it's made worse when dealing with bytes directly as you can end up writing half of a line (or even half of a unicode glyph) from stdout and then another half from stderr.

This doesn't directly exclude lighting up the native 2> log.txt scenario and reverting only for native 2>&1, but it does make it less clear cut. It's easy to understand that "error is text, so it's treated like strings, and when you merge it makes stdout also like strings". It's harder to understand "neither are treated like strings unless you merge them together" if that makes sense.

It's also not necessarily a simple switch I'd be flipping to enable it for stderr as well. There are a lot of places it needs to be special cased in pipeline creation similarly to how I'm doing it for stdout but subtly different. If/when this PR is merged I'll create a separate issue that is specifically for preserving bytes when redirecting stderr so the engine WG can discuss it. This PR will be for stdout specifically, but that doesn't necessarily mean another PR can't address it.

@AE1020
Copy link

AE1020 commented Aug 7, 2022

The stdout case probably would satisfy most people, whose concerns are more about binary preservation with things like curl/zip. My issue with cross-platform consistency on CRLF is its own crusade, which is distinct from this if one is going to actually "use" PowerShell in a bigger sense. But it happens to be measurably better with this change.

I can see that interleaving the streams raises a lot of issues. I'm not clear on how it ever works (are UNIX shells at risk of doing half-codepoints in UTF-8? If not, how? Would their guarantee be accomplished by line buffering vs. understanding UTF-8 encodings specifically? If so, what happens when the line buffer size is exceeded?) 😦 Sounds like something one would find a lot of inconvenient truths by reading up on...

@SeeminglyScience
Copy link
Collaborator Author

SeeminglyScience commented Aug 8, 2022

I can see that interleaving the streams raises a lot of issues. I'm not clear on how it ever works (are UNIX shells at risk of doing half-codepoints in UTF-8? If not, how? Would their guarantee be accomplished by line buffering vs. understanding UTF-8 encodings specifically? If so, what happens when the line buffer size is exceeded?) 😦 Sounds like something one would find a lot of inconvenient truths by reading up on...

The trouble mostly comes from utilizing System.Diagnostics.Process. When you redirect a stream using that API, it creates a pipe for each stream you've redirected. Then I would be asynchronously reading from both of these separate streams and trying to merge the output after the fact. These streams will also have different buffer sizes and flush intervals.

It's much easier when you are synchronously reading from and writing to a single pipe, as most platform specific shells with less output processing will do.

@msftbot msftbot bot added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 11, 2022
@msftbot msftbot bot added the Stale label Aug 26, 2022
@msftbot
Copy link

msftbot bot commented Aug 26, 2022

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment.

@AE1020
Copy link

AE1020 commented Aug 29, 2022

@adityapatwardhan What sort of feedback would be needed for PowerShell to prioritize reviewing this?

It is a rather popular request (see #1908, and touched upon by some other issues), where these seem like things people expect to work in a shell:

curl.exe http://whatever/a.png > a.png

node a.js | gzip -c > out.gz

Modulo the output directory I found when trying it, it seemed to work for that.

@msftbot msftbot bot removed the Stale label Aug 29, 2022
@SeeminglyScience
Copy link
Collaborator Author

SeeminglyScience commented Aug 29, 2022

What sort of feedback would be needed for PowerShell to prioritize reviewing this?

It's not finished yet, it's a work in progress PR. You can ignore the bot, if it closed it then I'd just reopen

@msftbot msftbot bot added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Aug 29, 2022
{
isNativeCommand = true;
nativeCommandProcessor.UpstreamIsNativeCommand = lastCommandWasNative;
}
Copy link
Member

@daxian-dbw daxian-dbw Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this won't work for pipelines invoked via PowerShell.AddCommand, given the pipeline construction won't go through the compiler code.

Similar to the usage of Runspace.CreatePipeline() and Runspace.CreateNestedPipeline, as they don't go through the compiler code either. We may need to do this within the pipeline processor.


private Task? _readToBufferTask;

public AsyncByteStreamDrainer(IStreamSource streamSource, SpanAction<byte, object?> callback, object? callbackArg, Action completedCallback)
Copy link
Member

@daxian-dbw daxian-dbw Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try the Stream.CopyToAsync API? If it works, we may not need AsyncByteStreamDrainer.

@msftbot msftbot bot added the Stale label Sep 17, 2022
@msftbot
Copy link

msftbot bot commented Sep 17, 2022

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment.

@daxian-dbw daxian-dbw removed the Stale label Sep 17, 2022
@msftbot msftbot bot added the Stale label Oct 2, 2022
@msftbot
Copy link

msftbot bot commented Oct 2, 2022

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment.

@pull-request-quantifier
Copy link

pull-request-quantifier bot commented Oct 7, 2022

This PR has 412 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Large
Size       : +367 -45
Percentile : 80.4%

Total files changed: 5

Change summary by file extension:
.cs : +367 -45

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  👌  👎 (Email)
Customize PullRequestQuantifier for this repository.

catch (Exception e) when (e.Data.Contains(typeof(ErrorRecord)))
{
// The error record is attached to the exception when thrown to preserve
// the call stack.
ErrorRecord? errorRecord = e.Data[typeof(ErrorRecord)] as ErrorRecord;
if (errorRecord is null)
{
throw;
}

e.Data.Remove(typeof(ErrorRecord));
throw new RuntimeException(null, e, errorRecord);
}
Copy link
Collaborator Author

@SeeminglyScience SeeminglyScience Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cleanest way I could think of to throw the underlying error record/exception while retaining the same exception type as Out-File without stripping the stack trace. Open to other ideas

@msftbot msftbot bot removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Oct 7, 2022
@iSazonov
Copy link
Collaborator

iSazonov commented Oct 8, 2022

Combine sequential native pipline elements and treat them as one pipeline element. Link them with IO redirection directly instead of powershell pipeline.

This was in #559 (comment)
If I understand write it is that we want, but can not implement due to Process class limitations, so we would have to fall in low APIs?

@SeeminglyScience
Copy link
Collaborator Author

SeeminglyScience commented Oct 10, 2022

This was in #559 (comment) If I understand write it is that we want, but can not implement due to Process class limitations, so we would have to fall in low APIs?

So it kind of depends how you interpret what @vors was saying there. In a sense, that's what I'm doing in this PR.

In another sense (which is what you're getting at) there is another route that would be a much bigger lift but would be closer to how other shells implement it.

On Windows that would be using CreatePipe to create a single pipe that we pass to CreateProcess for both the upstream's stdout and the downstream's stdin. And something somewhat similar on *nix with pipe.

That would require rewriting a significant chunk of the code as we wouldn't be able to use System.Diagnostics.Process at all really. I'm definitely interested to see if I can make that work, but not in this PR. Probably something I need to pursue on my own time, and tbh highly likely to be too great of an additional maintenance burden for the value add anyway.

@msftbot msftbot bot added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Extra Large Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants
X Tutup