X Tutup
The Wayback Machine - https://web.archive.org/web/20221226174047/https://github.com/PowerShell/PowerShell/issues/5643
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerShell should support creating an List similar to how it supports arrays #5643

Open
TravisEz13 opened this issue Dec 6, 2017 · 64 comments
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision Issue-Enhancement the issue is more of a feature request than a bug WG-Language parser, language semantics

Comments

@TravisEz13
Copy link
Member

TravisEz13 commented Dec 6, 2017

Powershell supports creating arrays with $array = 'a', 1, '3' . Then you can add an element to the array with $array += 4, but this creates a new array which is not performant.

Powershell should have a syntax which allows creating lists.
Assuming the operator is @[...], you could create a list with $list = @['a', 1, '3'] and then you could add an element to the existing list with $list += 4 without PowerShell having to create a new list.
Note: this new operator might function more like @(...)

This design assumes that changing , would be a breaking change. I'm open to discussing changing , as well.

I filed this based on an offline discussion about this comment on a PR: #5625 (comment)

@rkeithhill
Copy link
Collaborator

rkeithhill commented Dec 6, 2017

I like the idea but to truly impact performance you'd need to be operating on large lists. For a convenient way to create large lists, I would expect something like this to work @[Get-ChildItem C:\Windows -r -file *.dll -ea 0]. While the list literal form is nice, I don't see folks creating lists large enough to gain much of a perf benefit over using an array. Well, unless the list literal is created inside a busy (large n) loop.

@lzybkr
Copy link
Member

lzybkr commented Dec 6, 2017

Two points:

  • @() says - make sure the thing inside is an array. It's not necessary if you use , because the comma operator always creates an array.
  • I've often wondered if the comma operator could create a list instead of an array. I have a feeling most scripts would never notice a difference because of how freely things are converted to an object array.

@TravisEz13
Copy link
Member Author

TravisEz13 commented Dec 6, 2017

@lzybkr I updated my description based on @lzybkr 's comments

@daxian-dbw
Copy link
Member

daxian-dbw commented Dec 7, 2017

I like the idea of having a list literal in powershell. I think it could have a syntax like @[1, 2, 3] to directly create a list with elements 1, 2 and 3 without first create an array literal from 1,2,3 and then make it a list using @[].

@cchiu1979
Copy link

cchiu1979 commented Dec 7, 2017

@lzybkr is it something like this?
1,2,3
(1,2,3).length
( , (1,2,3) ).length
( @(1,2,3) ).length

@lzybkr
Copy link
Member

lzybkr commented Dec 7, 2017

Sure, alias properties would be needed to make lists work just like arrays.

@SteveL-MSFT SteveL-MSFT added the WG-Language parser, language semantics label Dec 12, 2017
@mklement0
Copy link
Contributor

mklement0 commented Dec 15, 2017

I've often wondered if the comma operator could create a list instead of an array.

If that's not considered too much of a breaking change, it would certainly be the best solution.

Otherwise:

@rkeithhill:

I get what you're saying about large lists, but that's where += comes in as a convenient syntax for appending to the list (calling ::Add() or ::AddRange() on the [System.Collections.ArrayList] or [System.Collections.Generic.List[object]] instance behind the scenes - unlike today's behavior of +=, which either silently recreates the variable content as an array or, if the variable was type-constrained, as a new instance).

In other words: something like the following would make sense:

$al = @[] # simpler than: [System.Collections.ArrayList]::new()

for ($i = 0; $i -lt 1000; ++$i) {
  $al += $i # simpler than: $null = $al.Add($i)
}

@daxian-dbw
Copy link
Member

daxian-dbw commented Dec 29, 2017

I submitted two PRs (WIP) with different designs for the List support in PowerShell:

  1. [WIP] Support ListExpression '@[]' in PowerShell #5762 -- Support @[], similar to @()
  2. [WIP] Support ListLiteralExpression '[]' in PowerShell #5761 -- Support ListLiteralExpression '[]', similar to ArrayLiteralExpression

@[] is my first design. However, I ran into a blocking issue regarding the closing bracket character ']'. Quoted from #5762:

@[] has a SubExpression like '@()' and '$()'. However, unlike the closing parenthesis character ')', the closing bracket character ']' doesn't always force to start a new token, and it can be included in a generic token, meaning that ']' can appear in a command name, argument, or function name. This makes it impossible for @[dir] to determine the ending of the list expression because dir] will be treated as a single generic token.

This PR adds the property InListSubExpression to Tokenizer, and makes ']' a force-to-start-new-token character when _tokenizer.InListSubExpression is set. This approach solves the most common UX problem but is by no way perfect, for example, comparing to @(funcHas[]inName) or @(dir has[]inpath), @[funcHas[]inName] and @[dir has[]inpath] won't work because the first ']' will force the command name to end.

Without breaking change, I think the best we can do is probably to make ']' a force-to-start-new-token character when parsing a command invocation pipeline in @[] but not when parsing any nested expression or statement within the @[].

At the same time, I started to think an alternative -- add ListLiteralExpression like the ArrayLiteralExpression. In that case, a list can only contain Expression elements and hence command name, arguments, and function names won't be a problem for the ending bracket. PR #5761 is for that design, where we use '[]' (same token pair as TypeConstraint and Attribute).

I hope the those 2 PRs can draw more discussion on the design.

@markekraus
Copy link
Contributor

markekraus commented Dec 29, 2017

@daxian-dbw since this code is beyond my understanding, does it attempt to create a strongly typed list, or is always List<Object>?

@daxian-dbw
Copy link
Member

daxian-dbw commented Dec 29, 2017

@markekraus It attempts to always create List<object>, like @() ways create an object[].

@daxian-dbw
Copy link
Member

daxian-dbw commented Dec 30, 2017

@lzybkr proposed to use new token pairs instead of @[] to represent a ListExpression in #5762 (comment):

You could consider 2 character tokens.
For example, F# uses this syntax for an array literal:

[| 1; 2 |]

There are other possibilities that probably aren't breaking changes, e.g. [< 1, 2 >].
The key here is to use a second character that can't be in a command name.

It would be great to have @[] to represent ListExpression, but I'm fine with new token pairs. I will prototype with [<>]. [<>] won't work because '>]' is allowed in a generic token. [| .. |] may work. If new token pairs are acceptable, I definitely prefer ListExpression over ListLiteral.

@mklement0
Copy link
Contributor

mklement0 commented Dec 30, 2017

@daxian-dbw: I'm really glad to see you take this on, but before we go any further with the syntax debate:

Is the consensus that we cannot just simply switch ,, the array construction operator, to an array-list/generic-list implementation behind the scenes, as @lzybkr hinted at - for reasons of backward compatibility?

The answer may well be that yes, it's too risky to make that change (I personally cannot tell), but if it happens to be no, after all, there's no need for a syntax debate.

@daxian-dbw
Copy link
Member

daxian-dbw commented Dec 31, 2017

@mklement0 IMHO, there would be 3 problems if we simply change the comma operator ',' to return a list:

  1. The AST type name ArrayLiteralAst would be inconsistent, but changing it would be a huge breaking change. There would be other breaking changes like the returned value of StaticType property, but the AST type name would be the most problematic one I guess.
  2. With the comma operator, we wouldn't be able to create an empty list.
  3. The comma operator only takes Expression elements, not arbitrary statements like @() does, for example, comparing to @(dir), you would have to use ,(dir). Besides, the comma operator doesn't unwrap the Expression value because it's literal (ArrayLiteralAst). So ,(dir) would return a one-element list that contains an object array.

I prefer a ListExpressionAst '@[]' over a ListLiteralAst '[]' because of the 3rd one above.

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 1, 2018

@daxian-dbw Thanks for great prototypes!
I'd prefer @[] if it would possible to implement. I very wonder to see something like[| ... |] - if we haven't another way I'd rather see simple List( ... ) or [List]1,2,3.

  • If we have problem with last ] in@[] could we use @[ 1, 2, 3 ]@ like multiline string literals?
  • @@[] don't resolve the problem.
  • We could reuse parentheses with other prefix - if @() array, $() singletion then %() or &() or *() - list.

I personally like *().

@markekraus
Copy link
Contributor

markekraus commented Jan 1, 2018

*() would be somewhat ambiguous. Should 5*(Get-Random) throw a RuntimeException for missing op_Multiply on List a CommandNotFoundException or should it multiply a random number by 5?

@daxian-dbw
Copy link
Member

daxian-dbw commented Jan 1, 2018

'%(1)' is parsed into a CommandAst today, where '%' is the command name (foreach-object), and the argument is (1).
'&(1)' is parsed into a CommandAst today, where '&' is the invocation operator and the command name is (1).
'*()' is also ambiguous, as @markekraus pointed out.

@markekraus
Copy link
Contributor

markekraus commented Jan 2, 2018

Minor correction: %{} would be the foreach-obejct. %() is ambiguous with modulo. e.g 5%(Get-Random -Minimum 1 -Maximum 5)

Also, @@[] would possibly be problematic for extended splat literals (if they ever make their way out of RFC).

outside of the literals.. I like the idea of Lists getting an accelerator, but only if it works similar to using namespace System.Collections.Generic making $MyList = [List[MyClass]]::New() easier. I would not like a [List] accelerator without the ability to set the type unless if could play nice and create List<Object> by default but still allow creating lists of a desired type.

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 2, 2018

I have only one question - where I can buy Unicode keyboard with 32000 buttons to replace my 102 keyboard? 😄

We could combine the accelerator idea and list literals:

@[int](1,2,3)
@[string](dir C:\)
@[](1,2,3) as short cut of @[object](1,2,3)

@mklement0
Copy link
Contributor

mklement0 commented Jan 5, 2018

@daxian-dbw: Thanks for the detailed feedback.

I can't speak to 1. (AST names), but perhaps the answer is to special-case @() for the @(<empty-or-scalar-or-array-literal>) cases, such as @(), @(3), or @(1, 2, 3) (note that @(<array-literal>) already is special-cased - see #4280), while leaving any @() that involves a command and/or multiple statements to work as it does now.

The alternative is to simply make @() always return a list. This has the advantage of allowing the definition of lists as a series of individual expression statements (defining an element each), obviating the need for , in multiline definitions (in which case the line breaks take the place of the statement-separating ;). The down-side is that lists would be created in many situations where an array will do; while @(Get-ChildItem) is more convenient than @((Get-ChildItem)), creating a list in such a case strikes me as less important.

Again, it might be too risky, but it would solve the syntax problem.

That said, that alone wouldn't address the desire for explicit typing.

Perhaps the special casing could be tweaked to translate something like
@([string[]] (...)) into a List<string> instance.

The need for the inner (...) - due to operator precedence - makes this slightly awkward, however, and forgetting them can easily go unnoticed, because you quietly get [object[]].

On the other hand, explicit typing is a more advanced use case, and optimizing for the typical case is arguably more important.

@daxian-dbw
Copy link
Member

daxian-dbw commented Jan 5, 2018

The alternative is to simply make @() always return a list.

I talked to @jpsnover about this today and he also brought up changing the semantic of @() to return a list. The down-side is:

  1. the AST type name ArrayExpressionAst being inconsistent with the semantic and StaticType property.
  2. list is created in some situation you need an array, but powershell can convert List<object> to object[] implicitly, so this might not be an issue.

For (1), could it be OK to have this inconsistency?

@lzybkr
Copy link
Member

lzybkr commented Jan 5, 2018

The Ast type name doesn't matter that much.

There are many examples outside of PowerShell where the name can be misleading - ArrayList is a good one.

Lua is another good example - quoting from here.

Tables in Lua are not a data structure; they are the data structure. All structures that other languages offer---arrays, records, lists, queues, sets---are represented with tables in Lua. More to the point, tables implement all these structures efficiently.

@SteveL-MSFT SteveL-MSFT added the Issue-Enhancement the issue is more of a feature request than a bug label Jan 5, 2018
@SteveL-MSFT SteveL-MSFT added this to the 6.1.0-Consider milestone Jan 5, 2018
@SteveL-MSFT SteveL-MSFT added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Jan 5, 2018
@iSazonov
Copy link
Collaborator

iSazonov commented Jan 5, 2018

but powershell can convert List to object[] implicitly, so this might not be an issue.

If $a = @(1, 2, 3) define List then I'd expect that $a = $a + 4 or $a += 4 don't convert List to Array. We could $a.ToArray(). In the case we should add magic ToArray() to arrays too as we add magic Count, Length, Where() and ForEach().
Also I expect many customers will ask about typed lists like [int]@(1, 2, 3) or @[int](1, 2, 3).

@mklement0
Copy link
Contributor

mklement0 commented Jan 6, 2018

@iSazonov:

If $a = @(1, 2, 3) define List then I'd expect that $a = $a + 4 or $a += 4 don't convert List to Array.

Actually, I would expect that to work with instances of any type that implements the IList interface and therefore has an .Add(Object) method - irrespective of this specific issue; see #5805

Also I expect many customers will ask about typed lists like [int]@(1, 2, 3) or @[int](1, 2, 3)

While slightly awkward, as discussed, @([int[]] (1,2,3)) has the advantage of not introducing new syntax (only new semantics).

@KirkMunro
Copy link
Contributor

KirkMunro commented Jun 27, 2019

I don't get the drive error you get. Are you testing that in a session that defines a CommandNotFoundAction handler?

I suppose @: could work too. That mucks up the reference to the angry emoji though. 😠 🤣

I really like the notion that you can add a character to an enclosure prefix in your scripts and voilà, they'll use a more efficient data structure. That would be a very low cost performance enhancement for some scripts if the data structure was implemented properly with operator support for things like +=, etc.

@KirkMunro
Copy link
Contributor

KirkMunro commented Jun 27, 2019

Just to put another alternative on the table:

:(1,2,3,4)

That's shorter, but : could be a command (still, it would only be a breaking change if someone had that as a command and they invoked that command by passing arguments in using round brackets).

@vexx32
Copy link
Collaborator

vexx32 commented Jun 27, 2019

Nope, fresh PS7-preview1 session. 🤷‍♂

Yeah, could do, but then you lose the callback to @() a bit and the meaning is a little less clear, I feel?

@iSazonov
Copy link
Collaborator

iSazonov commented Jun 27, 2019

As Jason meantioned above lists is probably edge case for scripts - so no need to have a syntax suger for creating lists. Perhaps we could only enhance '+' (+=) operator to support lists and concurrent collections (other types?).
We could start with this and add syntax sugers later if we find compromise.

@vexx32
Copy link
Collaborator

vexx32 commented Jun 27, 2019

That's definitely a no-brainer; we need the + / += support for lists and similar.

The syntactic sugar would really be nice as well though 😊

@ili101
Copy link

ili101 commented Aug 31, 2020

Please consider adding support for + / +=.
Even if you ignore the performance benefit this is more natural to use, for example I just did something like this and was surprised by the error:

$MyArrayList = [System.Collections.ArrayList]@(0, 1, 3, 4)
$MyArrayList += 5
$MyArrayList.Insert(2, 2) # Exception calling "Insert" with "2" argument(s): "Collection was of a fixed size."

@vexx32
Copy link
Collaborator

vexx32 commented Aug 31, 2020

I think we have an existing issue for that specifically: #5805

It came up again recently as a duplicate, but my comment there still stands: #13152 (comment)

@SteveL-MSFT
Copy link
Member

SteveL-MSFT commented Oct 10, 2020

@daxian-dbw perhaps we can turn your @[] implementation w/ addition operator support as an experimental feature? As part of this, we can make the breaking change so that ] forces a new token as it seems like a bucket 3 breaking change and we can get real world feedback via experimental feature.

@vexx32
Copy link
Collaborator

vexx32 commented Oct 10, 2020

@SteveL-MSFT clarification point on that -- would @[] become another subexpression operator in that case to match @() and $() or would it be more akin to () in that line breaks within it aren't permitted?

@SteveL-MSFT
Copy link
Member

SteveL-MSFT commented Oct 16, 2020

@vexx32 good question, I suppose it should probably match @() so that hypothetically people could just search and replace in many cases as a replacement and get the benefits

@oising
Copy link
Contributor

oising commented Nov 30, 2020

Has anyone mentioned exposing a $pscollectionprefererence variable where would could override what collection type is used natively, for all array operators and pipeline output?

@SteveL-MSFT SteveL-MSFT modified the milestones: 7.2-Consider, 7.3-Consider Dec 7, 2020
@ghost
Copy link

ghost commented Dec 25, 2020

@KirkMunro

I'm late to this conversation, but so far in PowerShell, square brackets are always used for indices. For that reason I personally don't like @[] as an enclosure.

I'm late to but in PowerShell square brackets are also used for types: [Math]::Round(2.2). The same applies to ( and ) - they not only used for defining array via @(). I don't see any reason why square brackets should not use for the proposed feature.

I'm all for @[] implementation which matches @().

@SteveL-MSFT
Copy link
Member

SteveL-MSFT commented Oct 15, 2022

Reviving this. Rather than introduce a new language syntax, I think it would be simpler to just introduce the proposed [list] type accelerator which produces a [system.collections.generic.list[object]].

@iRon7
Copy link

iRon7 commented Oct 19, 2022

... If only that the implication of this syntactic sugar proposal will likely also support Constrained Language Mode.
See also: Mutual lists in Constrained Language Mode

@iSazonov
Copy link
Collaborator

iSazonov commented Oct 19, 2022

The type accelerator doesn't change behavior of related operators.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Oct 19, 2022

I meant to comment on this thread but apparently never did. Specifically, regarding making += work that would be the first instance of += actually mutating the object on the LHS rather than returning something new.

With regards to List<> specifically, I think we'd be adding another common pitfall if it sees widespread use. The mutability of List<> is partially why you don't see it a lot in public API surfaces. It's so easy to slip up in this regard there's even an instance of mutating the caller's list in one of our public APIs (#12928).

That said, I did prototype (two years ago apparently) a List<>-y implementation with comparable Add performance, but with some extra guards against mutation. Definitely not ready to be added as is, but it's an approach with considering.


As a side note on the topic of lists and their place in PowerShell, they aren't actually very frequently better than simply using $myCollection = foreach ($a in $b) { }. The engine handling it for you is almost always better, so the amount of use cases for List<> specifically aren't actually as high as they appear. Granted not always an option, I'm not saying scenarios like adding across named blocks don't exist, just that they are not as frequent.

@mklement0
Copy link
Contributor

mklement0 commented Oct 21, 2022

Good point that using list types is often not needed, @SeeminglyScience.
@iRon7 created a "canonical" answer on Stack Overflow that advises against += and preaches the gospel of statements and pipelines as expressions (where the engine automatically collects multiple outputs in an [object[]] array for you).

As for your "List<>-y" implementation. At least at first glance it sounds similar to what @PetSerAl - who agrees that += shouldn't mutate the LHS (as do I now) attempted in the context of Assigning the result of an addition (+ operator) with an IList LHS back to that LHS should preserve the list (collection) type (#5805), specifically here.

Implementing our own list type that plays nicely with += without sacrificing (lots of) performance would provide two additional benefits:

  • Potentially providing syntactic sugar for construction, as previously discussed (@[...] or ....) - though finding a consensus on the syntax may be challenging.
  • Being able to avoid the pitfall mentioned by @powercode, namely the List<T>'s native's .ForEach() method shadowing its intrinsic (engine-provided) counterpart.

The least-effort alternative, perhaps acceptable in light of the need for lists not being as pressing as it may seem, would be:

  • Simply provide type accelerators for existing list types, say [arraylist] and [list[object]] - though I'm not sure if the latter, with its generic parameter, fits into the current type-accelerator mechanism.
  • Consider their use an advanced use case and expect users to know how those types work and their pitfalls:
    • the need to use .Add(), and additionally for [arraylist], to suppress the usually unwanted return value.
    • that for [list[T]] .ForEach() isn't PowerShell's .ForEach() method
  • By contrast, [list], while easier to type, wouldn't readily reveal its relationship with the List<Object> type it represents, and potentially increase the risk of falling into the .ForEach() pitfall.
  • An unpleasant pitfall that applies to any type-literal / cast-based solution is that an array on the RHS must be (...)-enclosed, though using a type constraint avoids the problem:
using namespace System.Collections.Generic

$list = [List[int]] 1..10 # !! WRONG

$list = [List[int]] (1..10) # OK
[List[int]] $list = 1..10 # OK

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Oct 21, 2022

As for your "List<>-y" implementation. At least at first glance it sounds similar to what @PetSerAl - who agrees that += shouldn't mutate the LHS (as do I now) attempted in the context of Assigning the result of an addition (+ operator) with an IList LHS back to that LHS should preserve the list (collection) type (#5805), specifically here.

Hah! They beat me to it by two years, that's amazing. Thanks for the link ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision Issue-Enhancement the issue is more of a feature request than a bug WG-Language parser, language semantics
Projects
Developer Experience/SDK
  
Awaiting triage
Development

No branches or pull requests

X Tutup