-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minmax function: (minitem, maxitem) in one pass #92720
Comments
|
See my comment in 2019. |
|
Here is the implementation of I have tried to match the existing API for |
|
|
The itertools module is for functions that produce iterators, not ones that consume them. If anything, min/max would go in the statistics module to be used along with quantiles and other summary statistics. That said, I don't think this should be in the standard library at all. I published a Python min/max recipe over a decade ago. It is mostly only a minor optimization (and cute trick), almost never a bottleneck in real code. What night be of more interest is a single-pass statistics module function that computed many statistics in a single pass: summary(data) -> count, total, mean, min, max, stdev, pstdev, ... However, it seems that the early design decisions for the statistics module was to opt for separate functions, to be applied in separate passes, using only pure python implementations, with efficiency being only of secondary concern. The decision was that for anything more advanced or optimized, a user should opt for scipy/numpy or other more powerful tooling. |
|
The main use case for a single pass minmax function would be iterators, since two separate passes over an iterator is not going to work :-)
But your point is taken. I withdraw the suggestion about itertools.
I don't think that a 25% speed up is a "minor optimization". But you are correct, it is unlikely to be an application bottleneck. And I don't think it is merely a cute trick, it is one of the algorithms given in "Introduction to Algorithms" by Cormen, Leiserson and Rivest. (The same Rivest of MD5 and RSA fame.)
single-pass statistics module function that computed many statistics in a single pass
That's an interesting idea. CAS calculators have similar functionality (although I don't know if they implement it in a single pass).
|
|
As for performance, the advantage over two calls of For iterators it can be written as functools.reduce(lambda x, y: (min(x[0], y), max(x[1], y)), iterator, (next(iterator),)*2)or functools.reduce(lambda x, y: ((y, x[1]) if y < x[0] else (x[0], y) if y > x[1] else x), iterator, (next(iterator),)*2)or functools.reduce(lambda x, y: sorted([*x, y])[::2], [1, 2, 3, 4, 5], (3, 3)) |
In reality, this is almost never needed. We do routinely consume iterators with min() or max(), but it is rare to need both and not also need some other information such as size.
Yes, I was just about to post this. And if the extreme values occur early in the dataset, there is essentially no relevant reduction in work. Everybody likes this cute recipe (that's why I posted it many years ago), but almost no one needs it (otherwise, I would have proposed it a decade ago). I think we should pass on this one. |
|
In Python a more direct approach seems to perform better: _object = object()
def minmax_oda(iterable, *args, key = None, default = _object):
iterable = (iterable, *args) if args else iterable
iterator = iter(iterable)
try:
minitem = maxitem = next(iterator)
except StopIteration:
if default is _object:
raise ValueError('minmax() arg is an empty sequence')
return default
if key is None:
for x in iterator:
if x > maxitem:
maxitem = x
elif x < minitem:
minitem = x
else:
maxval = minval = key(minitem)
for x in iterator:
if (val := key(x)) > maxval:
maxval, maxitem = val, x
elif val < minval:
minval, minitem = val, x
return (minitem, maxitem) |
It is possible to optimize the suggested implementation. The dream scenario would be to almost halve the time it takes to call
This would be a great idea. Even a Python implementation that calculates summary statistics in one pass may prove useful. The statistics module documentation says "[i]t is aimed at the level of graphing and scientific calculators" and as Steven said, "CAS calculators have similar functionality". Would the statistics module welcome any C implementations? The goal of the statistics module would not change: the aim would not be to compete with specialized libraries such as
I am still very much for having a one-pass However, I do also understand your points of views. |
|
I'm going to decline the suggestion, but @stevendaprano can reopen if he thinks it's important for the statistics module. Since we already have min() and max(), this is mainly being sold as a optimization. For data of a size when min() takes 60 seconds, numpy would likely be a much better choice. It is likely that other costs will dominate (i.e. loading the data), and it is likely that some other processing such as size, sum, etc will be needed. |
|
Thank you for your consideration @rhettinger. |
|
Thank you for the suggestion. |
@serhiy-storchaka @rhettinger Neither of you were talking about the solution looking at element pairs, were you? That does only 1.5 comparisons per element, instead of 2. Regardless of randomness and when the extremes occur. "need to perform two" is incorrect. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.


Feature or enhancement
A
minmaxfunction that computes and returns (as a 2-tuple) the minimum and maximum items of an iterable in one pass. It should have the same interface asminandmax.Pitch
Many applications require computing both the minimum and maximum items of an iterable. Python code that finds the min and max in one pass tends to be slower than the two-pass algorithm of applying both
minandmax. An implementation ofminmaxcan be written by making simple modifications tomin_maxinbltinmodule.c.Here are some benchmarks with
x = range(10_000_000)andf = lambda x: -x. The benefits of the one-pass approach are more pronounced when a key function is provided.In this graph,

x = range(n)wherenranges in[2 ** k for k in range(27)]. TheO(n)complexity of all four approaches is evident. However, a one-pass algorithm is obviously more efficient.Previous discussion
Possible Implementation
This is the implementation used for the above benchmarks. It was written using
min_maxfrombltinmodule.cas a basis. It looks long primarily because there are two virtually identicalwhileloops depending on whether a key function was provided or not. It has the same interface as theminandmaxbuilt-in functions.The text was updated successfully, but these errors were encountered: