X Tutup
Skip to content

WIP: Experimenting with rounding vs truncating RGB values#31134

Draft
ayshih wants to merge 3 commits intomatplotlib:mainfrom
ayshih:round_cmap
Draft

WIP: Experimenting with rounding vs truncating RGB values#31134
ayshih wants to merge 3 commits intomatplotlib:mainfrom
ayshih:round_cmap

Conversation

@ayshih
Copy link
Contributor

@ayshih ayshih commented Feb 11, 2026

PR summary

This PR is just for testing for now. Per #20459 (comment), the conversion of RGB values from floats to uint8 is not handled consistently:

  • rgb2hex() rounds RGB values (this is used by the SVG backend for some artists)
  • Colormap.__call__(..., bytes=True) truncates RGB values (this is used when colorizing data arrays)

I want to see how many image-comparison tests break when changing one to match the other.

Update:
The score is:

  • Changing Colormap.__call__(..., bytes=True) to round RGB values breaks at least 58 tests
  • Changing rgb2hex() to truncate RGB values breaks 12 tests, all SVG

From a technical perspective, I'd lean toward rounding, but that would mean updating a lot more baseline images than going with truncating. Either way, it might make sense to lump the change into #31021 to avoid updating a bunch of baseline images twice.

PR checklist

@timhoffm
Copy link
Member

As a note / side question: Is there a measureable performance difference? Some colormap design decisions in the history have been performance-based since colormaps handle many values. Probably negligable, but worth a thought before going in one direction.

@anntzer
Copy link
Contributor

anntzer commented Feb 11, 2026

A few additional notes that may be relevant:

  • Colormap already quantize colors (by default to 256 values image.lut = 256 values; I'm not even sure it makes sense to change this value which is fixed at import time), so it may make sense to have these quantized values align exactly to values representable as uint8, because that's what will usually end up being used anyways... but see second point.
  • Currently matplotlib only supports 8-bit color depth (in each channel); however there is interest in supporting higher bitdepths (in particular floating point buffers), see e.g. Transparency, color mixing, gamma & linear color space ? #5949; I have implemented support for these in mplcairo.

I'd say rgb2hex should round rgb values, though I don't have a very strong opinion there.

@ayshih
Copy link
Contributor Author

ayshih commented Feb 11, 2026

Is there a measureable performance difference?

Not really: the run times stay within the typical fluctuation range. I've timed make_image() for different image sizes (NxN) and different interpolation stages ('data' vs 'rgba'). interpolation_stage='rgba' applies the colormap before resampling, which is quite noticeable for the 3000x3000 array, but even then, the additional rounding doesn't appreciably slow things down.

Colormap truncating

N = 30
data: 7.8 ms ± 170 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
rgba: 13.7 ms ± 247 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

N = 300
data: 7.98 ms ± 227 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
rgba: 18.6 ms ± 299 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

N = 3000
data: 39.4 ms ± 452 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
rgba: 425 ms ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Colormap rounding

N = 30
data: 8.1 ms ± 299 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
rgba: 13.5 ms ± 138 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

N = 300
data: 7.99 ms ± 284 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
rgba: 17.6 ms ± 568 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

N = 3000
data: 40.7 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
rgba: 431 ms ± 15.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@timhoffm
Copy link
Member

  • Colormap already quantize colors (by default to 256 values image.lut = 256 values; I'm not even sure it makes sense to change this value which is fixed at import time)

The quantization is done at import-time. You can set a different value via matplotlibrc and then get higher sampling. There is a need for higher sampling(e.g. #22728). However, for ListedColormaps, this only results in linear interploation between the list elements. LinearSegmentedColormaps can by construction be interpolated arbitrarily. But currently do not have a way to sample a colormap from a functional relationship at a requested resolution (e.g. turbo).

Colormapping speed should not depend on the size of the LUT. So the only parameter that goes up with the LUT is the memory size. Currently we have 86 colormaps x 3 channels x 8byte (note: the LUT uses float), which results in 0.5MB storage. A factor 2-4 would be bearable for most modern systems, but OTOH we sould stive to keep our footprint low. Deferred colormap creation #30578 could help with that - but only if we don't put all raw data in a single python file.

@ayshih
Copy link
Contributor Author

ayshih commented Feb 12, 2026

Ha, whoops, my timing tests of making a single image were silly. They basically just affirm that the one-time cost of generating the uint8 RGB values for the LUT is negligible compared to the rest of the image processing, which scales with image size. I guess the question is whether there's a realistic use case of Colormap() being called a large number of times for a single figure.

@timhoffm
Copy link
Member

Not sure I understand your argument/questions. Instantiating colormaps via Colormap() is rare - basically all up-front cost at import-time when populating the ColormapRegistry. There's no other Colormap() calls. We currently sometimes copy colormaps, but that just copying the LUT and cheap enough (we are additionally working towards immutable colormaps, so that copies are rarely necessary).

Colormap creation is not performance critical, and we still have means to improve that. The question with rounding is about evaluation. And thinking about it, the difference is essentially a shift of 0.5 of the data (or lut), so still cheap.

Semi-OT: I noted that colormaps by default evaluate on the float-lut. They only discretize to uint8 if cmap(..., bytes=True). Incidentally, this is used for images, but not for collections 👀.

@ayshih
Copy link
Contributor Author

ayshih commented Feb 12, 2026

Sorry, when I wrote Colormap(), I was not referring to object instantiation, but rather to calling __call__(..., bytes=True) on an Colormap object. While the LUT with float RGB values is generated at import time, the LUT with uint8 RGB values – as currently written – is dynamically computed on every Colormap.__call__(..., bytes=True) call (namely, when colorizing an image). So, simply changing that one line to include a rounding step as I did in the first commit here – without adding any caching – means that a tiny performance hit is added to every image colorized, including every redraw cycle of any image artist. I think for any realistic use case this would still be a negligible hit because actually making use of the LUT, not to mention resampling, are far more expensive for non-trivial images.

One highly unrealistic case is if someone chose to use a gigantic number of imshow() calls, one image per input data pixel, as a brute-force alternative to pcolor(). It might be possible to detect a miniscule slowdown of Colormap.__call__(..., bytes=True) given a huge number of image artists. However, that'd be crazy, and regardless, precomputing+caching the uint8 LUT would still prevent any performance hit from added rounding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

X Tutup