-
Notifications
You must be signed in to change notification settings - Fork 107
Expand file tree
/
Copy pathGL_EXT_float8_e5m2_e4m3.txt
More file actions
201 lines (135 loc) · 7.61 KB
/
GL_EXT_float8_e5m2_e4m3.txt
File metadata and controls
201 lines (135 loc) · 7.61 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
Name
EXT_float8_e5m2_e4m3
Name Strings
GL_EXT_float_e5m2
GL_EXT_float_e4m3
Contact
Jeff Bolz, NVIDIA (jbolz 'at' nvidia.com)
Contributors
Status
Draft.
Version
Last Modified: June 4, 2025
Revision: 1
Dependencies
This extension can be applied to OpenGL GLSL versions 4.50
(#version 450) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.20
(#version 320) and higher.
This extension is written against the OpenGL Shading Language
Specification, version 4.60.8, dated August 14, 2023.
Overview
This extension introduces two new 8-bit floating point types known as "e4m3"
and "e5m2". The names describe how many exponent and mantissa bits are in
the type. These types are primarily intended for machine learning usage and
only a few operations are supported, including loading from and storing to
memory, conversion to/from other types, and optionally cooperative matrix
support.
Mapping to SPIR-V
-----------------
For informational purposes (non-normative), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs:
floate5m2_t -> OpTypeFloat 8 FPEncodingFloat8E5M2EXT
floate4m3_t -> OpTypeFloat 8 FPEncodingFloat8E4M3EXT
floate5m2BitsToIntEXT,
floate5m2BitsToUintEXT,
intBitsToFloate5m2EXT,
uintBitsToFloate5m2EXT,
floate4m3BitsToIntEXT,
floate4m3BitsToUintEXT,
intBitsToFloate4m3EXT,
uintBitsToFloate4m3EXT -> OpBitcast
saturatedConvertEXT -> Op*Convert with SaturatedToLargestFloat8NormalConversionEXT
Modifications to the OpenGL Shading Language Specification, Version 4.60
Including the following lines in a shader can be used to control the
language features described in this extension:
#extension GL_EXT_float_e5m2 : <behavior>
#extension GL_EXT_float_e4m3 : <behavior>
where <behavior> is as specified in section 3.3.
GL_EXT_float_e5m2 must be enabled to use the floate5m2_t types.
GL_EXT_float_e4m3 must be enabled to use the floate4m3_t types.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_EXT_float_e5m2 1
#define GL_EXT_float_e4m3 1
Modify Section 3.6, Keywords
(add to list of keywords)
floate5m2_t fe5m2vec2 fe5m2vec3 fe5m2vec4
floate4m3_t fe4m3vec2 fe4m3vec3 fe4m3vec4
Modify Section 4.1, Basic Types
Add entries to the table of Transparent Types:
floate5m2_t a floating-point scalar with a leading sign bit, 5 exponent bits, and 2 mantissa bits
fe5m2vec2 a two-component floate5m2_t vector
fe5m2vec3 a three-component floate5m2_t vector
fe5m2vec4 a four-component floate5m2_t vector
floate4m3_t a floating-point scalar with a leading sign bit, 4 exponent bits, and 3 mantissa bits
fe4m3vec2 a two-component floate4m3_t vector
fe4m3vec3 a three-component floate4m3_t vector
fe4m3vec4 a four-component floate4m3_t vector
Modify Section 4.1.10, Implicit Conversions
Add the following implicit conversions:
Type of expression Can be implicitly converted to
floate5m2_t bfloat16_t, float16_t, float32_t, float64_t
floate4m3_t bfloat16_t, float16_t, float32_t, float64_t
(and all corresponding vector promotions)
Modify Section 5.4.1, Conversion and Scalar Constructors
Add constructors for floate5m2_t and floate4m3_t to convert to and from
every other floating point and integer type. There are no conversion to or
from boolean types.
Modify Section 5.9, Expressions
Arithmetic operators are not supported on floate5m2_t, floate4m3_t, or
vectors of those types.
Modify Chapter 8, Built-In Functions
Add "genFE5M2Type" as an alias for floate5m2_t, fe5m2vec2, fe5m2vec3, fe5m2vec4.
Add "genFE4M3Type" as an alias for floate4m3_t, fe4m3vec2, fe4m3vec3, fe4m3vec4.
Add bitcast functions:
genI8Type floate5m2BitsToIntEXT(genFE5M2Type value);
genU8Type floate5m2BitsToUintEXT(genFE5M2Type value);
genFE5M2Type intBitsToFloate5m2EXT(genI8Type value);
genFE5M2Type uintBitsToFloate5m2EXT(genU8Type value);
genI8Type floate4m3BitsToIntEXT(genFE4M3Type value);
genU8Type floate4m3BitsToUintEXT(genFE4M3Type value);
genFE4M3Type intBitsToFloate4m3EXT(genI8Type value);
genFE4M3Type uintBitsToFloate4m3EXT(genU8Type value);
Add functions to do clamped conversions to the new types. These are
equivalent to constructing the result type from the source value, except
out of range values are replaced with the largest representable normal
value of the same sign.
void saturatedConvertEXT(out genFE5M2Type result, genF16Type value);
void saturatedConvertEXT(out genFE5M2Type result, genBF16Type value);
void saturatedConvertEXT(out genFE5M2Type result, genFType value);
void saturatedConvertEXT(out genFE5M2Type result, genDType value);
void saturatedConvertEXT(out genFE4M3Type result, genF16Type value);
void saturatedConvertEXT(out genFE4M3Type result, genBF16Type value);
void saturatedConvertEXT(out genFE4M3Type result, genFType value);
void saturatedConvertEXT(out genFE4M3Type result, genDType value);
// valid if the result type is constructible from the value type
void saturatedConvertEXT(out coopmat result, coopmat value);
Modify Section 8.X, Cooperative Matrix Functions
Add overloads for cooperative matrix load and store functions:
void coopMatLoad(out coopmat m, volatile coherent floate5m2_t[] buf, uint element, uint stride, int matrixLayout);
void coopMatLoad(out coopmat m, volatile coherent fe5m2vec2[] buf, uint element, uint stride, int matrixLayout);
void coopMatLoad(out coopmat m, volatile coherent fe5m2vec4[] buf, uint element, uint stride, int matrixLayout);
void coopMatLoad(out coopmat m, volatile coherent floate4m3_t[] buf, uint element, uint stride, int matrixLayout);
void coopMatLoad(out coopmat m, volatile coherent fe4m3vec2[] buf, uint element, uint stride, int matrixLayout);
void coopMatLoad(out coopmat m, volatile coherent fe4m3vec4[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out floate5m2_t[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out fe5m2vec2[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out fe5m2vec4[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out floate4m3_t[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out fe4m3vec2[] buf, uint element, uint stride, int matrixLayout);
void coopMatStore(coopmat m, volatile coherent out fe4m3vec4[] buf, uint element, uint stride, int matrixLayout);
Modify Chapter 9, Shading Language Grammar for Core Profile
(Add to token list)
FLOATE5M2_T FE5M2VEC2 FE5M2VEC3 FE5M2VEC4
FLOATE4M3_T FE4M3VEC2 FE4M3VEC3 FE4M3VEC4
(Add to type_specifier_non_array)
FLOATE5M2_T FE5M2VEC2 FE5M2VEC3 FE5M2VEC4
FLOATE4M3_T FE4M3VEC2 FE4M3VEC3 FE4M3VEC4
Issues
(1) Why no conversions to/from boolean types?
RESOLVED: glslang uses OpFUnordNotEqual to convert float->bool, and this is
not supported for the new types.
Revision History
Revision 1
- Internal revisions.