Enable NVFP4 RHT amax for grouped SReLU MLP#3133
Conversation
Signed-off-by: Siddhartha Raman <sraman@nvidia.com>
fa32e3b to
79def34
Compare
Greptile SummaryThis PR extends the NVFP4 RHT (Randomized Hadamard Transform) amax computation to the
Confidence Score: 5/5Safe to merge; the new SReLU hadamard path is gated behind both a cuDNN frontend version check and a runtime capability probe. The production path change is minimal — a renamed method accessor, a new version gate, and a small SReLU branch — all mirroring the existing SwiGLU hadamard implementation. The test extension correctly guards the quantization_tols call and the new dedicated test exercises the exact added path. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[fuser_forward] --> B{use_nvfp4_rht_amax?}
B -- No --> Z[grouped_gemm_activation_kernel]
B -- Yes --> C{activation_supports_hadamard?}
C -- No --> Z
C -- Yes --> D[kernel_getter = grouped_gemm_act_hadamard_kernel]
D --> E{kernel available?}
E -- No --> Z
E -- Yes --> F{activation_is_srelu?}
F -- Yes --> G[act_func = srelu]
F -- No --> H[act_func = _cudnn_act_func]
G --> I[grouped_gemm_act_hadamard_kernel]
H --> I
I --> J[_group_quantize_with_amax_for_grouped_mlp]
Z --> K[norm_const_tensor path]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[fuser_forward] --> B{use_nvfp4_rht_amax?}
B -- No --> Z[grouped_gemm_activation_kernel]
B -- Yes --> C{activation_supports_hadamard?}
C -- No --> Z
C -- Yes --> D[kernel_getter = grouped_gemm_act_hadamard_kernel]
D --> E{kernel available?}
E -- No --> Z
E -- Yes --> F{activation_is_srelu?}
F -- Yes --> G[act_func = srelu]
F -- No --> H[act_func = _cudnn_act_func]
G --> I[grouped_gemm_act_hadamard_kernel]
H --> I
I --> J[_group_quantize_with_amax_for_grouped_mlp]
Z --> K[norm_const_tensor path]
Reviews (6): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile |
vthumbe1503
left a comment
There was a problem hiding this comment.
LGTM mostly except CUDNN guard update that I think is needed.
| """Fused grouped GEMM activation kernel that also emits NVFP4 RHT amaxes.""" | ||
| try: | ||
| from cudnn import ( | ||
| grouped_gemm_glu_hadamard_wrapper_sm100, |
There was a problem hiding this comment.
Do we need new cudnn version for supporting srelu in this kernel? If so, we should update it.
|
/te-ci pytorch |
Co-authored-by: vthumbe1503 <vthumbe@nvidia.com> Signed-off-by: Siddhartha Raman Sundara Raman <sraman@nvidia.com>
Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
Signed-off-by: Siddhartha Raman Sundara Raman <sraman@nvidia.com>
a076f41 to
6ce5259
Compare
|
/te-ci pytorch |
Set default tolerance values for quantization checks. Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
|
/te-ci pytorch |
for more information, see https://pre-commit.ci
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: