Add automatic Downstream Evaluation to Modalities - Simplified Implementation by CYHSM · Pull Request #455 · Modalities/modalities

CYHSM · 2026-06-19T14:25:27Z

What does this PR do?

This PR is the bare version of #450 which works with a converted HF checkpoint. For this to work we need to first merge the conversion code from #424 and then merge this on top.

In #450 it integrates the conversion + the evaluation but @le1nux suggested to first merge @BlueCrescent PR. For more details see original PR.

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

le1nux

I think this is a good draft but we need to consider a few architectural / workflow adaptations for large-scale setups.
When we train large scale models the required training GPUs exceed the required eval GPUs by far. Also the training typically utilized most of the GPU memory, leaving no memory for evals potentially.

That's why I would not include the downstream eval into the training / eval loop of modalities and decouple this instead.

HF checkpoint conversion and downstream eval should run on dedicated, non-training nodes as new checkpoints are written out during the training.

So a better way in my opinion would be to have a CP conversion and eval service running in the background, polling e.g., every 60s for new checkpoints including explicit sentinel.

CP conversion and downstream eval can be CMD entry points within modalities.

le1nux · 2026-06-24T10:01:40Z

    mfu_calculator: PydanticMFUCalculatorABCType | None = None
    scheduled_pipeline: PydanticPipelineType | None = None
    device_mesh: PydanticDeviceMeshIFType | None = None
+    downstream_evaluator: Optional[PydanticDownstreamEvaluatorType] = None


Suggested change

downstream_evaluator: Optional[PydanticDownstreamEvaluatorType] = None

downstream_evaluator: PydanticDownstreamEvaluatorType | None = None

le1nux · 2026-06-24T10:35:41Z

        )
+
+
+class DownstreamEvaluator:


maybe we should have this implement an EvaluationIF interface

le1nux · 2026-06-24T10:46:22Z

+        cmd = self.olmes_command_template.format(
+            hf_model_dir=str(hf_model_dir),


I think running eval on the same node(s) as the training is unrealistic. Typically especially for larger models GPU memory headroom for evaluation is rather limited.

le1nux · 2026-06-24T10:48:35Z

+            logger.error(f"Failed to launch downstream evaluation: {e}")
+
+    def wait_for_evaluations(self) -> None:
+        if not hasattr(self, "active_processes") or not self.active_processes:


why so complicated? It's an instance variable.

Suggested change

if not hasattr(self, "active_processes") or not self.active_processes:

if len(self.active_processes) == 0:

le1nux · 2026-06-24T10:59:39Z

+        if getattr(components, "downstream_evaluator", None) is not None:
+            print_rank_0("\n" + "="*80)
+            print_rank_0("Training loop complete! Waiting for background evaluations to finish...")
+            print_rank_0("="*80 + "\n")
+            components.downstream_evaluator.wait_for_evaluations()
+            print_rank_0("All background evaluations completed successfully!")
+


as written in the overall feedback, I think we should exclude evaluation from the eval loop.

minimal changes for olmes integration

2e1421e

le1nux requested changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add automatic Downstream Evaluation to Modalities - Simplified Implementation#455

Add automatic Downstream Evaluation to Modalities - Simplified Implementation#455
CYHSM wants to merge 1 commit into
Modalities:mainfrom
CYHSM:downstream-evaluator-simple

CYHSM commented Jun 19, 2026

Uh oh!

le1nux left a comment

Uh oh!

le1nux Jun 24, 2026

Uh oh!

le1nux Jun 24, 2026

Uh oh!

le1nux Jun 24, 2026

Uh oh!

le1nux Jun 24, 2026

Uh oh!

le1nux Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	downstream_evaluator: Optional[PydanticDownstreamEvaluatorType] = None
	downstream_evaluator: PydanticDownstreamEvaluatorType \| None = None

		cmd = self.olmes_command_template.format(
		hf_model_dir=str(hf_model_dir),

	if not hasattr(self, "active_processes") or not self.active_processes:
	if len(self.active_processes) == 0:

		)


		class DownstreamEvaluator:

Uh oh!

Conversation

CYHSM commented Jun 19, 2026

What does this PR do?

Checklist before submitting final PR

Uh oh!

le1nux left a comment

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants