Human in the loop by IdirLISN · Pull Request #2443 · codalab/codabench

IdirLISN · 2026-06-24T11:40:47Z

@ mention of reviewers

A brief description of the purpose of the changes contained in this PR.

Human in the loop feature (HITL), enables organizers to add this option for their competitions

If this option is enabled, each submission will require a validation on compute worker side.

If no validation occurs, a time out will return a submission failed status
If the compute worker administrator validates the scoring file, the submission returns a scoring to codabench.

Issues this PR resolves

It secures the scoring process.

A checklist for hand testing

Create a submission and click on HITL button, see screenshot.
Submit and check compute worker logs.
Check the scoring file, path displayed in compute worker logs.
Choose to validate or not.
Check Codabench interface and make sure the score is returned or not in regards of the choice made in the last step

Caution

Compute worker changes.
The feature still need some polish.

Checklist

Didayolo · 2026-06-26T11:29:26Z

@IdirLISN I find this feature a bit confusing.

Naming and placement

First of all, as shown on your screenshot, we have an option "Auto-run submission". When it is disabled, organizers need to validate submissions before sending them:

I understand that this new feature is a variation of that where the validation happens at the compute worker level. However it is confusing to have a completely different naming and checkbox for a variation of the same feature.

We could, for instance, have an additionnal option when disabling "auto-run submissions" : "Validate submissions from website" VS "Validate submissions from compute worker", or something like this.

Also, the naming "Human in the loop" has different definitions among the community, but is usually used to refer to setup where the evaluation is done by human (not just the pre-run validation).

How does it work?

Question: concretely, how does the organizers receive and validate the submissions? Is it directly inside the compute worker through command line? How does it work if there are 10 workers in the queue? Did you document this?

IdirLISN · 2026-06-29T08:40:34Z

@Didayolo

This validation procedure is made to secure datasets when they are in compute worker side.
When the dataset provider doesn't want to give access to data and mount them inside of compute worker volume, we need to make sure the scoring file doesn't return anything which is not metrics and enable the dataset owner to verify the scoring content before sending it to the plateform.

The organizer just activate the option in the edit section of the competition and then there is someone from CW side who's going to validate the scoring before sending it.

As i progress through the feature i will elaborate the PR to make it clear.
About the name and the feature design, we should talk about it because i'm not the only one involved.

Thank you for your review :)

IdirLISN requested review from Didayolo, ObadaS, acletournel and wlln June 24, 2026 11:41

IdirLISN marked this pull request as draft June 24, 2026 12:58

IdirLISN self-assigned this Jun 24, 2026

IdirLISN added 2 commits June 29, 2026 15:24

HITL enabled and working, needs polish and test

0ccc352

disable HITL for public CWs

8b79a61

IdirLISN force-pushed the feature/HITL branch from 753c2b6 to 8b79a61 Compare June 29, 2026 13:24

HITL secute on public CW + UI improvement

a0fe2a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Human in the loop#2443

Human in the loop#2443
IdirLISN wants to merge 3 commits into
developfrom
feature/HITL

IdirLISN commented Jun 24, 2026 •

edited

Loading

Uh oh!

Didayolo commented Jun 26, 2026

Uh oh!

IdirLISN commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

IdirLISN commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@ mention of reviewers

A brief description of the purpose of the changes contained in this PR.

Issues this PR resolves

A checklist for hand testing

Checklist

Uh oh!

Didayolo commented Jun 26, 2026

Naming and placement

How does it work?

Uh oh!

IdirLISN commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IdirLISN commented Jun 24, 2026 •

edited

Loading

IdirLISN commented Jun 29, 2026 •

edited

Loading