Skip to content

Human in the loop#2443

Draft
IdirLISN wants to merge 3 commits into
developfrom
feature/HITL
Draft

Human in the loop#2443
IdirLISN wants to merge 3 commits into
developfrom
feature/HITL

Conversation

@IdirLISN

@IdirLISN IdirLISN commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

@ mention of reviewers

@acletournel
@ObadaS
@wlln
@Didayolo

A brief description of the purpose of the changes contained in this PR.

Human in the loop feature (HITL), enables organizers to add this option for their competitions
image

If this option is enabled, each submission will require a validation on compute worker side.
image

If no validation occurs, a time out will return a submission failed status
If the compute worker administrator validates the scoring file, the submission returns a scoring to codabench.

Issues this PR resolves

It secures the scoring process.

A checklist for hand testing

  • Create a submission and click on HITL button, see screenshot.
  • Submit and check compute worker logs.
  • Check the scoring file, path displayed in compute worker logs.
  • Choose to validate or not.
  • Check Codabench interface and make sure the score is returned or not in regards of the choice made in the last step

Caution

Compute worker changes.
The feature still need some polish.

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

@IdirLISN IdirLISN marked this pull request as draft June 24, 2026 12:58
@IdirLISN IdirLISN self-assigned this Jun 24, 2026
@Didayolo

Copy link
Copy Markdown
Member

@IdirLISN I find this feature a bit confusing.

Naming and placement

First of all, as shown on your screenshot, we have an option "Auto-run submission". When it is disabled, organizers need to validate submissions before sending them:

Capture d’écran 2026-06-26 à 13 24 25

I understand that this new feature is a variation of that where the validation happens at the compute worker level. However it is confusing to have a completely different naming and checkbox for a variation of the same feature.

We could, for instance, have an additionnal option when disabling "auto-run submissions" : "Validate submissions from website" VS "Validate submissions from compute worker", or something like this.

Also, the naming "Human in the loop" has different definitions among the community, but is usually used to refer to setup where the evaluation is done by human (not just the pre-run validation).

How does it work?

Question: concretely, how does the organizers receive and validate the submissions? Is it directly inside the compute worker through command line? How does it work if there are 10 workers in the queue? Did you document this?

@IdirLISN

IdirLISN commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

@Didayolo

This validation procedure is made to secure datasets when they are in compute worker side.
When the dataset provider doesn't want to give access to data and mount them inside of compute worker volume, we need to make sure the scoring file doesn't return anything which is not metrics and enable the dataset owner to verify the scoring content before sending it to the plateform.

The organizer just activate the option in the edit section of the competition and then there is someone from CW side who's going to validate the scoring before sending it.

As i progress through the feature i will elaborate the PR to make it clear.
About the name and the feature design, we should talk about it because i'm not the only one involved.

Thank you for your review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants