Skip to content

fix(server): avoid loading huge task results for metadata queries#3060

Open
contrueCT wants to merge 2 commits into
apache:masterfrom
contrueCT:task/avoid-loading-task-result
Open

fix(server): avoid loading huge task results for metadata queries#3060
contrueCT wants to merge 2 commits into
apache:masterfrom
contrueCT:task/avoid-loading-task-result

Conversation

@contrueCT

Copy link
Copy Markdown
Contributor

Purpose of the PR

Fix task metadata/recovery paths that can still load and decompress large task results, making GET /tasks, task restore, and DELETE /tasks/{id}?force=true unusable when historical tasks contain huge result payloads.

This is related to #3057 and #3059. The PR focuses on metadata-only access and cleanup paths, while large Gremlin result export/chunking can be handled separately.

Main Changes

  • Add scheduler APIs that allow task reads with or without result payloads.
  • Add metadata-only task deserialization that avoids reading ~task_result.
  • Keep GET /tasks/{id} default behavior compatible, and add with_result=false.
  • Make GET /tasks, task restore, and delete paths use metadata-only reads.
  • Add lightweight task vertex deletion to avoid GraphTransaction.removeVertex() force-loading indexed task vertices.
  • Avoid loading separated ~taskresult vertices when cleaning distributed task results.
  • Return a result-stripped copy for in-memory tasks when withResult=false.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn test -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=TaskCoreTest -DfailIfNoTests=false -Dcheckstyle.skip=true -Drat.skip=true
    • mvn compile -pl hugegraph-server/hugegraph-api -am -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=true
    • mvn test-compile -pl hugegraph-server/hugegraph-test -am -P api-test,rocksdb -Dtest=TaskApiTest -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=true

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. api Changes of API bug Something isn't working labels Jun 17, 2026
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 59.84252% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.90%. Comparing base (c3f56b5) to head (47fb2f9).

Files with missing lines Patch % Lines
.../apache/hugegraph/task/TaskAndResultScheduler.java 0.00% 26 Missing ⚠️
...g/apache/hugegraph/task/StandardTaskScheduler.java 70.37% 7 Missing and 1 partial ⚠️
.../main/java/org/apache/hugegraph/task/HugeTask.java 86.84% 2 Missing and 3 partials ⚠️
...pache/hugegraph/task/DistributedTaskScheduler.java 0.00% 4 Missing ⚠️
...ava/org/apache/hugegraph/task/TaskTransaction.java 76.47% 1 Missing and 3 partials ⚠️
.../org/apache/hugegraph/auth/HugeGraphAuthProxy.java 57.14% 3 Missing ⚠️
...ain/java/org/apache/hugegraph/api/job/TaskAPI.java 80.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c3f56b5) and HEAD (47fb2f9). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (c3f56b5) HEAD (47fb2f9)
3 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3060      +/-   ##
============================================
- Coverage     36.10%   29.90%   -6.20%     
+ Complexity      338      264      -74     
============================================
  Files           803      804       +1     
  Lines         68291    68382      +91     
  Branches       8970     8982      +12     
============================================
- Hits          24656    20452    -4204     
- Misses        40990    45564    +4574     
+ Partials       2645     2366     -279     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: Found a scheduler API compatibility regression and a failing targeted test. Evidence: TaskCoreTest#testTaskWithoutResult failed locally; codecov/project is also failing.


🔗 Please also check the failed codecov/project status: https://app.codecov.io/gh/apache/hugegraph/pull/3060

imbajin

This comment was marked as outdated.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 19, 2026

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the current head looks directionally good and preserves the default GET /tasks/{id} behavior. Could you please add or clarify a bit more regression coverage before merge?

Two areas seem worth covering:

  1. A regression case with a genuinely large/compressed task result, verifying metadata-only paths such as GET /tasks, task restore/scheduler loops, and with_result=false do not load/decompress task_result.
  2. The distributed task/result delete path: please either make the ~task + ~taskresult cleanup failure-safe/atomic, or add a targeted test/explanation for the current two-step delete path. Since TaskTransaction auto-commits after removeTaskVertex(), a failure between deleting ~taskresult and deleting ~task could otherwise leave a live task whose result has already been removed.

Minor checklist note: with_result is a new query parameter, so the PR checklist/docs status may need to reflect that public API surface, even if the runtime OpenAPI annotation already exposes it.

HugeTask<V> result = HugeTask.fromVertex(vertex);
this.tx().removeVertex(vertex);
HugeTask<V> result = HugeTask.fromVertex(vertex, false);
this.deleteTaskResultFromTx(id);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the cleanup expectation explicit here. This path now deletes ~taskresult before deleting ~task, but TaskTransaction/TaskAndResultTransaction run with auto-commit and removeTaskVertex() commits in afterWrite(). If the second delete fails after this call succeeds, we can leave a live task whose result has already been removed. Could you either make the two removals failure-safe/atomic, or add a targeted test/explanation showing why this intermediate state is acceptable/recoverable?


HugeTask<?> taskWithResult = scheduler.task(id, true);
Assert.assertEquals("\"metadata-result\"", taskWithResult.result());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently proves the withResult switch for a small result, but it does not reproduce the original failure mode: loading/decompressing a large historical task result. Please add a regression case with a genuinely large/compressed result and assert that metadata-only reads (task(id, false), tasks(..., false), and ideally restore/scheduler metadata paths) do not touch task_result.

@PathParam("id") long id) {
@PathParam("id") long id,
@Parameter(description = "Whether to load task result")
@DefaultValue("true")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this introduces a new public query parameter, please update the PR checklist/docs status accordingly, or briefly explain why the generated OpenAPI annotation is sufficient and no user-facing docs are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Changes of API bug Something isn't working lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants