fix(server): avoid loading huge task results for metadata queries#3060
fix(server): avoid loading huge task results for metadata queries#3060contrueCT wants to merge 2 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3060 +/- ##
============================================
- Coverage 36.10% 29.90% -6.20%
+ Complexity 338 264 -74
============================================
Files 803 804 +1
Lines 68291 68382 +91
Branches 8970 8982 +12
============================================
- Hits 24656 20452 -4204
- Misses 40990 45564 +4574
+ Partials 2645 2366 -279 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: Found a scheduler API compatibility regression and a failing targeted test. Evidence: TaskCoreTest#testTaskWithoutResult failed locally; codecov/project is also failing.
🔗 Please also check the failed codecov/project status: https://app.codecov.io/gh/apache/hugegraph/pull/3060
There was a problem hiding this comment.
Thanks, the current head looks directionally good and preserves the default GET /tasks/{id} behavior. Could you please add or clarify a bit more regression coverage before merge?
Two areas seem worth covering:
- A regression case with a genuinely large/compressed task result, verifying metadata-only paths such as
GET /tasks, task restore/scheduler loops, andwith_result=falsedo not load/decompresstask_result. - The distributed task/result delete path: please either make the
~task+~taskresultcleanup failure-safe/atomic, or add a targeted test/explanation for the current two-step delete path. SinceTaskTransactionauto-commits afterremoveTaskVertex(), a failure between deleting~taskresultand deleting~taskcould otherwise leave a live task whose result has already been removed.
Minor checklist note: with_result is a new query parameter, so the PR checklist/docs status may need to reflect that public API surface, even if the runtime OpenAPI annotation already exposes it.
| HugeTask<V> result = HugeTask.fromVertex(vertex); | ||
| this.tx().removeVertex(vertex); | ||
| HugeTask<V> result = HugeTask.fromVertex(vertex, false); | ||
| this.deleteTaskResultFromTx(id); |
There was a problem hiding this comment.
Please make the cleanup expectation explicit here. This path now deletes ~taskresult before deleting ~task, but TaskTransaction/TaskAndResultTransaction run with auto-commit and removeTaskVertex() commits in afterWrite(). If the second delete fails after this call succeeds, we can leave a live task whose result has already been removed. Could you either make the two removals failure-safe/atomic, or add a targeted test/explanation showing why this intermediate state is acceptable/recoverable?
|
|
||
| HugeTask<?> taskWithResult = scheduler.task(id, true); | ||
| Assert.assertEquals("\"metadata-result\"", taskWithResult.result()); | ||
|
|
There was a problem hiding this comment.
This currently proves the withResult switch for a small result, but it does not reproduce the original failure mode: loading/decompressing a large historical task result. Please add a regression case with a genuinely large/compressed result and assert that metadata-only reads (task(id, false), tasks(..., false), and ideally restore/scheduler metadata paths) do not touch task_result.
| @PathParam("id") long id) { | ||
| @PathParam("id") long id, | ||
| @Parameter(description = "Whether to load task result") | ||
| @DefaultValue("true") |
There was a problem hiding this comment.
Since this introduces a new public query parameter, please update the PR checklist/docs status accordingly, or briefly explain why the generated OpenAPI annotation is sufficient and no user-facing docs are needed.
Purpose of the PR
Fix task metadata/recovery paths that can still load and decompress large task results, making
GET /tasks, task restore, andDELETE /tasks/{id}?force=trueunusable when historical tasks contain huge result payloads.This is related to #3057 and #3059. The PR focuses on metadata-only access and cleanup paths, while large Gremlin result export/chunking can be handled separately.
Main Changes
~task_result.GET /tasks/{id}default behavior compatible, and addwith_result=false.GET /tasks, task restore, and delete paths use metadata-only reads.GraphTransaction.removeVertex()force-loading indexed task vertices.~taskresultvertices when cleaning distributed task results.withResult=false.Verifying these changes
mvn test -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=TaskCoreTest -DfailIfNoTests=false -Dcheckstyle.skip=true -Drat.skip=truemvn compile -pl hugegraph-server/hugegraph-api -am -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=truemvn test-compile -pl hugegraph-server/hugegraph-test -am -P api-test,rocksdb -Dtest=TaskApiTest -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=trueDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need