feat(manifest): add manifest and index manifest support#87
Open
lucasfang wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces C++ implementations and tests for reading/writing and merging Paimon manifest-related structures (manifest files, manifest lists, partition entries, index manifests, and committable messages).
Changes:
- Add
PartitionEntryand aggregation logic fromManifestEntry. - Add
ManifestListandManifestFilefile abstractions with ORC/AVRO compatibility tests. - Add index manifest handling (
IndexManifestFile, serializer, handler/combiner) andManifestCommittablemodel + tests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/manifest/partition_entry.h | Defines PartitionEntry API for aggregating per-partition stats. |
| src/paimon/core/manifest/partition_entry.cpp | Implements merge/convert/statistics logic for PartitionEntry. |
| src/paimon/core/manifest/partition_entry_test.cpp | Adds unit tests for PartitionEntry merge behavior and time handling. |
| src/paimon/core/manifest/manifest_list.h | Defines ManifestList abstraction and snapshot helper reads. |
| src/paimon/core/manifest/manifest_list.cpp | Implements ManifestList creation and writing. |
| src/paimon/core/manifest/manifest_list_test.cpp | Adds compatibility tests for reading/writing manifest lists. |
| src/paimon/core/manifest/manifest_file.h | Defines ManifestFile abstraction for manifest entry I/O. |
| src/paimon/core/manifest/manifest_file.cpp | Implements ManifestFile creation and rolling write logic. |
| src/paimon/core/manifest/manifest_file_test.cpp | Adds compatibility tests for manifest entry reading. |
| src/paimon/core/manifest/manifest_committable.h | Adds ManifestCommittable commit message container + formatting. |
| src/paimon/core/manifest/manifest_committable_test.cpp | Adds unit tests for ManifestCommittable. |
| src/paimon/core/manifest/index_manifest_file_handler.h | Declares index-manifest “combine then write” handler. |
| src/paimon/core/manifest/index_manifest_file_handler.cpp | Implements combiners and index-manifest write logic. |
| src/paimon/core/manifest/index_manifest_file_handler_test.cpp | Adds tests for bucketed/global combiner behavior and error cases. |
| src/paimon/core/manifest/index_manifest_file.h | Defines index manifest file abstraction. |
| src/paimon/core/manifest/index_manifest_file.cpp | Implements index manifest file creation and write delegation to handler. |
| src/paimon/core/manifest/index_manifest_entry.h | Defines IndexManifestEntry schema + model utilities. |
| src/paimon/core/manifest/index_manifest_entry_serializer.h | Declares serializer for index manifest entries. |
| src/paimon/core/manifest/index_manifest_entry_serializer.cpp | Implements index manifest entry (de)serialization. |
| src/paimon/core/manifest/index_manifest_entry_serializer_test.cpp | Adds serializer round-trip tests for index manifest entries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include "paimon/core/deletionvectors/deletion_vectors_index_file.h" | ||
| namespace paimon { | ||
|
|
||
| using BucketIdentifier = std::tuple<BinaryRow, int32_t, std::string>; |
Comment on lines
+33
to
+37
| std::unordered_map<BucketIdentifier, IndexManifestEntry> index_entries; | ||
| for (const auto& entry : prev_index_files) { | ||
| index_entries.emplace( | ||
| BucketIdentifier(entry.partition, entry.bucket, entry.index_file->IndexType()), entry); | ||
| } |
Comment on lines
+42
to
+47
| PartitionEntry PartitionEntry::Merge(const PartitionEntry& entry) const { | ||
| return PartitionEntry( | ||
| partition_, record_count_ + entry.record_count_, | ||
| file_size_in_bytes_ + entry.file_size_in_bytes_, file_count_ + entry.file_count_, | ||
| std::max(last_file_creation_time_, entry.last_file_creation_time_), entry.total_buckets_); | ||
| } |
| return Read(snapshot.DeltaManifestList(), /*filter=*/nullptr, manifests); | ||
| } | ||
|
|
||
| /// Return a `ManifestFileMeta` for each changelog manifest in this snapshot. |
Comment on lines
+110
to
+120
| Status ReadChangelogManifests(const Snapshot& snapshot, | ||
| std::vector<ManifestFileMeta>* manifests) const { | ||
| const std::optional<std::string>& changelog_manifest_list = | ||
| snapshot.ChangelogManifestList(); | ||
| if (changelog_manifest_list) { | ||
| return Status::NotImplemented("do not support read changelog manifest list"); | ||
| // return Read(changelog_manifest_list.value(), /*filter=*/nullptr, manifests); | ||
| } else { | ||
| return Status::OK(); | ||
| } | ||
| } |
Comment on lines
+179
to
+180
| paimon::test::GetDataDir() + "orc/append_with_global_index.db/append_with_global_index/", | ||
| "manifest-list-2bccccf8-9f5e-48f2-b706-5b33f8c3bfc0-0", pool); |
Comment on lines
+21
to
+27
| #include <cstdint> | ||
| #include <memory> | ||
| #include <string> | ||
|
|
||
| #include "paimon/common/data/binary_row.h" | ||
| #include "paimon/core/index/index_file_meta.h" | ||
| #include "paimon/core/manifest/file_kind.h" |
Comment on lines
+62
to
+66
| std::string ToString() const { | ||
| return fmt::format("IndexManifestEntry{{kind={}, partition={}, bucket={}, indexFile={}}}", | ||
| kind.ToByteValue(), partition.ToString(), bucket, | ||
| index_file->ToString()); | ||
| } |
Comment on lines
+70
to
+71
| static Result<std::unique_ptr<IndexManifestFileCombiner>> GetIndexManifestFileCombine( | ||
| const std::string& index_type, int32_t bucket_mode); |
Contributor
|
+1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: No linked issue
This change adds manifest and index manifest support for Paimon C++.
Included changes:
Manifest core:
ManifestFile,ManifestList,ManifestCommittable, andPartitionEntryclasses for managing table manifests and partition metadata.Index manifest:
IndexManifestEntry,IndexManifestEntrySerializer,IndexManifestFile, andIndexManifestFileHandlerfor index file tracking and serialization.Tests
Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.
Test coverage included in this change:
ManifestFileTestManifestListTestManifestCommittableTestPartitionEntryTestIndexManifestEntrySerializerTestIndexManifestFileHandlerTestAPI and Format
This change adds public API in
src/paimon/core/manifest/.No storage format or protocol changes.
Documentation
No documentation changes required.
Generative AI tooling
Migrate-by: Aone Copilot (Qwen-3.7-Max)