Skip to content

feat(manifest): add manifest and index manifest support#87

Open
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate
Open

feat(manifest): add manifest and index manifest support#87
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate

Conversation

@lucasfang

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: No linked issue

This change adds manifest and index manifest support for Paimon C++.

Included changes:

  • Manifest core:

    • Adds ManifestFile, ManifestList, ManifestCommittable, and PartitionEntry classes for managing table manifests and partition metadata.
    • Adds test coverage for manifest file operations, manifest list handling, committable serialization, and partition entry logic.
  • Index manifest:

    • Adds IndexManifestEntry, IndexManifestEntrySerializer, IndexManifestFile, and IndexManifestFileHandler for index file tracking and serialization.
    • Adds test coverage for index manifest entry serialization, index manifest file I/O, and handler workflows.

Tests

Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.

Test coverage included in this change:

  • ManifestFileTest
  • ManifestListTest
  • ManifestCommittableTest
  • PartitionEntryTest
  • IndexManifestEntrySerializerTest
  • IndexManifestFileHandlerTest

API and Format

This change adds public API in src/paimon/core/manifest/.

No storage format or protocol changes.

Documentation

No documentation changes required.

Generative AI tooling

Migrate-by: Aone Copilot (Qwen-3.7-Max)

Copilot AI review requested due to automatic review settings June 16, 2026 07:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces C++ implementations and tests for reading/writing and merging Paimon manifest-related structures (manifest files, manifest lists, partition entries, index manifests, and committable messages).

Changes:

  • Add PartitionEntry and aggregation logic from ManifestEntry.
  • Add ManifestList and ManifestFile file abstractions with ORC/AVRO compatibility tests.
  • Add index manifest handling (IndexManifestFile, serializer, handler/combiner) and ManifestCommittable model + tests.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/paimon/core/manifest/partition_entry.h Defines PartitionEntry API for aggregating per-partition stats.
src/paimon/core/manifest/partition_entry.cpp Implements merge/convert/statistics logic for PartitionEntry.
src/paimon/core/manifest/partition_entry_test.cpp Adds unit tests for PartitionEntry merge behavior and time handling.
src/paimon/core/manifest/manifest_list.h Defines ManifestList abstraction and snapshot helper reads.
src/paimon/core/manifest/manifest_list.cpp Implements ManifestList creation and writing.
src/paimon/core/manifest/manifest_list_test.cpp Adds compatibility tests for reading/writing manifest lists.
src/paimon/core/manifest/manifest_file.h Defines ManifestFile abstraction for manifest entry I/O.
src/paimon/core/manifest/manifest_file.cpp Implements ManifestFile creation and rolling write logic.
src/paimon/core/manifest/manifest_file_test.cpp Adds compatibility tests for manifest entry reading.
src/paimon/core/manifest/manifest_committable.h Adds ManifestCommittable commit message container + formatting.
src/paimon/core/manifest/manifest_committable_test.cpp Adds unit tests for ManifestCommittable.
src/paimon/core/manifest/index_manifest_file_handler.h Declares index-manifest “combine then write” handler.
src/paimon/core/manifest/index_manifest_file_handler.cpp Implements combiners and index-manifest write logic.
src/paimon/core/manifest/index_manifest_file_handler_test.cpp Adds tests for bucketed/global combiner behavior and error cases.
src/paimon/core/manifest/index_manifest_file.h Defines index manifest file abstraction.
src/paimon/core/manifest/index_manifest_file.cpp Implements index manifest file creation and write delegation to handler.
src/paimon/core/manifest/index_manifest_entry.h Defines IndexManifestEntry schema + model utilities.
src/paimon/core/manifest/index_manifest_entry_serializer.h Declares serializer for index manifest entries.
src/paimon/core/manifest/index_manifest_entry_serializer.cpp Implements index manifest entry (de)serialization.
src/paimon/core/manifest/index_manifest_entry_serializer_test.cpp Adds serializer round-trip tests for index manifest entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#include "paimon/core/deletionvectors/deletion_vectors_index_file.h"
namespace paimon {

using BucketIdentifier = std::tuple<BinaryRow, int32_t, std::string>;
Comment on lines +33 to +37
std::unordered_map<BucketIdentifier, IndexManifestEntry> index_entries;
for (const auto& entry : prev_index_files) {
index_entries.emplace(
BucketIdentifier(entry.partition, entry.bucket, entry.index_file->IndexType()), entry);
}
Comment on lines +42 to +47
PartitionEntry PartitionEntry::Merge(const PartitionEntry& entry) const {
return PartitionEntry(
partition_, record_count_ + entry.record_count_,
file_size_in_bytes_ + entry.file_size_in_bytes_, file_count_ + entry.file_count_,
std::max(last_file_creation_time_, entry.last_file_creation_time_), entry.total_buckets_);
}
return Read(snapshot.DeltaManifestList(), /*filter=*/nullptr, manifests);
}

/// Return a `ManifestFileMeta` for each changelog manifest in this snapshot.
Comment on lines +110 to +120
Status ReadChangelogManifests(const Snapshot& snapshot,
std::vector<ManifestFileMeta>* manifests) const {
const std::optional<std::string>& changelog_manifest_list =
snapshot.ChangelogManifestList();
if (changelog_manifest_list) {
return Status::NotImplemented("do not support read changelog manifest list");
// return Read(changelog_manifest_list.value(), /*filter=*/nullptr, manifests);
} else {
return Status::OK();
}
}
Comment on lines +179 to +180
paimon::test::GetDataDir() + "orc/append_with_global_index.db/append_with_global_index/",
"manifest-list-2bccccf8-9f5e-48f2-b706-5b33f8c3bfc0-0", pool);
Comment on lines +21 to +27
#include <cstdint>
#include <memory>
#include <string>

#include "paimon/common/data/binary_row.h"
#include "paimon/core/index/index_file_meta.h"
#include "paimon/core/manifest/file_kind.h"
Comment on lines +62 to +66
std::string ToString() const {
return fmt::format("IndexManifestEntry{{kind={}, partition={}, bucket={}, indexFile={}}}",
kind.ToByteValue(), partition.ToString(), bucket,
index_file->ToString());
}
Comment on lines +70 to +71
static Result<std::unique_ptr<IndexManifestFileCombiner>> GetIndexManifestFileCombine(
const std::string& index_type, int32_t bucket_mode);
@lxy-9602

Copy link
Copy Markdown
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants