Skip to content

feat(io): add VendedCredentialsProvider for S3 credential refresh#3507

Open
gabeiglio wants to merge 2 commits into
apache:mainfrom
gabeiglio:vended-credentials-provider
Open

feat(io): add VendedCredentialsProvider for S3 credential refresh#3507
gabeiglio wants to merge 2 commits into
apache:mainfrom
gabeiglio:vended-credentials-provider

Conversation

@gabeiglio

Copy link
Copy Markdown
Contributor

Introduces the credential provider class and the set_credentials_provider hook on the FileIO base class. No FileIO implementation is wired yet.

Rationale for this change

First PR to implement automatic vended credential refresh in PyIceberg.

See epic issue here

Are these changes tested?

Yes, created test_credentials_provider.py

Are there any user-facing changes?

Yes, users will have support for new config properties like

client.refresh-credentials-endpoint
client.refresh-credentials-enabled

@gabeiglio gabeiglio closed this Jun 15, 2026
@gabeiglio gabeiglio deleted the vended-credentials-provider branch June 15, 2026 12:47
@gabeiglio gabeiglio restored the vended-credentials-provider branch June 15, 2026 12:48
@gabeiglio gabeiglio reopened this Jun 15, 2026
Comment thread pyiceberg/io/__init__.py
Args:
provider (VendedCredentialsProvider): A concrete type of VendedCredentialsProvider (e.g S3VendedCredentialsProvider)
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a TODO and a pass in here? That should be enough to get rid of the noqa: B027. I believe that will be enough to get rid of that type exception.

I actually have no issues with assigning the provider to an internal variable (that isn't yet used)

credentials: list[StorageCredential] = Field(alias="storage-credentials")


class VendedCredentialsProvider:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems to be handling two different situations:

  • A generic provider for handling vended credentials
  • An implementation that handles for S3.

Can we split up the first part into a ABC and then the second part as an implementation?

@kevinjqliu

Copy link
Copy Markdown
Contributor

We can reuse the LoadCredentialsResponse components from #3499 now that its merged 😄 .

I think it'll be a good idea to keep separate concerns between the Rest Catalog, FileIO, and object storage credentials.
Here's how I'm thinking about it:

  • RestCatalog: owns REST/auth/session concerns.
    It should parse inline credentials, know how to call /credentials, and create a credential provider. It should not make PyArrow/Fsspec-specific decisions.

  • CredentialsProvider: owns credential refresh and path lookup.
    A good Python contract is something like properties_for(location) -> Properties. It can keep the full StorageCredential list, refresh when needed, and do longest-prefix matching internally. This avoids duplicating prefix logic in every FileIO.

  • FileIO: owns file access and backend cache behavior.
    PyArrow/Fsspec should ask the provider for credential properties for the actual file location, merge those into local backend construction properties, and invalidate/rebuild cached filesystems when returned credentials change.

  • Cloud-specific helpers: own property validation/mapping.
    S3 expiry fields and required key/token checks belong in an S3 helper/provider path, not in generic catalog logic.

What do you think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants