Skip to content

Feature Store Iceberg Properties#5685

Open
alexyoung13 wants to merge 17 commits intoaws:masterfrom
alexyoung13:youngag/iceberg-properties
Open

Feature Store Iceberg Properties#5685
alexyoung13 wants to merge 17 commits intoaws:masterfrom
alexyoung13:youngag/iceberg-properties

Conversation

@alexyoung13
Copy link
Copy Markdown

Description of changes:

NOTE: Based off of BassemHalim:feature-store-lakeformation @ commit d21ca67ab723cf5fcef9e6e1090efcd643e1ded3

Design

We will not be making any changes to the sagemaker core package as this code is autogenerated based off Feature Store APIs. This means it will be overwritten if we are not careful with maintenance. We will be making all our changes in the mlops package instead. In here we will be making a new class FeatureGroupManager that will extend the FeatureGroup class from the sagemaker core package. In the extended class we will create a new input type called IcebergProperties and overload 3 core functions and create 2 new helper functions.

IcebergProperties type

This new type takes in a wrapper for a Dict[str, str] that also includes some validation of the keys to make sure they are a part of our validated list.

class IcebergProperties(Base):
    """Configuration for Iceberg table properties in a Feature Group offline store."""
        properties: Optional[Dict[str, str]] = None

Overloaded functions

 @classmethod
    def get(
        cls,
        *args,
        include_iceberg_properties: bool = False,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Get a FeatureGroup resource with optional Iceberg property retrieval."""

       #Will get a FG given it's name and If a new include_iceberg_properties flag is set, then it will also 
       #add the iceberg parameters to the response
@classmethod
    def create(
        cls,
        *args,
        lake_formation_config: Optional[LakeFormationConfig] = None,
        iceberg_properties: Optional[IcebergProperties] = None,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Create a FeatureGroup resource with optional Lake Formation governance and Iceberg properties."""

      #Creates a FG by calling the super method, and then once the FG is created will call a helper 
      #function to set specific Iceberg parameters in the customer's offline store
def update(
        self,
        *args,
        iceberg_properties: Optional[IcebergProperties] = None,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Update a FeatureGroup resource with optional Iceberg property updates."""

      #Updates a FG by calling the super method, and then once the FG is updated will call a helper 
      #function to set specific Iceberg parameters in the customer's offline store

Helper functions

def _get_iceberg_properties(
        self,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
    ) -> Dict[str, any]:
        """Fetch the current Glue table definition for the Feature Group's Iceberg offline store."""

        #Validates that the Feature Group has an Iceberg-formatted offline store,
        #retrieves the Glue table, and strips non-TableInput fields. Will uses a session 
        #and region for a user to create a glue client and get the glue catalog of a customer's iceberg properties
def _update_iceberg_properties(
        self,
        iceberg_properties: IcebergProperties,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
    ) -> Dict[str, any]:
        """Update Iceberg table properties for the Feature Group's offline store."""

        #This method updates the Glue table properties for an Iceberg-formatted
        #offline store. The Feature Group must have an offline store configured
        #with table_format='Iceberg'. Will call _get_iceberg_properties to get the glue 
        #catalog table of the iceberg properties and then will use transactions to write the new 
        #values of each iceberg property passed

Security considerations

  • Allow list validation — This allow list is a list of Properties we guarantee compatibility with. Because this is an SDK change, the customer can obviously change the allow list to whatever they want. However, if they do this we can no longer confirm our service will work with their offline store, so that will have to be a risk they are willing to accept.
  • Glue catalog access — The helper functions create a Glue client using the customer's session/credentials and modify the customer's own Glue catalog table. No cross-account access occurs. Permissions required: glue:GetTable, glue:UpdateTable.

Usage

Create FG with Iceberg Properties

fg = FeatureGroupManager.create(
    #...Other Params...
    offline_store_config=OfflineStoreConfig(
        s3_storage_config=S3StorageConfig(s3_uri="s3://my-bucket/features/"),
        table_format="Iceberg", #Must have iceberg table to add iceberg_properties
    ),
    iceberg_properties=IcebergProperties(
        properties={
            "write.target-file-size-bytes": "536870912",
            "history.expire.min-snapshots-to-keep": "3",
        }
    )
)

Update existing FG with Iceberg Properties

fg = FeatureGroupManager.get(feature_group_name="my-feature-group")
fg.update(
    iceberg_properties=IcebergProperties(
        properties={
            "write.target-file-size-bytes": "268435456",
            "write.delete.mode": "merge-on-read",
        }
    ),
)

Get a FG's icebergProperties

fg = FeatureGroupManager.get(
    feature_group_name="my-feature-group",
    include_iceberg_properties=True,
)
print(fg.iceberg_properties.properties)  # e.g. {"write.target-file-size-bytes": "536870912"}

adishaa and others added 17 commits January 16, 2026 07:00
- Add LakeFormationConfig class to configure Lake Formation governance on offline stores
- Implement FeatureGroup subclass with Lake Formation integration capabilities
- Add helper methods for S3 URI/ARN conversion and Lake Formation role management
- Add S3 deny policy generation for Lake Formation access control
- Implement Lake Formation resource registration and S3 bucket policy setup
- Add integration tests for Lake Formation feature store workflows
- Add unit tests for Lake Formation configuration and policy generation
- Update feature_store module exports to include FeatureGroup and LakeFormationConfig
- Update API documentation to include Feature Store section in sagemaker_mlops.rst
- Enable fine-grained access control for feature store offline stores using AWS Lake Formation
Replace 10 bare print() calls with a single logger.info() call for the
S3 deny policy output in enable_lake_formation(). This makes the policy
display consistent with the rest of the LF workflow which uses logger.

Update 12 tests to mock the logger instead of builtins.print.

---
X-AI-Prompt: replace print with logger.info for s3 bucket policy display in enable_lake_formation
X-AI-Tool: kiro-cli
Rename the mlops FeatureGroup class to FeatureGroupManager to
distinguish it from the core FeatureGroup base class. Update all
references in unit and integration lake formation tests. Fix missing
comma in __init__.py __all__ list.
---
X-AI-Prompt: rename FeatureGroup to FeatureGroupManager and update lakeformation tests
X-AI-Tool: kiro-cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants