In this youtube video it demonstrates Tecton’s API for working with the feature store. What can we glean and learn from it?

@pyspark_transformation(inputs=sources.clicks)
def count_distinct_users(input_df):
    # do stuff

distinct_users = Feature(
    name="distinct_users",
    transformation=count_distinct_users,
    entities=entites.ad,
    materialization=MaterializationConfig(
        schedule_interval="1d",
        backfill_start_date="2020-01-01"
    )
)

A terraform like interface is used to apply the defined feature pipeline configuration:

# I presume there are steps like "publish" and then apply - or apply is `git push`
tecton apply
# 1. importing...
# 2. collecting feature declarations
# 3. performing server side validation
# starting plan...
# 4. feature applied!

A separate interface is used to serve the features - these are the group of features to use for training dataset

model_features = FeatureService(
    name="model_featrues",
    description="feature service for model ABC",
    features = [
        Feature(...), # as defined above...
        Feature(...),
        ...
    ]
)

# implies that it only uses event dataset only
# tecton just requires a list of historical events - it appears it will need to be pre-denormalised
training_values = model_features.get_features(events_dataset)

How does this link to the previous post? It suggests that the ideal way to present an enterprise feature store is to only use event based data and ignore state or SCD data!

More on this later!