Feast - its an open source feature store for machine learning.

At this point you might be asking me why are we thinking about this project? Is it dead on arrival? Instead of directly answering this, we’ll talk through some of the design goals in Feast, and where I perceive to be the gaps and deficiencies which we should think about.

Feature Store Design

In Feast, the feature store is described as three “layers”. It is presented as

graph LR; A[Stream] --> D B[Warehouse] --> D C[Files] --> D D[Feast] --> E[Train Model] D --> F[Serve Model] subgraph b[Data Sources] A B C end subgraph c[Feature Data] D end subgraph d[Production] E F end style b fill:#aec7e8,stroke:#000,color:#FFF style c fill:#aec7e8,stroke:#000,color:#FFF style d fill:#FFF,stroke:#000,color:#FFF style A fill:#b4cbc5,stroke:#000,color:#FFF style B fill:#b4cbc5,stroke:#000,color:#FFF style C fill:#b4cbc5,stroke:#000,color:#FFF style D fill:#b4cbc5,stroke:#000,color:#FFF style E fill:#FFF,stroke:#000,color:#FFF style F fill:#FFF,stroke:#000,color:#FFF

If we place this back into our lambda architecture diagram, we can begin to see some patterns

graph LR; A1[New Data] --> A A1 --> B A1 --> C A[Stream] --> D B[Warehouse] --> D C[Files] --> D D[Feast] --> E[Train Model] D --> F[Serve Model] subgraph a[Speed Layer] A end subgraph b[Batch Layer] B C end subgraph c[Serving Layer] D end subgraph d[Application] E F end style a fill:#aec7e8,stroke:#000,color:#FFF style b fill:#aec7e8,stroke:#000,color:#FFF style c fill:#aec7e8,stroke:#000,color:#FFF style d fill:#FFF,stroke:#000,color:#FFF style A1 fill:#FFF,stroke:#000,color:#FFF style A fill:#b4cbc5,stroke:#000,color:#FFF style B fill:#b4cbc5,stroke:#000,color:#FFF style C fill:#b4cbc5,stroke:#000,color:#FFF style D fill:#b4cbc5,stroke:#000,color:#FFF style E fill:#FFF,stroke:#000,color:#FFF style F fill:#FFF,stroke:#000,color:#FFF

Okay, so based on this model, what does Feast presume? As alluded, one approach is to simply assumptions. Infact Feast makes use of a pull model in order to satisfy the demands of a real-time and batch processing as part of the serving layer in the infrastructure!

Implicit in other aspects of Feast, is that the incoming data is already appropriately modelled and suitable for ingestion. If the data being provided isn’t denormalized; unfortunatley Feast can’t really “deal” with this input. This is made more explicit when we consider the Lambda Architecture diagram.

What are the Gaps

One of the key goals of Project Maquette is to create something truly end-to-end without fancy infrastructure. Even considering Feast, there are still gaps in the lack of support for normalized data store (we’ve assumed that the data being ingested is already denormalized).

What should our software applications consist of?

To start off with let’s see how far we can get using standard libraries only. This would mean that we will rely on:

  • sqlite3

For the serving and storage of data. For the analytics purposes at a later stage, we may make use of Pandas, though the serving of data may be made easier through using a webframework, we’ll also consider how far we can get without those dependencies.

Afterall wouldn’t it be interesting if we could serve a working feature store using out of the box Python?