Let’s think for a moment what a data scientist needs; they need to determine a particular feature for an instance at a point in time to determine a prediction.

Naturally this leads to data modelling patterns which are most suited to point in time calculations. However this is a tremendously difficult problem, as there would be lots and lots of data redunduncy if implemented naively. In fact I believe there isn’t a simple way to achieve this. Nevertheless, let’s work backwards - start from what we want, and see what it looks like.

What a data scientist wants

We want to be able to determine what a feature looks like at a point in time

Why do we want to know something at a point in time

So that we can determine what an instance looks like at a point in time. Another compelling reason is to determine whether an instance has changed so that an event driven approach can be used to trigger downstream processes.


Okay, so the compelling reason is to determine something from an event driven perspective! With this mindset, if the data incoming is transactional like, then the computation of the feature store and updating it becomes much more straightforward. Basically:

  1. An event or transactional update occurs
  2. Update features associated with events

This kind of pattern can be used to easily calculate deltas (though perhaps with a lot of redundancy). This approach also aligns well with lambda architecture ideals.