Build Ml Pipeline Once Deploy Everywhere

There is a certain appeal to building a machine learning pipeline once and deploying everywhere. Now often this refers to pipelines which are built via batch, and deployed as a batch, an api or via a stream; in this post, I thought I’ll explore what it may mean if we build a Python pipeline once and deploy over node.

Advantages

Easier integration into the node ecosystem - we no longer have to worry about mixing languages, nor are we necessarily tied to a pure python server-side stack. Now of course this isn’t really true in the first place, but some of the complexity might now be removed
Speed to deployment - if tooling and build system is correct, we should be able to easily move things along directly into the node ecosystem.

Disadvantages

Transcompiling woes - how would you debug Python that generated Javascript code?
Performance - we would be

Approach

The approach to do this can be found here. The general gist of it is that we are:

Using transcrypt to convert plain Python objects to JavaScript
Using tensorflow.js to handle things like converting TensorFlow models to JavaScript, allowing us to easily use these mature libraries in a sensible fashion. Afterall why would we want to rebuilt ML algorithms from scratch!

In a typically ML pipeline, you would want to

Perform some feature manipulation, e.g. convert text to a numeric vector
Convert categorical to one hot encoding

and many other things. In the Python ecosystem, we typically leverage scikit-learn to facilitate with this. However, as scikit-learn relies heavily on C bindings for performance reasons, this may prove difficult to port in an automated way. To resolve this, we may need to write a “pure python” re-implementation to get these benefits (maybe a mini-project for the future).

Once these features are constructed, we can use them to train the model, and also ensure we can recover the parameters which have been learnt. Once this is complete, we can freeze it using transcrypt

python -m transcrypt -b -p .none -n $(py)

When working with node, we also may need to convert it (or use experimental modules); here we used babel to suit our needs

npx babel __target__/$(py).js --out-file $(py).js

Finally when we put it all together, we need to make sure everything can fit together. In this case, the pattern I used was to have a code template in the following manner:

'use strict';
const tf = require('@tensorflow/tfjs')
require('@tensorflow/tfjs-node')

{{ my_pipeline }}

tf_predict = """async function predict(data){
    // Define a model.
    const model = await tf.loadModel('file:///path/to/model.json');
    model.predict(tf.tensor2d(data, [1, data[0].length])).print();
}
var pred = predict(X_transformed)
console.log(pred)

This approach has worked well so far for the use cases that I have