In my study of Deep Learning, there have been a lot of times that I’ve created something cool but had no way to show it to anyone. I found it the hard way that not many people believe in a complex system like Speech Recognition looking at the loss graph and the output of some test data.

The problems I faced: I built a great AI model, finetuning it for weeks. I tried to show it to other people but I had to.

  • Load google colab
  • Connect it to google drive
  • Copy and load the model
  • Wait…

That is enough to kill a man’s enthusiasm .

I copied the colab code into a server, built a flask API around it and tried to run it. It then showed the kind of tensorflow errors that I’d never seen before. Running tensorflow graphs and trying to build an API around it in the same program didn’t work that easily. One workaround was to run the AI system and show the output in the terminal. That output would then be taken by another program for further processing. That worked for me but didn’t seem like an elegant soulution.

Upon further research I found about tensorflow serving. It serves a tensorflow model as a REST server. And It works pretty well.
For simple cases, tf.keras.save_model() works fine but not if you have a little complex things going inside your model.

Prerequisites: Tensorflow, Docker
You need not know much about docker but a good bit about tensorflow.

Step 1:

The first step is to save the model.

  • Initially save the model using tf.keras.save_model() and if it doesn’t work.
  • Trace graphs from your tensorflow model. It varies according to use-cases but the documentation is pretty good.
    A Code example:
      callable = tf.function(
      concrete_function = callable.get_concrete_function(......), signatures=concrete_function)

    (This is how I once saved a BERT model)

Now that the model is saved, next is to serve it.

Step 2:

  • Tensorflow serving
    $ docker pull tensorflow/serving
  • Run it:
    $ docker run -p 8501:8501 \
    --mount type=bind,source=/path/to/my_model/,target=/models/my_model \
    -e MODEL_NAME=my_model -t tensorflow/serving

Is it done?
Well no, You’ve only served a model that outputs logits or probability distribution of something. That, you’ll have to interpret according to your requirement. So, you’d need to build an app around it. (You could use something other than python but you might need to use tensorflow again for post-processing of those logits). Now, the app just needs to send a HTTP request to port 8501.

If it’s a cats/dogs classification, don’t directly send the picture of a cat. Instead, read the documentation.

And just like that you can have a chatbot



I’m currently working on Emotion Recognition with Landmark Detection and hopefully I’ll have enough motivation to deploy it and release the code accompanying this post.