TensorFlow Serving

Version v0.2 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.

Training and serving using TFJob

Serving a model

We treat each deployed model as a component in your APP.

Generate Tensorflow model server component

ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME}

Depending where model file is located, set correct parameters

Google cloud

ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}


To use S3, first you need to create secret that will contain access credentials.

apiVersion: v1
  name: secretname
kind: Secret

Enable S3, set url and point to correct Secret

ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}
ks param set ${MODEL_COMPONENT} s3Enable True
ks param set ${MODEL_COMPONENT} s3SecretName secretname

Optionally you can also override default parameters of S3

# S3 region
ks param set ${MODEL_COMPONENT} s3AwsRegion us-west-1

# true Whether or not to use https for S3 connections
ks param set ${MODEL_COMPONENT} s3UseHttps true

# Whether or not to verify https certificates for S3 connections
ks param set ${MODEL_COMPONENT} s3VerifySsl true

# URL for your s3-compatible endpoint.
ks param set ${MODEL_COMPONENT} s3Endpoint http://s3.us-west-1.amazonaws.com


ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}
ks param set ${MODEL_COMPONENT} modelStorageType ${MODEL_STORAGE_TYPE}
ks param set ${MODEL_COMPONENT} nfsPVC ${NFS_PVC_NAME}

Deploy the model component. Ksonnet will pick up existing parameters for your environment (e.g. cloud, nocloud) and customize the resulting deployment appropriately

ks apply ${KF_ENV} -c ${MODEL_COMPONENT}

As before, a few pods and services have been created in your cluster. You can get the inception serving endpoint by querying kubernetes:

kubectl get svc inception -n=${NAMESPACE}
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)          AGE
inception   LoadBalancer   ww.xx.yy.zz   9000:30936/TCP   28m

In this example, you should be able to use the inception_client to hit ww.xx.yy.zz:9000

The model at gs://kubeflow-models/inception is publicly accessible. However, if your environment doesn’t have google cloud credential setup, TF serving will not be able to read the model. See this issue for example. To setup the google cloud credential, you should either have the environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login. See doc for more detail.

Telemetry using Istio

Please look at the Istio guide.

Logs and metrics with Stackdriver

See here for instructions to get logs and metrics using Stackdriver.

Request logging

It currently supports streaming to BigQuery.


Logging the requests and responses enables log analysis, continuous training, and skew detection.


Create the Bigquery dataset D and table T under your project P. The schema should also be set.

ks pkg install kubeflow/tf-serving
ks generate tf-serving-request-log mnist --gcpProject=P --dataset=D --table=T

Modify tf-serving-with-request-log.jsonnet as needed: - change the param of http proxy for logging, e.g. --request_log_prob=0.1 (Default is 0.01).

ks apply ENV -c mnist

Start sending requests, and the fluentd worker will stream them to Bigquery.

Next steps:

  1. Support different backends other than Bigquery
  2. Support request id (so that the logs can be joined). Issue.
  3. Optionally logs response and other metadata. We probably need a log config other than just sampling probability.