Triton Server——Model Serving

Triton Server官方文档可以参考,https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md

其支持的模型种类,参考,https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#model-files

triton要求使用torchscript(可以认为是tensorflow v1),而非eager,参考,https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html,因此在存储模型的时候要注意一下

每个模型需要一个配置文件来描述其输入输出:配置文件config.pbtxt的参考文档,https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md

config.pbtxt的一个简单示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
name: "simple"
platform: "pytorch_libtorch"
max_batch_size: 4
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 2 ]
}
]
output [
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]

模型结构假设为

1
2
3
4
5
6
7
8
9
10
11
12
in_dim, hidden_dim, out_dim = 2, 4, 1
hp = json.load(open('hp.json', 'r'))

model = nn.Sequential(
nn.Linear(in_dim, hidden_dim),
nn.GELU(),
nn.Dropout(hp["dropout"]),
nn.Linear(hidden_dim, hidden_dim),
nn.GELU(),
nn.Dropout(hp["dropout"]),
nn.Linear(hidden_dim, out_dim)
)

训练好模型之后,通过python代码将模型与配置文件都存储到S3中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
session = boto3.session.Session()

s3_client = session.client(
service_name='s3',
aws_access_key_id='XXX',
aws_secret_access_key='XXX',
endpoint_url='<http://10.105.222.7:24850>',
)
print(s3_client.list_buckets())

buffer = io.BytesIO()
model_scripted = script(model)
torch.jit.save(model_scripted, buffer)
s3_client.put_object(Bucket="mlflow-artifact", Key='manual/test/model_repository/simple/1/model.pt',
Body=buffer.getvalue())
with open('config.pbtxt', 'r') as f:
s3_client.put_object(Bucket="mlflow-artifact", Key='manual/test/model_repository/simple/config.pbtxt',
Body=f.read())

之后通过triton server连接S3访问模型文件夹,先试试docker

1
2
3
4
5
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \\
-e AWS_ACCESS_KEY_ID=XXX \\
-e AWS_SECRET_ACCESS_KEY=XXX \\
nvcr.io/nvidia/tritonserver:22.05-py3 tritonserver \\
--model-repository=s3://http://10.105.222.7:24850/mlflow-artifact/manual/test/model_repository

Docker上没问题后,使用seldon core部署triton server到k8s集群中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: triton-simple-model
namespace: seldon
spec:
name: triton-simple
predictors:
- graph:
implementation: TRITON_SERVER
modelUri: s3://mlflow-artifact/manual/test/model_repository
name: classifier
name: default
replicas: 1
# 【这个很重要,必须要加,对应了values中的镜像】
protocol: v2

部署完成后,验证

1
curl triton-simple-model-default-classifier:9000/v2/models/simple/config

输出为

1
{"name":"simple","platform":"pytorch_libtorch","backend":"pytorch","version_policy":{"latest":{"num_versions":1}},"max_batch_size":4,"input":[{"name":"input","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[2],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false}],"output":[{"name":"OUTPUT__0","data_type":"TYPE_FP32","dims":[1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"simple","kind":"KIND_CPU","count":1,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.pt","cc_model_filenames":{},"metric_tags":{},"parameters":{},"model_warmup":[]}

验证模型推断

1
2
curl 10.111.154.79:9000/v2/models/simple/infer -d \\
'{"inputs":[{"name":"input","datatype":"FP32","shape":[1,2],"data":[[2,3]]}]}'

模型输出为

1
{"model_name":"simple","model_version":"1","outputs":[{"name":"OUTPUT__0","datatype":"FP32","shape":[1,1],"data":[15.969637870788575]}]}

Triton Server——Model Serving
https://fffffaraway.github.io/2022/06/19/triton-server/
Author
Song Wei
Posted on
June 19, 2022
Licensed under