Pytorch 模型部署I--Torch2ONNX

深度学习框架部署中，TF1部署工具链最为成熟，而同样流行的Pytorch部署频频受挫，让本Pytorch忠实拥护者极度烦恼。Torch最近版本中更新了torchscript的用法，然而无论是jit还是trace，标准transformer模型都无法正确转换成torchscript。当然也有可能是我写的transformer有问题，但更多的原因，来自Pytorch本身的动态graph导致无法直接实现类似于keras/tf1 model的 model.summary()功能。

另外，嵌入式端口（如手机，车机等）深度学习模型的部署，一般可以用腾讯NCNN。而TF1转NCNN也并非那么简单。（扯远了

基于规范ONNX模型转化NCNN的工具较为成熟，Pytorch本身也有ONNX的转化工具，本文尝试简略写一下pytorch转onnx的步骤，包括写pytorch 模型时的注意事项。

正文开始之前说一件悲惨的事情：其实到现在还无法搞通Transformer模型部署。如果只想要网络部署，学习TF1的代价其实不算大。当然文中写模型的注意事项，对编写pytorch模型也深有益处～

参考链接：
Pytorch 官方文档系列：torch.onnx，Pytorch2onnx and using onnxruntime

ONNX 模型压缩和可视化：onnx-simplifier，可视化 Netron

onnx and onnxruntime

安装onnx 和 onnxruntime

1	pip install onnx onnxruntime

代码实现

注意事项：

所有network modules（class）中用到的layer函数，必须使用标准写法：在init(self,*args)中定义，在forward(*args)中调用
尽量不要使用unsqueeze和squeeze, 利用view替代；注意：view函数需要保证tensor变量是连续的，此时可以使用.contiguous(), 例如：target_batch.contiguous().view(-1))。平时写代码时，可以直接用reshape或者transpose。-
更新一下，onnx支持squeeze, unsqueeze, view, reshape, transpose。但是onnx-simplifier暂时不要写view,expand,repeat比较好。
forward中不要出现其他自定义函数。

其实上面的这些点，都源于pytorch是动态图，只有在调用时，才会形成一个graph。因此无论是jit, trace 还是 onnx.export，都需要保证拥有输入，才能形成完整静态图。
标准写法应当在日常中就严格执行，否则会导致存储模型不完整——我们可以发现 model.state_dict()其实只会存储init中定义layer的参数，因此，在forward中直接调用的layer是无法存储。

举个错误的例子，nn.ReLU()和nn.LayerNorm(d_model)的信息丢失：

class PoswiseFFN(nn.Module):
    def __init__(self):
        super(PoswiseFFN, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
        self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)

    def forward(self, inputs):
        residual = inputs 
        output = nn.ReLU()(self.conv1(inputs.transpose(1, 2))) # 错了！
        output = self.conv2(output).transpose(1, 2)
        return nn.LayerNorm(d_model)(output + residual) # 错了！

正确的写法

class PoswiseFFN(nn.Module):
    def __init__(self):
        super(PoswiseFFN, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
        self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
        #在init定义
        self.actv = nn.ReLU()
        self.norm = nn.LayerNorm(d_model)
    def forward(self, inputs):
        residual = inputs 
        output = self.actv(self.conv1(inputs.transpose(1, 2))) # 正确
        output = self.conv2(output).transpose(1, 2)
        return self.norm(output + residual) # 正确

转化onnx代码：

以下是torch模型转化onnx并验证的代码，完整模型代码见 transformer_onnx.py

model = Encoder()
model.eval()
x = (enc_inputs, enc_self_attn_mask)
torch_out, attn = model(enc_inputs, enc_self_attn_mask)

# save onnx model
import onnx
torch.onnx.export(model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "transformer_encoder.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'])
# load onnx model
import onnx

onnx_model = onnx.load("transformer_encoder.onnx")
onnx.checker.check_model(onnx_model)

# run inference, check and compare the results
import onnxruntime

ort_session = onnxruntime.InferenceSession("transformer_encoder.onnx")

def to_numpy(tensor):
    # return tensor.cpu().numpy()
    if tensor.requires_grad:
        return tensor.detach().cpu().numpy()
    else:
        return tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[i].name: to_numpy(x[i]) for i in range(len(x))}
ort_outs= ort_session.run(None, ort_inputs

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

onnx2ncnn

simplify onnx：onnx-simplifier

直接在python程序中写：

1 2	import onnxsim onnx_model_simp, check_ok = onnxsim.simplify(onnx_model）

bash命令中

1	python -m onnxsim input_onnx_model output_onnx_model

onnx2ncnn其实我失败了, 因为onnx把embedding interpret成了Gather, 而NCNN不支持Gather；不过NCNN支持embedding