设计网站printestseo官网优化
pytorch 到 tensorflow 可以用onnx作为中间工具转换,将pytorch转为onnx,再从onnx转为tensorflow,但是中间可能出现一些乱七八糟的问题。其实手动读参数再填充的对应的模型中也很方便,本文就总结一下手动模型转换。
第一种方式: 直接用kernel_initializer来填充权重
- dense 层转换:
dense_w = state_dict['dense.weight'].permute(1,0).numpy()dense_b = state_dict['dense.bias'].numpy()output = tf.keras.layers.Dense( dense_w.shape[-1], kernel_initializer=tf.constant_initializer(dense_w), bias_initializer=tf.constant_initializer(dense_b),name='bottleneck')(output)output = tf.keras.layers.Softmax(-1)(output)
- 卷积层转换:
conv_w = state_dict['conv.weight'].permute(2,1,0).numpy()conv_b = state_dict['conv.bias'].numpy() output = tf.keras.layers.Conv1D(filters=conv_w.shape[-1], kernel_size=conv_w.shape[0], padding='same', kernel_initializer= tf.constant_initializer(conv_w), bias_initializer=tf.constant_initializer(conv_b), name ='conv')(input)output = tf.keras.layers.LeakyReLU(alpha=0.01)(output)
注意:leakyReLU pytorch 中对应的默认参数和tensorflow中的默认参数不同,一定要保持一致。
- BiLSTM层转换:
hidden_channel=128lstm_w = state_dict['layer.weight_ih_l0'].permute(1,0).numpy()lstm_r = state_dict['layer.weight_hh_l0'].permute(1,0).numpy()lstm_b = state_dict['layer.bias_hh_l0'].numpy() + state_dict['layer.bias_ih_l0'].numpy()lstm_w_inv = state_dict['layer.weight_ih_l0_reverse'].permute(1,0).numpy()lstm_r_inv = state_dict['layer.weight_hh_l0_reverse'].permute(1,0).numpy()lstm_b_inv = state_dict['layer.bias_hh_l0_reverse'].numpy() + state_dict['layer.bias_ih_l0_reverse'].numpy()fw = tf.keras.layers.LSTM( hidden_channel, return_sequences=True, recurrent_activation='sigmoid', use_bias=True, kernel_initializer=tf.keras.initializers.constant(lstm_w),unit_forget_bias=False, recurrent_initializer=tf.keras.initializers.constant(lstm_r),bias_initializer=tf.keras.initializers.constant(lstm_b))bw = tf.keras.layers.LSTM( hidden_channel, return_sequences=True, go_backwards=True, recurrent_activation='sigmoid', use_bias=True, kernel_initializer=tf.keras.initializers.constant(lstm_w_inv),unit_forget_bias=False, recurrent_initializer=tf.keras.initializers.constant(lstm_r_inv),bias_initializer=tf.keras.initializers.constant(lstm_b_inv))output = tf.keras.layers.Bidirectional(fw, backward_layer = bw)(output)
注意:LSTM的转换方式特殊一些,pytorch中包含4组参数(input的权重+state的权重+两组bias, 文档中说第二组参数主要是为了CuDNN的并行化torch.nn.modules.rnn - PyTorch 1.7.0 documentation NVIDIA Deep Learning cuDNN Documentation); 而在tensorflow中包含3组参数(input的权重+state的权重+一组bias),在转换的时候将pytorch中的两组参数相加作为tensorflow的bias。这种方式 unit_forget_bias=False 否则bias 参数对不上。
!!!虽然 tf.keras.layers.LSTM 中参数recurrent_activation默认设置为'sigmoid',但是不显示设置为'sigmoid', 结果对不上。
第二种方式: 使用set_weights 来填充权重
- 特别注意bilstm的层:
lstm_w = state_dict['layer.weight_ih_l0'].permute(1,0).numpy()lstm_r = state_dict['layer.weight_hh_l0'].permute(1,0).numpy()lstm_b = state_dict['layer.bias_hh_l0'].T.numpy() + state_dict['layer.bias_ih_l0'].T.numpy()lstm_w_inv = state_dict['layer.weight_ih_l0_reverse'].permute(1,0).numpy()lstm_r_inv = state_dict['layer.weight_hh_l0_reverse'].permute(1,0).numpy()lstm_b_inv = state_dict['layer.bias_hh_l0_reverse'].T.numpy() + state_dict['layer.bias_ih_l0_reverse'].T.numpy()fw = tf.keras.layers.LSTM(hidden_channel, return_sequences=True, recurrent_activation='sigmoid')bw = tf.keras.layers.LSTM(hidden_channel, return_sequences=True, go_backwards=True, recurrent_activation='sigmoid')output = tf.keras.layers.Bidirectional(fw, backward_layer = bw)(output)keras_format_weights = [lstm_w,lstm_r,lstm_b,lstm_w_inv,lstm_r_inv,lstm_b_inv] model.get_weights(‘xxxx’).set_weights(keras_format_weights)
注意:bilstm包含6组权重矩阵,如果是单向lstm是3组权重矩阵,其他层的转换方式类似。