基于intel openvino的搭建及应用，包含分类，目标检测，及分割，超分辨 -凯发娱发k8

part i: 环境openvino tensorflow1.12.0

i: l_openvino_toolkit_p_2019.1.094

第一步常规安装参考链接：https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html

第二步编译inference engine samples:

cd /path/to/deployment_tools/inference_engine/samples

run ./build_samples.sh

编译后的生成文件路径

/root/inference_engine_samples_build/intel64/release

ii. tensorflow编译

bazel编译tensorflow

参考链接：https://blog.csdn.net/chenyuping333/article/details/82108509

bazel下载连接：https://github.com/bazelbuild/bazel/releases

(bazel-0.18.0-installer-linux-x86_64.sh)

2. tensorflow下载连接：https://github.com/tensorflow/tensorflow/tags (tensorflow-1.12.0)

step 1: cd /path/to/tensorflows

step2: ./configure (全选no)

step3: 编译freeze_graph

bazel build tensorflow/python/tools:freeze_graph

可能会遇到的问题：

error:python.h:no such file or directory

solutions:yum install python34-devel

note:一定将默认python改为python3，缺少numpy文件，yum安装时报错，更改对应文件头部的/usr/bin/python地址为/usr/bin/python2，使用pip3 安装numpy

错误参考链接: https://www.jianshu.com/p/db943b0f1627 https://blog.csdn.net/xjmxym/article/details/73610648

https://www.cnblogs.com/toseek/p/6192481.html

step4: 编译transform_graph

bazel build tensorflow/tools/graph_transforms:transform_graph

step5: 编译summarize_graph

bazel build tensorflow/tools/graph_transforms:summarize_graph

part ii: openvino for classification

数据集准备：imagenet val

分别制作六个文件夹每个文件夹内的图片数量依次为1,8,16,32,64,96

形式如下

测试模型：vgg-19,resnet-50,resnet-101,resnet-152,inception-v3,inception-v4

参考链接: https://github.com/vdevaram/deep_learning_utilities_cpu/blob/master/dldt/run_dldt_tf.sh

stepi: 下载预训练的模型(6个)

mkdir pretrainedmodels && cd pretrainedmodels

wget http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz

tar -xvf vgg_19_2016_08_28.tar.gz

wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz

tar -xvf inception_v3_2016_08_28.tar.gz

wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz

tar -xvf inception_v4_2016_09_09.tar.gz

wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz

tar -xvf resnet_v1_50_2016_08_28.tar.gz

wget http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz

tar -xvf resnet_v1_101_2016_08_28.tar.gz

wget http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz

tar -xvf resnet_v1_152_2016_08_28.tar.gz

解压后的形式如下所示

stepii: 生成对应分类预训练模型的.pb文件

cd /path/to/pretrainedmodels

mkdir frozen && mkdir pb

(*不同网络的对应指令有些许不同)

1． python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=vgg_19 \

--output_file=/path/to/pb/vgg_19.pb \

--labels_offset=1

2. python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=resnet_v1_50 \

--output_file=/path/to/pb/resnet_v1_50.pb \

--labels_offset=1

3. python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=resnet_v1_101 \

--output_file=/path/to/pb/resnet_v1_101.pb \

--labels_offset=1

4. python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=resnet_v1_152 \

--output_file=/path/to/pb/resnet_v1_152.pb \

--labels_offset=1

5. python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=inception_v3 \

--output_file=/path/to/pb/inception_v3.pb

6. python3.6 /path/to/tensorflowmodels/research/slim/export_inference_graph.py \

--alsologtostderr \

--model_name=inception_v4 \

--output_file=/path/to/pb/inception_v4.pb

stepiii: 对生成对应分类预训练模型的.pb文件进行freeze操作

(*若下面指令报错可以使用python3.6 运行对应的freeze_graph.py文件进行生成)

cd /path/to/tensorflowmodels

1. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/vgg_19.pb \

--input_checkpoint=/path/to/vgg_19.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_vgg_19.pb \

--output_node_names=vgg_19/fc8/squeezed

2. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/resnet_v1_50.pb \

--input_checkpoint=/path/to/resnet_v1_50.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_resnet_v1_50.pb \

--output_node_names=resnet_v1_50/predictions/reshape_1

3. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/resnet_v1_101.pb \

--input_checkpoint=/path/to/resnet_v1_101.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_resnet_v1_101.pb \

--output_node_names=resnet_v1_101/predictions/reshape_1

4. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/resnet_v1_152.pb \

--input_checkpoint=/path/to/resnet_v1_152.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_resnet_v1_152.pb \

--output_node_names=resnet_v1_152/predictions/reshape_1

5. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/inception_v3.pb \

--input_checkpoint=/path/to/inception_v3.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_inception_v3.pb \

--output_node_names=inceptionv3/predictions/reshape_1

6. bazel-bin/tensorflow/python/tools/freeze_graph \

--input_graph=/path/to/pb/inception_v4.pb \

--input_checkpoint=/path/to/inception_v4.ckpt \

--input_binary=true \

--output_graph=/path/to/frozen/frozen_inception_v4.pb \

--output_node_names=inceptionv4/logits/predictions

stepiv: 生成ir文件

cd /path/to /deployment_tools/model_optimizer

1. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_vgg_19.pb \

--data_type fp32 \

--output_dir /path/to/frozen/ \

--reverse_input_channels \

--batch 64

(batch大小可选1,8,16,32,64,96)

2. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_inception_v3.pb \

--data_type fp32 \

--scale 255 \

--reverse_input_channels \

--output_dir /path/to/frozen/ \

--batch 16

3. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_inception_v4.pb \

--data_type fp32 \

--scale 255 \

--reverse_input_channels \

--output_dir /path/to/frozen/ \

--batch 16

4. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_resnet_v1_50.pb \

--data_type fp32 \

--output_dir /path/to/frozen/ \

--reverse_input_channels \

--batch 16

5. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_resnet_v1_101.pb \

--data_type fp32 \

--output_dir /path/to/frozen/ \

--reverse_input_channels \

--batch 16

6. python3.6 mo.py --framework tf \

--input_model /path/to/frozen/frozen_resnet_v1_152.pb \

--data_type fp32 \

--output_dir /path/to/frozen/ \

--reverse_input_channels \

--batch 16

stepv:测试(-ni -niter 迭代100次)

cd /root/inference_engine_samples_build/intel64/release

./classification_sample \

-i /path/to/frozen/frozen_inception_v3.xml \

-d cpu -ni 100 \

-l /path/to/deployment_tools/inference_engine/samples/intel64/release/lib/libcpu_extension.so \

-nt 1 \

-i /path/to/val1

./benchmark_app \

-m /path/to /frozen/ frozen_inception_v3.xml \

-d cpu -api async -niter 100 \

-l /path/to/deployment_tools/inference_engine/samples/intel64/release/lib/libcpu_extension.so -nireq 32 \

-i /path/to/val1

(-nireq单cpu核的数量，通过lscpu指令查看)

(上述输入图片的数量与batch的大小相同，对应前面的val数据集，通过更改不同模型对应的.xml文件来测试不同的模型)

part iii: openvino for object detection test

数据集准备：coco val2017

分别制作六个文件夹每个文件夹内的图片数量依次为1,8,16,32,64,96

形式如下

note: tensorflow object_detection api参考：

https://github.com/tensorflow/models/tree/master/research/object_detection

step i:预训练模型的下载

模型链接：

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

解压后的文件格式如下图

step ii: 生成ir文件

参考链接：

https://docs.openvinotoolkit.org/latest/_docs_mo_dg_prepare_model_convert_model_tf_specific_convert_object_detection_api_models.html

（更改ssd,faster r-cnn,rfcn,mask r-cnn对应的 .json及pipeline.config文件

对应的.json文件如下

ssd_v2_support.json

faster_rcnn_support.json

rfcn_support.json

mask_rcnn_support.json）

python3.6 mo_tf.py \

--input_model=/path/to/frozen_inference_graph.pb \

--tensorflow_use_custom_operations_config /path/to/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json \

--tensorflow_object_detection_api_pipeline_config /path/to/pipeline.config \

--reverse_input_channels --batch 16

(batch只针对ssd的大小可调,faster rcnn 及rfcn只能测试batch为1的情形)

stepiii:测试

for ssd/faster rcnn/rfcn series

(benchmark_app只适用于ssd)

./benchmark_app \

-m /path/to/frozen_inference_graph.xml \

-d cpu -api async -niter 100 \

-l /path/to/intel64/debug/lib/libcpu_extension.so \

-nireq 32 -i /path/to /val1/

(-nireq代表单cpu的核数，通过lscpu可以查看，val与batch的大小相对应)

(rfcn, ssd 及 faster r-cnns, 测试指令都为object_detection_sample_ssd

***faster r-cnn/rfcn的只适用于batch=1

)

./object_detection_sample_ssd \

-m /path/to/frozen_inference_graph.xml -d cpu -ni 100 \

-l /path/to/intel64/debug/lib/libcpu_extension.so -i /path/to /val1/

for mask r-cnn

参考链接：https://docs.openvinotoolkit.org/latest/_inference_engine_samples_mask_rcnn_demo_readme.html

./mask_rcnn_demo \

-m /path/to/frozen_inference_graph.xml -d cpu -ni 100 \

-l /root/inference_engine_samples_build/intel64/debug/lib/libcpu_extension.so \

-i /path/to/coco_val/val64/

在mask r-cnn进行batch 值大于16以上时，会出现错误

“segmentation fault”是因为在测试时会生成在当前路径下生成图片，随着batch的增加，内存爆掉。此部分代码可以注释，并不影响正常时间的测量。

注释部分代码如下图。

main.cpp 路径：/opt/intel/openvino_2019.1.0394/deployment_tools/inference_engine/samples/mask_rcnn_demo

此处开始注释

此处结束注释

文件修改后重新进行编译

run ./build_samples.sh

part iv: openvino for deeplabv3

reference：https://github.com/fionazz92/openvino/tree/master/deeplabv3+_mobilenetv2

tensorflow运行指令(wwen.sh)：

echo 1 > /proc/sys/vm/compact_memory

echo 3 > /proc/sys/vm/drop_caches

echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo

echo 0 > /proc/sys/kernel/numa_balancing

cpupower frequency-set -g performance

export kmp_blocktime=0

export kmp_settings=1

export kmp_affinity=granularity=fine,compact,1,0

export omp_num_threads=16

numactl --physcpubind=0-15,32-47 --membind=0 python3.6 demo_multi.py --input_folder ./img --output_folder ./output --logdir ./model > node_2_1.log 2>&1 &

numactl --physcpubind=16-31,48-63 --membind=1 python3.6 demo_multi.py --input_folder ./img --output_folder ./output --logdir ./model > node_2_2.log 2>&1 &

step1: ir文件的生成

python3.6 mo_tf.py --input_model /home/gsj/deeplab/research/deeplab/model/frozen_inference_graph.pb --data_type fp32 --output_dir /home/gsj/super-resolution/tf_estimator_barebone/models/ --input 0:xception_65/pad --output aspp0/relu,aspp1_pointwise/relu,aspp2_pointwise/relu,aspp3_pointwise/relu,resizebilinear_1 --input_shape [1,1953,2593,3](根据下一步inference过程，预处理图片对应输出尺寸进行调整，如下图)

step2: inference(infer_ie_tf.py位于intel64/release下)

echo 1 > /proc/sys/vm/compact_memory

echo 3 > /proc/sys/vm/drop_caches

echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo

echo 0 > /proc/sys/kernel/numa_balancing

cpupower frequency-set -g performance

export kmp_blocktime=0

export kmp_settings=1

export kmp_affinity=granularity=fine,compact,1,0

export omp_num_threads=16

python3.6 infer_ie_tf.py -m /home/gsj/super-resolution/tf_estimator_barebone/models/frozen_inference_graph.xml -i 1.jpg -l lib/libcpu_extension .so

#infer_ie_tf.py代码

#author:fourmi_gsj

from __future__ import print_function

import sys

import os

from argparse import argumentparser

import numpy as np

import cv2

import time 
import tensorflow as tf

from tensorflow.python.platform import gfile
from openvino.inference_engine import ienetwork,ieplugin
def build_argparser():

    parser = argumentparser()

    parser.add_argument("-m", "--model" ,help="path to an .xml file with a trained model.",required=true,type=str)

    parser.add_argument("-i", "--input", help="path to a folder with images or path to an image files", required=true,

                        type=str)

    parser.add_argument("-l", "--cpu_extension",

                        help="mkldnn (cpu)-targeted custom layers.absolute path to a shared library with the kernels "

                             "impl.", type=str, default=none)

    parser.add_argument("-pp", "--plugin_dir", help="path to a plugin folder", type=str, default=none)

    parser.add_argument("-d", "--device",

                        help="specify the target device to infer on; cpu, gpu, fpga or myriad is acceptable. sample "

                             "will look for a suitable plugin for device specified (cpu by default)", default="cpu",

                        type=str)

    parser.add_argument("-nt", "--number_top", help="number of top results", default=10, type=int)

    parser.add_argument("-pc", "--performance", help="enables per-layer performance report", action='store_true')
    return parser
def resoze_for_concat(a0,a1,a2,a3,rb):

    iimg_ir = []

    '''

    print('a0:',a0.shape)

    print('a1:',a1.shape)

    print('a2:',a2.shape)

    print('a3:',a3.shape)

    print('rb:',rb.shape)

    resize_aspp0 =np.float32(np.zeros((1,256,123,163)))

    resize_resizebilinear_1=np.float32(np.zeros((1,256,123,163)))

    resize_aspp1 =np.float32(np.zeros((1,256,123,163)))

    resize_aspp2 =np.float32(np.zeros((1,256,123,163)))

    resize_aspp3 =np.float32(np.zeros((1,256,123,163)))

    for i in range(256):

        resize_aspp0[0,i]=cv2.resize(a0[0,i],(163,123), interpolation=cv2.inter_linear)

        resize_resizebilinear_1[0,i]=cv2.resize(rb[0,i],(163,123), interpolation=cv2.inter_linear)

        resize_aspp1[0,i]=cv2.resize(a1[0,i],(163,123), interpolation=cv2.inter_linear)

        resize_aspp2[0,i]=cv2.resize(a2[0,i],(163,123), interpolation=cv2.inter_linear)

        resize_aspp3[0,i]=cv2.resize(a3[0,i],(163,123), interpolation=cv2.inter_linear)

    resizebilinear_1=resize_resizebilinear_1.transpose((0,2,3,1))

    aspp0=resize_aspp0.transpose((0,2,3,1))

    aspp1=resize_aspp1.transpose((0,2,3,1))

    aspp2=resize_aspp2.transpose((0,2,3,1))

    aspp3=resize_aspp3.transpose((0,2,3,1))

    '''

    resizebilinear_1=rb.transpose((0,2,3,1))

    aspp0=a0.transpose((0,2,3,1))

    aspp1=a1.transpose((0,2,3,1))

    aspp2=a2.transpose((0,2,3,1))

    aspp3=a3.transpose((0,2,3,1))

    '''

    print(aspp0.shape)

    print(aspp1.shape)

    print(aspp2.shape)

    print(aspp3.shape)

    print(resizebilinear_1.shape)

    '''

    iimg_ir.append(resizebilinear_1)

    iimg_ir.append(aspp0)

    iimg_ir.append(aspp1)

    iimg_ir.append(aspp2)

    iimg_ir.append(aspp3)

    return iimg_ir
class _model_preprocess():

    def __init__(self):

        graph = tf.graph()

        f_handle = gfile.fastgfile("/home/gsj/deeplab/research/deeplab/model/frozen_inference_graph.pb",'rb')

        graph_def = tf.graphdef.fromstring(f_handle.read())

        with graph.as_default():

            tf.import_graph_def(graph_def,name='')

        self.sess = tf.session(graph=graph)
    def _pre_process(self,image):

        seg_map = self.sess.run('sub_7:0',feed_dict={'imagetensor:0':[image]})

        #print('the shape of the seg_map is :',seg_map.shape)

        return seg_map
class _model_postprocess():

    def __init__(self):

        graph = tf.graph()

        f_handle = gfile.fastgfile("/home/gsj/deeplab/research/deeplab/model/frozen_inference_graph.pb",'rb')

        graph_def = tf.graphdef.fromstring(f_handle.read())
        with graph.as_default():

            new_input0=tf.placeholder(tf.float32,shape=(1,123,163,256),name='new_input0')

            new_input1=tf.placeholder(tf.float32,shape=(1,123,163,256),name='new_input1')

            new_input2=tf.placeholder(tf.float32,shape=(1,123,163,256),name='new_input2')

            new_input3=tf.placeholder(tf.float32,shape=(1,123,163,256),name='new_input3')

            new_input4=tf.placeholder(tf.float32,shape=(1,123,163,256),name='new_input4')
            tf.import_graph_def(graph_def,input_map={'resizebilinear_1:0':new_input0,'aspp0/relu:0':new_input1,'aspp1_pointwise/relu:0':new_input2,'aspp2_pointwise/relu:0':new_input3,'aspp3_pointwise/relu:0':new_input4},name='')
        self.sess = tf.session(graph=graph)

    def _post_process(self,image_ir,image):

        seg_map = self.sess.run('semanticpredictions:0', feed_dict={'imagetensor:0': [image], 'new_input0:0': image_ir[0],

                                    'new_input1:0': image_ir[1],'new_input2:0': image_ir[2],'new_input3:0': image_ir[3],

                                    'new_input4:0': image_ir[4]})

        return seg_map
_pre = _model_preprocess()

_post = _model_postprocess()
def main_ie_infer():

    args = build_argparser().parse_args()

    model_xml = args.model

    model_bin = os.path.splitext(model_xml)[0]   ".bin"

    image = cv2.imread(args.input)

    print("the size of the orig image is:",image.shape[0],image.shape[1])
    h_input_size=1360 #the height of the output

    w_input_size=1020 #the width of the output
    h_ratio = 1.0 * h_input_size / image.shape[0]

    w_ratio = 1.0 * w_input_size / image.shape[1]

    shrink_size = (int(w_ratio * image.shape[1]),int(h_ratio*image.shape[0]))

    image = cv2.resize(image,shrink_size, interpolation=cv2.inter_linear)

    print("the shape of the resized image is:",image.shape)
    # plugin initialization for specified device and load extensions library if specified

    plugin = ieplugin(device=args.device, plugin_dirs=args.plugin_dir)

    if args.cpu_extension and 'cpu' in args.device:

        plugin.add_cpu_extension(args.cpu_extension)

    if args.performance:

        plugin.set_config({"perf_count": "yes"})
    # read ir

    net = ienetwork.from_ir(model=model_xml, weights=model_bin)

    #print("the output info of the net is :",net.outputs)
    input_blob = next(iter(net.inputs))

    print('input_blob is :',input_blob)

    exec_net = plugin.load(network=net)
    img_ir = []

    for itr in range(1):

        now = time.time()

        image_ = _pre._pre_process(image)

        image_ = image_.transpose((0,3,1,2))

        #print("the shape of the front net'output:",image_.shape)
        res =exec_net.infer(inputs={input_blob:image_})

        #print(res.keys())

        aspp0 = res['aspp0/relu']

        aspp1 = res['aspp1_pointwise/relu']

        aspp2 = res['aspp2_pointwise/relu']

        aspp3 = res['aspp3_pointwise/relu']

        resizebilinear_1=res['resizebilinear_1']
        img_ir = resoze_for_concat(aspp0,aspp1,aspp2,aspp3,resizebilinear_1)

        result = _post._post_process(img_ir,image)[0]

    print('time cost:',time.time()-now)

        #print(result)

    result[result!=0]=255
    cv2.imwrite('./result_deeplabv3.jpg', result)

    del net

    del exec_net

    del plugin
if __name__=='__main__':

    sys.exit(main_ie_infer() or 0)

part v: openvino for super resolution

step 1: tensorflow进行测试

github: https://github.com/ychfan/tf_estimator_barebone

运行inference程序，查找该模型的输出节点，查找到的节点名称为”clip_by_value”

指令执行路径：/home/gsj/super-resolution/tf_estimator_barebone/

运行相关指令如下：

export kmp_blocktime=1

export kmp_affinity=granularity=fine,compact,1,0

export omp_num_threads=16

numactl -c 0-15,32-47 -m 0 python3.6 -m datasets.div2k --mode wdsr --model-dir /home/gsj/super-resolution/tf_estimator_barebone/models/ --input-dir /home/gsj/super-resolution/tf_estimator_barebone/data/div2k_valid_hr/ --output-dir ../output

step 2:

将文件夹models下的模型相关文件(saved_model.pb, variabels文件夹)进行处理，freeze saved_model.pb文件，生成pruned_saved_model_or_whatever.pb文件

指令执行路径：/home/gsj/super-resolution/tf_inference_demo/tensorflow-1.12.0

基于intel openvino的搭建及应用，包含分类，目标检测，及分割，超分辨 -凯发娱发k8

基于intel openvino的搭建及应用，包含分类，目标检测，及分割，超的相关教程结束。

相关推荐

多加速器驱动agx的目标检测与车道分割

[tensorflow] 使用 mask_rcnn 完成目标检测与实例分割，同时输出每个区域的 feature map