语义分割-FCN32S(案例) 作者:马育民 • 2020-02-26 14:37 • 阅读:10815 # 介绍 本文通过案例讲解FCN32语义分割实现 基于VGG16神经网络,将3个全连接层去掉,加上3个卷积层,为了防止过拟合,前2个卷积层后面分别加上dropout层,最后用转置卷积层将结果放大32倍,将输出图像的尺寸恢复到之前 数据连接: https://www.kaggle.com/mayumin8211/head-location ### 模型结构 [![](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png) # 代码 ### 导包 ``` import tensorflow as tf import glob import numpy as np import matplotlib.pyplot as plt from PIL import Image from IPython.display import display import os.path ``` ### 常量 ``` # 图片大小 IMG_WIDTH=224 AUTOTUNE=tf.data.experimental.AUTOTUNE # vgg16权重文件 vgg16_h5='/kaggle/input/vgg16-weights-tf-dim-ordering-tf-kernels-notop/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5' ``` ### 获取所有jpg图片路径 ``` # 有3个mat文件 img_paths=glob.glob("/kaggle/input/head-location/images/images/*.jpg") # png_paths=glob.glob("/kaggle/input/head-location/annotations/annotations/trimaps/*") print(img_paths[:3]) # print(png_paths[:3]) ``` ### 查看所有图片数量 ``` count_img=len(img_paths) # count_png=len(png_paths) print(count_img) # print(count_png) ``` ### 获取所有png图片路径 jpg图片和png图片可能不是一一对应的,所以需要根据jpg图片名字,获取png图片路径 ``` png_path="/kaggle/input/head-location/annotations/annotations/trimaps/" png_paths=[] def get_png_paths(): for item in img_paths: # print(item) name=os.path.basename(item).split(".")[0]+".png" # print(name) path=os.path.join(png_path,name) png_paths.append(path) # print(path) get_png_paths() print(png_paths[0]) print(len(png_paths)) ``` ### 测试:显示jpg和png图片(关键) ``` display(Image.open(img_paths[0])) display(Image.open(png_paths[0])) ``` **注意:** png图片不能正常显示 ### 查看png图片内部数据(关键) ``` from collections import Counter def show_png_data(): arr=np.array(Image.open(png_paths[0])) print(type(arr)) print(arr) print("最小数:",np.min(arr)) print("最大数:",np.max(arr)) print(Counter(arr.flatten())) show_png_data() ``` 执行结果: ``` [[2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] ... [2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2]] 最小数: 1 最大数: 3 Counter({2: 132176, 1: 37847, 3: 17477}) ``` 可知png图片的数据只有1,2,3 ### 用matplotlib图片(关键) matplotlib显示特殊图片时,会自行调整,显示成人眼可识别的图片 ``` plt.imshow(Image.open(img_paths[0])) plt.show() plt.imshow(Image.open(png_paths[0])) plt.show() ``` ### 生成Dataset ``` ds=tf.data.Dataset.from_tensor_slices((img_paths,png_paths)) ``` 测试 ``` for jpg,png in ds.take(2): print(jpg) print(png) print('--') ``` ### 实现解析图片函数 ``` def parse_jpg(path): img=tf.io.read_file(path) img=tf.image.decode_jpeg(img,channels=3) img=tf.image.resize(img,(IMG_WIDTH,IMG_WIDTH)) img=img/255 return img def parse_png(path): img=tf.io.read_file(path) img=tf.image.decode_png(img,channels=1) img=tf.image.resize(img,(IMG_WIDTH,IMG_WIDTH)) img=img-1 return img def parse_img(jpg_path,png_path): jpg=parse_jpg(jpg_path) png=parse_png(png_path) return jpg,png ``` 执行函数 ``` ds2=ds.map(parse_img,num_parallel_calls=AUTOTUNE).shuffle(count_img) ``` ### 分割训练集和测试集 ``` train_count=int(count_img*0.7) train_ds=ds2.take(train_count) test_ds=ds2.skip(train_count) ``` 测试图片 ``` for jpg,png in train_ds.take(2): # print(jpg.numpy()) # arr=png.numpy() # print(collections.Counter(arr.flatten())) plt.imshow(jpg.numpy()) plt.show() png2=np.squeeze(png.numpy()) plt.imshow(png2) plt.show() ``` ### 准备训练集和测试集 ``` train_ds2=train_ds.shuffle(train_count).batch(8).prefetch(AUTOTUNE) test_ds2=test_ds.batch(8).prefetch(AUTOTUNE) train_ds ``` ### 加载vgg16模型 ``` app_vgg16=tf.keras.applications.VGG16(include_top=False,weights=vgg16_h5,input_shape=(IMG_WIDTH,IMG_WIDTH,3)) # 用vgg16提取图像特征,所以禁止训练 app_vgg16.trainable=False app_vgg16.summary() ``` ### 构建FCN32S(基于vgg16)模型 移除vgg16模型中的全连接层,之后加了3个卷积层: 1. 第一个卷积层,卷积核数量 4096,卷积核大小 (7,7),padding为same,即:卷积后图像尺寸不改变 2. 第二个卷积层,卷积核数量 4096,卷积核大小 (1,1),padding为same,即:卷积后图像尺寸不改变 3. 第三个卷积层,卷积核数量 3,卷积核大小 (1,1),padding为same,即:卷积后图像尺寸不改变 为了防止过拟合,前2个卷积层之后都加了 **dropout层** 最后加上转置卷积层: - 卷积核数量是3,因为在本案例中,分割图的像素分为3类 - 卷积核尺寸是(32,32) - 步长是32,表示放大32倍 ``` # 构建fcn32_vgg16 def build_fcn32_vgg16(): print(app_vgg16.name,":",app_vgg16.output.shape) o=tf.keras.layers.Conv2D(4096,7,activation="relu",padding="same",name="fc6")(app_vgg16.output) print(o.shape) o = tf.keras.layers.Dropout(rate=0.5)(o) o=tf.keras.layers.Conv2D(4096,1,activation="relu",padding="same",name="fc7")(o) print(o.shape) o = tf.keras.layers.Dropout(rate=0.5)(o) o=tf.keras.layers.Conv2D(1000,1,activation="relu",padding="same",kernel_initializer="he_normal",name="fc8")(o) print(o.shape) o=tf.keras.layers.Conv2DTranspose(3,32,32,padding="same",activation="softmax",name="Conv2DTran")(o) #padding="same", print(o.name,":",o.shape) # o=tf.keras.layers.Conv2DTranspose(3,32,32,activation="softmax",name="Conv2DTran")(app_vgg16.output) #padding="same", # print(o.name,":",o.shape) model=tf.keras.Model(inputs=app_vgg16.input,outputs=[o],name="fcn32_vgg16_2") print("|"*50) model.summary() return model model=build_fcn32_vgg16() ``` **注意:** 在有的资料中,在vgg16后面加上3个卷积层,然后才是发卷积层,但经过测试,效果不好,所以注释掉 ### 模型可视化 ``` tf.keras.utils.plot_model(model,to_file=model.name+".png",show_shapes=True) ``` [![](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png) ### 编译、训练 ``` model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.0001),loss="sparse_categorical_crossentropy",metrics=["acc"]) history=model.fit(train_ds2,epochs=10,validation_data=test_ds2) ``` [![](https://www.malaoshi.top/upload/0/0/1EF53iZUObDq.png)](https://www.malaoshi.top/upload/0/0/1EF53iZUObDq.png) ### 查看损失值、准确率 [![](https://www.malaoshi.top/upload/0/0/1EF53iZxisBv.png)](https://www.malaoshi.top/upload/0/0/1EF53iZxisBv.png) [![](https://www.malaoshi.top/upload/0/0/1EF53iaHlg5y.png)](https://www.malaoshi.top/upload/0/0/1EF53iaHlg5y.png) ### 保存模型 ``` model.save(model.name+".h5") ``` # 预测 ``` for jpg,png in test_ds2.take(1): prd=model.predict(jpg) # print(len(prd)) # print(prd[0].shape) # print(prd[0]) for i in range(2): prd_arr=tf.argmax(prd[i],axis=2) print("prd_arr.shape:",prd_arr.shape) print(jpg[i].shape) # print(jpg[i].numpy()) print(png[i].shape) plt.imshow(jpg[i].numpy()) plt.show() plt.imshow(np.squeeze(png[i].numpy())) # plt.imshow(tf.keras.preprocessing.image.array_to_img(png[i])) plt.show() plt.imshow(prd_arr) plt.show() ``` [![](https://www.malaoshi.top/upload/0/0/1EF53ib11931.png)](https://www.malaoshi.top/upload/0/0/1EF53ib11931.png) [![](https://www.malaoshi.top/upload/0/0/1EF53ibH33N4.png)](https://www.malaoshi.top/upload/0/0/1EF53ibH33N4.png) [![](https://www.malaoshi.top/upload/0/0/1EF53ibYrC87.png)](https://www.malaoshi.top/upload/0/0/1EF53ibYrC87.png) 感谢: https://github.com/YigeunLee/fcn32 https://github.com/advaitsave/Multiclass-Semantic-Segmentation-CamVid/blob/master/Multiclass_Semantic_Segmentation_using_FCN_32.ipynb https://github.com/nayemabs/keras_segmentation/blob/master/Models/FCN32.py https://github.com/lsh1994/keras-segmentation/tree/master/Models 原文出处:http://malaoshi.top/show_1EF53OZZLiWQ.html