语义分割-FCN32S（案例）-马育民老师

# 介绍
本文通过案例讲解FCN32语义分割实现

基于VGG16神经网络，将3个全连接层去掉，加上3个卷积层，为了防止过拟合，前2个卷积层后面分别加上dropout层，最后用转置卷积层将结果放大32倍，将输出图像的尺寸恢复到之前

数据连接：
https://www.kaggle.com/mayumin8211/head-location

### 模型结构
[![](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)

# 代码

### 导包
```
import tensorflow as tf
import glob
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from IPython.display import display
import os.path
```

### 常量
```
# 图片大小
IMG_WIDTH=224

AUTOTUNE=tf.data.experimental.AUTOTUNE

# vgg16权重文件
vgg16_h5='/kaggle/input/vgg16-weights-tf-dim-ordering-tf-kernels-notop/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
```

### 获取所有jpg图片路径
```
# 有3个mat文件
img_paths=glob.glob("/kaggle/input/head-location/images/images/*.jpg")
# png_paths=glob.glob("/kaggle/input/head-location/annotations/annotations/trimaps/*")

print(img_paths[:3])
# print(png_paths[:3])
```

### 查看所有图片数量
```
count_img=len(img_paths)
# count_png=len(png_paths)
print(count_img)
# print(count_png)
```

### 获取所有png图片路径
jpg图片和png图片可能不是一一对应的，所以需要根据jpg图片名字，获取png图片路径
```
png_path="/kaggle/input/head-location/annotations/annotations/trimaps/"
png_paths=[]
def get_png_paths():
    for item in img_paths:
    #     print(item)
        name=os.path.basename(item).split(".")[0]+".png"
#         print(name)
        path=os.path.join(png_path,name)
        png_paths.append(path)
#         print(path)
        
get_png_paths()

print(png_paths[0])
print(len(png_paths))
```

### 测试：显示jpg和png图片（关键）
```
display(Image.open(img_paths[0]))
display(Image.open(png_paths[0]))
```

**注意：** png图片不能正常显示

### 查看png图片内部数据（关键）
```
from collections import Counter
def show_png_data():
    arr=np.array(Image.open(png_paths[0]))
    print(type(arr))
    print(arr)
    print("最小数：",np.min(arr))
    print("最大数：",np.max(arr))

print(Counter(arr.flatten()))

show_png_data()
```
执行结果：
```
<class 'numpy.ndarray'>
[[2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 ...
 [2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]]
最小数： 1
最大数： 3
Counter({2: 132176, 1: 37847, 3: 17477})
```

可知png图片的数据只有1，2，3

### 用matplotlib图片（关键）
matplotlib显示特殊图片时，会自行调整，显示成人眼可识别的图片
```
plt.imshow(Image.open(img_paths[0]))
plt.show()

plt.imshow(Image.open(png_paths[0]))
plt.show()
```

### 生成Dataset
```
ds=tf.data.Dataset.from_tensor_slices((img_paths,png_paths))

```

测试
```
for jpg,png in ds.take(2):
    print(jpg)
    print(png)
    print('--')
```

### 实现解析图片函数
```
def parse_jpg(path):
    img=tf.io.read_file(path)
    img=tf.image.decode_jpeg(img,channels=3)
    img=tf.image.resize(img,(IMG_WIDTH,IMG_WIDTH))
    img=img/255
    
    return img

def parse_png(path):
    img=tf.io.read_file(path)
    img=tf.image.decode_png(img,channels=1)
    img=tf.image.resize(img,(IMG_WIDTH,IMG_WIDTH))
    img=img-1
    
    return img

def parse_img(jpg_path,png_path):
    jpg=parse_jpg(jpg_path)
    png=parse_png(png_path)
    
    return jpg,png

```

执行函数
```
ds2=ds.map(parse_img,num_parallel_calls=AUTOTUNE).shuffle(count_img)
```

### 分割训练集和测试集
```
train_count=int(count_img*0.7)

train_ds=ds2.take(train_count)
test_ds=ds2.skip(train_count)
```

测试图片
```
for jpg,png in train_ds.take(2):
#     print(jpg.numpy())
#     arr=png.numpy()
#     print(collections.Counter(arr.flatten()))
    plt.imshow(jpg.numpy())
    plt.show()
    png2=np.squeeze(png.numpy())
    plt.imshow(png2)
    plt.show()
```

### 准备训练集和测试集
```
train_ds2=train_ds.shuffle(train_count).batch(8).prefetch(AUTOTUNE)
test_ds2=test_ds.batch(8).prefetch(AUTOTUNE)

train_ds
```

### 加载vgg16模型
```
app_vgg16=tf.keras.applications.VGG16(include_top=False,weights=vgg16_h5,input_shape=(IMG_WIDTH,IMG_WIDTH,3))

# 用vgg16提取图像特征，所以禁止训练
app_vgg16.trainable=False

app_vgg16.summary()
```

### 构建FCN32S（基于vgg16）模型

移除vgg16模型中的全连接层，之后加了3个卷积层：
1. 第一个卷积层，卷积核数量 4096，卷积核大小 (7,7)，padding为same，即：卷积后图像尺寸不改变
2. 第二个卷积层，卷积核数量 4096，卷积核大小 (1,1)，padding为same，即：卷积后图像尺寸不改变
3. 第三个卷积层，卷积核数量 3，卷积核大小 (1,1)，padding为same，即：卷积后图像尺寸不改变

为了防止过拟合，前2个卷积层之后都加了 **dropout层**

最后加上转置卷积层：
- 卷积核数量是3，因为在本案例中，分割图的像素分为3类
- 卷积核尺寸是(32,32)
- 步长是32，表示放大32倍

```
# 构建fcn32_vgg16

def build_fcn32_vgg16():
    print(app_vgg16.name,":",app_vgg16.output.shape)
    o=tf.keras.layers.Conv2D(4096,7,activation="relu",padding="same",name="fc6")(app_vgg16.output)
    print(o.shape)
    
    o = tf.keras.layers.Dropout(rate=0.5)(o)
    
    o=tf.keras.layers.Conv2D(4096,1,activation="relu",padding="same",name="fc7")(o)
    print(o.shape)
    
    o = tf.keras.layers.Dropout(rate=0.5)(o)
    
    o=tf.keras.layers.Conv2D(1000,1,activation="relu",padding="same",kernel_initializer="he_normal",name="fc8")(o)
    print(o.shape)

o=tf.keras.layers.Conv2DTranspose(3,32,32,padding="same",activation="softmax",name="Conv2DTran")(o) #padding="same",
    print(o.name,":",o.shape)

#     o=tf.keras.layers.Conv2DTranspose(3,32,32,activation="softmax",name="Conv2DTran")(app_vgg16.output) #padding="same",
#     print(o.name,":",o.shape)
    
    model=tf.keras.Model(inputs=app_vgg16.input,outputs=[o],name="fcn32_vgg16_2")
    
    print("|"*50)
    model.summary()
    
    return model

model=build_fcn32_vgg16()
```

**注意：**
在有的资料中，在vgg16后面加上3个卷积层，然后才是发卷积层，但经过测试，效果不好，所以注释掉

### 模型可视化

```
tf.keras.utils.plot_model(model,to_file=model.name+".png",show_shapes=True)
```

[![](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)](https://www.malaoshi.top/upload/0/0/1EF53VROsFDf.png)

### 编译、训练
```
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.0001),loss="sparse_categorical_crossentropy",metrics=["acc"])
history=model.fit(train_ds2,epochs=10,validation_data=test_ds2)
```

[![](https://www.malaoshi.top/upload/0/0/1EF53iZUObDq.png)](https://www.malaoshi.top/upload/0/0/1EF53iZUObDq.png)

### 查看损失值、准确率
[![](https://www.malaoshi.top/upload/0/0/1EF53iZxisBv.png)](https://www.malaoshi.top/upload/0/0/1EF53iZxisBv.png)

[![](https://www.malaoshi.top/upload/0/0/1EF53iaHlg5y.png)](https://www.malaoshi.top/upload/0/0/1EF53iaHlg5y.png)

### 保存模型

```
model.save(model.name+".h5")
```

# 预测
```
for jpg,png in test_ds2.take(1):
    prd=model.predict(jpg)
#     print(len(prd))
#     print(prd[0].shape)
#     print(prd[0])
    
    for i in range(2):
        prd_arr=tf.argmax(prd[i],axis=2)
        print("prd_arr.shape:",prd_arr.shape)
        print(jpg[i].shape)
#         print(jpg[i].numpy())
        print(png[i].shape)
        plt.imshow(jpg[i].numpy())
        plt.show()
        plt.imshow(np.squeeze(png[i].numpy()))
#         plt.imshow(tf.keras.preprocessing.image.array_to_img(png[i]))
        plt.show()
        plt.imshow(prd_arr)
        plt.show()
```

[![](https://www.malaoshi.top/upload/0/0/1EF53ib11931.png)](https://www.malaoshi.top/upload/0/0/1EF53ib11931.png)

[![](https://www.malaoshi.top/upload/0/0/1EF53ibH33N4.png)](https://www.malaoshi.top/upload/0/0/1EF53ibH33N4.png)

[![](https://www.malaoshi.top/upload/0/0/1EF53ibYrC87.png)](https://www.malaoshi.top/upload/0/0/1EF53ibYrC87.png)

感谢：
https://github.com/YigeunLee/fcn32
https://github.com/advaitsave/Multiclass-Semantic-Segmentation-CamVid/blob/master/Multiclass_Semantic_Segmentation_using_FCN_32.ipynb
https://github.com/nayemabs/keras_segmentation/blob/master/Models/FCN32.py
https://github.com/lsh1994/keras-segmentation/tree/master/Models

原文出处：http://malaoshi.top/show_1EF53OZZLiWQ.html