当前位置：首页 > AI > 基于 PaddlePaddle 框架的 SegNet 论文复现

基于 PaddlePaddle 框架的 SegNet 论文复现

发布时间：2025-07-18　　　　编辑:游乐网

本文复现了SegNet语义分割模型，其为编码器-解码器结构，编码器对应VGG-16前13层，解码器共13层，损失函数采用带median frequency weight的加权交叉熵。在Ai Studio环境配置后，基于PaddleSeg，经模型搭建、损失函数设计及调参，在camvid数据集上复现，结果超原论文60.1%的mIoU精度。

基于 paddlepaddle 框架的 segnet 论文复现 - 游乐网

论文简介

论文名称：SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
论文地址：https://arxiv.org/pdf/1511.00561.pdf

1. 网络结构

基于 PaddlePaddle 框架的 SegNet 论文复现 - 游乐网
总体结构是一个“编码器——解码器”最后加上Softmax层的结构
编码器与VGG-16的前13层，相对应的，解码器也是13层

2. Loss

使用的是交叉熵函数
原论文使用的是“median frequency weight”的加权交叉熵进行计算，每一类的 weight=median(weights)/weights
实验时我们同时使用了不加权重的交叉熵和加权重的交叉熵进行计算，发现加权重的交叉熵效果的确更好

3. Experiments

3.1 Ai Studio 环境配置

由于 Ai Studio 的部分兼容问题，目前使用加权交叉熵需要修改 PaddlePaddle 内置的交叉熵函数，可以在终端中执行如下命令进行暂时的修复：
1.使用 vim 编辑文件：
vim /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/functional/loss.py
2.注释掉问题语句：
:1414,1420s/^/#/g

之后就可以使用 Ai Studio 环境进行复现

3.2 训练、验证参数设置

在 config.yml 中可以对训练、验证参数进行设置

batch_size: 12  #设定batch_size的值即为迭代一次送入网络的图片数量，一般显卡显存越大，batch_size的值可以越大iters: 1000    #模型迭代的次数train_dataset: #训练数据设置  type: Dataset #选择数据集格式  dataset_root: data/PaddleSeg/camvid #选择数据集路径  train_path: data/PaddleSeg/camvid/train_list.txt #选择数据集list  num_classes: 12 #指定目标的类别个数（背景也算为一类）  transforms: #数据预处理/增强的方式    - type: Resize #送入网络之前需要进行resize      target_size: [512, 512] #将原图resize成512*512在送入网络    - type: RandomHorizontalFlip #采用水平反转的方式进行数据增强    - type: Normalize #图像进行归一化  mode: trainval_dataset: #验证数据设置  type: Dataset #选择数据集格式  dataset_root: data/PaddleSeg/camvid #选择数据集路径  val_path: data/PaddleSeg/camvid/val_list.txt #选择数据集list  num_classes: 12 #指定目标的类别个数（背景也算为一类）  transforms: #数据预处理/增强的方式    - type: Resize  #将原图resize成512*512在送入网络      target_size: [512, 512]  #将原图resize成512*512在送入网络    - type: Normalize #图像进行归一化  mode: valoptimizer: #设定优化器的类型  type: sgd #采用SGD（Stochastic Gradient Descent）随机梯度下降方法为优化器  momentum: 0.9 #动量  weight_decay: 4.0e-5 #权值衰减，使用的目的是防止过拟合learning_rate: #设定学习率  value: 0.1  #初始学习率  decay:    type: poly  #采用poly作为学习率衰减方式。    power: 0.9  #衰减率    end_lr: 0   #最终学习率loss: #设定损失函数的类型  types:    - type: CrossEntropyLoss #损失函数类型  coef: [1]model: #模型说明  type: SegNet  #设定模型类别  num_classes: 12

登录后复制

3.3 开始训练

快速开始命令介绍：

cd PaddleSegpython train.py \ # 使用 PaddleSeg 进行训练       --config config.yml \ # 配置文件路径       --do_eval \ # 在保存模型时验证       --use_vdl \ # 使用 visualdl 保存日志       --save_interval 500 \ # 保存模型的频率       --save_dir output # 模型及日志保存路径

登录后复制

3.4 训练过程展示

训练日志可以在 github 项目下载：https://github.com/stuartchen1949/segnet_paddle/tree/main/log_upload
下面截取部分信息展示：基于 PaddlePaddle 框架的 SegNet 论文复现 - 游乐网

基于 PaddlePaddle 框架的 SegNet 论文复现 - 游乐网

3.5 调参技巧说明

最好是按照原论文的超参数进行复现，这样工作量低
在本论文中，bs=12，lr=0.1，使用 SGD 方法，动量为0.9 但是也可以另辟蹊径，比如：分阶段训练，每个阶段设置学习率不同

3.6 其他说明

截至2024年8月18日，在解码器的 unpool 过程中，Paddle 并无对应的 API，所以在本次复现使用的是 Paddle 的 interpolate 函数，插值方式设置的是“双三次插值”

项目介绍

项目背景

本项目为“飞桨论文复现挑战赛（第四期）”参赛作品，比赛链接：https://aistudio.baidu.com/aistudio/competition/detail/106
目的是复现论文“SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”

项目结果

基于 PaddlePaddle 框架的 SegNet 论文复现 - 游乐网

结果说明

原论文使用“camvid 11类”数据集达到 miou 精度 60.1% ，我们的结果为 0.5553/11*12=0.605 >60.1% （因为最后一类是void，原论文并未计算，需要消除这个影响）

项目实现思路

大方向

1.模型搭建

2.损失函数设计

3.调参

模型搭建

由于属于语义分割问题，故使用 Paddle 的 PaddleSeg 模块进行快速开发
PaddleSeg 快速指南：https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.2/docs/quick_start.md
为了实现复现任务，我们只需要：
1.定义模型并添加至 PaddleSeg 的模型库中
2.计算出交叉熵的权值并修改 PaddleSeg 的交叉熵函数.py
3.创建 config.py 文件设置超参数

损失函数设计

原论文使用的是“median frequency weight”的加权交叉熵进行计算，每一类的 weight=median(weights)/weights

调参

按照原论文的数据进行设置即可

我们使用的 config.yml 文件内容如下

batch_size: 12iters: 10000train_dataset:  type: Dataset  dataset_root: PaddleSeg/camvid  train_path: PaddleSeg/camvid/train_list.txt  num_classes: 12  transforms:    # - type: Resize    #   target_size: [512, 512]    # - type: RandomHorizontalFlip    - type: Normalize  mode: trainval_dataset:  type: Dataset  dataset_root: PaddleSeg/camvid  val_path: PaddleSeg/camvid/val_list.txt  num_classes: 12  transforms:    # - type: Resize    #   target_size: [512, 512]    - type: Normalize  mode: valoptimizer:  type: sgd  momentum: 0.9  weight_decay: 4.0e-5lr_scheduler:  type: PolynomialDecay  learning_rate: 0.1  end_lr: 0  power: 0.9loss:  types:    - type: CrossEntropyLoss  coef: [1]model:  type: SegNet  num_classes: 12

登录后复制

我们需要修改 PaddleSeg 中的损失函数 cross_entropy_loss.py ，修改后如下

import paddlefrom paddle import nnimport paddle.nn.functional as Ffrom paddleseg.cvlibs import manager@manager.LOSSES.add_componentclass CrossEntropyLoss(nn.Layer):    """    Implements the cross entropy loss function.    Args:        weight (tuple|list|ndarray|Tensor, optional): A manual rescaling weight            given to each class. Its length must be equal to the number of classes.            Default ``None``.        ignore_index (int64, optional): Specifies a target value that is ignored            and does not contribute to the input gradient. Default ``255``.        top_k_percent_pixels (float, optional): the value lies in [0.0, 1.0]. When its value < 1.0, only compute the loss for            the top k percent pixels (e.g., the top 20% pixels). This is useful for hard pixel mining. Default ``1.0``.        data_format (str, optional): The tensor format to use, 'NCHW' or 'NHWC'. Default ``'NCHW'``.    """    def __init__(self,                 weight=                #  None,                #  [0.49470329175952255,                #  0.35828960644888747,                #  8.478075684728779,                #  0.2632281509876036,                #  1.8575191953843206,                #  0.8569813461262166,                #  7.10457223748049,                #  7.395517740818225,                #  1.4206921357883606,                #  13.036496170307238,                #  28.57158303913671,                # #  2.1105473453421433                #  0],                 [0.5183676152953121,                 0.3754285285214713,                 8.883627718253265,                 0.2758197715255286,                 1.946374345422177,                 0.8979753806852697,                 7.444422201714914,                 7.749285195873823,                 1.4886514942686155,                 13.660102013013443,                 29.938315778195324,                 0],                 ignore_index=11,                 top_k_percent_pixels=1.0,                 data_format='NCHW'):        super(CrossEntropyLoss, self).__init__()        # if weight is not None:        #     weight = paddle.to_tensor(weight, dtype='float32')        if weight is not None:            self.weight = paddle.to_tensor(weight, dtype='float32')            long_weight = weight + [0] * (256 - len(weight))            self.long_weight = paddle.to_tensor(long_weight, dtype='float32')        else:            self.weight = None            self.long_weight = None        # self.weight = weight        self.ignore_index = 11        self.top_k_percent_pixels = top_k_percent_pixels        self.EPS = 1e-8        self.data_format = data_format    def forward(self, logit, label, semantic_weights=None):        """        Forward computation.        Args:            logit (Tensor): Logit tensor, the data type is float32, float64. Shape is                (N, C), where C is number of classes, and if shape is more than 2D, this                is (N, C, D1, D2,..., Dk), k >= 1.            label (Tensor): Label tensor, the data type is int64. Shape is (N), where each                value is 0 <= label[i] <= C-1, and if shape is more than 2D, this is                (N, D1, D2,..., Dk), k >= 1.            semantic_weights (Tensor, optional): Weights about loss for each pixels, shape is the same as label. Default: None.        """        channel_axis = 1 if self.data_format == 'NCHW' else -1        if self.weight is not None and logit.shape[channel_axis] != len(                self.weight):            raise ValueError(                'The number of weights = {} must be the same as the number of classes = {}.'                .format(len(self.weight), logit.shape[1]))        logit = paddle.transpose(logit, [0, 2, 3, 1])        # logit = paddle.transpose(logit, [0, 2, 1])        loss = F.cross_entropy(            logit,            label,            ignore_index=self.ignore_index,            reduction='none',            # weight=self.weight            weight=self.long_weight)        mask = label != self.ignore_index        mask = paddle.cast(mask, 'float32')        loss = loss * mask        if semantic_weights is not None:            loss = loss * semantic_weights        if self.weight is not None:            _one_hot = F.one_hot(label, logit.shape[-1])            coef = paddle.sum(_one_hot * self.weight, axis=-1)        else:            coef = paddle.ones_like(label)        label.stop_gradient = True        mask.stop_gradient = True        if self.top_k_percent_pixels == 1.0:            avg_loss = paddle.mean(loss) / (paddle.mean(mask * coef) + self.EPS)            return avg_loss        loss = loss.reshape((-1, ))        top_k_pixels = int(self.top_k_percent_pixels * loss.numel())        loss, indices = paddle.topk(loss, top_k_pixels)        coef = coef.reshape((-1, ))        coef = paddle.gather(coef, indices)        coef.stop_gradient = True        return loss.mean() / (paddle.mean(coef) + self.EPS)

登录后复制

添加我们的模型

1. 将 my_model.py 拷贝至 PaddleSeg/paddleseg/model

2. 修改 init.py

项目使用

本项目基于 PaddleSeg 开发，所以使用过程简单，下面进行逐步介绍

0. 解压数据集并进行预处理

1. 下载 PaddleSeg

2. 将相应配置文件修改（本项目采用 cp 命令直接对原文件进行覆盖）

3. 运行训练即可

4. 运行验证即可

0. 解压数据集并进行预处理

In [ ]

# 解压数据!unzip -oq /home/aistudio/data/data79232/camvid.zip

登录后复制 In [ ]

# 创建listimport osdir = "camvid"%cd camvid!touch train_list.txt!touch val_list.txt%cdnames = os.listdir(os.path.join(dir, "train"))l = len(names)print("总共%d个数据" %l)n = 0for name in names:    with open("camvid/train_list.txt","r+") as f:        f.read()        # t = "camvid/train/" + name + " camvid/trainannot/" + name + "\n"        t = "train/" + name + " trainannot/" + name + "\n"        f.write(t)    n+=1    # with open("train_list.txt","r+") as f:    #     f.read()    #     t = "camvid/train/" + name + " camvid/trainannot/" + name + "\n"    #     f.write(t)    # n+=1    print("已写入%d路径" %n)dir = "camvid"names = os.listdir(os.path.join(dir, "val"))l = len(names)print("总共%d个数据" %l)n = 0for name in names:    with open("camvid/val_list.txt","r+") as f:        f.read()        # t = "camvid/val/" + name + " camvid/valannot/" + name + "\n"        t = "val/" + name + " valannot/" + name + "\n"        f.write(t)    n+=1    # with open("val_list.txt","r+") as f:    #     f.read()    #     t = "camvid/val/" + name + " camvid/valannot/" + name + "\n"    #     f.write(t)    # n+=1    print("已写入%d路径" %n)dir = "camvid"# %cd camvid!touch train_list.txt# !touch val_list.txt# %cdnames = os.listdir(os.path.join(dir, "train"))l = len(names)print("总共%d个数据" %l)n = 0for name in names:    with open("train_list.txt","r+") as f:        f.read()        t = "camvid/train/" + name + " camvid/trainannot/" + name + "\n"        f.write(t)    n+=1    print("已写入%d路径" %n)

登录后复制 In [ ]

# 平衡数据集 lossimport cv2 as cvimport numpy as nppaths = open("train_list.txt", "r")CLASS_NUM = 11SUM = [[] for i in range(CLASS_NUM)]SUM_ = 0for line in paths:    line.rstrip("\n")    line.lstrip("\n")    path = line.split()    img = cv.imread(path[1], 0)    img_np = np.array(img)    for i in range(CLASS_NUM):        SUM[i].append(np.sum((img_np == i)))for index, iter in enumerate(SUM):    print("类别{}的数量：".format(index), sum(iter))for iter in SUM:    SUM_ += sum(iter)median = 1/CLASS_NUMfor index, iter in enumerate(SUM):    print("weight_{}:".format(index), median/(sum(iter)/SUM_))

登录后复制

1. 下载 PaddleSeg

In [ ]

# 安装 paddleseg!git clone https://gitee.com/paddlepaddle/PaddleSeg.git!pip install PaddleSeg

登录后复制

2. 将相应配置文件修改（本项目采用 cp 命令直接对原文件进行覆盖）

In [ ]

# 覆写损失函数!cp cross_entropy_loss.py PaddleSeg/paddleseg/models/losses/# 添加自定义 model!cp my_model.py PaddleSeg/paddleseg/models!cp __init__.py PaddleSeg/paddleseg/models

登录后复制

3. 运行训练即可

In [ ]

# 训练一 3000 iter ，模型保存至output!python PaddleSeg/train.py \       --config config.yml \       --do_eval \       --use_vdl \       --save_interval 100 \       --log_iters 1 \       --save_dir output \       --iters 3000 \       --keep_checkpoint_max 10 \       --batch_size 12

登录后复制 In [ ]

# 训练二 36000 iter （只训练到了12111），从output读取模型，保存至output1!python PaddleSeg/train.py \       --config config.yml \       --do_eval \       --use_vdl \       --save_interval 1 \       --log_iters 1 \       --save_dir output_1 \       --iters 36000 \       --keep_checkpoint_max 10 \       --resume_model output/iter_3000 \       --batch_size 12

登录后复制

4. 运行验证即可

In [ ]

# 计算miou，选取output1最好模型!python PaddleSeg/val.py \       --config config.yml \       --model_path output_1/best_model/model.pdparams    #    --model_path model.pdparams # 复制到/home/aistudio文件夹了

登录后复制

全站导航

首页

单机

手游

电竞

资讯

专题

排行

基于 PaddlePaddle 框架的 SegNet 论文复现

论文简介

1. 网络结构

2. Loss

3. Experiments

3.1 Ai Studio 环境配置

3.2 训练、验证参数设置

3.3 开始训练

3.4 训练过程展示

3.5 调参技巧说明

3.6 其他说明

项目介绍

项目背景

项目结果

结果说明

项目实现思路

大方向

1.模型搭建

2.损失函数设计

3.调参

模型搭建

损失函数设计

调参

我们使用的 config.yml 文件内容如下

我们需要修改 PaddleSeg 中的损失函数 cross_entropy_loss.py ，修改后如下

添加我们的模型

1. 将 my_model.py 拷贝至 PaddleSeg/paddleseg/model

2. 修改 __init__.py

项目使用

本项目基于 PaddleSeg 开发，所以使用过程简单，下面进行逐步介绍

0. 解压数据集并进行预处理

1. 下载 PaddleSeg

2. 将相应配置文件修改（本项目采用 cp 命令直接对原文件进行覆盖）

3. 运行训练即可

4. 运行验证即可

0. 解压数据集并进行预处理

1. 下载 PaddleSeg

2. 将相应配置文件修改（本项目采用 cp 命令直接对原文件进行覆盖）

3. 运行训练即可

4. 运行验证即可

相关阅读

MORE

热门合集

MORE

MORE

文章资讯

MORE

热门资讯

MORE

变态游戏推荐

MORE

最新专题

MORE

热门游戏推荐

MORE

手机游戏

端游游戏

文章资讯

电竞游戏

补丁工具

2. 修改 init.py