involution:大家一起内卷起来吧
本文介绍用Layer方式搭建involution算子,以此魔改ResNet打造RedNet模型,已加入【Paddle-Image-Models】项目,含转换后的最新预训练参数,精度基本对齐。还展示了算子和模型的搭建代码、测试情况及精度验证结果,RedNet性能和效率优于ResNet等模型。

引入
真·内卷无处不在,现在神经网络也能内卷了这次项目就用 Layer 的方式搭建一下 involution 算子,并且使用这个算子参照论文所述魔改一下 ResNet 打造一个新模型 RedNet当然这个模型也已经添加到了 【Paddle-Image-Models】 项目中了,包含转换之后的最新预训练参数,精度基本对齐好让大家能够尽快在神经网络里面内卷起来相关资料
论文:【Involution: Inverting the Inherence of Convolution for Visual Recognition】
免费影视、动漫、音乐、游戏、小说资源长期稳定更新! 👉 点此立即查看 👈
代码:【d-li14/involution】
论文概要
提出了一种新的神经网络算子(operator或op)称为 involution,它比 convolution 更轻量更高效,形式上比 self-attention 更加简洁,可以用在各种视觉任务的模型上取得精度和效率的双重提升。通过 involution 的结构设计,我们能够以统一的视角来理解经典的卷积操作和近来流行的自注意力操作。算子和模型搭建
导入必要的包
In [1]import paddleimport paddle.nn as nnfrom paddle.vision.models import resnet登录后复制
involution(内卷)
针对输入 feature map 的一个坐标点上的特征向量:先通过 (FC-BN-ReLU-FC) 和 reshape (channel-to-space) 变换展开成 kernel 的形状从而得到这个坐标点上对应的 involution kernel再和输入 feature map 上这个坐标点邻域的特征向量进行 Multiply-Add 得到最终输出的 feature mapinvolution 示意图如下:
In [2]class involution(nn.Layer): def __init__(self, channels, kernel_size, stride): super(involution, self).__init__() self.kernel_size = kernel_size self.stride = stride self.channels = channels reduction_ratio = 4 self.group_channels = 16 self.groups = self.channels // self.group_channels self.conv1 = nn.Sequential( ('conv', nn.Conv2D( in_channels=channels, out_channels=channels // reduction_ratio, kernel_size=1, bias_attr=False )), ('bn', nn.BatchNorm2D(channels // reduction_ratio)), ('activate', nn.ReLU()) ) self.conv2 = nn.Sequential( ('conv', nn.Conv2D( in_channels=channels // reduction_ratio, out_channels=kernel_size**2 * self.groups, kernel_size=1, stride=1)) ) if stride > 1: self.avgpool = nn.AvgPool2D(stride, stride) def forward(self, x): weight = self.conv2(self.conv1( x if self.stride == 1 else self.avgpool(x))) b, c, h, w = weight.shape weight = weight.reshape(( b, self.groups, self.kernel_size**2, h, w)).unsqueeze(2) out = nn.functional.unfold( x, self.kernel_size, strides=self.stride, paddings=(self.kernel_size-1)//2, dilations=1) out = out.reshape( (b, self.groups, self.group_channels, self.kernel_size**2, h, w)) out = (weight * out).sum(axis=3).reshape((b, self.channels, h, w)) return out登录后复制 算子测试
In [3]inv = involution(128, 7, 1)paddle.summary(inv, (1, 128, 64, 64))out = inv(paddle.randn((1, 128, 64, 64)))print(out.shape)登录后复制
--------------------------------------------------------------------------- Layer (type) Input Shape Output Shape Param # =========================================================================== Conv2D-1 [[1, 128, 64, 64]] [1, 32, 64, 64] 4,096 BatchNorm2D-1 [[1, 32, 64, 64]] [1, 32, 64, 64] 128 ReLU-1 [[1, 32, 64, 64]] [1, 32, 64, 64] 0 Conv2D-2 [[1, 32, 64, 64]] [1, 392, 64, 64] 12,936 ===========================================================================Total params: 17,160Trainable params: 17,032Non-trainable params: 128---------------------------------------------------------------------------Input size (MB): 2.00Forward/backward pass size (MB): 15.25Params size (MB): 0.07Estimated Total Size (MB): 17.32---------------------------------------------------------------------------[1, 128, 64, 64]登录后复制
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:648: UserWarning: When training, we now always track global mean and variance. "When training, we now always track global mean and variance.")登录后复制
RedNet
使用 involution 替换 ResNet BottleneckBlock 中的 3x3 convolution 得到了一族新的骨干网络 RedNet性能和效率优于 ResNet 和其他 self-attention 做 op 的 SOTA 模型模型具体信息如下:In [4]class BottleneckBlock(resnet.BottleneckBlock): def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, base_width=64, dilation=1, norm_layer=None): super(BottleneckBlock, self).__init__(inplanes, planes, stride, downsample, groups, base_width, dilation, norm_layer) width = int(planes * (base_width / 64.)) * groups self.conv2 = involution(width, 7, stride) class RedNet(resnet.ResNet): def __init__(self, block, depth, num_classes=1000, with_pool=True): super(RedNet, self).__init__(block=block, depth=50, num_classes=num_classes, with_pool=with_pool) layer_cfg = { 26: [1, 2, 4, 1], 38: [2, 3, 5, 2], 50: [3, 4, 6, 3], 101: [3, 4, 23, 3], 152: [3, 8, 36, 3] } layers = layer_cfg[depth] self.conv1 = None self.bn1 = None self.relu = None self.inplanes = 64 self.stem = nn.Sequential( nn.Sequential( ('conv', nn.Conv2D( in_channels=3, out_channels=self.inplanes // 2, kernel_size=3, stride=2, padding=1, bias_attr=False )), ('bn', nn.BatchNorm2D(self.inplanes // 2)), ('activate', nn.ReLU()) ), involution(self.inplanes // 2, 3, 1), nn.BatchNorm2D(self.inplanes // 2), nn.ReLU(), nn.Sequential( ('conv', nn.Conv2D( in_channels=self.inplanes // 2, out_channels=self.inplanes, kernel_size=3, stride=1, padding=1, bias_attr=False )), ('bn', nn.BatchNorm2D(self.inplanes)), ('activate', nn.ReLU()) ) ) self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.layer3 = self._make_layer(block, 256, layers[2], stride=2) self.layer4 = self._make_layer(block, 512, layers[3], stride=2) def forward(self, x): x = self.stem(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) if self.with_pool: x = self.avgpool(x) if self.num_classes > 0: x = paddle.flatten(x, 1) x = self.fc(x) return x登录后复制 模型测试
In [5]model = RedNet(BottleneckBlock, 26)paddle.summary(model, (1, 3, 224, 224))out = model(paddle.randn((1, 3, 224, 224)))print(out.shape)登录后复制
------------------------------------------------------------------------------- Layer (type) Input Shape Output Shape Param # =============================================================================== Conv2D-88 [[1, 3, 224, 224]] [1, 32, 112, 112] 864 BatchNorm2D-71 [[1, 32, 112, 112]] [1, 32, 112, 112] 128 ReLU-35 [[1, 32, 112, 112]] [1, 32, 112, 112] 0 Conv2D-89 [[1, 32, 112, 112]] [1, 8, 112, 112] 256 BatchNorm2D-72 [[1, 8, 112, 112]] [1, 8, 112, 112] 32 ReLU-36 [[1, 8, 112, 112]] [1, 8, 112, 112] 0 Conv2D-90 [[1, 8, 112, 112]] [1, 18, 112, 112] 162 involution-18 [[1, 32, 112, 112]] [1, 32, 112, 112] 0 BatchNorm2D-73 [[1, 32, 112, 112]] [1, 32, 112, 112] 128 ReLU-37 [[1, 32, 112, 112]] [1, 32, 112, 112] 0 Conv2D-91 [[1, 32, 112, 112]] [1, 64, 112, 112] 18,432 BatchNorm2D-74 [[1, 64, 112, 112]] [1, 64, 112, 112] 256 ReLU-38 [[1, 64, 112, 112]] [1, 64, 112, 112] 0 MaxPool2D-1 [[1, 64, 112, 112]] [1, 64, 56, 56] 0 Conv2D-93 [[1, 64, 56, 56]] [1, 64, 56, 56] 4,096 BatchNorm2D-76 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 ReLU-39 [[1, 256, 56, 56]] [1, 256, 56, 56] 0 Conv2D-96 [[1, 64, 56, 56]] [1, 16, 56, 56] 1,024 BatchNorm2D-79 [[1, 16, 56, 56]] [1, 16, 56, 56] 64 ReLU-40 [[1, 16, 56, 56]] [1, 16, 56, 56] 0 Conv2D-97 [[1, 16, 56, 56]] [1, 196, 56, 56] 3,332 involution-19 [[1, 64, 56, 56]] [1, 64, 56, 56] 0 BatchNorm2D-77 [[1, 64, 56, 56]] [1, 64, 56, 56] 256 Conv2D-95 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-78 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 Conv2D-92 [[1, 64, 56, 56]] [1, 256, 56, 56] 16,384 BatchNorm2D-75 [[1, 256, 56, 56]] [1, 256, 56, 56] 1,024 BottleneckBlock-17 [[1, 64, 56, 56]] [1, 256, 56, 56] 0 Conv2D-99 [[1, 256, 56, 56]] [1, 128, 56, 56] 32,768 BatchNorm2D-81 [[1, 128, 56, 56]] [1, 128, 56, 56] 512 ReLU-41 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 AvgPool2D-4 [[1, 128, 56, 56]] [1, 128, 28, 28] 0 Conv2D-102 [[1, 128, 28, 28]] [1, 32, 28, 28] 4,096 BatchNorm2D-84 [[1, 32, 28, 28]] [1, 32, 28, 28] 128 ReLU-42 [[1, 32, 28, 28]] [1, 32, 28, 28] 0 Conv2D-103 [[1, 32, 28, 28]] [1, 392, 28, 28] 12,936 involution-20 [[1, 128, 56, 56]] [1, 128, 28, 28] 0 BatchNorm2D-82 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-101 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-83 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 Conv2D-98 [[1, 256, 56, 56]] [1, 512, 28, 28] 131,072 BatchNorm2D-80 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 BottleneckBlock-18 [[1, 256, 56, 56]] [1, 512, 28, 28] 0 Conv2D-104 [[1, 512, 28, 28]] [1, 128, 28, 28] 65,536 BatchNorm2D-85 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-43 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-107 [[1, 128, 28, 28]] [1, 32, 28, 28] 4,096 BatchNorm2D-88 [[1, 32, 28, 28]] [1, 32, 28, 28] 128 ReLU-44 [[1, 32, 28, 28]] [1, 32, 28, 28] 0 Conv2D-108 [[1, 32, 28, 28]] [1, 392, 28, 28] 12,936 involution-21 [[1, 128, 28, 28]] [1, 128, 28, 28] 0 BatchNorm2D-86 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-106 [[1, 128, 28, 28]] [1, 512, 28, 28] 65,536 BatchNorm2D-87 [[1, 512, 28, 28]] [1, 512, 28, 28] 2,048 BottleneckBlock-19 [[1, 512, 28, 28]] [1, 512, 28, 28] 0 Conv2D-110 [[1, 512, 28, 28]] [1, 256, 28, 28] 131,072 BatchNorm2D-90 [[1, 256, 28, 28]] [1, 256, 28, 28] 1,024 ReLU-45 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 AvgPool2D-5 [[1, 256, 28, 28]] [1, 256, 14, 14] 0 Conv2D-113 [[1, 256, 14, 14]] [1, 64, 14, 14] 16,384 BatchNorm2D-93 [[1, 64, 14, 14]] [1, 64, 14, 14] 256 ReLU-46 [[1, 64, 14, 14]] [1, 64, 14, 14] 0 Conv2D-114 [[1, 64, 14, 14]] [1, 784, 14, 14] 50,960 involution-22 [[1, 256, 28, 28]] [1, 256, 14, 14] 0 BatchNorm2D-91 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-112 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-92 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 Conv2D-109 [[1, 512, 28, 28]] [1, 1024, 14, 14] 524,288 BatchNorm2D-89 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 BottleneckBlock-20 [[1, 512, 28, 28]] [1, 1024, 14, 14] 0 Conv2D-115 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-94 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-47 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-118 [[1, 256, 14, 14]] [1, 64, 14, 14] 16,384 BatchNorm2D-97 [[1, 64, 14, 14]] [1, 64, 14, 14] 256 ReLU-48 [[1, 64, 14, 14]] [1, 64, 14, 14] 0 Conv2D-119 [[1, 64, 14, 14]] [1, 784, 14, 14] 50,960 involution-23 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-95 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-117 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-96 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 BottleneckBlock-21 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-120 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-98 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-49 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-123 [[1, 256, 14, 14]] [1, 64, 14, 14] 16,384 BatchNorm2D-101 [[1, 64, 14, 14]] [1, 64, 14, 14] 256 ReLU-50 [[1, 64, 14, 14]] [1, 64, 14, 14] 0 Conv2D-124 [[1, 64, 14, 14]] [1, 784, 14, 14] 50,960 involution-24 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-99 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-122 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-100 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 BottleneckBlock-22 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-125 [[1, 1024, 14, 14]] [1, 256, 14, 14] 262,144 BatchNorm2D-102 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 ReLU-51 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-128 [[1, 256, 14, 14]] [1, 64, 14, 14] 16,384 BatchNorm2D-105 [[1, 64, 14, 14]] [1, 64, 14, 14] 256 ReLU-52 [[1, 64, 14, 14]] [1, 64, 14, 14] 0 Conv2D-129 [[1, 64, 14, 14]] [1, 784, 14, 14] 50,960 involution-25 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 BatchNorm2D-103 [[1, 256, 14, 14]] [1, 256, 14, 14] 1,024 Conv2D-127 [[1, 256, 14, 14]] [1, 1024, 14, 14] 262,144 BatchNorm2D-104 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 4,096 BottleneckBlock-23 [[1, 1024, 14, 14]] [1, 1024, 14, 14] 0 Conv2D-131 [[1, 1024, 14, 14]] [1, 512, 14, 14] 524,288 BatchNorm2D-107 [[1, 512, 14, 14]] [1, 512, 14, 14] 2,048 ReLU-53 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 0 AvgPool2D-6 [[1, 512, 14, 14]] [1, 512, 7, 7] 0 Conv2D-134 [[1, 512, 7, 7]] [1, 128, 7, 7] 65,536 BatchNorm2D-110 [[1, 128, 7, 7]] [1, 128, 7, 7] 512 ReLU-54 [[1, 128, 7, 7]] [1, 128, 7, 7] 0 Conv2D-135 [[1, 128, 7, 7]] [1, 1568, 7, 7] 202,272 involution-26 [[1, 512, 14, 14]] [1, 512, 7, 7] 0 BatchNorm2D-108 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 Conv2D-133 [[1, 512, 7, 7]] [1, 2048, 7, 7] 1,048,576 BatchNorm2D-109 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 Conv2D-130 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 2,097,152 BatchNorm2D-106 [[1, 2048, 7, 7]] [1, 2048, 7, 7] 8,192 BottleneckBlock-24 [[1, 1024, 14, 14]] [1, 2048, 7, 7] 0 AdaptiveAvgPool2D-1 [[1, 2048, 7, 7]] [1, 2048, 1, 1] 0 Linear-1 [[1, 2048]] [1, 1000] 2,049,000 ===============================================================================Total params: 9,264,318Trainable params: 9,202,014Non-trainable params: 62,304-------------------------------------------------------------------------------Input size (MB): 0.57Forward/backward pass size (MB): 188.62Params size (MB): 35.34Estimated Total Size (MB): 224.53-------------------------------------------------------------------------------[1, 1000]登录后复制
模型精度验证
使用 Paddle-Image-Models 来进行模型精度验证安装 PPIM
In [ ]!pip install ppim==1.0.1 -i https://pypi.python.org/pypi登录后复制
解压数据集
解压 ILSVRC2012 验证集In [ ]# 解压数据集!mkdir ~/data/ILSVRC2012!tar -xf ~/data/data68594/ILSVRC2012_img_val.tar -C ~/data/ILSVRC2012登录后复制
模型评估
使用 ILSVRC2012 验证集进行精度验证In [ ]import osimport cv2import numpy as npimport paddleimport paddle.vision.transforms as Tfrom ppim import rednet26, rednet38, rednet50, rednet101, rednet152# 构建数据集# backend cv2class ILSVRC2012(paddle.io.Dataset): def __init__(self, root, label_list, transform): self.transform = transform self.root = root self.label_list = label_list self.load_datas() def load_datas(self): self.imgs = [] self.labels = [] with open(self.label_list, 'r') as f: for line in f: img, label = line[:-1].split(' ') self.imgs.append(os.path.join(self.root, img)) self.labels.append(int(label)) def __getitem__(self, idx): label = self.labels[idx] image = self.imgs[idx] image = cv2.imread(image) image = self.transform(image) return image.astype('float32'), np.array(label).astype('int64') def __len__(self): return len(self.imgs)# 配置模型model, val_transforms = rednet26(pretrained=True)model = paddle.Model(model)model.prepare(metrics=paddle.metric.Accuracy(topk=(1, 5)))# 配置数据集val_dataset = ILSVRC2012('data/ILSVRC2012', transform=val_transforms, label_list='data/data68594/val_list.txt')# 模型验证model.evaluate(val_dataset, batch_size=16)登录后复制 {'acc_top1': 0.75956, 'acc_top5': 0.9319}登录后复制
游乐网为非赢利性网站,所展示的游戏/软件/文章内容均来自于互联网或第三方用户上传分享,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系youleyoucom@outlook.com。
同类文章
AI 的记忆不是硬盘——从 40 个真实 Bug 说起
这是 AI 认知架构实战笔记 系列的第 2 篇 上一篇我们聊了「给 AI 写灵魂文件」这件事,这一篇,我们来看看,当这份灵魂文件真正运转起来之后,现实究竟会给我们带来多少“惊喜”——或者更准确地说,是漏洞。项目名为 WorkBuddy-Configure,已部署在 gitee 和 gitcode 上
OpenClaw给每个Agent单独指定workspace
OpenClaw中为每个Agent配置独立工作区的最佳实践 在大模型智能体协作平台上,实现多个Agent之间的文件隔离是确保项目管理井然有序的关键需求。如果您正在使用OpenClaw平台,为不同角色的智能体分配专属工作空间可以有效避免文件冲突、权限混乱等问题。本指南将详细介绍在OpenClaw中为每
OpenClaw更新操作
前言 对于 OpenClaw 的忠实用户而言,每一次版本迭代都意义非凡。新功能密集、改动幅度大是它的显著特点,这固然令人欣喜,但伴随而来的更新操作也时常会遇到一些预料外的状况。本文旨在系统梳理我们在升级过程中遇到的常见问题与解决方案,帮助您在下次更新时更加顺畅,有效规避不必要的麻烦。 一、OpenC
openclaw源码
项目资源与开源社区 对于希望深入研究OpenClaw技术生态的开发者与研究者,以下几个核心的开源仓库提供了关键的切入点和持续更新的资源集合。 首先,OpenClaw项目的主仓库位于: https: github com openclaw openclaw 这里是所有核心代码、文档和官方进展的枢纽,
关停 Sora 后 OpenAI 转身收购 TBPN 播客,亲自下场做媒体
OpenAI关闭Sora后战略转向:收购TBPN播客,深度布局内容生态 四月初的科技界新闻不断,一则来自科技媒体9to5Mac的报道引发了行业的强烈关注。OpenAI在近期宣布正式收购知名科技商业播客品牌The Browser Pane。这一战略动作紧随其视频应用Sora的停止运营之后,被外界普遍视
- 日榜
- 周榜
- 月榜
相关攻略
2015-03-10 11:25
2015-03-10 11:05
2021-08-04 13:30
2015-03-10 11:22
2015-03-10 12:39
2022-05-16 18:57
2025-05-23 13:43
2025-05-23 14:01
热门教程
- 游戏攻略
- 安卓教程
- 苹果教程
- 电脑教程

