基于姿态语音打造超级玛丽新玩法
2024 PaddlePaddle Hackathon 飞桨黑客马拉松,是由飞桨联合深度学习技术及应用国家工程实验室主办,联合 OpenVINO、MLFlow、KubeFlow、TVM 等开源项目共同出品,面向全球开发者的深度学习领域编程活动,旨在鼓励开发者了解与参与深度学习开源项目。

飞桨黑客马拉松比赛介绍
2024 PaddlePaddle Hackathon 飞桨黑客马拉松,是由飞桨联合深度学习技术及应用国家工程实验室主办,联合 OpenVINO、MLFlow、KubeFlow、TVM 等开源项目共同出品,面向全球开发者的深度学习领域编程活动,旨在鼓励开发者了解与参与深度学习开源项目。

参赛项目介绍
本项目基于姿态估计和语音关键词分类模型打造了一款简单实用的人机交互新玩法。
免费影视、动漫、音乐、游戏、小说资源长期稳定更新! 👉 点此立即查看 👈
项目演示基于PyGame超级玛丽(PS: 有兴趣的小伙伴可以尝试其他好玩的游戏), 通过姿态估计模型提取几何太特征和运动特征翻译人体姿势指令,整个过程运动量还是比较大,很适合娱乐的同时减肥健身; 另一方面运动累了也可以切换到语音模式,让人机交互更接近真实感。
基于本项目小伙伴还可以发挥更多的想象,比如练习外语,健身APP, 抑或是用PaddleGAN来点元宇宙的错觉,抑或是玩玩真机网友之类, 等等等等....
本项目的GitHub地址: https://github.com/thunder95/Play_Mario_With_PaddlePaddle
注意: 两天参赛时间现撸代码,还存在很多瑕疵,所以本项目还在持续优化过程中,欢迎大家提出宝贵的意见,互相学习交流。
B站视频体验如下:
b站视频链接:https://www.bilibili.com/video/BV1B64y1i7GM
功能模块

超级玛丽游戏
一款载着满满儿时记忆的游戏, 在GitHub已有大佬基于PyGame已经完美复现, 作者已经实现到了第4关。
GitHub地址: https://github.com/justinmeister/Mario-Level-1
本项目对于交互部分做了少量的修改, 原项目是通过PyGame监听的按键操作,在本项目中将其他模块的指令放到队列中替代按键信号。

人体关键点估计
因人机交互对模型推理的高实时性要求,调研过多个模型之后, 最终选型采用的是PaddleDetection开源的PicoDet-S-Pedestrian以及PP-TinyPose, 模型推理时间单帧20ms左右,速度和效果都能满足要求。
PP-TinyPose是PaddleDetecion针对移动端设备优化的实时姿态检测模型,可流畅地在移动端设备上执行多人姿态估计任务。借助PaddleDetecion自研的优秀轻量级检测模型PicoDet,我们同时提供了特色的轻量级垂类行人检测模型。
PP-TinyPose 链接: https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/keypoint/tiny_pose

考虑到额外的动作模型会增加指令的延迟,本项目只是将得到的关键点基于坐标信息进行简单的分类,基本也能满足需求。
In [ ]!git clone PaddleDetection%cd PaddleDetection!python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=outut_inference/picodet_s_192_pedestrian --keypoint_model_dir=outut_inference/tinypose_128x96 --image_file=demo/000000014439.webp --device=GPU登录后复制
语音分类训练
语音样本采集
目前AIStudio不支持在线采集,可以下载代码到本地运行:
!python speech_cmd_cls/generate_data.py
借助PyAudio第三方库, 上述语音采集脚本可自动录制声音,语音只需要采集游戏玩家7个关键字的声音,并以500ms间隔切割保存到对应目录,每个关键字大概录制2~3分钟就够了。时间充分的话,也可以按需扩充样本。
语音数据清洗
对于无声的、电流声的、或是听起来不清晰的录音片段,需要移动到第8个目录(名称: 其他)
语音数据预处理
借助第三方库librosa, 加载音频文件,提取melspectrogram特征,并过滤掉一些低分贝音频帧。
!python speech_cmd_cls/preprocess.py
ps: 文件夹下speech_cmd_cls/data是录制的作者的语音,方便大家测试。
In [ ]#数据预处理!unzip speech_cmd_cls.zip%cd speech_cmd_cls/!python preprocess.py登录后复制
/home/aistudio/speech_cmd_cls标签名: ['左', '右', '下', '停', '跑', '跳', '打', '其它']preprocess data finished登录后复制In [ ]
#简单搭建一个自定义带注意力的LSTM网络结构from paddle import nnclass SpeechCommandModel(nn.Layer): def __init__(self, num_classes=10): super(SpeechCommandModel, self).__init__() self.conv1 = nn.Conv2D(126, 10, (5, 1), padding="SAME") self.relu1 = nn.ReLU() self.bn1 = nn.BatchNorm2D(10) self.conv2 = nn.Conv2D(10, 1, (5, 1), padding="SAME") self.relu2 = nn.ReLU() self.bn2 = nn.BatchNorm2D(1) self.lstm1 = nn.LSTM(input_size=80, hidden_size=64, direction="bidirect") self.lstm2 = nn.LSTM(input_size=128, hidden_size=64, direction="bidirect") self.query = nn.Linear(128, 128) self.softmax = nn.Softmax(axis=-1) self.fc1 = nn.Linear(128, 64) self.fc1_relu = nn.ReLU() self.fc2 = nn.Linear(64, 32) self.classifier = nn.Linear(32, num_classes) self.cls_softmax = nn.Softmax(axis=-1) def forward(self, x): x = self.conv1(x) x = self.relu1(x) x = self.bn1(x) x = self.conv2(x) x = self.relu2(x) x = self.bn2(x) x = x.squeeze(axis=-1) x, _ = self.lstm1(x) x, _ = self.lstm2(x) x = x.squeeze(axis=1) q = self.query(x) attScores = paddle.matmul(q, x, transpose_y=True) attScores = self.softmax(attScores) attVector = paddle.matmul(attScores, x) output = self.fc1(attVector) output = self.fc1_relu(output) output = self.fc2(output) output = self.classifier(output) output = self.cls_softmax(output) return outputmodel = SpeechCommandModel(num_classes = 8)print(model)登录后复制
SpeechCommandModel( (conv1): Conv2D(126, 10, kernel_size=[5, 1], padding=SAME, data_format=NCHW) (relu1): ReLU() (bn1): BatchNorm2D(num_features=10, momentum=0.9, epsilon=1e-05) (conv2): Conv2D(10, 1, kernel_size=[5, 1], padding=SAME, data_format=NCHW) (relu2): ReLU() (bn2): BatchNorm2D(num_features=1, momentum=0.9, epsilon=1e-05) (lstm1): LSTM(80, 64 (0): BiRNN( (cell_fw): LSTMCell(80, 64) (cell_bw): LSTMCell(80, 64) ) ) (lstm2): LSTM(128, 64 (0): BiRNN( (cell_fw): LSTMCell(128, 64) (cell_bw): LSTMCell(128, 64) ) ) (query): Linear(in_features=128, out_features=128, dtype=float32) (softmax): Softmax(axis=-1) (fc1): Linear(in_features=128, out_features=64, dtype=float32) (fc1_relu): ReLU() (fc2): Linear(in_features=64, out_features=32, dtype=float32) (classifier): Linear(in_features=32, out_features=8, dtype=float32) (cls_softmax): Softmax(axis=-1))登录后复制
模型训练
使用飞桨的高层API对语音网络进行训练, 训练的准确率在95%左右
即使没有GPU在飞桨框架下训练这个小网络也非常的快。
!python speech_cmd_cls/train.py
In [18]!python train.py登录后复制
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import impThe loss value printed in the log is the current step, and the metric is the average value of previous steps.Epoch 1/20/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working return (isinstance(seq, collections.Sequence) and/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:653: UserWarning: When training, we now always track global mean and variance. "When training, we now always track global mean and variance.")step 193/193 [==============================] - loss: 1.2740 - acc: 0.9538 - 17ms/step Eval begin...step 22/22 [==============================] - loss: 1.6995 - acc: 0.9657 - 6ms/step Eval samples: 175Epoch 2/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9551 - 16ms/step Eval begin...step 22/22 [==============================] - loss: 1.5585 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 3/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9525 - 16ms/step Eval begin...step 22/22 [==============================] - loss: 1.4175 - acc: 0.9771 - 6ms/step Eval samples: 175Epoch 4/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9564 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.5593 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 5/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9538 - 13ms/step Eval begin...step 22/22 [==============================] - loss: 1.3246 - acc: 0.9714 - 5ms/step Eval samples: 175Epoch 6/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9447 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.5576 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 7/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9460 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.4488 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 8/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9525 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.7026 - acc: 0.9429 - 6ms/step Eval samples: 175Epoch 9/20step 193/193 [==============================] - loss: 1.7740 - acc: 0.9389 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.7024 - acc: 0.9486 - 6ms/step Eval samples: 175Epoch 10/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9460 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.5597 - acc: 0.9543 - 6ms/step Eval samples: 175Epoch 11/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9467 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.5596 - acc: 0.9657 - 6ms/step Eval samples: 175Epoch 12/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9506 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.5625 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 13/20step 193/193 [==============================] - loss: 1.7740 - acc: 0.9571 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.5593 - acc: 0.9657 - 6ms/step Eval samples: 175Epoch 14/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9525 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.6989 - acc: 0.9600 - 6ms/step Eval samples: 175Epoch 15/20step 193/193 [==============================] - loss: 1.7740 - acc: 0.9512 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.8454 - acc: 0.9543 - 6ms/step Eval samples: 175Epoch 16/20step 193/193 [==============================] - loss: 1.7740 - acc: 0.9473 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.7026 - acc: 0.9543 - 6ms/step Eval samples: 175Epoch 17/20step 193/193 [==============================] - loss: 1.2741 - acc: 0.9519 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.3661 - acc: 0.9771 - 6ms/step Eval samples: 175Epoch 18/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9590 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.4335 - acc: 0.9714 - 6ms/step Eval samples: 175Epoch 19/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9590 - 14ms/step Eval begin...step 22/22 [==============================] - loss: 1.6870 - acc: 0.9657 - 6ms/step Eval samples: 175Epoch 20/20step 193/193 [==============================] - loss: 1.2740 - acc: 0.9545 - 15ms/step Eval begin...step 22/22 [==============================] - loss: 1.6629 - acc: 0.9486 - 6ms/step Eval samples: 175登录后复制
模型评估和预测
训练完成可以对模型进行初步评估,也可以线下使用麦克风对模型效果进行实时验证
!python speech_cmd_cls/eval.py
!python speech_cmd_cls/realtime_infer.py
特别注意: 即使在验证集上训练出效果不错的模型,但是在这个小网络和小数据集上泛化能力相对较弱,当更换设备,更换说话人,或是更换到不同噪音背景的环境,效果可能会有些不理想。
In [20]!python eval.py登录后复制
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import impEval begin.../opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working return (isinstance(seq, collections.Sequence) andstep 3/3 - loss: 1.3763 - acc: 0.9543 - 27ms/stepEval samples: 175{'loss': [1.3763338], 'acc': 0.9542857142857143}登录后复制
游乐网为非赢利性网站,所展示的游戏/软件/文章内容均来自于互联网或第三方用户上传分享,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系youleyoucom@outlook.com。
同类文章
逼AI当山顶洞人!Claude防话痨插件爆火,网友:受够了AI废话
新智元报道编辑:元宇【新智元导读】一个让AI像原始人一样说话的插件,在HN上一夜爆火,冲破2w星。它的核心只是一条简单粗暴的prompt:删掉冠词、客套和一切废话,号称能省下75%的输出token。
季度利润翻 8 倍,最赚钱的「卖铲人」财报背后,内存涨价狂潮如何收场?
AI 时代最赚钱的公司,可能从来不是做 AI 的那个。作者|张勇毅编辑|靖宇淘金热里最稳赚的人,从来不是淘金的,是卖铲子的。这句老话在 2026 年的科技行业又应验了一次。只不过这次卖铲子的不是英伟
Claude Code Harness+龙虾科研团来了!金字塔分层架构+多智能体
Claw AI Lab团队量子位 | 公众号 QbitAI你还在一个人做科研吗?科研最难的,从来不是问题本身,而是一个想法从文献到实验再到写作,只能靠自己一点点往前推。一个人方向偏了没人提醒,遇到歧
让离线强化学习从「局部描摹」变「全局布局」丨ICLR'26
面对复杂连续任务的长程规划,现有的生成式离线强化学习方法往往会暴露短板。它们生成的轨迹经常陷入局部合理但全局偏航的窘境。它们太关注眼前的每一步,却忘了最终的目的地。针对这一痛点,厦门大学和香港科技大
美国犹他州启动新试点项目:AI为患者开具精神类药物处方
IT之家 4 月 5 日消息,据外媒 PC Mag 当地时间 4 月 4 日报道,美国医疗机构 Legion Health 在犹他州获得监管批准,启动一项试点项目,允许 AI 系统为患者开具精神类药
- 日榜
- 周榜
- 月榜
相关攻略
2015-03-10 11:25
2015-03-10 11:05
2021-08-04 13:30
2015-03-10 11:22
2015-03-10 12:39
2022-05-16 18:57
2025-05-23 13:43
2025-05-23 14:01
热门教程
- 游戏攻略
- 安卓教程
- 苹果教程
- 电脑教程
热门话题

