JoyCaption本地部署指南,支持SFW与NSFW图像描述
Joy Caption 快速上手 图像反推提示语模型最近热度很高,其中 Joy Caption 算是比较亮眼的一款。它不仅能生成详细、明确的图像描述,关键是同时支持 SFW 和 NSFW 场景 —— 这意味着不管什么类型的图,它都能准确反推。模型本身由谷歌的 siglip-so400m-patch1
Joy Caption 快速上手
图像反推提示语模型最近热度很高,其中 Joy Caption 算是比较亮眼的一款。它不仅能生成详细、明确的图像描述,关键是同时支持 SFW 和 NSFW 场景 —— 这意味着不管什么类型的图,它都能准确反推。模型本身由谷歌的 siglip-so400m-patch14-384 和微调过的 Meta-Llama-3.1-8B-bnb-4bit 组合而成,本地跑起来大概需要 7GB 显存。下面把安装和使用的关键点捋一捋。
Joy Caption 安装指南
在 ComfyUI 里用插件管理器搜索 Comfyui_CXH_joy_caption,安装后重启就行。难点其实在模型文件上,需要手动准备三个部分:
- 确认本地 transformers >= 4.44.2 版本
- 下载 google/siglip-so400m-patch14-384,放到 /ComfyUI/models/clip/siglip-so400m-patch14-384,注意是整个项目文件夹
- 下载 unsloth/Meta-Llama-3.1-8B-bnb-4bit,放到 /ComfyUI/models/LLM
- 下载 fancyfeast/joy-caption-pre-alpha,放到 /ComfyUI/models/Joy_caption




不想一个个手动拖文件可以用 git 方式:在 CMD 里导航到对应目录,执行 git lfs install 后再用 git lfs clone 拉取模型仓库。注意要替换成实际的仓库地址。

Joy Caption 反推工作流
在 Flux 基础工作流上增加 JoyCaption 模型加载 和 反推 两个节点即可。本示例还使用了 Flux 细节质感提升 LORA 模型,能让整体图像质感更扎实。

以下是几个实际反推效果,直接看提示词和生成图会更直观。
01. 晾晒衣服
flim rendering, depicting a young Asian woman standing on a rooftop on a bright, sunny day. She has long, straight black hair tied into a high ponytail, and is wearing a simple, white, short-sleeved T-shirt and light blue denim shorts. The woman is facing away from the camera, gazing towards the horizon with a serene expression. She holds a wooden clothespin in her right hand, which is holding a white T-shirt on a clothesline strung horizontally across the rooftop. The clothesline is made of thin, yellow string, and the clothespin is positioned near the sleeve of the shirt. In the background, there is a clear blue sky with a few scattered clouds, and a view of a cityscape featuring multiple high-rise apartment buildings with balconies. The rooftop surface is concrete, with a few small plants adding some greenery. The overall scene conveys a sense of tranquility and simplicity, with the bright sunlight casting soft, natural shadows.


02. 在一起
octane rendering,UE5,Maya,blender, . This is a digital photograph featuring two fingers, one on top of the other, with the tips touching. Each finger is drawn with black marker to resemble a person. The top finger is a girl, depicted with closed eyes and a small smile, suggesting happiness. She has a pink heart above her head, and her arms are bent at the elbow, with her hands clasped together. The bottom finger is a boy, with closed eyes and a small smile, also suggesting happiness. He has a pink heart above his head and his arms are bent at the elbow, with his hands clasped together. The fingers are positioned upright and facing each other, with the girl’s finger on top. The background is a soft, pale yellow color, providing a neutral and soothing backdrop that enhances the warm, affectionate theme of the image. Text is written in black, playful, hand-like font, with the words “Together Forever” above the girl’s finger, and “I love you…” below the boy’s finger. The text is surrounded by small pink hearts, adding a whimsical touch. The overall mood is one of love and affection, conveyed through the simple yet charming depiction of the fingers and the accompanying text.

03. 卖猪仔
octane rendering,UE5,Maya,blender, Slung over his shoulder was a stick with two caged piglets on either side, of a photorealistic CGI (computer-generated imagery) artwork. This digital artwork depicts a chubby, adorable baby with dark hair and large, round eyes, dressed in a sleeveless, light pink dress with a subtle polka dot pattern. The baby is holding two woven baskets on either side of its body, balanced on its shoulders. Each basket contains a small piglet with pink skin and short snouts. The baby’s expression is one of contentment and innocence. The background features a rural setting with a pa ved path leading into the distance, flanked by lush green foliage and a wooden building on the left. The lighting is soft and natural, creating a serene atmosphere. The textures are meticulously detailed, from the smoothness of the baby’s skin to the coarse texture of the woven baskets and the soft fur of the piglets. This CGI artwork combines photorealism with a whimsical, almost surrealistic touch, enhancing the charm and cuteness of the subject.


04. 黄瓜服装秀
octane rendering,UE5,Maya,blender, . This is a highly detailed, high-resolution photograph featuring a life-sized, stylized human figure crafted entirely from cucumber slices. The figure stands against a plain white background, emphasizing its vivid green hue. The person, with a serene expression, wears an elegant, sleeveless gown made from cucumber slices arranged in a layered, petal-like fashion. The gown’s neckline is V-shaped, and the slices form a series of overlapping, scalloped edges that resemble a flower’s petals. The figure’s head is adorned with a crown of cucumber lea ves, adding to the botanical theme. The texture of the cucumber slices is smooth and glossy, with the light reflecting off the wet surface, giving it a fresh, vibrant appearance. The overall effect is both surreal and artistic, blending elements of nature and human craftsmanship. The photograph captures the intricate details of the cucumber slices, emphasizing their natural patterns and the delicate arrangement that creates the gown. The figure’s skin is a pale, almost translucent white, contrasting starkly with the green of the cucumber slices, enhancing the surreal nature of the image.


你是一名 AI 行业编辑,请围绕下面这条热点输出一份资讯解读:
热点:JoyCaption本地部署指南,支持SFW与NSFW图像描述要求:
1. 先用一句话解释这条热点在讲什么
2. 再总结它为什么重要
3. 说明会影响哪些 AI 产品或内容方向
4. 最后给出 3 个适合资讯站使用的标题
游乐网为非赢利性网站,所展示的游戏/软件/文章内容均来自于互联网或第三方用户上传分享,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系youleyoucom@outlook.com。
相关热点GoogleMeet是面向商业与企业的视频会议服务,支持屏幕共享、实时字幕及与GoogleWorkspace集成,适用于项目讨论、网络研讨和线上教学等多种会议场景,具备扎实的安全与隐私保护。
Lanter是Chrome扩展,利用AI将YouTube视频语音转为带时间戳的文字笔记,支持一键抓取高光、自动标点排版、书签管理、全局搜索及每日邮件汇总,方便高效回顾视频关键内容。
一款AI驱动的Chrome扩展音频笔记应用,支持录音自动转文字、标签分类与全文搜索,将语音转化为可检索的数字资产,显著提升信息定位与管理效率。
专为GoogleMeet设计的AIChrome扩展,实时转录会议内容,自动生成摘要并提取行动项与决策,无缝同步至Google文档、任务及Gmail,省去手动整理时间,显著提升协作效率。
- 日榜
- 周榜
- 月榜
热点快看
