智谱AI开源混合架构图像生成编辑模型GLM-Image,文本渲染与知识理解能力突出

2026年1月,智谱AI正式推出其新一代旗舰图像生成模型GLM-Image。该模型采用独创的“自回归+扩散解码器”混合架构,在通用图像生成质量上与主流潜在扩散模型相当,但在文本渲染和知识密集型生成场景中展现出显著优势。

GLM-Image的核心在于其双模块设计。其自回归生成器基于GLM-4-9B大语言模型,负责生成图像的紧凑语义编码;而70亿参数的扩散解码器则专注于将编码转化为高保真、细粒度的视觉细节,并通过专用的字形编码器(Glyph Encoder)大幅提升图像内文本的准确性。
?

此外,模型通过解耦的强化学习进行后训练,分别优化全局语义对齐与局部细节保真。这使得GLM-Image不仅能根据复杂文本描述生成信息丰富的图像,还支持图像编辑、风格迁移、身份保留生成等多种图像到图像任务,为工业级应用提供了强大且统一的解决方案。
示例代码
#pip install git+https://github.com/huggingface/transformers.git
#pip install git+https://github.com/huggingface/diffusers.git
#Text to Image Generation
import torch
from diffusers.pipelines.glm_image import GlmImagePipeline
pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image", torch_dtype=torch.bfloat16, device_map="cuda")
prompt = "A beautifully designed modern food magazine style dessert recipe illustration, themed around a raspberry mousse cake. The overall layout is clean and bright, divided into four main areas: the top left features a bold black title 'Raspberry Mousse Cake Recipe Guide', with a soft-lit close-up photo of the finished cake on the right, showcasing a light pink cake adorned with fresh raspberries and mint leaves; the bottom left contains an ingredient...点击查看剩余70%
网友评论