name: gemini-image-gen description: 实施Google Gemini API图像生成的指南 - 使用gemini-2.5-flash-image模型从文本提示创建高质量图像。当生成图像、创建视觉内容或实现文本到图像功能时使用。支持文本到图像、图像编辑、多图像组合和迭代优化。 license: MIT version: 1.0.0 allowed-tools:

Bash
Read
Write

Gemini图像生成技能

使用Google的Gemini 2.5 Flash Image模型，通过文本提示、图像编辑和多图像组合能力生成高质量图像。

何时使用此技能

当您需要时使用此技能：

从文本描述生成图像
通过添加/移除元素或改变风格编辑现有图像
将多个源图像组合成新的构图
通过对话式编辑迭代优化图像
为文档、设计或创意项目创建视觉内容

先决条件

API密钥设置

技能按以下顺序自动检测您的GEMINI_API_KEY：

进程环境：export GEMINI_API_KEY="your-key"
技能目录：.claude/skills/gemini-image-gen/.env
项目目录：./.env（项目根目录）

获取您的API密钥：访问Google AI Studio

创建.env文件，内容为：

GEMINI_API_KEY=your_api_key_here

Python设置

安装所需包：

pip install google-genai

快速开始

基本文本到图像生成

from google import genai
from google.genai import types
import os

# API密钥检测由辅助脚本自动处理
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

# 保存到./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

使用辅助脚本

为方便起见，使用提供的辅助脚本处理API密钥检测和文件保存：

# 生成单个图像
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "A futuristic city with flying cars" \
  --aspect-ratio 16:9 \
  --output ./docs/assets/city.png

# 生成特定模态
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "Modern architecture design" \
  --response-modalities image text \
  --aspect-ratio 1:1

主要功能

宽高比

比例	分辨率	用例	令牌成本
1:1	1024×1024	社交媒体、头像	1290
16:9	1344×768	风景、横幅	1290
9:16	768×1344	移动端、肖像	1290
4:3	1152×896	传统媒体	1290
3:4	896×1152	垂直海报	1290

响应模态

['image']: 仅生成图像
['text']: 仅生成文本描述
['image', 'text']: 同时生成图像和描述

图像编辑

提供现有图像 + 文本指令进行修改：

import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)

多图像组合

组合最多3个源图像（推荐）：

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)

提示工程技巧

构建有效提示包含三个元素：

主体：要生成什么（“一个机器人”）
上下文：环境设置（“在一个未来城市中”）
风格：艺术处理（“赛博朋克风格、霓虹灯照明”）

示例：“一个在未来城市中的机器人，赛博朋克风格，有霓虹灯照明和雨湿街道”

质量修饰词：

添加术语如"4K"、“HDR”、“高质量”、“专业摄影”
指定相机设置：“35mm镜头”、“浅景深”、“黄金小时照明”

图像中的文本：

最多限制25个字符
使用最多3个不同短语
指定字体风格：“粗体无衬线标题"或"手写体”

查看references/prompting-guide.md获取全面的提示工程策略。

安全设置

模型包括可调节的安全过滤器。按请求配置：

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

查看references/safety-settings.md获取详细配置选项。

输出管理

所有生成的图像应保存到./docs/assets/目录：

# 如果需要，创建目录
mkdir -p ./docs/assets

辅助脚本自动保存到此位置，文件名带时间戳。

模型规格

模型：gemini-2.5-flash-image

输入令牌：最多65,536
输出令牌：最多32,768
支持输入：文本和图像
支持输出：文本和图像
知识截止日期：2025年6月
功能：图像生成、结构化输出、批处理API、缓存

限制

推荐最多3个输入图像以获得最佳结果
文本渲染最好先单独生成
不支持音频/视频输入
儿童图像上传的区域限制（EEA、CH、UK）
最佳语言支持：英语、西班牙语（墨西哥）、日语、普通话、印地语

错误处理

常见问题及解决方案：

API密钥未找到：

# 检查环境变量
echo $GEMINI_API_KEY

# 验证.env文件存在
cat .claude/skills/gemini-image-gen/.env
# 或
cat .env

安全过滤器阻止：

查看response.prompt_feedback.block_reason
如果适合您的用例，调整安全设置
修改提示以避免触发过滤器

令牌限制超过：

减少提示长度
使用更少的输入图像
简化图像编辑指令

参考文档

获取详细信息，参见：

references/api-reference.md - 完整的API规范
references/prompting-guide.md - 高级提示工程
references/safety-settings.md - 安全配置详情
references/code-examples.md - 额外实现示例

Gemini图像生成技能Skill gemini-image-gen