gemini-imagen

Image generation with Google's Imagen and Gemini native models. Text-to-image, image editing, and iterative refinement.

$ Installer

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/development/gemini-imagen ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill


name: gemini-imagen description: "Image generation with Google's Imagen and Gemini native models. Text-to-image, image editing, and iterative refinement."

Gemini Image Generation

Source: https://ai.google.dev/gemini-api/docs/image-generation

Google offers two approaches for image generation: Imagen (dedicated image model) and Gemini native (multimodal generation). All generated images include SynthID watermarks.

Model Selection

ModelAPIUse CaseOutput
gemini-2.5-flash-imagegenerate_contentFast text-to-image, editing, iterationUp to 4K
gemini-3-pro-previewgenerate_contentBest quality, complex compositionsUp to 4K
imagen-4.0-generate-001generate_imagesHigh-fidelity text-to-image1K default
imagen-4.0-ultra-generate-001generate_imagesUltra quality, 2K outputUp to 2K
imagen-4.0-fast-generate-001generate_imagesFast generation1K
imagen-3.0-generate-002generate_imagesLegacy, stableUp to 4 images

When to use which:

  • Gemini native (gemini-*-image): Image editing, multi-turn refinement, mixed text+image output
  • Imagen: Pure text-to-image, high-fidelity results, batch generation (1-4 images)

Quick Start

Gemini Native (Text-to-Image)

from google import genai
from PIL import Image

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=["A modern logo for a tech startup called 'Nexus'"],
)

for part in response.parts:
    if part.inline_data is not None:
        image = part.as_image()
        image.save("logo.png")

Imagen (Dedicated Image Generation)

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_images(
    model="imagen-4.0-generate-001",
    prompt="A photorealistic sunset over mountains, 4K HDR",
    config=types.GenerateImagesConfig(
        number_of_images=4,
        aspect_ratio="16:9",
    )
)

for i, generated_image in enumerate(response.generated_images):
    generated_image.image.save(f"sunset_{i}.png")

Image Editing (Gemini Native)

from google import genai
from PIL import Image

client = genai.Client()
source_image = Image.open("photo.jpg")

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        "Remove the background and replace with a gradient blue sky",
        source_image,
    ],
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("edited.png")

Best Practices

Prompt Structure

Build prompts with three components:

  1. Subject: The main object, person, or scene
  2. Context: Background, setting, environment
  3. Style: Art style, photography type, quality modifiers
"A [style] of [subject] in/with [context]"

Example:

"A minimalist logo for a health care company on a solid color background. Include the text 'Journey'."

Photography Modifiers

CategoryKeywords
Proximityclose-up, zoomed out, aerial, from below
Lightingnatural, dramatic, warm, cold, golden hour
Cameramotion blur, soft focus, bokeh, portrait
Lens35mm, 50mm, fisheye, wide angle, macro
Filmblack and white, polaroid, film noir
Quality4K, HDR, high-quality, professional

Text in Images

When generating text in images:

  • Keep text under 25 characters for best results
  • Use 2-3 distinct phrases maximum
  • Specify font style (bold, serif, sans-serif)
  • Specify placement (top, center, bottom)
prompt = 'A poster with the text "Summerland" in bold font as a title, underneath is "Summer never felt so good"'

Aspect Ratios

RatioUse Case
1:1Social media posts, icons
4:3Photography, medium format
3:4Portrait, vertical scenes
16:9Widescreen, landscapes, presentations
9:16Mobile, stories, tall subjects

Imagen Configuration

config = types.GenerateImagesConfig(
    number_of_images=4,        # 1-4 images
    aspect_ratio="16:9",       # Aspect ratio
    image_size="2K",           # 1K or 2K (Ultra/Standard only)
    person_generation="allow_adult",  # dont_allow, allow_adult, allow_all
)

Multi-Turn Refinement (Gemini Native)

Gemini native supports conversational image editing:

from google import genai

client = genai.Client()
chat = client.chats.create(model="gemini-2.5-flash-image")

# Initial generation
response = chat.send_message("Create a cozy coffee shop interior")

# Iterative refinement
response = chat.send_message("Add more plants and warmer lighting")
response = chat.send_message("Make it evening with fairy lights")

# Save final result
for part in response.parts:
    if part.inline_data:
        part.as_image().save("coffee_shop_final.png")

Async Usage

import asyncio
from google import genai

client = genai.Client()

async def generate_variations(prompt: str, count: int = 4):
    """Generate multiple image variations concurrently."""
    tasks = [
        client.aio.models.generate_images(
            model="imagen-4.0-generate-001",
            prompt=prompt,
            config={"number_of_images": 1}
        )
        for _ in range(count)
    ]
    results = await asyncio.gather(*tasks)
    return [r.generated_images[0] for r in results]

Common Patterns

Logo Generation

prompt = """
A {style} logo for a {industry} company on a solid color background.
Include the text "{company_name}".
Style: minimalist, modern, clean lines
"""

Product Photography

prompt = """
Professional studio photo of {product},
{background} background,
{lighting} lighting,
4K, high detail, commercial quality
"""

Artistic Styles

Reference historical movements:

  • "in the style of an impressionist painting"
  • "in the style of pop art"
  • "in the style of art deco poster"
  • "digital art, trending on artstation"

Limitations

  • Imagen: English prompts only, max 480 tokens
  • Person generation: allow_all not available in EU, UK, CH, MENA
  • Text rendering: Best under 25 characters
  • Iterative editing: Only with Gemini native, not Imagen

Documentation Index

ResourceWhen to Consult
gemini-image-generation.mdGemini native models, editing, multi-turn, grounding
imagen.mdImagen API, prompt guide, photography tips, model versions

Syncing Documentation

cd skills/gemini-imagen
bun run scripts/sync-docs.ts