name: gemini-imagen description: "Image generation with Google's Imagen and Gemini native models. Text-to-image, image editing, and iterative refinement."

Gemini Image Generation

Source: https://ai.google.dev/gemini-api/docs/image-generation

Google offers two approaches for image generation: Imagen (dedicated image model) and Gemini native (multimodal generation). All generated images include SynthID watermarks.

Model Selection

Model	API	Use Case	Output
`gemini-2.5-flash-image`	generate_content	Fast text-to-image, editing, iteration	Up to 4K
`gemini-3-pro-preview`	generate_content	Best quality, complex compositions	Up to 4K
`imagen-4.0-generate-001`	generate_images	High-fidelity text-to-image	1K default
`imagen-4.0-ultra-generate-001`	generate_images	Ultra quality, 2K output	Up to 2K
`imagen-4.0-fast-generate-001`	generate_images	Fast generation	1K
`imagen-3.0-generate-002`	generate_images	Legacy, stable	Up to 4 images

When to use which:

Gemini native (gemini-*-image): Image editing, multi-turn refinement, mixed text+image output
Imagen: Pure text-to-image, high-fidelity results, batch generation (1-4 images)

Quick Start

Gemini Native (Text-to-Image)

from google import genai
from PIL import Image

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=["A modern logo for a tech startup called 'Nexus'"],
)

for part in response.parts:
    if part.inline_data is not None:
        image = part.as_image()
        image.save("logo.png")

Imagen (Dedicated Image Generation)

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_images(
    model="imagen-4.0-generate-001",
    prompt="A photorealistic sunset over mountains, 4K HDR",
    config=types.GenerateImagesConfig(
        number_of_images=4,
        aspect_ratio="16:9",
    )
)

for i, generated_image in enumerate(response.generated_images):
    generated_image.image.save(f"sunset_{i}.png")

Image Editing (Gemini Native)

from google import genai
from PIL import Image

client = genai.Client()
source_image = Image.open("photo.jpg")

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        "Remove the background and replace with a gradient blue sky",
        source_image,
    ],
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("edited.png")

Best Practices

Prompt Structure

Build prompts with three components:

Subject: The main object, person, or scene
Context: Background, setting, environment
Style: Art style, photography type, quality modifiers

"A [style] of [subject] in/with [context]"

Example:

"A minimalist logo for a health care company on a solid color background. Include the text 'Journey'."

Photography Modifiers

Category	Keywords
Proximity	close-up, zoomed out, aerial, from below
Lighting	natural, dramatic, warm, cold, golden hour
Camera	motion blur, soft focus, bokeh, portrait
Lens	35mm, 50mm, fisheye, wide angle, macro
Film	black and white, polaroid, film noir
Quality	4K, HDR, high-quality, professional

Text in Images

When generating text in images:

Keep text under 25 characters for best results
Use 2-3 distinct phrases maximum
Specify font style (bold, serif, sans-serif)
Specify placement (top, center, bottom)

prompt = 'A poster with the text "Summerland" in bold font as a title, underneath is "Summer never felt so good"'

Aspect Ratios

Ratio	Use Case
`1:1`	Social media posts, icons
`4:3`	Photography, medium format
`3:4`	Portrait, vertical scenes
`16:9`	Widescreen, landscapes, presentations
`9:16`	Mobile, stories, tall subjects

Imagen Configuration

config = types.GenerateImagesConfig(
    number_of_images=4,        # 1-4 images
    aspect_ratio="16:9",       # Aspect ratio
    image_size="2K",           # 1K or 2K (Ultra/Standard only)
    person_generation="allow_adult",  # dont_allow, allow_adult, allow_all
)

Multi-Turn Refinement (Gemini Native)

Gemini native supports conversational image editing:

from google import genai

client = genai.Client()
chat = client.chats.create(model="gemini-2.5-flash-image")

# Initial generation
response = chat.send_message("Create a cozy coffee shop interior")

# Iterative refinement
response = chat.send_message("Add more plants and warmer lighting")
response = chat.send_message("Make it evening with fairy lights")

# Save final result
for part in response.parts:
    if part.inline_data:
        part.as_image().save("coffee_shop_final.png")

Async Usage

import asyncio
from google import genai

client = genai.Client()

async def generate_variations(prompt: str, count: int = 4):
    """Generate multiple image variations concurrently."""
    tasks = [
        client.aio.models.generate_images(
            model="imagen-4.0-generate-001",
            prompt=prompt,
            config={"number_of_images": 1}
        )
        for _ in range(count)
    ]
    results = await asyncio.gather(*tasks)
    return [r.generated_images[0] for r in results]

Common Patterns

Logo Generation

prompt = """
A {style} logo for a {industry} company on a solid color background.
Include the text "{company_name}".
Style: minimalist, modern, clean lines
"""

Product Photography

prompt = """
Professional studio photo of {product},
{background} background,
{lighting} lighting,
4K, high detail, commercial quality
"""

Artistic Styles

Reference historical movements:

"in the style of an impressionist painting"
"in the style of pop art"
"in the style of art deco poster"
"digital art, trending on artstation"

Limitations

Imagen: English prompts only, max 480 tokens
Person generation: allow_all not available in EU, UK, CH, MENA
Text rendering: Best under 25 characters
Iterative editing: Only with Gemini native, not Imagen

Documentation Index

Resource	When to Consult
gemini-image-generation.md	Gemini native models, editing, multi-turn, grounding
imagen.md	Imagen API, prompt guide, photography tips, model versions

Syncing Documentation

cd skills/gemini-imagen
bun run scripts/sync-docs.ts

gemini-imagen

$ Installer