Data Engineering
525 skills in Data & AI > Data Engineering
multi-agent-architecture
Multi-agent sistem mimarisi referansi. Use when working with agents, pipelines, or understanding the content generation workflow.
telemetry-validator-agent
AI-powered Telemetry Validator Agent that verifies instrumentation works in sandbox environments. Use when: (1) Validating OTel spans are emitted correctly, (2) Verifying correlation headers in Kafka messages, (3) Confirming OpenLineage events for data pipelines, (4) Generating validation evidence for merge approval. Triggers: "validate telemetry", "verify instrumentation", "check OTel spans", "validate correlation headers".
databases
Work with MongoDB (document database, BSON documents, aggregation pipelines, Atlas cloud) and PostgreSQL (relational database, SQL queries, psql CLI, pgAdmin). Use when designing database schemas, writing queries and aggregations, optimizing indexes for performance, performing database migrations, configuring replication and sharding, implementing backup and restore strategies, managing database users and permissions, analyzing query performance, or administering production databases.
duckdb-lakehouse
DuckDB data lakehouse development with Dagster orchestration and Apache Iceberg integrationfor floe-platform. Use when: (1) Building data pipelines with DuckDB as compute engine,(2) Configuring dbt-duckdb with Polaris plugin, (3) Reading/writing Iceberg tables viaPolaris catalog, (4) Creating Dagster assets with DuckDB, (5) Designing catalog-firstlakehouse architecture, (6) Connecting to REST catalogs with inline credentials.
data-engineer
Data pipelines and analytics infrastructure
reqon
Use when writing or editing .vague files for Reqon declarative API data pipelines
media-processing
Process multimedia files with FFmpeg (video/audio encoding, conversion, streaming, filtering, hardware acceleration) and ImageMagick (image manipulation, format conversion, batch processing, effects, composition). Use when converting media formats, encoding videos with specific codecs (H.264, H.265, VP9), resizing/cropping images, extracting audio from video, applying filters and effects, optimizing file sizes, creating streaming manifests (HLS/DASH), generating thumbnails, batch processing images, creating composite images, or implementing media processing pipelines. Supports 100+ formats, hardware acceleration (NVENC, QSV), and complex filtergraphs.
dlt
dlt (data load tool) patterns for SignalRoom ETL pipelines. Use when creating sources, debugging pipeline failures, understanding schema evolution, or implementing incremental loading.
gitlab-ci-debugger
Debug and monitor GitLab CI/CD pipelines for merge requests. Check pipeline status, view job logs, and troubleshoot CI failures. Use this when the user needs to investigate GitLab CI pipeline issues, check job statuses, or view specific job logs.
clickhouse-pipeline-optimizer
Analyze and optimize existing ClickHouse pipelines, schemas, and queries. Use when reviewing ClickHouse performance, auditing table designs, troubleshooting slow queries, or improving data ingestion patterns.
sql-test-generator
ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name. This skill is for unit testing ONLY (not data quality checks - use bigconfig-generator for Bigeye monitoring). Works with bigquery-etl-core skill to understand query patterns.
cicd-workflows
GitHub Actions and CI/CD patterns for Databricks, including automated testing, deployment, and quality gates.
ci-cd-pipelines
Guide for creating and configuring CI/CD pipelines with GitHub Actions or GitLab CI. Use when users need to set up automated workflows for testing, building, deploying applications, or managing secrets. Covers Node.js, Python, Docker, Vercel, Railway, Cloudflare, and multi-environment deployments.
data-engineering
Data pipeline architecture, ETL/ELT patterns, data modeling, and production data platform design
docker
Guide for using Docker - a containerization platform for building, running, and deploying applications in isolated containers. Use when containerizing applications, creating Dockerfiles, working with Docker Compose, managing images/containers, configuring networking and storage, optimizing builds, deploying to production, or implementing CI/CD pipelines with Docker.
nushell
Guide for using Nushell (Nu), a modern shell with structured data pipelines, cross-platform compatibility, and programming language features
data-warehousing
Snowflake, BigQuery, Redshift, Delta Lake, and data warehouse design
google-adk
Guide for building AI agents with Google ADK (Agent Development Kit). Use when creating multi-agent pipelines, implementing conditional agent branching, designing agent tools with FunctionTool, or debugging agent data flow issues. Covers SequentialAgent, LoopAgent, ParallelAgent patterns, session.state management, output_key chaining, and transfer_to_agent for control flow. Essential for understanding non-obvious ADK behaviors like why SequentialAgent runs ALL agents even after rejection.
svg-theme-system
Use when you need SVGs with selectable/customisable appearance: multiple palettes (WLILO / Obsidian / light), consistent typography, and repeatable agent-friendly pipelines. Triggers: svg theme, tokens, palette, WLILO, obsidian, design system, dark mode, diagram styling.
mongodb
Guide for implementing MongoDB - a document database platform with CRUD operations, aggregation pipelines, indexing, replication, sharding, search capabilities, and comprehensive security. Use when working with MongoDB databases, designing schemas, writing queries, optimizing performance, configuring deployments (Atlas/self-managed/Kubernetes), implementing security, or integrating with applications through 15+ official drivers. (project)