Data Engineering
525 skills in Data & AI > Data Engineering
harness-ci
Harness CI (Continuous Integration) for container-native builds, test intelligence, caching, parallelization, and build infrastructure management. Activate for build pipelines, CI steps, test automation, artifact publishing, and build optimization.
bitbucket-workflow
Create PRs and debug pipeline failures in Bitbucket with Carefeed conventions. Auto-extracts Jira keys from branches, generates PR descriptions, and diagnoses CI failures. Use when user mentions PRs, pipelines, or CI/CD.
uptick-ci-patterns
This skill should be used when the user asks to "set up CI", "configure GitHub Actions", "create a workflow", "pin actions", "use ratchet", "set up Claude code review", "configure AWS OIDC", "deploy with tickforge", or mentions GitHub Actions, CI/CD pipelines, or workflow security. Provides Uptick's security-first GitHub Actions patterns.
agent-workflow-orchestrator
Full integration of 5 agent patterns (Router, Sequential, Parallel, Orchestrator, Evaluator) into automated workflow pipelines. Analyzes tasks, selects optimal pattern combinations, and executes end-to-end with quality validation. Use for complex projects requiring intelligent pattern selection and automatic execution flow.
big-data
Apache Spark, Hadoop, distributed computing, and large-scale data processing
effect-streams-pipelines
Stream creation, transformation, sinks, batching, and resilience. Use when building data pipelines with concurrency and backpressure.
state-flow-visualizer
Analyze and visualize state flow in LangGraph pipelines. Use when debugging state mutations, tracing data flow between nodes, identifying orphan state keys, detecting read/write dependencies, or generating Mermaid diagrams of state transitions. Triggers on phrases like "visualize state flow", "trace state key", "which nodes touch X", "state dependencies", "debug state mutation", "orphan state keys", "state flow diagram".
finish-es-latam-audiobook
Финализация аудиокниги ru→es_latam. Проверяет что ВСЕ переводы готовы, генерирует TTS, комбинирует аудио, ищет фоновую картинку, создаёт видео. Использовать ТОЛЬКО после завершения claude-translation-pipeline-es_latam.
global-tech-stack
Maintain approved technology stack including TypeScript/Python languages, React/Tailwind frontend, Node.js/FastAPI backend, PostgreSQL/Redis persistence, and Ansible infrastructure automation with enforced quality gates. Use this skill when selecting technologies, adding dependencies, configuring tooling, or ensuring infrastructure-as-code practices. Applies to package.json, requirements.txt, CI/CD pipelines, Ansible playbooks, linters, formatters, testing frameworks, and all technology choices requiring documented approval and migration strategies.
roslyn-source-generators
Create and maintain Roslyn source generators for compile-time code generation. Use when building incremental generators, designing pipelines with ForAttributeWithMetadataName, creating marker attributes, implementing equatable models, testing generators, or debugging generator performance issues.
delta-live-tables
Delta Live Tables (DLT) pipeline patterns and examples for building declarative, self-healing data pipelines with automatic quality enforcement and lineage tracking.
dbt-model-writer
Writes, edits, and creates dbt models following best practices. Use when user needs to create new dbt SQL models, update existing models, or convert raw SQL to dbt format. Handles staging, intermediate, and mart models with proper config blocks, CTEs, and documentation.
spring-boot-full-stack
Complete Java Spring Boot skill set for building enterprise applications.Includes modular architecture with optional components:- PostgreSQL database with JPA/Hibernate + Flyway migration- Redis caching (optional)- Kafka/RabbitMQ messaging (optional, choose one)- JWT + OAuth2 authentication (optional OAuth2)- RBAC authorization (optional)- TDD with Mockito- Spec-First Development with OpenSpec
github-actions
GitHub Actions CI/CD reference for workflow templates, caching strategies,and automation patterns. Includes homelab integration with self-hosted runners.Use when creating workflows, debugging CI failures, or setting up deployments.Triggers: github actions, ci, cd, workflow, pipeline, runner, artifact.
raw-workflow-creator
Create and run RAW workflows. Use this skill when the user asks to create a workflow, automate a task, build a data pipeline, generate reports, or asks "How do I build X with RAW?".
dbt-semantic-layer-developer
Provides expert-level assistance with dbt Semantic Layer, MetricFlow, semantic models, metrics, dimensions, entities, measures, and BI tool integrations. Use this skill when building semantic models, creating metrics (simple, ratio, cumulative, derived, conversion), debugging validation errors, or integrating with BI tools. Extracted from official dbt documentation and optimized for data practitioners.
data-pipeline-monitoring
Monitor and troubleshoot dual-pipeline data collection systems on GCP. This skill should be used when checking pipeline health, viewing logs, diagnosing failures, or monitoring long-running operations for data collection workflows. Supports Cloud Run Jobs (batch pipelines) and VM systemd services (real-time streams).
data-warehousing
Snowflake, BigQuery, Redshift, dimensional modeling, and modern data warehouse architecture
dataform-engineering-fundamentals
Use when developing BigQuery Dataform transformations, SQLX files, source declarations, or troubleshooting pipelines - enforces TDD workflow (tests first), ALWAYS use ${ref()} never hardcoded table paths, comprehensive columns:{} documentation, safety practices (--schema-suffix dev, --dry-run), proper ref() syntax, .sqlx for new declarations, no schema config in operations/tests, and architecture patterns that prevent technical debt under time pressure
gcp-cloud
Google Cloud Platform infrastructure patterns and best practices. Use when designing or implementing GCP solutions including Compute Engine, Cloud Functions, Cloud Storage, and BigQuery.