Data Engineering
525 skills in Data & AI > Data Engineering
moai-security-devsecops
SAST/DAST/SCA automation, CI/CD security pipelines, vulnerability management
understanding-flow
Use when learning walkerOS architecture, understanding data flow, or designing composable event pipelines. Covers Source→Collector→Destination pattern and separation of concerns. (project)
understanding-flow
Use when learning walkerOS architecture, understanding data flow, or designing composable event pipelines. Covers Source→Collector→Destination pattern and separation of concerns.
update-dataset
End-to-end dataset update workflow with PR creation, snapshot, meadow, garden, and grapher steps. Use when user wants to update a dataset, refresh data, run ETL update, or mentions updating dataset versions.
gcloud
Guide for implementing Google Cloud SDK (gcloud CLI) - a command-line tool for managing Google Cloud resources. Use when installing/configuring gcloud, authenticating with Google Cloud, managing projects/configurations, deploying applications, working with Compute Engine/GKE/App Engine/Cloud Storage, scripting gcloud operations, implementing CI/CD pipelines, or troubleshooting Google Cloud deployments.
mongodb
Guide for implementing MongoDB - a document database platform with CRUD operations, aggregation pipelines, indexing, replication, sharding, search capabilities, and comprehensive security. Use when working with MongoDB databases, designing schemas, writing queries, optimizing performance, configuring deployments (Atlas/self-managed/Kubernetes), implementing security, or integrating with applications through 15+ official drivers. (project)
docker
Guide for using Docker - a containerization platform for building, running, and deploying applications in isolated containers. Use when containerizing applications, creating Dockerfiles, working with Docker Compose, managing images/containers, configuring networking and storage, optimizing builds, deploying to production, or implementing CI/CD pipelines with Docker.
deploy
Sets up deployment, analytics, and health monitoring for projects. Use when user mentions デプロイ, deploy, Vercel, Netlify, 公開, アナリティクス, analytics, GA, Google Analytics, 環境診断, health check. Do NOT load for: 実装作業, ローカル開発, レビュー, セットアップ.
ci
Diagnoses and fixes CI/CD pipeline failures. Use when user mentions 'CI', 'GitHub Actions', 'GitLab CI', 'ビルドエラー', 'テスト失敗', 'パイプライン', 'CIが落ちた', or asks to analyze build/test failures. Do NOT load for: ローカルビルド, 通常の実装作業, レビュー, セットアップ.
architecture-paradigm-pipeline
Compose processing stages using a pipes-and-filters model for ETL, media processing, or compiler-like workloads. Triggers: pipeline architecture, pipes and filters, ETL, data transformation, stream processing, CI/CD pipeline, media processing, batch processing Use when: data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial DO NOT use when: selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow isn't sequential or predictable. DO NOT use when: complex branching/merging logic dominates. Consult this skill when designing data pipelines or transformation workflows.
neon-toolkit
Creates and manages ephemeral Neon databases for testing, CI/CD pipelines, and isolated development environments. Use when building temporary databases for automated tests or rapid prototyping.
data-pipeline
Orchestrate marketing data collection, transformation, and reporting workflows. Use when relevant to the task.
ci-cd-pipeline
Set up and maintain continuous integration and continuous deployment pipelines with GitHub Actions, GitLab CI, Jenkins, or similar tools to automate testing, building, and deployment. Use when configuring automated builds, setting up test automation, implementing deployment automation, creating release workflows, managing environment deployments, configuring build caching, implementing blue-green deployments, setting up rollback strategies, or automating the entire software delivery pipeline.
neondb-serverless
Use Neon serverless Postgres with branching, connection pooling, and instant scalability for modern applications with Prisma or Drizzle ORM integration. Use when setting up serverless Postgres databases, implementing database branching for preview environments, configuring connection pooling, optimizing for serverless cold starts, using Prisma with Neon, implementing database migrations, scaling databases automatically, or building applications on Vercel/Netlify with Postgres.
nextflow-pipeline-overview
Efficiently understand unfamiliar Nextflow pipelines without reading all files. Use when encountering a new pipeline or needing to explain pipeline structure.
youtube-title
Generate optimized YouTube video titles that maximize click-through rates by sparking curiosity and complementing thumbnails. This skill should be used when the user asks to create, improve, or brainstorm YouTube video titles, or when working on YouTube content that requires title optimization.
spring-boot-event-driven-patterns
Implement Event-Driven Architecture (EDA) in Spring Boot using ApplicationEvent, @EventListener, and Kafka. Use for building loosely-coupled microservices with domain events, transactional event listeners, and distributed messaging patterns.
discover-data
Automatically discover data pipeline and ETL skills when working with ETL. Activates for data development tasks.
crm-hygiene
Use to enforce stage definitions, next-step requirements, and data completeness across the pipeline.
deal-review
Use to run structured opportunity inspections that align pipeline data with buyer reality.