Data Engineering
525 skills in Data & AI > Data Engineering
Methodology Bootstrapping
Apply Bootstrapped AI Methodology Engineering (BAIME) to develop project-specific methodologies through systematic Observe-Codify-Automate cycles with dual-layer value functions (instance quality + methodology quality). Use when creating testing strategies, CI/CD pipelines, error handling patterns, observability systems, or any reusable development methodology. Provides structured framework with convergence criteria, agent coordination, and empirical validation. Validated in 8 experiments with 100% success rate, 4.9 avg iterations, 10-50x speedup vs ad-hoc. Works for testing, CI/CD, error recovery, dependency management, documentation systems, knowledge transfer, technical debt, cross-cutting concerns.
Tekton
Tekton Pipelines CI/CD best practices for Kubernetes-native workflows. USE WHEN working with Tekton tasks, pipelines, triggers, building container images, GitOps integration, or cloud-native CI/CD.
osiris-component-developer
Create production-ready Osiris ETL components (extractors, writers, processors). Use when building new components, implementing capabilities (discover, streaming, bulkOperations), adding doctor/healthcheck methods, packaging for distribution, validating against 60-rule checklist, or ensuring E2B cloud compatibility. Supports third-party component development in isolated projects.
mailman
Guidance for setting up and configuring mailing list servers with Postfix and Mailman3. This skill should be used when tasks involve configuring email servers, mailing list management, LMTP integration, or mail delivery pipelines. Applies to tasks requiring Postfix-Mailman integration, subscription workflows, or email broadcast functionality.
github-actions
Create and maintain GitHub Actions workflows for CI/CD, testing, deployment, and automation. Use when setting up pipelines, automating tasks, or configuring continuous integration.
dipeo-codegen-pipeline
Router skill for DiPeO code generation pipeline (TypeScript specs → IR → Python/GraphQL). Use when task mentions TypeScript models, IR builders, generated code diagnosis, or codegen workflow. For simple tasks, handle directly; for complex work, escalate to dipeo-codegen-pipeline agent.
multi-source-data-merger
This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.
harness-step-schema
Creates or updates pipeline step schemas in the harness-schema repository. Use when the user wants to add a new step, modify an existing step's fields, or make a step available in different stages. Triggers for requests about "create step", "add step", "new step schema", "update step", or "step available in stage".
spark-engineer
Use when building Apache Spark applications, distributed data processing pipelines, or optimizing big data workloads. Invoke for DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics.
Unnamed Skill
Use when setting up CI/CD pipelines, containerizing applications, or managing infrastructure as code. Invoke for pipelines, Docker, Kubernetes, cloud platforms, GitOps. Keywords: DevOps, CI/CD, Docker, Kubernetes, Terraform, GitHub Actions.
autonomous-coding-agent
Build autonomous coding agents with CLI integration. Use when creating automated coding workflows, managing multi-session development, or building CI/CD code generation pipelines.
crawl4ai
This skill should be used when users need to scrape websites, extract structured data, handle JavaScript-heavy pages, crawl multiple URLs, or build automated web data pipelines. Includes optimized extraction patterns with schema generation for efficient, LLM-free extraction.
container-deployment
Manage containerization and deployment automation using Docker, Kubernetes, and cloud platforms. Use when working with Docker images, container registries, orchestration, deployment pipelines, infrastructure as code, or environment management. Handles container builds, registry publishing, and deployment strategies.
agent-deployment-engineer
Expert deployment engineer specializing in CI/CD pipelines, release automation, and deployment strategies. Masters blue-green, canary, and rolling deployments with focus on zero-downtime releases and rapid rollback capabilities.
bigquery
Instructions for querying Google BigQuery using the bq command-line tool. Useful for running SQL queries, exploring datasets, and exporting results.
agent-iot-engineer
Expert IoT engineer specializing in connected device architectures, edge computing, and IoT platform development. Masters IoT protocols, device management, and data pipelines with focus on building scalable, secure, and reliable IoT solutions.
enterprise-readiness
Assess and enhance software projects for enterprise-grade security, quality, and automation. Use when evaluating projects for production readiness, implementing supply chain security (SLSA, signing, SBOMs), hardening CI/CD pipelines, or establishing quality gates. Aligned with OpenSSF Scorecard, Best Practices Badge (all levels), SLSA, and S2C2F. By Netresearch.
agent-build-engineer
Expert build engineer specializing in build system optimization, compilation strategies, and developer productivity. Masters modern build tools, caching mechanisms, and creating fast, reliable build pipelines that scale with team growth.
agent-data-engineer
Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure. Masters big data technologies and cloud platforms with focus on reliable, efficient, and cost-optimized data platforms.
gh-run-failure
Use to analyze failures in GitHub pipelines or jobs.