Data Engineering
525 skills in Data & AI > Data Engineering
create-pof-tscn
Complete POF to TSCN conversion pipeline that converts POF files to GLB geometry and generates Godot scenes by combining the GLB with POF metadata extracted by parse-pof skill.
sqlmesh
SQLMesh patterns for data transformation with column-level lineage and virtual environments. Use when building data pipelines that need advanced features like automatic DAG inference and efficient incremental processing.
deploy
Help users deploy their portfolio to production. Covers Vercel, Netlify, Cloudflare Pages, GitHub Pages, Docker, and MCP integration for AI-driven deployment.
blockchain-data-collection-validation
Empirical validation workflow for blockchain data collection pipelines before production implementation. Use when validating data sources, testing DuckDB integration, building POC collectors, or verifying complete fetch-to-storage pipelines for blockchain data.
content-ops-netlify
Next.js content model with visual editing for Netlify.
data-quality-test-generator
Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests.
atft-pipeline
Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.
build
Production-grade Go CLI patterns, automated release workflows with Release Please, versioned docs, and coverage enforcement for DevSecOps build pipelines.
dagster-local
Interact with Dagster data orchestration platform running locally or on Kubernetes. Use when Claude needs to monitor Dagster runs, get run logs, list assets/jobs, materialize assets, launch jobs, or debug pipeline failures. Supports both local Dagster dev server and Kubernetes deployments where each job run is a separate pod.
cicd-fix-expert
Analyze and fix CI/CD pipeline failures including build errors, test failures, and linting issues
aws-infrastructure
AWS infrastructure as code with Terraform and CDK, including VPC design, EKS cluster setup, S3 bucket configuration, RDS databases, DynamoDB tables, Lambda functions, API Gateway, CloudWatch monitoring, IAM policies, security groups, cost optimization, multi-account strategies, CI/CD with CodePipeline, infrastructure testing, disaster recovery, compliance automation, and cloud-native best practices for production workloads.
backend-development
Build robust backend systems with modern technologies (Node.js, Python, Go, Rust, .NET), frameworks (NestJS, FastAPI, Django, ASP.NET Core), databases (PostgreSQL, MongoDB, Redis), APIs (REST, GraphQL, gRPC, Minimal APIs), authentication (OAuth 2.1, JWT), testing strategies, security best practices (OWASP Top 10), performance optimization, scalability patterns (microservices, caching, sharding), DevOps practices (Docker, Kubernetes, CI/CD), and monitoring. Use when designing APIs, implementing authentication, optimizing database queries, setting up CI/CD pipelines, handling security vulnerabilities, building microservices, or developing production-ready backend systems.
cube-semantic-layer
Research-driven Cube semantic layer development. Injects research steps for data modeling, dbt integration, metrics, dimensions, REST API, GraphQL API, and SQL API (Postgres wire protocol). Use when building semantic layer, consumption APIs, metrics layer, Cube integration tests, or working with Cube MEASURE() syntax and psycopg2 connections.
pyspark-patterns
PySpark best practices, TableUtilities methods, ETL patterns, logging standards, and DataFrame operations for this project. Use when writing or debugging PySpark code.
dbt-development
Proactive skill for validating dbt models against coding conventions. Auto-activates when creating, reviewing, or refactoring dbt models in staging, integration, or warehouse layers. Validates naming, SQL structure, field conventions, testing coverage, and documentation. Supports project-specific convention overrides and sqlfluff integration.
oidc-federation-patterns
Secretless authentication to cloud providers using OpenID Connect federation. GCP, Azure, and cloud-agnostic examples with subject claim patterns and trust policies.
backend-migrations
Create and manage database migrations for schema changes, ensuring zero-downtime deployments and data integrity. Use this skill when creating migration files, modifying database schemas, adding or altering tables/columns/indexes, or working with migration tools like Alembic, Flyway, Liquibase, or framework-specific migration systems (Django migrations, Rails migrations, Prisma migrations). Apply this skill when implementing reversible migrations with up/down methods, handling data migrations separately from schema changes, creating indexes on large tables, or planning backwards-compatible schema changes for high-availability systems. This skill ensures migrations are version-controlled, focused, safe to rollback, and compatible with CI/CD pipelines and zero-downtime deployment strategies.
data-pipeline-patterns
Follow these patterns when implementing data pipelines, ETL, data ingestion, or data validation in OptAIC. Use for point-in-time (PIT) correctness, Arrow schemas, quality checks, and Prefect orchestration.
project-architecture
Detailed architecture, data flow, pipeline execution, dependencies, and system design for the Unify data migration project. Use when you need deep understanding of how components interact.
gcp-bq-data-export
Use when exporting BigQuery data to Cloud Storage, extracting tables to CSV, JSON, Avro, or Parquet formats, or using EXPORT DATA statements. Covers bq extract command, format options, compression, and wildcard exports.