Data Engineering
525 skills in Data & AI > Data Engineering
streams
Master Node.js streams for memory-efficient processing of large datasets, real-time data handling, and building data pipelines
vsa-pattern-selector
Blazor VSA パターンカタログからの適切なパターン選択支援。新機能追加、 CRUD 操作、クエリ実装、状態遷移、バウンダリー設計などの文脈で、 catalog/index.json の ai_decision_matrix に基づいて最適なパターンを 提案する。Feature Slice、Pipeline Behavior、Domain Pattern、 Query Pattern などから文脈に応じたパターンを選択。
configuring-dapr-pubsub
Configures Dapr pub/sub components for event-driven microservices with Kafka or Redis. Use when wiring agent-to-agent communication, setting up event subscriptions, or integrating Dapr sidecars. Covers component configuration, subscription patterns, publishing events, and Kubernetes deployment. NOT when using direct Kafka clients or non-Dapr messaging patterns.
scraping-data-pipeline
Use this skill when scraping UFC fighter data from UFCStats.com, validating scraped data, loading data into the database, or running the complete scraping pipeline. Handles fighters list, fighter details, events, and fight history. Includes data validation, error recovery, cache invalidation, and progress reporting.
managing-fighter-images
Use this skill when working with UFC fighter images including downloading from multiple sources (Wikimedia, Sherdog, Bing), detecting and replacing placeholder images, handling duplicates, normalizing image sizes, validating image quality, syncing filesystem to database, or running the complete image pipeline. Handles missing images, batch downloads, and multi-source orchestration.
Unnamed Skill
This skill should be used when the user asks about deploying Rails applications, Kamal deployment, Docker containers, production configuration, environment variables, secrets management, CI/CD pipelines, server provisioning, zero-downtime deploys, Kamal Proxy, Thruster, or infrastructure setup. Also use when discussing production optimization, deployment strategies, or hosting options. Examples:
incremental-fetch
Build resilient data ingestion pipelines from APIs. Use when creating scripts that fetch paginated data from external APIs (Twitter, exchanges, any REST API) and need to track progress, avoid duplicates, handle rate limits, and support both incremental updates and historical backfills. Triggers: 'ingest data from API', 'pull tweets', 'fetch historical data', 'sync from X', 'build a data pipeline', 'fetch without re-downloading', 'resume the download', 'backfill older data'. NOT for: simple one-shot API calls, websocket/streaming connections, file downloads, or APIs without pagination.
data-engineer
Data engineering agent for ETL pipelines, data warehousing, and analytics
spec-pipeline
Explains the required sequence and success signals for each stage.
n8n-integration-patterns
Decide when to use n8n workflows versus Next.js server actions for backend logic. Use when implementing complex multi-step workflows, AI agent pipelines, or external service integrations. Provides patterns for both runtime webhook integration and development-time architectural decisions.
no-runtime-code
Guardrails to keep the pipeline pure: specs + SEA + generators only.
cocoindex
Comprehensive toolkit for developing with the CocoIndex library. Use when users need to create data transformation pipelines (flows), write custom functions, or operate flows via CLI or API. Covers building ETL workflows for AI data processing, including embedding documents into vector databases, building knowledge graphs, creating search indexes, or processing data streams with incremental updates.
elite-mvp-master
Top 1% development standard for OC Pipeline and federal-grade construction software. Use when Bill asks to build features, fix bugs, write code, create components, or do any software development work. Activates for requests involving Supabase, Vercel, React, Node.js, database work, API development, or any coding tasks for OC Pipeline or related projects.
jn
Use JN for data transformation and ETL. Read data with 'jn cat', filter with 'jn filter', write with 'jn put'. Convert between CSV/JSON/Excel/YAML formats. Stream data through Unix pipes. Integrate with VisiData for visual exploration. Use when working with data files, format conversion, filtering data, or ETL pipelines.
sbt-benchmark
Use PROACTIVELY for JMH benchmarks in this Spark protobuf project. Handles sbt commands, always cleans before benchmarking, saves output to /tmp logs, and uses log-reader agent to parse results.
ci-pipeline
GitHub Actions CI/CD pipelines with caching, matrix builds, and deployment strategies. Focuses on build speed, reliability, and security. Use when creating or optimizing CI/CD workflows, debugging pipeline failures, or implementing deployment automation.
databricks-python-sdk
Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.
backend-development
Build robust backend systems with modern technologies (Node.js, Python, Go, Rust), frameworks (NestJS, FastAPI, Django), databases (PostgreSQL, MongoDB, Redis), APIs (REST, GraphQL, gRPC), authentication (OAuth 2.1, JWT), testing strategies, security best practices (OWASP Top 10), performance optimization, scalability patterns (microservices, caching, sharding), DevOps practices (Docker, Kubernetes, CI/CD), and monitoring. Use when designing APIs, implementing authentication, optimizing database queries, setting up CI/CD pipelines, handling security vulnerabilities, building microservices, or developing production-ready backend systems. | Sử dụng khi xây dựng API, server, backend, máy chủ, xử lý dữ liệu, endpoint, microservices.
devops
Deploy and manage cloud infrastructure on Cloudflare (Workers, R2, D1, KV, Pages, Durable Objects, Browser Rendering), Docker containers, and Google Cloud Platform (Compute Engine, GKE, Cloud Run, App Engine, Cloud Storage). Use when deploying serverless functions to the edge, configuring edge computing solutions, managing Docker containers and images, setting up CI/CD pipelines, optimizing cloud infrastructure costs, implementing global caching strategies, working with cloud databases, or building cloud-native applications. | Sử dụng khi: triển khai, Docker, Kubernetes, CI/CD, container, cấu hình server.
stream-chain
Stream-JSON chaining for multi-agent pipelines, data transformation, and sequential workflows