healthsim-trialsim

Generate realistic clinical trial synthetic data including study definitions, sites, subjects, visits, adverse events, efficacy assessments, and disposition. Use when user requests: clinical trial data, CDISC/SDTM/ADaM datasets, trial cohorts (Phase I/II/III/IV), FDA submission test data, or specific therapeutic areas like oncology or biologics/CGT.

$ Installieren

git clone https://github.com/mark64oswald/healthsim-workspace /tmp/healthsim-workspace && cp -r /tmp/healthsim-workspace/skills/trialsim ~/.claude/skills/healthsim-workspace

// tip: Run this command in your terminal to install the skill


name: healthsim-trialsim description: "Generate realistic clinical trial synthetic data including study definitions, sites, subjects, visits, adverse events, efficacy assessments, and disposition. Use when user requests: clinical trial data, CDISC/SDTM/ADaM datasets, trial cohorts (Phase I/II/III/IV), FDA submission test data, or specific therapeutic areas like oncology or biologics/CGT."

TrialSim

Status: Active Development

TrialSim generates realistic synthetic clinical trial data for testing, training, and development purposes.

For Claude

Use this skill when the user requests clinical trial data, CDISC-compliant datasets, or regulatory submission test data. This is the primary skill for generating realistic synthetic clinical trial data.

When to apply this skill:

  • User mentions clinical trials, studies, or protocols
  • User requests CDISC, SDTM, or ADaM datasets
  • User specifies trial phases (Phase I, II, III, IV)
  • User mentions FDA/EMA submission data or regulatory requirements
  • User asks for adverse events, safety data, or efficacy endpoints
  • User mentions specific therapeutic areas (oncology, cardiovascular, CNS)
  • User requests SDTM domains (DM, AE, VS, LB, CM, EX, DS, MH)

Key capabilities:

  • Generate complete study definitions with protocol parameters
  • Create multi-site, multi-country trial configurations
  • Produce subject-level longitudinal data with realistic patterns
  • Generate safety data (adverse events, labs, vitals) with MedDRA/LOINC coding
  • Create efficacy endpoints for various therapeutic areas
  • Output CDISC-compliant formats (SDTM, ADaM)

For specific trial phases, therapeutic areas, or SDTM domains, load the appropriate skill from the tables below.

Overview

TrialSim provides:

  • Complete study lifecycle data (protocol to closeout)
  • Multi-site, multi-country trial configurations
  • Subject-level longitudinal data with realistic patterns
  • Safety data (adverse events, labs, vitals)
  • Efficacy endpoints (primary, secondary, exploratory)
  • CDISC-compliant output (SDTM, ADaM)

Trigger Phrases

Activate TrialSim when user mentions:

  • "clinical trial" or "clinical study"
  • "Phase I/II/III/IV" or "pivotal trial"
  • "CDISC", "SDTM", "ADaM"
  • "FDA submission data" or "regulatory data"
  • "adverse events" or "safety data"
  • "efficacy endpoints"
  • Trial therapeutic areas (oncology, cardiology, etc.)
  • SDTM domains (DM, AE, VS, LB, CM, EX, DS)

Quick Links

Core Skills

TopicSkillDescription
Domain Knowledgeclinical-trials-domain.mdCore trial concepts, phases, regulatory
Recruitmentrecruitment-enrollment.mdScreening funnel, enrollment patterns

Trial Phase Skills

PhaseSkillDescription
Phase 1phase1-dose-escalation.mdFIH, dose escalation, MTD (3+3, BOIN, CRM)
Phase 2phase2-proof-of-concept.mdPOC, dose-ranging, futility (Simon's, MCP-Mod)
Phase 3phase3-pivotal.mdPivotal registration trials, NDA/BLA

SDTM Domain Skills

DomainSkillDescription
DMdomains/demographics-dm.mdSubject demographics, treatment arms
AEdomains/adverse-events-ae.mdAdverse events with MedDRA coding
VSdomains/vital-signs-vs.mdVital sign measurements
LBdomains/laboratory-lb.mdLaboratory results with LOINC
CMdomains/concomitant-meds-cm.mdConcomitant medications with ATC
EXdomains/exposure-ex.mdStudy drug exposure, dose modifications
DSdomains/disposition-ds.mdSubject disposition, discontinuation
MHdomains/medical-history-mh.mdMedical history, comorbidities
Domain Indexdomains/README.mdAll SDTM domains overview

Therapeutic Areas

AreaSkillKey Endpoints
Oncologytherapeutic-areas/oncology.mdRECIST, ORR, PFS, OS
Cardiovasculartherapeutic-areas/cardiovascular.mdMACE, CV outcomes
CNStherapeutic-areas/cns.mdCognitive scales, imaging
CGTtherapeutic-areas/cgt.mdCAR-T, gene therapy

Real World Evidence

TopicSkillDescription
RWE Overviewrwe/overview.mdRWE concepts, data sources
Synthetic Controlsrwe/synthetic-control.mdExternal control arm generation

Output Formats

FormatSkillUse Case
SDTM../../formats/cdisc-sdtm.mdRegulatory submission
ADaM../../formats/cdisc-adam.mdStatistical analysis
Dimensional../../formats/dimensional-analytics.mdBI dashboards, analytics
JSONDefaultAPI integration
CSV../../formats/csv.mdSpreadsheet analysis

Data Models & References

ResourceLocationDescription
Canonical Models../../references/data-models.md#trialsim-models15 entity schemas (Subject, Study, Site, AE, etc.)
Dimensional Schema../../formats/dimensional-analytics.md#trialsim-clinical-trial-analyticsStar schema for BI (7 dims, 6 facts)
Code Systems../../references/code-systems.mdMedDRA, LOINC, ATC

Core Entities

TrialSim uses 15 canonical entity schemas. See Data Models Reference for complete JSON schemas.

Entity Overview

EntitySDTM DomainDescription
SubjectDMTrial participant (extends Person)
StudyTSProtocol definition
Site-Investigational site
TreatmentArmTAStudy arm definition
VisitScheduleTVProtocol visits
ActualVisitSVSubject visit occurrence
RandomizationDM/SESubject randomization
AdverseEventAESafety events with MedDRA
ExposureEXStudy drug dosing
ConcomitantMedCMPrior/concomitant meds with ATC
TrialLabLBLab results with LOINC
EfficacyAssessmentRS/TRResponse assessments
MedicalHistoryMHPre-existing conditions
DispositionEventDSSubject disposition
ProtocolDeviationDVProtocol deviations

Key Entity Examples

Study:

{
  "study_id": "ABC-123-001",
  "protocol_title": "A Phase 3, Randomized, Double-Blind Study...",
  "phase": "Phase 3",
  "therapeutic_area": "Oncology",
  "indication": "Non-Small Cell Lung Cancer",
  "sponsor": "Example Pharma Inc.",
  "status": "Ongoing"
}

Subject (with cross-product linking):

{
  "subject_id": "0001",
  "usubjid": "ABC-123-001-001-0001",
  "site_id": "001",
  "patient_ref": "MRN-12345",
  "screening_date": "2024-01-15",
  "randomization_date": "2024-01-22",
  "treatment_arm": "TRT",
  "status": "Active"
}

Integration with Other Products

TrialSim integrates with other HealthSim products for complete clinical trial data:

FromToIntegration Pattern
PatientSimTrialSimPatient → Subject (add consent, randomization, protocol visits)
NetworkSimTrialSimProvider → Investigator (add credentials, training, delegation log)
PopulationSimTrialSimDemographics → Recruitment pool (geographic, demographic eligibility)

Cross-Product: PatientSim

Trial subjects are patients with additional trial-specific data:

Integration Pattern: Use PatientSim for baseline clinical characteristics. TrialSim adds protocol-specific assessments (RECIST, NYHA class changes), randomization, and SDTM-formatted data.

Cross-Product: PopulationSim Integration

PopulationSim v2.0 provides embedded real-world data for evidence-based trial planning, site selection, and diversity compliance. When geographies are specified, TrialSim uses actual CDC PLACES, SVI, and ADI data to ground feasibility estimates and enrollment projections.

Data-Driven Trial Planning Pattern

Step 1: Look up real population data for potential sites

# For site feasibility in Houston metro (Harris County, FIPS: 48201)
Read from: skills/populationsim/data/county/places_county_2024.csv
→ DIABETES_CrudePrev: 12.1% (for diabetes trial)
→ CHD_CrudePrev: 6.4% (for CV outcomes trial)
→ CANCER_CrudePrev: 6.2% (for oncology trial)
→ TotalPopulation: 4,731,145

Read from: skills/populationsim/data/county/svi_county_2022.csv
→ RPL_THEMES: 0.68 (moderate-high vulnerability)
→ EP_MINRTY: 72.1% (supports diversity requirements)

Step 2: Apply to site feasibility estimation

{
  "site_feasibility": {
    "county_fips": "48201",
    "county_name": "Harris County, TX",
    "indication": "Type 2 Diabetes",
    "eligible_population": {
      "total_population": 4731145,
      "disease_prevalence": 0.121,
      "prevalent_patients": 572467,
      "age_eligible_18_75": 458974,
      "funnel_to_screenable": 0.05,
      "annual_screenable": 22949
    },
    "diversity_metrics": {
      "minority_percentage": 0.721,
      "meets_fda_diversity_guidance": true
    },
    "data_provenance": {
      "source": "CDC_PLACES_2024",
      "data_year": 2022
    }
  }
}

Step 3: Generate realistic enrollment projections

  • Site catchment based on real prevalence (not national averages)
  • Diversity enrollment reflecting actual demographics
  • Screening-to-randomization rates adjusted for SVI (access barriers)

Embedded Data Sources for Trial Planning

SourceFileUse in TrialSim
CDC PLACES Countypopulationsim/data/county/places_county_2024.csvDisease prevalence for feasibility
CDC PLACES Tractpopulationsim/data/tract/places_tract_2024.csvCatchment area analysis
SVI Countypopulationsim/data/county/svi_county_2022.csvDiversity planning, access barriers
SVI Tractpopulationsim/data/tract/svi_tract_2022.csvSite-level vulnerability context
Geography Crosswalkpopulationsim/data/crosswalks/cbsa_definitions.csvMetro area site clustering

Trial-Specific Applications

ApplicationData UsedTrialSim Integration
Site FeasibilityPLACES disease prevalence + populationEligible patient pool sizing
Diversity PlanningSVI EP_MINRTY, demographicsFDA diversity guidance compliance
Enrollment ProjectionPLACES + SVI access indicatorsScreening/randomization rates
Site SelectionMulti-county PLACES comparisonOptimal site network design
Catchment AnalysisTract-level PLACESDrive-time eligible population

Example: Data-Grounded Phase III Site Selection

Request: "Identify top 5 US counties for a Phase III NASH trial based on patient availability"

Data Lookup Process:

Query places_county_2024.csv for:
  - High OBESITY_CrudePrev (NASH proxy)
  - High DIABETES_CrudePrev (comorbidity)
  - Large TotalPopulation (volume)

Query svi_county_2022.csv for:
  - EP_MINRTY (diversity potential)
  - EP_UNINSUR (access consideration)

Output with Provenance:

{
  "recommended_sites": [
    {
      "rank": 1,
      "county_fips": "48201",
      "name": "Harris County, TX",
      "obesity_prevalence": 0.328,
      "diabetes_prevalence": 0.121,
      "population": 4731145,
      "minority_pct": 0.721,
      "estimated_eligible": 45000,
      "diversity_score": "excellent"
    }
  ],
  "data_provenance": {
    "sources": ["CDC_PLACES_2024", "CDC_SVI_2022"],
    "methodology": "prevalence_weighted_ranking"
  }
}

Integration with Trial-Support Skills

PopulationSim SkillTrialSim ApplicationData Source
data-lookup.mdExact prevalence for feasibilityCDC PLACES 2024
county-profile.mdSite catchment demographicsPLACES + SVI
svi-analysis.mdDiversity and access analysisCDC SVI 2022
feasibility-estimation.mdProtocol feasibility funnelAll sources
diversity-planning.mdFDA diversity complianceSVI demographics

Key Principle: When planning trials, always ground feasibility and diversity estimates in real PopulationSim data. This enables evidence-based site selection and realistic enrollment projections.

Development Status

ComponentStatus
SKILL.md (this file)✅ Complete
clinical-trials-domain.md✅ Complete
recruitment-enrollment.md✅ Complete
phase3-pivotal.md✅ Complete
domains/ (DM, AE, VS, LB, CM, EX, DS, MH)✅ Complete
therapeutic-areas/✅ Complete
rwe/✅ Complete
phase1-dose-escalation.md✅ Complete
phase2-proof-of-concept.md✅ Complete

Related Skills

Output Formats

TrialSim supports multiple output formats:

FormatUse CaseSkill Reference
Canonical JSONInternal processing, API integrationdata-models.md
CDISC SDTMRegulatory submission, FDA/EMAcdisc-sdtm.md
CDISC ADaMAnalysis datasets, statistical programmingcdisc-adam.md
Dimensional (Star Schema)Analytics, BI dashboards, DuckDB/Databricksdimensional-analytics.md

Dimensional Analytics

For trial operations analytics and BI dashboards, request dimensional output:

Generate Phase III trial with 100 subjects as star schema for DuckDB

This produces:

  • Dimensions: dim_study, dim_site, dim_subject, dim_treatment_arm, dim_visit_schedule, dim_meddra, dim_lab_test
  • Facts: fact_enrollment, fact_visit, fact_adverse_event, fact_exposure, fact_efficacy, fact_lab_result

See dimensional-analytics.md for full DDL and example queries.

Usage Examples

Example 1: Basic Phase 3 Trial

Prompt: "Generate a Phase 3 oncology trial with 200 subjects"

Output:

{
  "study": {
    "study_id": "ONCO-2025-001",
    "protocol_title": "A Phase 3, Randomized, Double-Blind, Placebo-Controlled Study of ABC-123 in Advanced NSCLC",
    "phase": "Phase 3",
    "therapeutic_area": "Oncology",
    "indication": "Non-Small Cell Lung Cancer",
    "design": "Randomized, Double-Blind, Placebo-Controlled",
    "primary_endpoint": "Overall Survival",
    "target_enrollment": 200,
    "arms": [
      { "arm_id": "A", "name": "ABC-123 + SOC", "allocation_ratio": 1 },
      { "arm_id": "B", "name": "Placebo + SOC", "allocation_ratio": 1 }
    ]
  },
  "sites": [
    { "site_id": "001", "name": "Memorial Cancer Center", "country": "USA", "enrolled": 25 },
    { "site_id": "002", "name": "University Oncology", "country": "USA", "enrolled": 22 }
  ],
  "subjects": [
    {
      "subject_id": "001-0001",
      "site_id": "001",
      "arm": "A",
      "status": "Active",
      "randomization_date": "2025-01-15"
    }
  ]
}

Example 2: Adverse Events with MedDRA

Prompt: "Generate adverse events for a 50-subject immunotherapy trial"

Output:

{
  "domain": "AE",
  "adverse_events": [
    {
      "USUBJID": "IO-001-0023",
      "AESEQ": 1,
      "AETERM": "Fatigue",
      "AEDECOD": "Fatigue",
      "AEBODSYS": "General disorders and administration site conditions",
      "AESEV": "MILD",
      "AESER": "N",
      "AEREL": "POSSIBLY RELATED",
      "AESTDTC": "2025-02-10",
      "AEENDTC": "2025-02-18",
      "AEOUT": "RECOVERED/RESOLVED"
    },
    {
      "USUBJID": "IO-001-0007",
      "AESEQ": 1,
      "AETERM": "Immune-mediated colitis",
      "AEDECOD": "Colitis",
      "AEBODSYS": "Gastrointestinal disorders",
      "AESEV": "SEVERE",
      "AESER": "Y",
      "AESHOSP": "Y",
      "AEREL": "RELATED",
      "AEACN": "DRUG INTERRUPTED",
      "AESTDTC": "2025-03-05",
      "AEOUT": "NOT RECOVERED/NOT RESOLVED"
    }
  ]
}

Example 3: Screening and Enrollment

Prompt: "Generate screening data with 30% screen failure rate for 100 subjects"

Output:

{
  "screening_summary": {
    "total_screened": 143,
    "screen_passed": 100,
    "screen_failed": 43,
    "screen_failure_rate": 0.30
  },
  "screen_failures": [
    {
      "screening_id": "SCR-0015",
      "failure_reason": "Inclusion criteria not met",
      "failure_detail": "EGFR mutation negative",
      "screening_date": "2025-01-20"
    },
    {
      "screening_id": "SCR-0028",
      "failure_reason": "Exclusion criteria met",
      "failure_detail": "Prior immunotherapy within 6 months",
      "screening_date": "2025-01-25"
    }
  ]
}

Example 4: DM Domain SDTM Output

Prompt: "Generate DM domain for 10 subjects as SDTM"

Output:

{
  "domain": "DM",
  "records": [
    {
      "STUDYID": "ONCO-2025-001",
      "DOMAIN": "DM",
      "USUBJID": "ONCO-2025-001-001-0001",
      "SUBJID": "0001",
      "SITEID": "001",
      "RFSTDTC": "2025-01-22",
      "RFENDTC": null,
      "BRTHDTC": "1958-05-15",
      "AGE": 66,
      "AGEU": "YEARS",
      "SEX": "M",
      "RACE": "WHITE",
      "ETHNIC": "NOT HISPANIC OR LATINO",
      "ARMCD": "TRT",
      "ARM": "ABC-123 + SOC",
      "COUNTRY": "USA"
    }
  ]
}

Example 5: Laboratory Results with LOINC

Prompt: "Generate LB domain with liver function tests for safety monitoring"

Output:

{
  "domain": "LB",
  "records": [
    {
      "STUDYID": "SAFE-001",
      "DOMAIN": "LB",
      "USUBJID": "SAFE-001-001-0042",
      "LBSEQ": 1,
      "LBTESTCD": "ALT",
      "LBTEST": "Alanine Aminotransferase",
      "LBCAT": "CHEMISTRY",
      "LBORRES": "32",
      "LBORRESU": "U/L",
      "LBSTRESN": 32,
      "LBSTRESU": "U/L",
      "LBSTNRLO": 7,
      "LBSTNRHI": 56,
      "LBNRIND": "NORMAL",
      "LBLOINC": "1742-6",
      "LBBLFL": "Y",
      "VISITNUM": 2,
      "VISIT": "BASELINE"
    }
  ]
}

Generative Framework Integration

TrialSim integrates with the Generative Framework for specification-driven generation at scale.

Profile-Driven Generation

Use profile specifications to generate trial subject populations:

"Generate 150 subjects for an oncology Phase 3 trial"

The Profile Executor will:

  1. Sample demographics meeting I/E criteria
  2. Generate baseline disease characteristics
  3. Apply randomization to treatment arms
  4. Create screening and baseline assessments

Journey-Driven Generation

Attach protocol journey specifications to create visit sequences:

"Add a 6-cycle treatment protocol journey"

The Journey Executor will:

  1. Generate protocol visits at specified windows
  2. Create assessments per visit schedule
  3. Apply visit variance within windows
  4. Handle protocol deviations and early termination

Cross-Domain Sync

When generating across products, TrialSim entities are automatically linked:

TrialSim EntityLinks To
SubjectPatientSim Patient (via SSN)
SiteNetworkSim Facility
InvestigatorNetworkSim Provider
ConmedRxMemberSim Fill (if applicable)

See: ../generation/executors/cross-domain-sync.md