You've done 10 sprints.
Are your decisions actually getting better?

Name: @grainulation/harvest
Author: Grainulation

harvest analyzes your wheat sprint history to find patterns, score prediction accuracy, and detect evidence decay. It turns your sprint archive into a learning system — computing metrics from the compiler's structured output, not summarizing with an LLM.

Get started See it in action

What is harvest?

wheat is a research framework where you collect evidence-graded "claims" to make technical decisions. After you have done several sprints, you have a history of predictions and outcomes.

harvest reads that history and computes which claim types tend to be right, where teams are overconfident, and whether adversarial review (/challenge) actually improves accuracy. Measured, not guessed. It is the feedback loop that turns individual sprints into an organizational learning system.

Quick start — 3 commands

Analyze sprints

Point harvest at your sprints directory. It reads all claims and calculates accuracy metrics.

$ npx @grainulation/harvest analyze ./sprints/

Scanning 5 sprints, 127 total claims... consolidate-auth-to-monolith: 82% accuracy auth-scaling: 78% accuracy perf-baseline: 61% accuracy

Calibrate predictions

Compare predictions against actual outcomes. Score your accuracy over time.

$ npx @grainulation/harvest calibrate ./sprints/

Overconfident: perf-baseline (predicted 0.82, hit 61%) Well-calibrated: mobile-rewrite, data-pipeline Pattern: /challenge usage = +12% accuracy

Generate report

Produce a self-contained HTML retrospective with charts.

$ npx @grainulation/harvest report ./sprints/ -o retro.html

Written: retro.html Self-contained, dark theme, 6 charts Share with the team or attach to wiki.

Real example

A realistic harvest workflow

You have finished a sprint and recorded outcomes. Run harvest to measure velocity, find coverage gaps, and score your prediction accuracy.

$ npx @grainulation/harvest velocity claims.json

Measure sprint velocity

See how claims accumulated across phases. Identify bottlenecks — did research drag while prototyping was fast?

Sprint velocity: 4.2 claims/day define ████░░░░ 3 claims (1 day) research ████████████ 18 claims (4 days) prototype ██████░░ 6 claims (2 days)

$ npx @grainulation/harvest coverage claims.json

Find coverage gaps

Which topics have deep evidence and which are under-researched? Coverage gaps are where blind spots hide.

Topic coverage: performance ████████████ 12 claims (3 tiers) security ██████░░░░░░ 6 claims (2 tiers) cost ██░░░░░░░░░░ 2 claims (1 tier) [!] gap

$ npx @grainulation/harvest calibrate claims.json --outcome results.json

Score prediction accuracy

Compare what you predicted against what actually happened. A calibration score above 0.70 means your research process is working.

Calibration score: 0.73 (above baseline) Overconfident on: cost estimates (predicted low, actual 2.1x) Well-calibrated: performance, security Pattern: /challenge claims had 91% accuracy vs 68% unchallenged

The outcome: Instead of wondering whether your research process works, you have numbers. Velocity shows throughput, coverage shows gaps, and calibration shows accuracy. Over multiple sprints, you learn which research habits actually improve decisions.

Key features

Are your decisions getting better?

The feedback loop that turns research sprints into a learning system.

Prediction Calibration

Score past predictions against actual outcomes. Know your accuracy rate and where you are overconfident. orchard uses these calibration scores to rank conflicting claims when two sprints disagree.

Pattern Detection

Find which research approaches lead to better decisions. Sprints with /challenge have higher accuracy. Accuracy data feeds back into silo so knowledge packs carry confidence scores from real outcomes.

Claim Decay Detection

Find stale claims that need refreshing. Technology moves fast — old evidence can mislead.

HTML Retrospectives

Dark-themed, self-contained HTML reports with charts. Share with the team or attach to a wiki.

Confidence Scoring

Auto-assign confidence levels to new wheat claims based on historical accuracy by topic and claim type. Past calibration data trains future predictions — claims in domains where you have been accurate get higher confidence, while weak areas are flagged for deeper research.

Drift Detection

Continuous monitoring that alerts when reality diverges from predictions over time. When outcomes start drifting from what a sprint predicted — a cost estimate creeping up, a performance target slipping — harvest flags it before the gap becomes a crisis.

Frequently asked questions

Do I need Node.js?

Yes, Node 20 or later. But your project can use any language or framework.

How many sprints do I need?

harvest works with 2 or more completed sprints. Patterns become more reliable around 5-10 sprints.

How does this compare to manual retrospectives?

Manual retros rely on memory and opinion. harvest works from the actual claim data — predictions vs. outcomes, evidence tiers vs. accuracy. It finds patterns humans miss, like which research approaches consistently lead to better decisions.

The ecosystem

wheat creates decisions. harvest measures whether they were right.

wheat compiler farmer permissions barn shared tools mill export silo knowledge packs harvest analytics orchard orchestration grainulation ecosystem

You've done 10 sprints.Are your decisions actually getting better?