You've done 10 sprints.
Are your decisions actually getting better?

harvest analyzes your wheat sprint history to find patterns, score prediction accuracy, and detect stale claims. It turns your sprint archive into a learning system.


What is harvest?

wheat is a research framework where you collect evidence-graded "claims" to make technical decisions. After you have done several sprints, you have a history of predictions and outcomes.

harvest reads that history and tells you which kinds of claims tend to be right, where you are overconfident, and whether using adversarial review (/challenge) actually improves accuracy. It is the feedback loop that turns individual sprints into an organizational learning system.


Quick start — 3 commands
1

Analyze sprints

Point harvest at your sprints directory. It reads all claims and calculates accuracy metrics.

$ npx @grainulation/harvest analyze ./sprints/
Scanning 5 sprints, 127 total claims... graphql-migration: 82% accuracy auth-scaling: 78% accuracy perf-baseline: 61% accuracy
2

Calibrate predictions

Compare predictions against actual outcomes. Score your accuracy over time.

$ npx @grainulation/harvest calibrate ./sprints/
Overconfident: perf-baseline (predicted 0.82, hit 61%) Well-calibrated: mobile-rewrite, data-pipeline Pattern: /challenge usage = +12% accuracy
3

Generate report

Produce a self-contained HTML retrospective with charts.

$ npx @grainulation/harvest report ./sprints/ -o retro.html
Written: retro.html Self-contained, dark theme, 6 charts Share with the team or attach to wiki.

A realistic harvest workflow

You have finished a sprint and recorded outcomes. Run harvest to measure velocity, find coverage gaps, and score your prediction accuracy.

1
$ npx @grainulation/harvest velocity claims.json

Measure sprint velocity

See how claims accumulated across phases. Identify bottlenecks — did research drag while prototyping was fast?

Sprint velocity: 4.2 claims/day define ████░░░░ 3 claims (1 day) research ████████████ 18 claims (4 days) prototype ██████░░ 6 claims (2 days)
2
$ npx @grainulation/harvest coverage claims.json

Find coverage gaps

Which topics have deep evidence and which are under-researched? Coverage gaps are where blind spots hide.

Topic coverage: performance ████████████ 12 claims (3 tiers) security ██████░░░░░░ 6 claims (2 tiers) cost ██░░░░░░░░░░ 2 claims (1 tier) [!] gap
3
$ npx @grainulation/harvest calibrate claims.json --outcome results.json

Score prediction accuracy

Compare what you predicted against what actually happened. A calibration score above 0.70 means your research process is working.

Calibration score: 0.73 (above baseline) Overconfident on: cost estimates (predicted low, actual 2.1x) Well-calibrated: performance, security Pattern: /challenge claims had 91% accuracy vs 68% unchallenged

The outcome: Instead of wondering whether your research process works, you have numbers. Velocity shows throughput, coverage shows gaps, and calibration shows accuracy. Over multiple sprints, you learn which research habits actually improve decisions.


Are your decisions getting better?

The feedback loop that turns research sprints into a learning system.

Prediction Calibration

Score past predictions against actual outcomes. Know your accuracy rate and where you are overconfident. orchard uses these calibration scores to rank conflicting claims when two sprints disagree.

Pattern Detection

Find which research approaches lead to better decisions. Sprints with /challenge have higher accuracy. Accuracy data feeds back into silo so knowledge packs carry confidence scores from real outcomes.

Claim Decay Detection

Find stale claims that need refreshing. Technology moves fast — old evidence can mislead.

HTML Retrospectives

Dark-themed, self-contained HTML reports with charts. Share with the team or attach to a wiki.

Confidence Scoring

Auto-assign confidence levels to new wheat claims based on historical accuracy by topic and claim type. Past calibration data trains future predictions — claims in domains where you have been accurate get higher confidence, while weak areas are flagged for deeper research.

Drift Detection

Continuous monitoring that alerts when reality diverges from predictions over time. When outcomes start drifting from what a sprint predicted — a cost estimate creeping up, a performance target slipping — harvest flags it before the gap becomes a crisis.


Do I need Node.js?

Yes, Node 18 or later. But your project can use any language or framework.

How many sprints do I need?

harvest works with 2 or more completed sprints. Patterns become more reliable around 5-10 sprints.

How does this compare to manual retrospectives?

Manual retros rely on memory and opinion. harvest works from the actual claim data — predictions vs. outcomes, evidence tiers vs. accuracy. It finds patterns humans miss, like which research approaches consistently lead to better decisions.