You've done 10 sprints.
Are your decisions actually getting better?
harvest analyzes your wheat sprint history to find patterns, score prediction accuracy, and detect stale claims. It turns your sprint archive into a learning system.
What is harvest?
wheat is a research framework where you collect evidence-graded "claims" to make technical decisions. After you have done several sprints, you have a history of predictions and outcomes.
harvest reads that history and tells you which kinds of claims tend to be right, where you are overconfident, and whether using adversarial review (/challenge) actually improves accuracy. It is the feedback loop that turns individual sprints into an organizational learning system.
Analyze sprints
Point harvest at your sprints directory. It reads all claims and calculates accuracy metrics.
$ npx @grainulation/harvest analyze ./sprints/Calibrate predictions
Compare predictions against actual outcomes. Score your accuracy over time.
$ npx @grainulation/harvest calibrate ./sprints/Generate report
Produce a self-contained HTML retrospective with charts.
$ npx @grainulation/harvest report ./sprints/ -o retro.htmlA realistic harvest workflow
You have finished a sprint and recorded outcomes. Run harvest to measure velocity, find coverage gaps, and score your prediction accuracy.
Measure sprint velocity
See how claims accumulated across phases. Identify bottlenecks — did research drag while prototyping was fast?
Find coverage gaps
Which topics have deep evidence and which are under-researched? Coverage gaps are where blind spots hide.
Score prediction accuracy
Compare what you predicted against what actually happened. A calibration score above 0.70 means your research process is working.
The outcome: Instead of wondering whether your research process works, you have numbers. Velocity shows throughput, coverage shows gaps, and calibration shows accuracy. Over multiple sprints, you learn which research habits actually improve decisions.
Are your decisions getting better?
The feedback loop that turns research sprints into a learning system.
Prediction Calibration
Score past predictions against actual outcomes. Know your accuracy rate and where you are overconfident. orchard uses these calibration scores to rank conflicting claims when two sprints disagree.
Pattern Detection
Find which research approaches lead to better decisions. Sprints with /challenge have higher accuracy. Accuracy data feeds back into silo so knowledge packs carry confidence scores from real outcomes.
Claim Decay Detection
Find stale claims that need refreshing. Technology moves fast — old evidence can mislead.
HTML Retrospectives
Dark-themed, self-contained HTML reports with charts. Share with the team or attach to a wiki.
Confidence Scoring
Auto-assign confidence levels to new wheat claims based on historical accuracy by topic and claim type. Past calibration data trains future predictions — claims in domains where you have been accurate get higher confidence, while weak areas are flagged for deeper research.
Drift Detection
Continuous monitoring that alerts when reality diverges from predictions over time. When outcomes start drifting from what a sprint predicted — a cost estimate creeping up, a performance target slipping — harvest flags it before the gap becomes a crisis.
Yes, Node 18 or later. But your project can use any language or framework.
harvest works with 2 or more completed sprints. Patterns become more reliable around 5-10 sprints.
Manual retros rely on memory and opinion. harvest works from the actual claim data — predictions vs. outcomes, evidence tiers vs. accuracy. It finds patterns humans miss, like which research approaches consistently lead to better decisions.
The ecosystem
wheat creates decisions. harvest measures whether they were right.