corn kernel grading with computer vision

corn is one of those crops that looks uniform from a distance — until you put it under a camera. at scale, kernel-level defects separate into distinct failure modes: cracked pericarps, mold colonization, insect boring, heat-discolored endosperm, sprout damage, and the less obvious but operationally critical issue of size inconsistency. each of these maps to measurable impacts on germination rate, handling behavior, and ultimately the USDA grade of the batch.

the system described here is not a demo. it is a repeatable pipeline that scores a batch quickly, annotates defects with bounding boxes, and returns a grade that follows the united states standards for corn (7 cfr §810.404).


why this problem, why computer vision

corn is a useful target because defects are visually meaningful and the downstream impact is immediate. but the choice of computer vision is driven by specific structural properties of the problem:

  • surface defects are observable — mold, cracking, insect boring, and heat discoloration all produce pixel-level signals that survive moderate image compression.
  • defects map to actionable categories — each visual class corresponds to a specific usda damage category, not a vague "unhealthy" label.
  • output ties to a grading policy — the detection results feed directly into a deterministic ruleset based on 7 cfr §810.404.

the practical challenge is that agricultural data is messy. kernels overlap, lighting varies across samples, some defects are subtle (early-stage mold vs. healthy), and a single 12mp image can contain 200+ kernels with multiple defect types simultaneously. this pushes the system toward object detection rather than image-level classification — we need per-kernel counts, not a single "good/bad" score.


system architecture: two-layer design

the system operates in two layers:

layer 1 — perception (object detection):

  • localize each kernel or defect region in the image
  • classify each detection into: normal, mold damage, blue-eye mold damage, insect damage, drier damage, sprout damage, heat damage, cracked
  • produce confidence scores and bounding boxes

layer 2 — decision (grading):

  • aggregate detections into batch-level statistics
  • compute damage percentage and heat-damage percentage
  • compare against usda grade thresholds
  • return a grade: u.s. no. 1 through u.s. no. 5

this split matters. the detector does not pretend to be the grader. the grader does not pretend to understand the image. the boundary keeps the system debuggable: bounding boxes show what the model saw, the grade shows what the system decided, and the raw json lets you audit both.


usda corn grading standards (7 cfr §810.404)

the united states standards for corn define five grades. each grade has hard limits on four parameters:

gradetest weight (lb/bu)damaged kernels (%)heat-damaged kernels (%)broken corn & foreign material (%)
u.s. no. 1≥ 56.0≤ 0.1≤ 2.0≤ 3.0
u.s. no. 2≥ 54.0≤ 0.2≤ 3.0≤ 5.0
u.s. no. 3≥ 52.0≤ 0.5≤ 4.0≤ 7.0
u.s. no. 4≥ 49.0≤ 1.0≤ 5.0≤ 10.0
u.s. no. 5≥ 46.0≤ 3.0≤ 7.0≤ 15.0

key definitions (7 cfr §810.402)

damaged kernels — kernels and pieces that are badly ground-damaged, badly weather-damaged, diseased, frost-damaged, germ-damaged, heat-damaged, insect-bored, mold-damaged, sprout-damaged, or otherwise materially damaged.

heat-damaged kernels — a subset of damaged kernels, defined specifically as kernels materially discolored and damaged by heat. the key distinction: heat damage must show visible discoloration extending from the germ toward the sides or back of the kernel.

basis of determination (§810.403) — all determinations are made on the grain after removal of broken corn and foreign material (bcfm).

sample size methodology

parameterportion size
damage assessment250 g
moisture400 g
test weight1,000 g

for small samples (< 100 kernels), the system switches to absolute kernel counts rather than percentages — matching field practice for small seed lots.


dataset and model

dataset construction

this is a custom-trained model, not a pretrained base. the dataset was built specifically for this project — internally referred to as grain-clf — trained end-to-end on roboflow.

kernel selection — samples were collected from real lots with deliberate variety in damage type. selecting visually diverse kernels matters: if the dataset skews toward clean kernels, the model never learns to distinguish early-stage mold from heat discoloration.

annotation — bounding boxes were drawn per-kernel against the eight usda damage classes. annotation standards were verified against department of agricultural research guidelines before training began.

image: batch annotation in progress corn kernels being annotated roboflow annotation interface showing multiple kernel classes labeled across a full sample tray.

augmentation — augmentations were chosen to simulate real-world input variance. the target users submit images from different devices and lighting conditions, so training data needed to reflect that. heavy geometric distortion was excluded to preserve defect texture.

image: roboflow augmentation pipeline roboflow augmentation config the full transform chain applied during training. images are auto-oriented and stretched to 416×416 — the native yolov11 input resolution — before any geometric augmentation runs. per-bounding-box flip and rotation keep label positions consistent with the transformed pixels. brightness (±15%) and minimal blur/noise simulate the variance in phone camera quality across different users.

why yolov11

yolov11m was chosen as the backbone architecture over yolov8m for three reasons:

considerationyolov8myolov11madvantage
mAP@50-95 (coco)50.2%51.5%+1.3%
parameters25.9m20.1m-22%
inference (t4)~5.0ms~4.4ms+12% faster
small-object headstandard panetc2psa attentionbetter fine detail

these are base model comparisons (coco) used to justify architecture selection. the deployed model is the grain-clf custom-trained version.

architecture highlights:

  • c3k2 blocks replace c2f — two 3×3 convolutions instead of one large convolution, cutting parameters while preserving receptive field.
  • c2psa module (cross-stage partial spatial attention) after stage 4 — combines self-attention with csp connections for improved global modeling of small, occluded objects.
  • optimized panet neck — better multi-scale feature fusion for kernels appearing at different image scales.

trained model metrics

metricvalue
mAP@5016.1%
precision66.5%
recall20.4%

the precision/recall gap reflects an early training run. high precision (66.5%) means detections the model does make are largely correct — it is not hallucinating random damage. low recall (20.4%) means many real defects are still missed, primarily subtle early-stage mold and cracked pericarps. the practical implication: the current model is useful for flagging obvious damage and building a grading baseline, but the dataset needs more coverage of edge-case kernels before it can drive hard grade decisions at u.s. no. 1 thresholds.


detection postprocessing

raw yolov11 output contains overlapping boxes and duplicate detections. the postprocessing pipeline cleans this up before grading.

intersection over union

def iou(box_a, box_b):
    """
    compute intersection over union for two boxes.
    boxes are (x0, y0, x1, y1) in pixel coordinates.
    """
    x_a = max(box_a[0], box_b[0])
    y_a = max(box_a[1], box_b[1])
    x_b = min(box_a[2], box_b[2])
    y_b = min(box_a[3], box_b[3])
    
    inter = max(0, x_b - x_a) * max(0, y_b - y_a)
    area_a = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1])
    area_b = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1])
    
    return inter / float(area_a + area_b - inter + 1e-6)

non-maximum suppression

def non_max_suppression(predictions, iou_threshold=0.3):
    boxes = []
    for pred in predictions:
        x0 = pred['x'] - pred['width'] / 2
        y0 = pred['y'] - pred['height'] / 2
        x1 = pred['x'] + pred['width'] / 2
        y1 = pred['y'] + pred['height'] / 2
        boxes.append((x0, y0, x1, y1, pred['confidence'], pred['class']))
    
    boxes = sorted(boxes, key=lambda b: b[4], reverse=True)
    keep = []
    while boxes:
        chosen = boxes.pop(0)
        keep.append(chosen)
        boxes = [b for b in boxes if iou(b, chosen) < iou_threshold]
    
    return [{
        'x': (b[0] + b[2]) / 2, 'y': (b[1] + b[3]) / 2,
        'width': b[2] - b[0], 'height': b[3] - b[1],
        'confidence': b[4], 'class': b[5]
    } for b in keep]

the iou_threshold=0.3 was chosen empirically — corn kernels are roughly circular and don't form dense clusters where a higher threshold would be needed.


grading logic

the grading layer converts per-kernel detections into a usda grade. the implementation encodes 7 cfr §810.404 directly:

# format: (grade_name, max_damage_pct, max_damage_kernels, max_heat_pct, max_heat_kernels)
USDA_GRADES = [
    ("u.s. no. 1", 3.0,   1, 2.0,   0),
    ("u.s. no. 2", 5.0,   2, 3.0,   0),
    ("u.s. no. 3", 7.0,   3, 4.0,   0),
    ("u.s. no. 4", 10.0,  5, 5.0,   0),
    ("u.s. no. 5", 15.0,  7, 7.0,   1)
]

def classify_grade(damage_pct, damage_kernels, heat_pct, heat_kernels, total_kernels):
    if total_kernels < 100:
        for grade, _, max_dmg_k, _, max_heat_k in USDA_GRADES:
            if damage_kernels <= max_dmg_k and heat_kernels <= max_heat_k:
                return grade
    else:
        for grade, max_dmg_pct, _, max_heat_pct, _ in USDA_GRADES:
            if damage_pct <= max_dmg_pct and heat_pct <= max_heat_pct:
                return grade
    return "sample grade"  # below u.s. no. 5

damage class grouping

detection classcounts towardusda category
mold damagedamaged kernels§810.402(d)
blue-eye mold damagedamaged kernels§810.402(d)
insect damagedamaged kernels§810.402(d)
drier damagedamaged kernels§810.402(d)
sprout damagedamaged kernels§810.402(d)
crackeddamaged kernels§810.402(d)
heat damageheat-damaged kernels§810.402(f)
normalnot damaged

note: heat damage is counted twice — once toward the general damage kernel limit, and again toward the stricter heat-damaged kernel limit.

image: annotated kernel sample 4 corn kernels with detection boxes yolov11 inference output showing mold damage (red), blue-eye mold (dark red), and insect damage (orange) on individual kernels.


serving infrastructure

the inference stack is a flask service handling image upload, async inference, and annotated result rendering.

asynchronous processing pipeline

progress_store = {}  # in-memory progress tracker

@app.route("/analyze", methods=["POST"])
def analyze():
    file = request.files['image']
    moisture = float(request.form.get("moisture", 0))
    weight = float(request.form.get("weight", 0))
    progress_id = str(uuid.uuid4())
    
    filename = str(uuid.uuid4()) + path.splitext(file.filename)[1]
    filepath = os.path.join(UPLOAD_FOLDER, filename)
    file.save(filepath)
    
    thread = threading.Thread(
        target=process_image_async,
        args=(filepath, filename, moisture, weight, progress_id))
    thread.start()
    return jsonify({'progress_id': progress_id})

image compression strategy

def compress_image(image_path, max_size_mb=5, quality=85, progress_id=None):
    img = image.open(image_path).convert("rgb")
    target_pixels = 1920 * 1920
    current_pixels = img.size[0] * img.size[1]
    if current_pixels > target_pixels:
        ratio = (target_pixels / current_pixels) ** 0.5
        img = img.resize((int(img.size[0]*ratio), int(img.size[1]*ratio)),
                       image.resampling.lanczos)
    
    for attempt in range(3):
        buffer = io.bytesio()
        img.save(buffer, format='jpeg', quality=quality, optimize=True)
        size_mb = buffer.tell() / (1024 * 1024)
        if size_mb <= max_size_mb or quality <= 20:
            with open(image_path, 'wb') as f:
                f.write(buffer.getvalue())
            return size_mb
        quality = max(20, quality - 20)

the retry logic handles 413 errors from the inference api by recompressing to 2mb before failing gracefully.


output format and transparency

the result returns both the interpreted grade and the evidence:

{
  "counts": {"normal": 187, "mold damage": 3, "heat damage": 2, "cracked": 1},
  "grade": "u.s. no. 2",
  "total_damage_pct": 3.1,
  "heat_damage_pct": 1.0,
  "total_kernels": 193
}

the annotated image draws bounding boxes with class-specific colors (blue for normal, red for mold, darkred for blue-eye mold). the raw yolov11 json is also returned for debugging — transparency is essential for a system operators need to trust.


what the system gets right

the strongest architectural decision is the separation between perception and decision:

componentresponsibility
yolov11 detectorvisual classification of each kernel/defect
nms postprocessingdeduplication, noise reduction
grading rulesetdeterministic usda compliance check
flask serviceimage handling, async processing, retry logic

this is easier to debug than an end-to-end "quality score" model because each layer can be inspected independently.


limitations and next steps

current limitations:

  • assumes kernels are spread on a flat surface with reasonable lighting. piled kernels or severe occlusion will reduce detection accuracy.
  • heat damage detection has lower precision than mold detection — subtle discoloration is harder to distinguish from normal color variation.
  • the usda grade table is hardcoded; other jurisdictions (eu, thailand, etc.) need different thresholds.

roadmap:

  • moisture and test-weight adjustments (moisture > 15% affects grade)
  • expand to other grains (wheat, soybean) with the same pipeline
  • batch-level estimation by aggregating multiple image samples
  • subscription api model for commercial deployment
  • exportable pdf reports with usda-compliant grading certificates

closing note

the hard part of this project is not drawing bounding boxes. the hard part is turning noisy agricultural images into a decision that is fast, explainable, and compliant with a known standard. yolov11 gives us the perception layer; the usda standards give us the decision layer. the value is in the interface between them — and in making sure the operator can always see why a grade was assigned.

↑↑↓↓←→←→BA