AI Solution Brief· GSAI · 2026 · 04 · 0017Delivered / Production

Turn a handwritten order into
structured data
ready for accounting in 3 seconds

This is one entry from Wavesteam Technology's AI solution library. For handwritten order recognition, we benchmarked general OCR APIs, multimodal large models, and a domain-specific OCR model, then delivered a hybrid production pipeline. On messy handwriting, folded paper, cross-line corrections, and overlapping fields, field-level accuracy improved from 68.4% to 96.1%.

View sample pipeline Explore the technical approach8 min read · v2.1 · updated May 10, 2026

Field-level accuracy

96.1%

1,200 complex handwritten samples

Average processing time

1.8s

End to end · P95 ≤ 3.0s

Manual review workload

-82%

Before vs. after launch

Production runtime

11 months

6 order types / 3 customers

/ 01The Problem

The real business problem

The customer is a fast-moving consumer goods distributor serving hundreds of retail outlets. Orders arrive as handwritten paper forms and must be entered into ERP for fulfillment and reconciliation. The actual paper conditions are far messier than a clean demo sample.

P-01PAIN POINT

Highly variable handwriting

Different sales reps write with different styles, pressure, angles, cross-line notes, corrections, signatures, and overwritten fields. Traditional OCR often fails at the full-row level.

P-02PAIN POINT

Ambiguous field meaning

Quantity, unit price, product name, and remarks are often mixed together. Product shorthand must be normalized against business vocabulary and SKU context.

P-03PAIN POINT

Manual entry became the bottleneck

The team had to process 1,200–1,800 orders per day within a narrow window. Two operators were working overtime and still produced avoidable entry errors.

P-04PAIN POINT

Small errors created financial risk

A wrong quantity, price, or customer name affects fulfillment, reconciliation, and month-end settlement. The impact is not inconvenience; it is real financial exposure.

/ 02Live Sample

One order through the full pipeline

The sample below shows a desensitized handwritten order from a real workflow. We place the original image, visual recognition overlay, and final structured JSON side by side so stakeholders can see what the AI actually does.

INPUTOriginal handwritten order

jpg · 1408×1056

OUTPUTRecognition output

Folded paper100%

Creases shift field positions, but all 12 rows are correctly aligned.

Cross-line correction97%

Crossed-out prices and handwritten replacements are interpreted with an audit trail.

Overlapping fields94%

Remarks overlapping the price column are reassigned through structured post-processing.

/ 03How We Think

We did not bet on one model. We combined the right tools.

For production AI, we do not choose a model first. We benchmark practical options against real samples, then design a hybrid pipeline where each model handles the part it is best suited for.

Path

General cloud OCR APIs

Tencent Cloud / Alibaba Cloud / Baidu AI Cloud

Fast to integrate
Excellent on printed text and regular tables
Usage-based cost works for early testing
Weak on casual handwriting and connected strokes
Cannot understand customer-specific product shorthand
Field structure must still be rebuilt by the business layer

Printed accuracy: 98%+
This scenario: 68.4%
Structuring: Custom build

Our call · Useful as a baseline and fallback channel

Path

Multimodal LLM reading

GPT-4o / Qwen-VL / Tongyi Qianwen VL

Strong contextual understanding for corrections and messy forms
Can output structured JSON directly
Good semantic generalization across product vocabulary
Higher cost and latency per order
Field-level numeric accuracy can fluctuate
Private deployment and compliance require additional design

This scenario: 88.7%
Latency: 4.2s
Cost / 1k pages: ≈ ¥48

Our call · Best for difficult samples and semantic correction

Path

Domain-specific OCR model

Wavesteam · GS-OCR-Hand v2

Fine-tuned on real customer samples
Two-stage detection and recognition is controllable and explainable
Lower inference cost and latency
Needs ongoing data-loop maintenance
New form types still need migration samples
Does not fully solve cross-field semantic judgment alone

This scenario: 93.2%
Latency: 0.9s
Cost / 1k pages: ≈ ¥6

Our call · Primary path for more than 90% of daily traffic

Final Decision

Hybrid by design

The delivered system uses a domain OCR primary path, a multimodal semantic correction path, and cloud OCR fallback. Most routine orders finish in under one second. Low-confidence fields are routed to the multimodal model, and poor-quality samples or service failures fall back automatically with manual-review flags.

This structure balances accuracy, cost, latency, and controllability. The engineering principle is simple: use software architecture to turn model uncertainty into business certainty.

/ 04How It Works

An explainable, degradable, and improvable AI pipeline

We treat AI as a pipeline, not a black box. Every step has a defined responsibility, input, output, and fallback strategy.

STEP / 011
Capture and layout normalization
Images enter through mobile, scanners, or forms. Distortion, shadow, white balance, and layout are normalized before recognition.
DocAlignDeskewShadow Removal
STEP / 022
Field detection
A layout-aware detector identifies headers, rows, columns, and field roles before recognition.
DBNet++Layout-awareRoI Routing
STEP / 033
Character recognition
GS-OCR-Hand v2 is fine-tuned on real handwritten samples. Low-confidence fields are routed forward for semantic review.
CRNN + AttentionDomain fine-tuneConfidence routing
STEP / 044
Multimodal semantic correction
For corrections, cross-line notes, and context-heavy fields, a multimodal model reads against SKU dictionaries and historical order context.
VLMRAG · SKU dictSelf-consistency
STEP / 055
Structuring and rule validation
The output is normalized with unit conversion, price-range checks, customer matching, total checks, and audit logs.
Rule engineEntity resolveAudit log

System architecture · Layered view

v2.1 · 2026.04

L1 · Access

Capture and access layer

Five business channels with idempotency, rate limits, and desensitization.

Channel

Mobile app

Channel

Scanner webhook

Channel

Feishu form sync

Gateway

API Gateway · JWT

L2 · Inference

AI inference layer

Three inference paths plus a confidence-aware router.

Primary

GS-OCR-Hand v2

Semantic

Qwen-VL · LLM

Fallback

Tencent OCR

Router

Confidence router

L3 · Business

Business orchestration layer

Connects OCR output to ERP, reconciliation, and review workflows.

Engine

Field normalization

Match

Customer / SKU matching

Audit

Total validation

Workflow

Manual review queue

L4 · Data Loop

Data-loop layer

Online corrections flow back into datasets so the model improves monthly.

Storage

Sample store

Label

Online labeling

Train

Monthly fine-tune

Monitor

Drift alerts

/ 05How We Deliver

Ten weeks to launch, then a continuous data loop

We deliver AI systems as accepted, measurable engineering projects. Before launch, the work is milestone-based; after launch, the data loop keeps improving the model.

Week 0Milestone · 01
Scenario diagnosis
Walk through the real order flow with business and IT stakeholders.
Process mapData and compliance checklistSuccess metrics
Week 1–2Milestone · 02
Data cold start
Collect 4,600 real forms and build the first training and evaluation sets.
Labeling guide1,200-sample evaluation setBaseline metrics
Week 3–4Milestone · 03
Parallel path evaluation
Run cloud OCR, multimodal reading, and custom OCR on the same samples.
Evaluation reportCost-latency-accuracy quadrant
Week 5–7Milestone · 04
Hybrid pipeline build
Build confidence routing, semantic correction, fallback, and pressure tests.
Model weightsInference service v1.0Load-test report
Week 8–9Milestone · 05
Gray release
Run AI and human entry in parallel at one warehouse for reconciliation.
Daily reconciliationReview interface v1
Week 10Milestone · 06
Full launch
Roll out to six warehouses and sign off against KPI targets.
RunbookRollback plan
OngoingMilestone · 07
Data loop and monthly iteration
Online errors flow back into the sample store for incremental improvement.
Monthly samplesModel review notes

/ 06The Result

Turning uncertain recognition into measurable business value

Instead of vague claims, we use same-sample before-and-after metrics and customer feedback to show whether the system solved the real problem.

Metric

Before

After

Change

Field-level accuracy

BEFORE68.4%

AFTER96.1%

+27.7pp

Average processing time

BEFORE≈ 90s manual

AFTER1.8s

-98%

Daily peak throughput

BEFORE≈ 1,800 pages

AFTER≈ 12,000 pages

×6.7

Reconciliation discrepancies

BEFORE0.7%

AFTER0.04%

-94%

End-to-end cost per page

BEFORE¥0.42 manual

AFTER¥0.06

-85%

Manual review workload

BEFORE2 full-time operators

AFTER0.4 FTE review only

-82%

“
Month-end reconciliation used to be our biggest headache. Since this OCR system went live, orders from six warehouses are basically scanned, structured in seconds, and written into ERP. More importantly, the models and data stay in our private cloud.
ITIT Director · FMCG distributor in South China

Sign-Off

Sign-off milestones

UAT passed2025.06.20
Full launch2025.06.27
First monthly review2025.07.31
Annual renewal2026.05.08

/ 07Where It Fits

The same pattern applies to a family of document problems

The hybrid OCR + multimodal correction + fallback pattern can be reused for forms, tickets, handwritten logs, and business documents where structure matters.

Restaurant supply chainReusable →

Handwritten hotel and restaurant order forms

Stock replenishment forms can go directly into inventory systems.

FMCG distributionReusable →

Paper return slips from retail outlets

Sales teams can scan slips into monthly ledgers.

ConstructionReusable →

Site logs and signed delivery notes

Robust handling for outdoor stains, folds, and handwriting.

Healthcare distributionReusable →

Clinic or pharmacy prescription and usage forms

Sensitive units and dosage fields can be checked against dictionaries.

LogisticsReusable →

Handwritten delivery and return notes

Signatures and remarks can be separated from operational fields.

Financial documentsReusable →

Checks, receipts, and reconciliation forms

Supports local deployment and end-to-end audit trails.

/ 08About Us

Why Wavesteam Technology

We are a software engineering and AI implementation team. Over the past three years, we have moved more than 20 AI scenarios from promising demo to stable production operation.

Our Team

Make AI work in production

/ 01

Method

Scenario diagnosis → data cold start → path evaluation → hybrid pipeline → gray release → full launch → data loop
AI projects are managed as engineering projects with PM, milestones, and acceptance criteria.
No endless PoC demos; every system is delivered with measurable outcomes.

/ 02

Stack

Vision: DBNet++ / CRNN / TrOCR / GS-OCR-Hand
Multimodal: Qwen-VL / GPT-4o / Gemini Vision with flexible routing
Engineering: FastAPI / Triton / vLLM / Postgres / Redis / K8s for private deployment

/ 03

Engineering

High-throughput inference service tuning
Data-loop platform for labeling, evaluation, drift monitoring, and fine-tuning
Unified observability across business metrics, model metrics, and cost metrics

If you have a concrete workflow AI hasn't solved yet, let's figure out the right approach together.

We unpack the workflow with you, judge whether AI is worth using and which approach makes the most sense, then come back within 5 business days with a practical initial plan and estimate.

Business email

contact@boilingwater.cn

Office

10F, South Tower, Kingkey Yujing Times, Longgang District, Shenzhen

Turn a handwritten order into
structured data
ready for accounting in 3 seconds

Turn a handwritten order intostructured dataready for accounting in 3 seconds