滚水科技
Index中文Book a call
AI Solution Brief· GSAI · 2026 · 04 · 0017Delivered / Production

Turn a handwritten order into
structured data
ready for accounting in 3 seconds

This is one entry from Wavesteam Technology's AI solution library. For handwritten order recognition, we benchmarked general OCR APIs, multimodal large models, and a domain-specific OCR model, then delivered a hybrid production pipeline. On messy handwriting, folded paper, cross-line corrections, and overlapping fields, field-level accuracy improved from 68.4% to 96.1%.

View sample pipelineExplore the technical approach8 min read · v2.1 · updated May 10, 2026
Field-level accuracy
96.1%
1,200 complex handwritten samples
Average processing time
1.8s
End to end · P95 ≤ 3.0s
Manual review workload
-82%
Before vs. after launch
Production runtime
11 months
6 order types / 3 customers
/ 01The Problem

The real business problem

The customer is a fast-moving consumer goods distributor serving hundreds of retail outlets. Orders arrive as handwritten paper forms and must be entered into ERP for fulfillment and reconciliation. The actual paper conditions are far messier than a clean demo sample.

Industry · FMCG distribution / restaurant supply chain
Volume · 1,200–1,800 orders per day
Current flow · 2 data-entry operators + manual ERP input
Goal · Structure all orders within 30 minutes
P-01PAIN POINT

Highly variable handwriting

Different sales reps write with different styles, pressure, angles, cross-line notes, corrections, signatures, and overwritten fields. Traditional OCR often fails at the full-row level.

P-02PAIN POINT

Ambiguous field meaning

Quantity, unit price, product name, and remarks are often mixed together. Product shorthand must be normalized against business vocabulary and SKU context.

P-03PAIN POINT

Manual entry became the bottleneck

The team had to process 1,200–1,800 orders per day within a narrow window. Two operators were working overtime and still produced avoidable entry errors.

P-04PAIN POINT

Small errors created financial risk

A wrong quantity, price, or customer name affects fulfillment, reconciliation, and month-end settlement. The impact is not inconvenience; it is real financial exposure.

/ 02Live Sample

One order through the full pipeline

The sample below shows a desensitized handwritten order from a real workflow. We place the original image, visual recognition overlay, and final structured JSON side by side so stakeholders can see what the AI actually does.

Sample · ORD-20260427-018
Stage · End-to-end pipeline
Latency · 1.74s
Confidence · avg. 0.94
INPUTOriginal handwritten order
jpg · 1408×1056
Original handwritten order
OUTPUTRecognition output
Recognition output
Folded paper100%

Creases shift field positions, but all 12 rows are correctly aligned.

Cross-line correction97%

Crossed-out prices and handwritten replacements are interpreted with an audit trail.

Overlapping fields94%

Remarks overlapping the price column are reassigned through structured post-processing.

/ 03How We Think

We did not bet on one model. We combined the right tools.

For production AI, we do not choose a model first. We benchmark practical options against real samples, then design a hybrid pipeline where each model handles the part it is best suited for.

Evaluation set · 1,200 samples / 4 form types
Metric · Field-level F1 + end-to-end acceptance
Cycle · 2 weeks
Output · Selection report v1.3
A
Path

General cloud OCR APIs

Tencent Cloud / Alibaba Cloud / Baidu AI Cloud
  • Fast to integrate
  • Excellent on printed text and regular tables
  • Usage-based cost works for early testing
  • Weak on casual handwriting and connected strokes
  • Cannot understand customer-specific product shorthand
  • Field structure must still be rebuilt by the business layer
Printed accuracy
98%+
This scenario
68.4%
Structuring
Custom build
Our call · Useful as a baseline and fallback channel
B
Path

Multimodal LLM reading

GPT-4o / Qwen-VL / Tongyi Qianwen VL
  • Strong contextual understanding for corrections and messy forms
  • Can output structured JSON directly
  • Good semantic generalization across product vocabulary
  • Higher cost and latency per order
  • Field-level numeric accuracy can fluctuate
  • Private deployment and compliance require additional design
This scenario
88.7%
Latency
4.2s
Cost / 1k pages
≈ ¥48
Our call · Best for difficult samples and semantic correction
C
Path

Domain-specific OCR model

Wavesteam · GS-OCR-Hand v2
  • Fine-tuned on real customer samples
  • Two-stage detection and recognition is controllable and explainable
  • Lower inference cost and latency
  • Needs ongoing data-loop maintenance
  • New form types still need migration samples
  • Does not fully solve cross-field semantic judgment alone
This scenario
93.2%
Latency
0.9s
Cost / 1k pages
≈ ¥6
Our call · Primary path for more than 90% of daily traffic
Final Decision

Hybrid by design

The delivered system uses a domain OCR primary path, a multimodal semantic correction path, and cloud OCR fallback. Most routine orders finish in under one second. Low-confidence fields are routed to the multimodal model, and poor-quality samples or service failures fall back automatically with manual-review flags.

This structure balances accuracy, cost, latency, and controllability. The engineering principle is simple: use software architecture to turn model uncertainty into business certainty.

/ 04How It Works

An explainable, degradable, and improvable AI pipeline

We treat AI as a pipeline, not a black box. Every step has a defined responsibility, input, output, and fallback strategy.

Primary inference · GS-OCR-Hand v2
Semantic correction · Qwen-VL Plus
Fallback · Tencent general OCR
Deployment · Customer private cloud / K8s
  1. STEP / 011

    Capture and layout normalization

    Images enter through mobile, scanners, or forms. Distortion, shadow, white balance, and layout are normalized before recognition.

    DocAlignDeskewShadow Removal
  2. STEP / 022

    Field detection

    A layout-aware detector identifies headers, rows, columns, and field roles before recognition.

    DBNet++Layout-awareRoI Routing
  3. STEP / 033

    Character recognition

    GS-OCR-Hand v2 is fine-tuned on real handwritten samples. Low-confidence fields are routed forward for semantic review.

    CRNN + AttentionDomain fine-tuneConfidence routing
  4. STEP / 044

    Multimodal semantic correction

    For corrections, cross-line notes, and context-heavy fields, a multimodal model reads against SKU dictionaries and historical order context.

    VLMRAG · SKU dictSelf-consistency
  5. STEP / 055

    Structuring and rule validation

    The output is normalized with unit conversion, price-range checks, customer matching, total checks, and audit logs.

    Rule engineEntity resolveAudit log

System architecture · Layered view

v2.1 · 2026.04
L1 · Access

Capture and access layer

Five business channels with idempotency, rate limits, and desensitization.

Channel
Mobile app
Channel
Scanner webhook
Channel
Feishu form sync
Gateway
API Gateway · JWT
L2 · Inference

AI inference layer

Three inference paths plus a confidence-aware router.

Primary
GS-OCR-Hand v2
Semantic
Qwen-VL · LLM
Fallback
Tencent OCR
Router
Confidence router
L3 · Business

Business orchestration layer

Connects OCR output to ERP, reconciliation, and review workflows.

Engine
Field normalization
Match
Customer / SKU matching
Audit
Total validation
Workflow
Manual review queue
L4 · Data Loop

Data-loop layer

Online corrections flow back into datasets so the model improves monthly.

Storage
Sample store
Label
Online labeling
Train
Monthly fine-tune
Monitor
Drift alerts
/ 05How We Deliver

Ten weeks to launch, then a continuous data loop

We deliver AI systems as accepted, measurable engineering projects. Before launch, the work is milestone-based; after launch, the data loop keeps improving the model.

Timeline · 10-week launch + ongoing iteration
Team · 2 algorithm / 2 engineering / 1 PM
Deliverables · 23 items
Acceptance · Field-level F1 ≥ 95%
  1. Week 0Milestone · 01

    Scenario diagnosis

    Walk through the real order flow with business and IT stakeholders.

    Process mapData and compliance checklistSuccess metrics
  2. Week 1–2Milestone · 02

    Data cold start

    Collect 4,600 real forms and build the first training and evaluation sets.

    Labeling guide1,200-sample evaluation setBaseline metrics
  3. Week 3–4Milestone · 03

    Parallel path evaluation

    Run cloud OCR, multimodal reading, and custom OCR on the same samples.

    Evaluation reportCost-latency-accuracy quadrant
  4. Week 5–7Milestone · 04

    Hybrid pipeline build

    Build confidence routing, semantic correction, fallback, and pressure tests.

    Model weightsInference service v1.0Load-test report
  5. Week 8–9Milestone · 05

    Gray release

    Run AI and human entry in parallel at one warehouse for reconciliation.

    Daily reconciliationReview interface v1
  6. Week 10Milestone · 06

    Full launch

    Roll out to six warehouses and sign off against KPI targets.

    RunbookRollback plan
  7. OngoingMilestone · 07

    Data loop and monthly iteration

    Online errors flow back into the sample store for incremental improvement.

    Monthly samplesModel review notes
/ 06The Result

Turning uncertain recognition into measurable business value

Instead of vague claims, we use same-sample before-and-after metrics and customer feedback to show whether the system solved the real problem.

Observation window · 11 months after launch
Customer · FMCG distributor in South China
Review sample · 36,400 pages / quarter
Payback period · ≈ 4.6 months
Metric
Before
After
Change
Field-level accuracy
BEFORE68.4%
AFTER96.1%
+27.7pp
Average processing time
BEFORE≈ 90s manual
AFTER1.8s
-98%
Daily peak throughput
BEFORE≈ 1,800 pages
AFTER≈ 12,000 pages
×6.7
Reconciliation discrepancies
BEFORE0.7%
AFTER0.04%
-94%
End-to-end cost per page
BEFORE¥0.42 manual
AFTER¥0.06
-85%
Manual review workload
BEFORE2 full-time operators
AFTER0.4 FTE review only
-82%
“

Month-end reconciliation used to be our biggest headache. Since this OCR system went live, orders from six warehouses are basically scanned, structured in seconds, and written into ERP. More importantly, the models and data stay in our private cloud.

ITIT Director · FMCG distributor in South China
Sign-Off

Sign-off milestones

  • UAT passed2025.06.20
  • Full launch2025.06.27
  • First monthly review2025.07.31
  • Annual renewal2026.05.08
/ 07Where It Fits

The same pattern applies to a family of document problems

The hybrid OCR + multimodal correction + fallback pattern can be reused for forms, tickets, handwritten logs, and business documents where structure matters.

Restaurant supply chainReusable →

Handwritten hotel and restaurant order forms

Stock replenishment forms can go directly into inventory systems.

FMCG distributionReusable →

Paper return slips from retail outlets

Sales teams can scan slips into monthly ledgers.

ConstructionReusable →

Site logs and signed delivery notes

Robust handling for outdoor stains, folds, and handwriting.

Healthcare distributionReusable →

Clinic or pharmacy prescription and usage forms

Sensitive units and dosage fields can be checked against dictionaries.

LogisticsReusable →

Handwritten delivery and return notes

Signatures and remarks can be separated from operational fields.

Financial documentsReusable →

Checks, receipts, and reconciliation forms

Supports local deployment and end-to-end audit trails.

/ 08About Us

Why Wavesteam Technology

We are a software engineering and AI implementation team. Over the past three years, we have moved more than 20 AI scenarios from promising demo to stable production operation.

Wavesteam AI delivery team
Our Team
Make AI work in production
/ 01

Method

  • Scenario diagnosis → data cold start → path evaluation → hybrid pipeline → gray release → full launch → data loop
  • AI projects are managed as engineering projects with PM, milestones, and acceptance criteria.
  • No endless PoC demos; every system is delivered with measurable outcomes.
/ 02

Stack

  • Vision: DBNet++ / CRNN / TrOCR / GS-OCR-Hand
  • Multimodal: Qwen-VL / GPT-4o / Gemini Vision with flexible routing
  • Engineering: FastAPI / Triton / vLLM / Postgres / Redis / K8s for private deployment
/ 03

Engineering

  • High-throughput inference service tuning
  • Data-loop platform for labeling, evaluation, drift monitoring, and fine-tuning
  • Unified observability across business metrics, model metrics, and cost metrics

Related solutions

More Wavesteam solutions

AI, capital-markets docs, OCR, vision, IoT and membership operations — composable for your industry.

  • AI Vision for Security

    AI Vision for Security

    Edge inference and multimodal models for face, behavior, and vehicle recognition — 99.7% accuracy, sub-50ms latency, deployed 24/7 across cities, plants, and campuses.

    • AI
    • Edge Inference
    • Security
    Explore solution→
  • Multi-Plant Inventory OS

    Multi-Plant Inventory OS

    A custom inventory and procurement platform for multi-plant manufacturers — AI demand forecasting, automated replenishment, and 42% higher inventory turnover in 14 months.

    • Inventory
    • Multi-Plant
    • AI Forecasting
    Explore solution→
  • Drone Battery Swap Stations

    Drone Battery Swap Stations

    Field-deployable battery-swap cabinets with cloud BMS — 2-minute swaps, 12 bays per cabinet, 3× the daily area covered, and 8-month payback for ag-drone operators.

    • IoT
    • Battery Swap
    • Cloud BMS
    Explore solution→
Let's Talk

If you have a concrete workflow AI hasn't solved yet, let's figure out the right approach together.

We unpack the workflow with you, judge whether AI is worth using and which approach makes the most sense, then come back within 5 business days with a practical initial plan and estimate.

Business email
contact@boilingwater.cn
Office
10F, South Tower, Kingkey Yujing Times, Longgang District, Shenzhen

Please complete Cloudflare verification before submitting.

By submitting, you agree we'll use your information only for this consultation — never for unrelated marketing.

Wavesteam Technology

A production OCR and multimodal AI case study for handwritten order recognition.

联系我们
© 2026 Wavesteam Technology. 保留所有权利。
邮箱:contact@boilingwater.cn地址:深圳市龙岗区龙城街道黄阁坑社区京基御景时代大厦南塔 10 层