Automation
How AI Document Parsing Works in Convoya
A deep dive into the vision-language models that power Convoya's automatic document extraction.
Alex Rivera
Every logistics operation deals with documents: BOLs, invoices, PODs, rate confirmations, contracts. Most of these arrive as PDFs, images, or even faxes. Extracting structured data from these documents is one of the highest-value automation opportunities in logistics.
The Challenge
Traditional OCR (Optical Character Recognition) can read text, but it doesn't understand context. It can't tell the difference between a pickup date and a delivery date, or between a carrier name and a shipper name.
Template-based approaches work for standardized documents, but break down when you're dealing with thousands of different formats from hundreds of carriers and customers.
Our Approach: Vision-Language Models
Convoya uses vision-language models (VLMs) that can see and understand documents the way a human would. Instead of looking for text in specific positions, the model understands what it's looking at.
How It Works
- Document Intake — The document arrives via email, API, or manual upload. We normalize formats and enhance image quality automatically.
- Visual Analysis — The VLM examines the document structure, layout, and content as a whole, understanding headers, tables, and semantic relationships.
- Entity Extraction — Key fields are identified and extracted with confidence scores for each value.
- Validation — Extracted data is validated against business rules and cross-referenced with existing records.
- Human Review — Low-confidence extractions are flagged for review rather than proceeding with guesses.
What We Extract
For a typical Bill of Lading, we extract:
- PRO numbers and reference numbers
- Origin and destination addresses (parsed and geocoded)
- Weight and piece count
- Freight class and NMFC codes
- Pickup and delivery dates/times
- Special instructions and accessorials
- Signatures (presence and timestamp)
Handling Edge Cases
Real-world documents are messy:
- Handwritten notes and corrections
- Poor scan quality or faded text
- Multiple documents combined in one file
- Stamped dates overlapping printed text
- Non-standard layouts and custom forms
Our models are trained on millions of logistics documents, including these edge cases. When confidence is low, we route to human review rather than guessing.
Accuracy by Document Type
Current accuracy rates across document types:
| Document Type | Accuracy |
|---|---|
| Rate Confirmations | 98.2% |
| BOLs (typed) | 96.8% |
| BOLs (handwritten) | 91.4% |
| Invoices | 97.5% |
| PODs | 94.1% |
Continuous Learning
Every correction from human review feeds back into the model, continuously improving accuracy for your specific documents and carriers. Over time, the system learns your partners' document formats and improves extraction rates.
Topics