Repeatability
Medium
The target fields are consistent across all invoices, which is favorable. However, 450 invoices from different vendors with different layouts means the agent must generalize across highly variable document structures — scanned images add OCR noise on top of that variability.
Ambiguity Tolerance
Medium
The output format (CSV with defined fields) is crisp, and the fields themselves are well-defined. The ambiguity lives in extraction: what counts as 'total amount' when a document shows subtotal, GST, and grand total separately, or when line items span multiple pages.
Data & Tool Availability
High
The PDFs are the primary input and are available. Mature tools exist (AWS Textract, Azure Form Recognizer, Google Document AI, or open-source alternatives) that can handle both native PDFs and scanned images with OCR. No live system access is required.
Error Cost
High
This feeds directly into financial reconciliation — a misread invoice number, wrong total, or dropped GST figure can cause real accounting errors. Errors are technically reversible but catching them requires manual audit effort, which partially defeats the purpose of automation.
Human Judgment Required
Medium
Most extraction is mechanical, but edge cases require judgment: partially legible scans, ambiguous date formats, multi-currency invoices, or invoices where the 'total' field is unclear. A human needs to define rules upfront and review flagged low-confidence extractions.