Repeatability
Medium
The core task — extract tabular data from PDFs — is structurally repetitive, but three different bank layouts mean the agent must adapt its parsing logic per format. Layout drift across 2 years of statements adds further variability that reduces true repeatability.
Ambiguity Tolerance
Medium
The output schema is well-defined (date, description, debit, credit, balance), but success criteria for edge cases — split transactions, multi-line descriptions, OCR artifacts — are not. The agent cannot reliably self-certify that the output is clean without a reconciliation check.
Data & Tool Availability
High
The input files are self-contained PDFs, and mature OCR tools (e.g., AWS Textract, Azure Form Recognizer, Tesseract) plus CSV output pipelines are readily available. No live API access or external permissions are required.
Error Cost
High
Transposed digits, missed transactions, or misaligned debit/credit columns in financial data can corrupt accounting records, trigger audit issues, or cause incorrect tax filings. Errors are not always obvious and can propagate silently into downstream systems.
Human Judgment Required
Medium
Most extraction is mechanical, but ambiguous OCR reads (e.g., '0' vs 'O', smudged amounts), multi-line transaction descriptions, and balance reconciliation failures require a human to adjudicate. The agent cannot reliably flag every case it gets wrong.