Digitalizing invoices with OCR
PaymentsScan and attach invoices to payments
Description
While a lot of operations, like electronic payments, have been streamlined thanks to digitalization, some tasks still remain manual and relying on papers, like invoices. Each invoice has a different format, and businesses anticipate receiving hundreds or even thousands of them each month. Sometimes it takes more than a team to process a large volume of invoices, and accuracy is not always guaranteed.
In this Use Case we will use Google Colab and certain Python libraries to simulate OCR (Optical Character Recognition) and automatically import invoices in PDF format to attach them to the generated payment.
For a production usage, we need to leverage a commercial OCR document reader to be able to parse numerous types of invoices with different layouts.
Therefore, in our example we use PyPDF and tabula-py to extract a predefined invoice layout for the demonstration purposes.
Step by step instruction:
- Parse an invoice in PDF format.
- Create a payment with the extracted data.
- Attach the original invoice to the payment, leveraging the Attachment API.