PDF Parsing and Reconciliation

Mission Statement

Our client is adviser to many of the world’s leading families and wealth creators.

They send investment information data to a third-party firm who creates PDF outputs from the data for reporting purposes.

The problem is the current process is manual and open to human error, the PDF reports when received back often contain errors. The process of the flow is to catch and therefore create better guidance to stop these from happening by extracting the pdf data and reconciling it with the original client data.

Tools Used

Alteryx

- API Connector

- Extraction of tabular data

- Reconciliation against existing data

- Alteryx Server

Nanonets

- PDF Parsing

Detailed Solution

The workflow will create a report of errors from the PDF reports produced for our client. These can then be used to improve the process and catch mistakes before filing reports.

Our client produces check sheets from internal data; these cover benchmarks, return on equity and performance.

The PDF reports are uploaded to the Nanonets environment once they are received.

The first workflow in Alteryx uses the nanonets API to check all files within the environment. Any files that have already been processed are not handed onto the second flow.

A list of the unprocessed files is used to start the second workflow that downloads the tabular data from Nanonets in batches this is then prepped and blended. Upon completion, it is saved under a file structure within our client’s storage.
The third flow checks for changes to the file structure to find new sheets and this data is handed to the fourth flow.
The fourth flow uses our client’s original data and the data that was newly created to perform a reconciliation on two criteria; (A) The internal integrity of child sheets to the specification of our client, and (B) The children are checked against the provided check sheets from our client.

This checks the accuracy of the PDF outputs and allows a greater degree of confidence in them.