The article discusses the importance of extracting table data from documents like earning reports, purchase orders, and technical manuals into Excel. It suggests two ways to generate an Excel file from the document’s table data, using Azure Function and Apache Spark in Azure Synapse Analytics, especially when dealing with large volumes of documents.
The Azure Function takes a document and uses Form Recognizer’s “General Document” model to extract table data and generate an Excel file. Each table on a page is extracted and stored to a sheet in the Excel document, with the sheet name corresponding to the page number. The function can also capture key-value pairs on the page using the addkeyvalue_pairs flag, and it takes advantage of Form Recognizer’s ability to extract column and row spans to present data as it appears in the table.
Open full article
Extracting Table data from documents into an Excel Spreadsheet
Azure Function and Synapse Spark Notebook is available here in this GIT Repository