Convert only tables from PDF to Excel in C# and .NET
PDF processing is one of the most frequently used tasks in office automation, document management, and data analytics. One of the most common tasks faced by developers is extracting tables from PDF documents and converting them into editable Excel tables.
In this article, we will discuss an effective method for automatically converting tables only from PDF to Excel in C# and .NET using the PDF Focus .NET component from SautinSoft library. This library not only provides convenient tools for table recognition but also ensures the accuracy and speed of the conversion.
Primary use cases for this method:
- Automated processing of reports generated in PDF where tables are important, such as financial reports, trading reports, and lists.
- Data extraction for analytics: preparing data for business intelligence systems.
- Data migration: transferring information from legacy PDF documents into organized Excel sheets.
- Processing scanned documents: using OCR (given that PDF Focus can work in conjunction with OCR), you can obtain tables from scanned PDFs.
Complete code
using System;
using System.IO;
namespace Sample
{
class Sample
{
static void Main(string[] args)
{
// Before starting, we recommend to get a free key:
// https://sautinsoft.com/start-for-free/
// Apply the key here:
// SautinSoft.PdfFocus.SetLicense("...");
string pathToPdf = Path.GetFullPath(@"..\..\..\Table.pdf");
string pathToExcel = "Result.xlsx";
// Convert only tables from PDF to XLS spreadsheet and skip all textual data.
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
// The output result will be in XLSX (Excel modern format) or in XLS (Excel 97-2003 Workbook)
f.ExcelOptions.Format = SautinSoft.PdfFocus.Format.Xlsx;
// f.ExcelOptions.Format = SautinSoft.PdfFocus.Format.Xls;
// 'true' = Convert all data to spreadsheet (tabular and even textual).
// 'false' = Skip textual data and convert only tabular (tables) data.
f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = false;
// 'true' = Preserve original page layout.
// 'false' = Place tables before text.
f.ExcelOptions.PreservePageLayout = true;
// The information includes the names for the culture, the writing system,
// the calendar used, the sort order of strings, and formatting for dates and numbers.
System.Globalization.CultureInfo ci = new System.Globalization.CultureInfo("en-US");
ci.NumberFormat.NumberDecimalSeparator = ",";
ci.NumberFormat.NumberGroupSeparator = ".";
f.ExcelOptions.CultureInfo = ci;
f.OpenPdf(pathToPdf);
if (f.PageCount > 0)
{
int result = f.ToExcel(pathToExcel);
// Open the resulted Excel workbook.
if (result==0)
{
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(pathToExcel) { UseShellExecute = true });
}
}
}
}
}
Imports System.IO
Imports System.Drawing.Imaging
Imports System.Collections.Generic
Imports SautinSoft
Module Sample
Sub Main()
' Before starting, we recommend to get a free key:
' https://sautinsoft.com/start-for-free/
' Apply the key here
' SautinSoft.PdfFocus.SetLicense("...");
Dim pathToPdf As String = Path.GetFullPath("..\..\..\Table.pdf")
Dim pathToExcel As String = "Result.xlsx"
' Convert only tables from PDF to XLS spreadsheet and skip all textual data.
Dim f As New SautinSoft.PdfFocus()
' The output result will be in XLSX (Excel modern format) or in XLS (Excel 97-2003 Workbook)
f.ExcelOptions.Format = SautinSoft.PdfFocus.Format.Xlsx
' f.ExcelOptions.Format = SautinSoft.PdfFocus.Format.Xls
' 'true' = Convert all data to spreadsheet (tabular and even textual).
' 'false' = Skip textual data and convert only tabular (tables) data.
f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = False
' 'true' = Preserve original page layout.
' 'false' = Place tables before text.
f.ExcelOptions.PreservePageLayout = True
' The information includes the names for the culture, the writing system,
' the calendar used, the sort order of strings, and formatting for dates and numbers.
Dim ci As New System.Globalization.CultureInfo("en-US")
ci.NumberFormat.NumberDecimalSeparator = ","
ci.NumberFormat.NumberGroupSeparator = "."
f.ExcelOptions.CultureInfo = ci
f.OpenPdf(pathToPdf)
If f.PageCount > 0 Then
Dim result As Integer = f.ToExcel(pathToExcel)
' Open the resulted Excel workbook.
If result = 0 Then
System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(pathToExcel) With {.UseShellExecute = True})
End If
End If
End Sub
End Module
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: