PDF or Portable Document Format is a file format developed by Adobe Acrobat Reader for sharing and exchanging documents digitally. It retains the original formatting of the data and offers features like security settings and password protection. As a C# developer, you might have come across situations where you need to integrate PDF functionality into your software application. However, developing this functionality from scratch can be a time-consuming and complicated task. Hence, it is important to carefully consider the trade-off between building a new service from scratch or leveraging a pre-existing library, taking into account factors such as performance, effectiveness, and efficiency of the application.
PDF Focus .NET
PDF Focus .Net is a comprehensive and easy-to-use library for manipulating PDF files in .NET applications. It allows developers to extract text and images from PDF files, convert PDF documents to other formats (such as Word, Excel, HTML, and image formats), merge or split PDF files, as well as add watermarks, bookmarks, annotations, and other elements to PDF documents.
- To read a PDF in C# using PDF Focus .Net, you need to follow these steps:
- Install PDF Focus .Net library using NuGet Package Manager.
- Create a new C# Console Application project in Visual Studio.
- Add a reference to the PDF Focus .Net library to your project.
- In your program.cs file, add the following using statement to import the necessary namespace:
using SautinSoft;
- Replace the `pdfFile` variable with the path to your PDF file and replace the `textFile` variable with the path where you want to save the extracted text.
- Run the program, and it will extract the text from the PDF file and save it to the specified text file.
Use the following code to read the PDF file and extract its contents:
class Program
{
static void Main(string[] args)
{
string pdfFile = @"C:\path\to\your\pdf\file.pdf";
string textFile = @"C:\path\to\save\text\file.txt";
PdfFocus pdfFocus = new PdfFocus();
pdfFocus.OpenPdf(pdfFile);
if (pdfFocus.PageCount > 0)
{
// Set "Text" property to extract text from the whole PDF document.
pdfFocus.WordOptions.Format = PdfFocus.CWordOptions.eWordDocument;
pdfFocus.ToWord(textFile);
if (System.IO.File.Exists(textFile))
{
Console.WriteLine($"PDF text extracted successfully to: {textFile}");
}
else
{
Console.WriteLine("Failed to extract PDF text.");
}
}
else
{
Console.WriteLine("No pages found in the PDF document.");
}
Console.ReadLine();
}
}
iText software
iText software is a powerful and versatile PDF library that allows developers to create, manipulate, and render PDF documents programmatically. With iText, users have access to a wide range of features and functionalities, such as adding text, images, and tables, creating forms, encrypting and signing documents, extracting and merging pages, and much more.
- To read a PDF in C# using iText, you need to follow these steps:
- Install iTextSharp library using NuGet Package Manager.
- Create an instance of PdfReader class to open the PDF file.
- Use the PdfReader object to extract the PDF content and store it in a StringBuilder.
- Close the PdfReader object.
- Access the extracted content from the StringBuilder.
Here is an example code snippet:
using System;
using System.Text;
using iTextSharp.text.pdf;
namespace ReadPDFExample
{
class Program
{
static void Main(string[] args)
{
string path = "path_to_your_pdf_file.pdf";
StringBuilder text = new StringBuilder();
using (PdfReader reader = new PdfReader(path))
{
for (int page = 1; page <= reader.NumberOfPages; page++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, page));
}
}
Console.WriteLine(text);
}
}
}
Make sure to replace "path_to_your_pdf_file.pdf" with the actual path to your PDF file.
This code reads the PDF file page by page using the PdfReader and PdfTextExtractor classes from iTextSharp library. The extracted text from each page is appended to the StringBuilder object. Finally, the extracted text is printed to the Console.
Note: iTextSharp is a third-party library that is not actively maintained.