Text Search in PDFs with C# and .NET

Finding specific text within a PDF document is a common requirement for various applications, such as document management systems, data extraction tools, and automated workflows. The SautinSoft.PDF library provides a robust and efficient way to locate text within PDF files using C# and .NET. This article will guide you through the process of finding text in PDF documents.

Searching text in PDFs is essential for:

  • Data Extraction: Extracting specific information from large documents.
  • Content Management: Indexing and organizing documents based on their content.
  • Automated Workflows: Automating tasks that depend on the presence of specific text within documents.

Step-by-step guide:

  1. Add SautinSoft.PDF from NuGet.
  2. Load the PDF document.
  3. Find the specific text in the PDF file.
  4. Output the number of occurrences found in the console.

Input file:

Output result:

Complete code

using System;
using System.IO;
using SautinSoft;
using SautinSoft.Pdf;
using SautinSoft.Pdf.Content;
using System.Linq;

namespace Sample
{
    class Sample
    {
        /// <summary>
        /// Find text in the PDF.
        /// </summary>
        /// <remarks>
        /// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/find-text.php
        /// </remarks>
        static void Main(string[] args)
        {
            // Before starting this example, please get a free 100-day trial key:
            // https://sautinsoft.com/start-for-free/

            // Apply the key here:
            // PdfDocument.SetLicense("...");

            string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");

            var document = PdfDocument.Load(pdfFile);
            {
                // Find all occurrences of a given text in a pdf file.
                var text = document.Pages[0].Content.GetText().Find("the");
                
                Console.WriteLine("Found " + text.Count() + " elements of this symbol combination.");
            }
        }
    }
}

Download

Option Infer On

Imports System
Imports System.IO
Imports SautinSoft
Imports SautinSoft.Pdf
Imports SautinSoft.Pdf.Content
Imports System.Linq

Namespace Sample
	Friend Class Sample
		''' <summary>
		''' Find text in the PDF.
		''' </summary>
		''' <remarks>
		''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/find-text.php
		''' </remarks>
		Shared Sub Main(ByVal args() As String)
			' Before starting this example, please get a free license:
			' https://sautinsoft.com/start-for-free/

			' Apply the key here:
			' PdfDocument.SetLicense("...");

			Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")

			Dim document = PdfDocument.Load(pdfFile)
			If True Then
				' Find all occurrences of a given text in a pdf file.
				Dim text = document.Pages(0).Content.GetText().Find("the")

				Console.WriteLine("Found " & text.Count() & " elements of this symbol combination.")
			End If
		End Sub
	End Class
End Namespace

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.