Get a text bounds and size from PDF in C# and VB.NET

Reading additional information about a text in a PDF document is a common task that can be accomplished using the SautinSoft PDF .Net library. This library provides a convenient way to interact with PDF content, including text, images, and other elements. Below is an example of how you can use PDF .Net to read additional information about text elements, such as their bounds, font properties, and color.

Output result:

Step-by-Step Guide

  1. Create a New Project

    Open Visual Studio and create a new Console Application project.

  2. Add PDF.Net Reference

    Download the PDF.Net library and add it to your project. You can do this by right-clicking on your project in the Solution Explorer, selecting "Add Reference," and browsing to the PDF.Net DLL.

  3. Write the Code to Get a text bounds and size.
  4. Complete code

    using System;
    using System.IO;
    using SautinSoft;
    using SautinSoft.Pdf;
    using SautinSoft.Pdf.Content;
    
    class Program
    {
        static void Main()
        {
            // Apply the trial key.
            //PdfDocument.SetLicense("Your trial key here");
    
            // Specify the path to the PDF file.
            string pdfFile = Path.GetFullPath(@"..\..\..\Asset Recovery Evaluation.pdf");
    
            // Load the PDF document.
            using (var document = PdfDocument.Load(pdfFile))
            {
                // Iterate through all pages in the document.
                foreach (var page in document.Pages)
                {
                    // Get an enumerator for the content elements of the page.
                    var contentEnumerator = page.Content.Elements.All(page.Transform).GetEnumerator();
    
                    // Iterate through the content elements.
                    while (contentEnumerator.MoveNext())
                    {
                        // Check if the current element is a text element.
                        if (contentEnumerator.Current.ElementType == PdfContentElementType.Text)
                        {
                            // Cast the element to PdfTextContent.
                            var textElement = (PdfTextContent)contentEnumerator.Current;
    
                            // Read the text content element's additional information.
                            var text = textElement.ToString();
                            var font = textElement.Format.Text.Font;
                            var color = textElement.Format.Fill.Color;
                            var bounds = textElement.Bounds;
                            contentEnumerator.Transform.Transform(bounds);
    
                            // Output the information to the console.
                            Console.WriteLine($"Unicode text: {text}");
                            Console.WriteLine($"Font name: {font.Face.Family.Name}");
                            Console.WriteLine($"Font size: {font.Size}");
                            Console.WriteLine($"Font style: {font.Face.Style}");
                            Console.WriteLine($"Font weight: {font.Face.Weight}");
    
                            // Check if the color is in RGB format and output it.
                            if (color.TryGetRgb(out double red, out double green, out double blue))
                            {
                                Console.WriteLine($"Color: Red={red}, Green={green}, Blue={blue}");
                            }
    
                            // Output the bounds of the text.
                            Console.WriteLine($"Bounds: Left={bounds.Left:0.00}, Bottom={bounds.Bottom:0.00}, Right={bounds.Right:0.00}, Top={bounds.Top:0.00}");
                            Console.WriteLine();
                        }
                    }
                }
            }
        }
    }

    Download

  5. Run the Application

    Build and run your application. If everything is set up correctly, the content from the specified PDF file will be extracted.

Additional Features

    PDF.Net offers various other features for handling PDF documents, such as:
  • Extracting images from PDF files.
  • Converting PDF to other formats like DOCX, HTML, and images.
  • Merging and splitting PDF files.
  • Adding and reading interactive forms.

Conclusion

With PDF .Net, reading additional information about text in a PDF document is straightforward. The library provides a clear API to access text properties, which can be invaluable for tasks such as document analysis, content extraction, and automated processing.


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.