Extract all images and vector graphics from PDF in C# and .NET


In the modern world of working with PDF documents, it occupies an important place — they are used for legal, financial, educational and many other purposes. It is often necessary to extract images and graphic elements from PDF documents for later analysis, editing, or integration with other systems. In this article, we'll show you how to do this easily and efficiently using the SautinSoft PdfFocus .NET library.

Extracting graphic elements from a PDF is relevant in the following cases:

  • Content Analytics: Analyze the visual components of documents to determine styles, image types, or their use.
  • Data migration: Transferring images from PDF to other formats or file management systems.
  • Editing and addition: adding new elements or formatting existing ones.
  • Archiving and optimization: Extraction of graphics to reduce the size of documents or integrate them into multimedia materials.
  • Create presentations or reports where you need to reuse images from the original PDF.

It is an indispensable tool for developers, system administrators, and analysts who need to scale their work with PDF graphic content. The extracted images and vector graphic objects can be used for:

  • Web and mobile applications where dynamic content display is required.
  • Creation of multimedia presentations.
  • Text analysis and recognition (for example, in the OCR process).
  • Data processing for machine learning.
  • Restore objects of damaged PDF documents.
  • Migration of data between systems.

The volume of application of such solutions is constantly growing, as the volume of digital documents in companies and organizations increases. Tools that automate work with graphics in PDF are in demand in fintech, law, education, and marketing. The ability to quickly extract and use graphics from PDF helps to increase the efficiency of business processes, which makes such solutions popular and in demand.

Step-by-Step:

  1. Add SautinSoft.PdfFocus from Nuget.
  2. Load a PDF document.
  3. Rasterize all vector graphics.
  4. Show all extracted images.

Complete code

using System;
using System.IO;
using System.Collections.Generic;
using SautinSoft;

namespace Sample
{
    class Sample
    {
        static void Main(string[] args)
        {
            // Before starting, we recommend to get a free key:
            // https://sautinsoft.com/start-for-free/
            
            // Apply the key here:
            // SautinSoft.PdfFocus.SetLicense("...");
			
            // Extract all images from PDF
            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
			
            string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
            string imageDir = new DirectoryInfo(Directory.GetCurrentDirectory()).CreateSubdirectory("images").FullName;

            List<PdfFocus.PdfImage> pdfImages = null;

            f.OpenPdf(pdfFile);

            if (f.PageCount > 0)
            {
                // Rasterize all vector graphics
                f.ImageExtractionOptions.RasterizeComplexGraphics = true;

                pdfImages = f.ExtractImages();

                // Show all extracted images.
                if (pdfImages != null && pdfImages.Count > 0)
                {

                    for (int i = 0; i < pdfImages.Count; i++)
                    {
                        string imageFile = Path.Combine(imageDir, String.Format("img{0}.png", i + 1));
                        pdfImages[i].Picture.Encode(new FileStream(imageFile, FileMode.Create), SkiaSharp.SKEncodedImageFormat.Png, 100);						
                    }

                    System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(imageDir) { UseShellExecute = true });
                }
            }
        }
    }
}

Download

Imports System
Imports System.IO
Imports System.Collections.Generic
Imports SautinSoft

Namespace Sample
	Friend Class Sample
		Shared Sub Main(ByVal args() As String)
			' Before starting, we recommend to get a free key:
			' https://sautinsoft.com/start-for-free/

			' Apply the key here
			' SautinSoft.PdfFocus.SetLicense("...");

			' Extract all images from PDF
			Dim f As New SautinSoft.PdfFocus()

			Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")
			Dim imageDir As String = (New DirectoryInfo(Directory.GetCurrentDirectory())).CreateSubdirectory("images").FullName

			Dim pdfImages As List(Of PdfFocus.PdfImage) = Nothing

			f.OpenPdf(pdfFile)

			If f.PageCount > 0 Then
				' Rasterize all vector graphics
				f.ImageExtractionOptions.RasterizeComplexGraphics = True

				pdfImages = f.ExtractImages()

				' Show all extracted images.
				If pdfImages IsNot Nothing AndAlso pdfImages.Count > 0 Then

					For i As Integer = 0 To pdfImages.Count - 1
						Dim imageFile As String = Path.Combine(imageDir, String.Format("img{0}.png", i + 1))
						pdfImages(i).Picture.Encode(New FileStream(imageFile, FileMode.Create), SkiaSharp.SKEncodedImageFormat.Png, 100)
					Next i

					System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(imageDir) With {.UseShellExecute = True})
				End If
			End If
		End Sub
	End Class
End Namespace

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:


Captcha

Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.