Extract all images and vector graphics from PDF in C# and .NET
In the modern world of working with PDF documents, it occupies an important place — they are used for legal, financial, educational and many other purposes. It is often necessary to extract images and graphic elements from PDF documents for later analysis, editing, or integration with other systems. In this article, we'll show you how to do this easily and efficiently using the SautinSoft PdfFocus .NET library.
Extracting graphic elements from a PDF is relevant in the following cases:
- Content Analytics: Analyze the visual components of documents to determine styles, image types, or their use.
- Data migration: Transferring images from PDF to other formats or file management systems.
- Editing and addition: adding new elements or formatting existing ones.
- Archiving and optimization: Extraction of graphics to reduce the size of documents or integrate them into multimedia materials.
- Create presentations or reports where you need to reuse images from the original PDF.
It is an indispensable tool for developers, system administrators, and analysts who need to scale their work with PDF graphic content. The extracted images and vector graphic objects can be used for:
- Web and mobile applications where dynamic content display is required.
- Creation of multimedia presentations.
- Text analysis and recognition (for example, in the OCR process).
- Data processing for machine learning.
- Restore objects of damaged PDF documents.
- Migration of data between systems.
The volume of application of such solutions is constantly growing, as the volume of digital documents in companies and organizations increases. Tools that automate work with graphics in PDF are in demand in fintech, law, education, and marketing. The ability to quickly extract and use graphics from PDF helps to increase the efficiency of business processes, which makes such solutions popular and in demand.
Step-by-Step:
- Add SautinSoft.PdfFocus from Nuget.
- Load a PDF document.
- Rasterize all vector graphics.
- Show all extracted images.
Complete code
using System;
using System.IO;
using System.Collections.Generic;
using SautinSoft;
namespace Sample
{
class Sample
{
static void Main(string[] args)
{
// Before starting, we recommend to get a free key:
// https://sautinsoft.com/start-for-free/
// Apply the key here:
// SautinSoft.PdfFocus.SetLicense("...");
// Extract all images from PDF
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
string imageDir = new DirectoryInfo(Directory.GetCurrentDirectory()).CreateSubdirectory("images").FullName;
List<PdfFocus.PdfImage> pdfImages = null;
f.OpenPdf(pdfFile);
if (f.PageCount > 0)
{
// Rasterize all vector graphics
f.ImageExtractionOptions.RasterizeComplexGraphics = true;
pdfImages = f.ExtractImages();
// Show all extracted images.
if (pdfImages != null && pdfImages.Count > 0)
{
for (int i = 0; i < pdfImages.Count; i++)
{
string imageFile = Path.Combine(imageDir, String.Format("img{0}.png", i + 1));
pdfImages[i].Picture.Encode(new FileStream(imageFile, FileMode.Create), SkiaSharp.SKEncodedImageFormat.Png, 100);
}
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(imageDir) { UseShellExecute = true });
}
}
}
}
}
Imports System
Imports System.IO
Imports System.Collections.Generic
Imports SautinSoft
Namespace Sample
Friend Class Sample
Shared Sub Main(ByVal args() As String)
' Before starting, we recommend to get a free key:
' https://sautinsoft.com/start-for-free/
' Apply the key here
' SautinSoft.PdfFocus.SetLicense("...");
' Extract all images from PDF
Dim f As New SautinSoft.PdfFocus()
Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")
Dim imageDir As String = (New DirectoryInfo(Directory.GetCurrentDirectory())).CreateSubdirectory("images").FullName
Dim pdfImages As List(Of PdfFocus.PdfImage) = Nothing
f.OpenPdf(pdfFile)
If f.PageCount > 0 Then
' Rasterize all vector graphics
f.ImageExtractionOptions.RasterizeComplexGraphics = True
pdfImages = f.ExtractImages()
' Show all extracted images.
If pdfImages IsNot Nothing AndAlso pdfImages.Count > 0 Then
For i As Integer = 0 To pdfImages.Count - 1
Dim imageFile As String = Path.Combine(imageDir, String.Format("img{0}.png", i + 1))
pdfImages(i).Picture.Encode(New FileStream(imageFile, FileMode.Create), SkiaSharp.SKEncodedImageFormat.Png, 100)
Next i
System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(imageDir) With {.UseShellExecute = True})
End If
End If
End Sub
End Class
End Namespace
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: