Extracting Embedded Files from PDFs in C# and .NET

PDF files often contain embedded files, such as images, documents, or other resources, which can be extracted for further use. Using Sautinsoft PDF.NET library, developers can easily extract these embedded files in C# and .NET. Extracting these files allows developers to access and utilize the embedded content programmatically.

Embedded files in PDFs can include important resources such as:
  • Attachments (e.g., Word documents, Excel sheets, or images).
  • Metadata or additional data for processing.
  • Supporting files for specific workflows.

Step-by-step guide:

  1. Add SautinSoft.PDF from NuGet.
  2. Load a PDF document.
  3. Create a new ZipArchive and leave it open.
  4. Add all the files embedded in the PDF document to the zip archive.
  5. Use the description or name as the relative path to the entry in the zip archive.
  6. Create an entry in the zip archive.
  7. Copy the contents of the embedded file to the zip archive entry.

Input file:

Output result:

Complete code

using System;
using System.IO;
using System.IO.Compression;
using SautinSoft.Pdf;

class Program
{
    /// <summary>
    /// Embed files to PDF document.
    /// </summary>
    /// <remarks>
    /// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/extract-embedded-files.php
    /// </remarks>
    static void Main()
    {
        // Before starting this example, please get a free 100-day trial key:
        // https://sautinsoft.com/start-for-free/

        // Apply the key here:
        // PdfDocument.SetLicense("...");

        // Add to zip archive all files embedded in the PDF document.
        using (var document = PdfDocument.Load(Path.GetFullPath(@"..\..\..\Embedded Files.pdf")))
        using (var archiveStream = File.Create("Embedded Files.zip"))
        using (var archive = new ZipArchive(archiveStream, ZipArchiveMode.Create, leaveOpen: true))
            foreach (var keyFilePair in document.EmbeddedFiles)
            {
                var fileSpecification = keyFilePair.Value;

                // Use the description or the name as the relative path of the entry in the zip archive.
                var entryFullName = fileSpecification.Description;
                if (entryFullName == null || !entryFullName.EndsWith(fileSpecification.Name, StringComparison.Ordinal))
                    entryFullName = fileSpecification.Name;

                var embeddedFile = fileSpecification.EmbeddedFile;

                // Create zip archive entry.
                // Zip archive entry is compressed if the embedded file's compressed size is less than its uncompressed size.
                bool compress = embeddedFile.Size == null || embeddedFile.CompressedSize < embeddedFile.Size.GetValueOrDefault();
                var entry = archive.CreateEntry(entryFullName, compress ? CompressionLevel.Optimal : CompressionLevel.NoCompression);

                // Set the modification date, if it is specified in the embedded file.
                var modificationDate = embeddedFile.ModificationDate;
                if (modificationDate != null)
                    entry.LastWriteTime = modificationDate.GetValueOrDefault();

                // Copy embedded file contents to the zip archive entry.
                using (var embeddedFileStream = embeddedFile.OpenRead())
                using (var entryStream = entry.Open())
                    embeddedFileStream.CopyTo(entryStream);
            }

        System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo("Embedded Files.zip") { UseShellExecute = true });
    }
}

Download

Option Infer On

Imports System
Imports System.IO
Imports System.IO.Compression
Imports SautinSoft.Pdf

Friend Class Program
	''' <summary>
	''' Embed files to PDF document.
	''' </summary>
	''' <remarks>
	''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/extract-embedded-files.php
	''' </remarks>
	Shared Sub Main()
		' Before starting this example, please get a free 100-day trial key:
		' https://sautinsoft.com/start-for-free/

		' Apply the key here:
		' PdfDocument.SetLicense("...");

		' Add to zip archive all files embedded in the PDF document.
		Using document = PdfDocument.Load(Path.GetFullPath("..\..\..\Embedded Files.pdf"))
		Using archiveStream = File.Create("Embedded Files.zip")
		Using archive = New ZipArchive(archiveStream, ZipArchiveMode.Create, leaveOpen:= True)
			For Each keyFilePair In document.EmbeddedFiles
				Dim fileSpecification = keyFilePair.Value

				' Use the description or the name as the relative path of the entry in the zip archive.
				Dim entryFullName = fileSpecification.Description
				If entryFullName Is Nothing OrElse Not entryFullName.EndsWith(fileSpecification.Name, StringComparison.Ordinal) Then
					entryFullName = fileSpecification.Name
				End If

				Dim embeddedFile = fileSpecification.EmbeddedFile

				' Create zip archive entry.
				' Zip archive entry is compressed if the embedded file's compressed size is less than its uncompressed size.
				Dim compress As Boolean = embeddedFile.Size Is Nothing OrElse embeddedFile.CompressedSize < embeddedFile.Size.GetValueOrDefault()
				Dim entry = archive.CreateEntry(entryFullName,If(compress, CompressionLevel.Optimal, CompressionLevel.NoCompression))

				' Set the modification date, if it is specified in the embedded file.
				Dim modificationDate = embeddedFile.ModificationDate
				If modificationDate IsNot Nothing Then
					entry.LastWriteTime = modificationDate.GetValueOrDefault()
				End If

				' Copy embedded file contents to the zip archive entry.
				Using embeddedFileStream = embeddedFile.OpenRead()
				Using entryStream = entry.Open()
					embeddedFileStream.CopyTo(entryStream)
				End Using
				End Using
			Next keyFilePair
		End Using
		End Using
		End Using

		System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo("Embedded Files.zip") With {.UseShellExecute = True})
	End Sub
End Class

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.