PDF Attachment Annotations Extraction in C# and .NET

PDF files often contain annotations, including attached files, which are used to embed additional files in the document. Software extraction of these attachment annotations can be useful for automating workflows, extracting embedded data, or bulk document processing. In this article, we'll look at how to extract attachment annotations from PDF files in C# and .NET using the SautinSoft.PDF library.

PDF attachment annotations allow users to insert files such as images, documents, or other data. Extracting these attachments programmatically can help in scenarios such as:

  • Archiving of embedded files.
  • Document content analysis.
  • Automation of data extraction processes.

Below is an example of how this can be done:

  1. Add SautinSoft.PDF from NuGet.
  2. Load the PDF document.
  3. Add all the files from the annotations to the file attachments located on the first page to the zip archive.
  4. Create a file "Annotation to the attached Files.zip".

Input file:

Output result:

Complete code

using System;
using System.IO;
using System.IO.Compression;
using SautinSoft.Pdf.Annotations;
using SautinSoft.Pdf;

class Program
{
    /// <summary>
    /// Annotations.
    /// </summary>
    /// <remarks>
    /// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/extract-attachment-annotations.php
    /// </remarks>
    static void Main()
    {
        // Before starting this example, please get a free trial key:
        // https://sautinsoft.com/start-for-free/

        // Apply the key here:
        // PdfDocument.SetLicense("...");

        // Add to zip archive all files from file attachment annotations located on the first page.
        using (var document = PdfDocument.Load(Path.GetFullPath(@"..\..\..\File Attachment Annotations.pdf")))
        using (var archiveStream = File.Create("File Attachment Annotation Files.zip"))
        using (var archive = new ZipArchive(archiveStream, ZipArchiveMode.Create, leaveOpen: true))
            foreach (var annotation in document.Pages[0].Annotations)
                if (annotation.AnnotationType == PdfAnnotationType.FileAttachment)
                {
                    var fileAttachmentAnnotation = (PdfFileAttachmentAnnotation)annotation;

                    var fileSpecification = fileAttachmentAnnotation.File;

                    // Use the description or the file name as the relative path of the entry in the zip archive.
                    var entryFullName = fileAttachmentAnnotation.Description;
                    if (entryFullName == null || !entryFullName.EndsWith(fileSpecification.Name, StringComparison.Ordinal))
                        entryFullName = fileSpecification.Name;

                    var embeddedFile = fileSpecification.EmbeddedFile;

                    // Create zip archive entry.
                    // Zip archive entry is compressed if the embedded file's compressed size is less than its uncompressed size.
                    bool compress = embeddedFile.Size == null || embeddedFile.CompressedSize < embeddedFile.Size.GetValueOrDefault();
                    var entry = archive.CreateEntry(entryFullName, compress ? CompressionLevel.Optimal : CompressionLevel.NoCompression);

                    // Set the modification date, if it is specified in the embedded file.
                    var modificationDate = embeddedFile.ModificationDate;
                    if (modificationDate != null)
                        entry.LastWriteTime = modificationDate.GetValueOrDefault();

                    // Copy embedded file contents to the zip archive entry.
                    using (var embeddedFileStream = embeddedFile.OpenRead())
                    using (var entryStream = entry.Open())
                        embeddedFileStream.CopyTo(entryStream);
                }

        System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo("File Attachment Annotation Files.zip") { UseShellExecute = true });
    }
}

Download

Option Infer On

Imports System
Imports System.IO
Imports System.IO.Compression
Imports SautinSoft.Pdf.Annotations
Imports SautinSoft.Pdf

Friend Class Program
	''' <summary>
	''' Annotations.
	''' </summary>
	''' <remarks>
	''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/extract-attachment-annotations.php
	''' </remarks>
	Shared Sub Main()
		' Before starting this example, please get a free trial key:
		' https://sautinsoft.com/start-for-free/

		' Apply the key here:
		' PdfDocument.SetLicense("...");

		' Add to zip archive all files from file attachment annotations located on the first page.
		Using document = PdfDocument.Load(Path.GetFullPath("..\..\..\File Attachment Annotations.pdf"))
		Using archiveStream = File.Create("File Attachment Annotation Files.zip")
		Using archive = New ZipArchive(archiveStream, ZipArchiveMode.Create, leaveOpen:= True)
			For Each annotation In document.Pages(0).Annotations
				If annotation.AnnotationType = PdfAnnotationType.FileAttachment Then
					Dim fileAttachmentAnnotation = CType(annotation, PdfFileAttachmentAnnotation)

					Dim fileSpecification = fileAttachmentAnnotation.File

					' Use the description or the file name as the relative path of the entry in the zip archive.
					Dim entryFullName = fileAttachmentAnnotation.Description
					If entryFullName Is Nothing OrElse Not entryFullName.EndsWith(fileSpecification.Name, StringComparison.Ordinal) Then
						entryFullName = fileSpecification.Name
					End If

					Dim embeddedFile = fileSpecification.EmbeddedFile

					' Create zip archive entry.
					' Zip archive entry is compressed if the embedded file's compressed size is less than its uncompressed size.
					Dim compress As Boolean = embeddedFile.Size Is Nothing OrElse embeddedFile.CompressedSize < embeddedFile.Size.GetValueOrDefault()
					Dim entry = archive.CreateEntry(entryFullName,If(compress, CompressionLevel.Optimal, CompressionLevel.NoCompression))

					' Set the modification date, if it is specified in the embedded file.
					Dim modificationDate = embeddedFile.ModificationDate
					If modificationDate IsNot Nothing Then
						entry.LastWriteTime = modificationDate.GetValueOrDefault()
					End If

					' Copy embedded file contents to the zip archive entry.
					Using embeddedFileStream = embeddedFile.OpenRead()
					Using entryStream = entry.Open()
						embeddedFileStream.CopyTo(entryStream)
					End Using
					End Using
				End If
			Next annotation
		End Using
		End Using
		End Using

		System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo("File Attachment Annotation Files.zip") With {.UseShellExecute = True})
	End Sub
End Class

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.