How to load a PDF document in C# and VB.Net

  1. Load from a file
    
    DocumentCore dc = DocumentCore.Load(@"d:\Book.pdf");
    
    The dc object represents a document loaded into memory. The file format is detected automatically from the file extension: ".pdf".

    To not rely on the file extension and guarantee that the file contents is really PDF, you may specify PdfLoadOptions parameter.

    
    DocumentCore dc = DocumentCore.Load(@"d:\Book.pdf", new PdfLoadOptions());
    

    PdfLoadOptions
    Specifies to load a document in Adobe Portable Document Format (PDF) format and allows to set the loading properties.
    All properties already set to convenient defaults, but you may change any of them:

    Property Type Default Description
    ConversionMode enum
    {Flowing,
    Continuous,
    Exact}
    Flowing Set a mode to load a PDF document: Flowing, Continuous and Exact
    Flowing, Continuous and Exact modes.
    PageIndex int 0 0-based index of the first page to load
    PageCount int int.MaxValue The number of pages to load
    SelectedPages int[] int[0] Array with page numbers to load (0-based index).
    Password string string.Empty Open protected / encrypted document.
    KeepCharScaleAndSpacing bool true Whether to keep the original char scaling and spacing or reset it to 100%.
    DetectTables bool true Whether to recreate real tables or leave them as graphical lines. Notice, a PDF format doesn't have such concept as a table. Actually, it's a set of graphic lines.
    «Document .Net» has an intellectual algorithm to recognize tables.
    >Recreate real tables or leave them as graphical lines.
    OptimizeImages bool true Whether to merge adjacent images into a one.
    ShapeAnchoring bool true Produce shape coordinates anchored relative to paragraph or not. In case of false, the coordinates will be anchored to page.
    RasterizeVectorGraphics bool true Whether to rasterize vector graphics or leave it as is.
    ShowInvisibleText bool false Whether to load an invisible text or skip it.
    Some PDF documents can contain a text as picture and an invisible text atop this picture. For this case, we recommed to set this option to 'true' and PreserveImages to 'false'.
    PreserveImages bool true Whether to load images from PDF or skip them.
    PreserveImages bool true Whether to load images from PDF or skip them.
    For example, to work only with textual data, this option can help you to significantly save a time of loading and memory usage.
    PreserveGraphics bool true Whether to load vector graphics from PDF or skip it.
    The same as for images, to work only with textual data, this option can help you to significantly save a time of loading and using of memory.
    PreserveEmbeddedFonts bool false Whether to load embedded fonts from PDF and store them in DocumentCore or select and use similar TTF fonts from the Environment.
    
                // Load only 1st page from PDF
                PdfLoadOptions plo = new PdfLoadOptions()
                {
                    PageIndex = 0,
                    PageCount = 1
                };
                DocumentCore dc = DocumentCore.Load(@"d:\Book.pdf", plo);
    
  2. Load from a Stream
    
                // Let us say we already have a PDF document as array of bytes.
                byte[] pdfBytes = null;
                // pdfBytes = ...
    
                DocumentCore dc = null;
                using (MemoryStream pdfStream = new MemoryStream(pdfBytes))
                {
                    dc = DocumentCore.Load(pdfStream, new PdfLoadOptions());
                }
                // Here we can do with our document 'dc' anything we need.
 

Complete code

using System;
using System.IO;
using SautinSoft.Document;

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            LoadPDFFromFile();
            //LoadPDFFromStream();
        }
        // From a file
        static void LoadPDFFromFile()
        {
            string filePath = @"d:\Book.pdf";
            // The file format is detected automatically from the file extension: ".pdf".
            // But as shown in the example below, we can specify PdfLoadOptions as 2nd parameter
            // to explicitly set that a loadable document has PDF format.
            DocumentCore dc = DocumentCore.Load(filePath);
        }
        static void LoadPDFFromStream()
        {
            // Get document bytes.
            byte[] fileBytes = File.ReadAllBytes(@"d:\Book.pdf");

            DocumentCore dc = null;
            // Create a MemoryStream
            using (MemoryStream ms = new MemoryStream(fileBytes))
            {
                // Specifying PdfLoadOptions we explicitly set that a loadable document is PDF.
                // Also we specified here to load only 1st page and 
                // switched off the 'OptimizeImage' to not merge adjacent images into a one.
                PdfLoadOptions pdfLO = new PdfLoadOptions()
                {
                    FromPage = 1,
                    ToPage = 1,
                    OptimizeImages = false
                };

                // Load a PDF document from the MemoryStream.
                dc = DocumentCore.Load(ms, new PdfLoadOptions());
            }
        }
    }
}
        
            Imports System
Imports System.IO
Imports SautinSoft.Document

Module ExampleVB

    Sub Main()
        LoadPDFFromFile()
        'LoadPDFFromStream();
    End Sub
    ' From a file
    Public Sub LoadPDFFromFile()
        Dim filePath As String = "d:\Book.pdf"
        ' The file format is detected automatically from the file extension: ".pdf".
        ' But as shown in the example below, we can specify PdfLoadOptions as 2nd parameter
        ' to explicitly set that a loadable document has PDF format.
        Dim dc As DocumentCore = DocumentCore.Load(filePath)
    End Sub
    Public Sub LoadPDFFromStream()
        ' Get document bytes.
        Dim fileBytes() As Byte = File.ReadAllBytes("d:\Book.pdf")

        Dim dc As DocumentCore = Nothing
        ' Create a MemoryStream
        Using ms As New MemoryStream(fileBytes)
            ' Specifying PdfLoadOptions we explicitly set that a loadable document is PDF.
            ' Also we specified here to load only 1st page and 
            ' switched off the 'OptimizeImage' to not merge adjacent images into a one.
            Dim pdfLO As New PdfLoadOptions() With {
                    .FromPage = 1,
                    .ToPage = 1,
                    .OptimizeImages = False
                }

            ' Load a PDF document from the MemoryStream.
            dc = DocumentCore.Load(ms, New PdfLoadOptions())
        End Using
    End Sub
End Module
© SautinSoft 2017