PDF Linearization

"Fast Web View" or linearization of PDFs is a way to optimize PDFs so they can be streamed to client applications in a Youtube video-like fashion. This allows remote, online documents to be opened almost instantly, eliminating the need to wait minutes or hours for large documents to fully download.

Linearization is especially useful when accessing large documents from remote URLs or resources, such as browsers, mobile, desktop, or server applications.

PDF .Net supports Linearized PDF, and it is the first to support PDF linearization within a browser viewer (i.e., WebViewer). It is also a simple matter to create linearized documents using our cross-platform PDF SDK.

The following article describes linearization in detail.


What is PDF Linearization?

  • Linearization concerns accessing online PDF documents from any software.
  • Pages are served or "streamed" by byte-range requests from the web server to the client.

When Should I use Linearization?

Any developer working with large, network-bound documents should consider using linearization. Here’s why:

We’ve found that linearization enables opening of large PDFs in 7 seconds on average when using a 4G connection. And while open time extends when a document has a very large and complex first page, most documents are shown to benefit from linearization so long as they have at least a few pages.


How Linearization Works - Fast Random Access via On-demand Streaming of Pages

Linearization, introduced with PDF 1.2, has a 20+ page appendix dedicated to it in the core PDF reference.

But if you want a quicker explanation, read on.

Linearization works by altering the internal structure of a PDF file in a way that allows for fast on-demand streaming of partial content.

Simply put, each PDF is an object tree, starting at the root node and ascending from there. Pages can refer to other objects hanging from that tree by object number. In the case of non-linearized PDFs, these objects are often scattered throughout the file, such as embedded fonts. And because there is no quick way to identify and retrieve the resources for a particular page, traditional viewers must download the entire document before opening it.

In contrast, a linearized PDF is reorganized so that page resources are logically grouped according to the page order of the document (hence the term "linearized"). Linearization dictionaries and "hint tables" are also added to the top of the document. These serve as an inventory that specifies the location of objects needed to render any given page, essentially allowing random online access to the page.

Systems that use linearization typically convert documents to linearized PDF upon upload.

Viewers designed to handle linearized content can request linearized PDF content from a web server via a URL. This information is provided as a contiguous content "chunk" of the PDF binary.

If the viewer detects linearization, it will stop downloading after receiving the hint table and the first page. The remaining content chunks are prioritized based on how the user navigates. For example, if a user skips to page 475 in a 1000-page document, the viewer may request resources for page 475 and surrounding pages, which will be downloaded first.

The remainder of the document is gradually downloaded and rendered as the user session continues. In addition, pages that are no longer needed can be easily erased from memory when needed.


Linearization and PDF .Net

Linearization therefore provides a much faster online experience overall. There are also several other advantages when dealing with online documents remotely.:

- Linearization makes the viewing experience more resilient to network interruptions. A network interruption during a large document download, for example, might require that the user restart; at the very least, it can significantly delay first page view.

- It improves reliability where there is limited memory/storage, where it would be difficult to cache downloaded data locally (for example, when working in a browser and especially, in a mobile browser).

- It reduces network transfer costs. Some viewers such as our PDF Focus SDK can be configured to download only those pages viewed by the user. This is critical when serving very large 1GB+ to mobile devices with limited or costly data plans, and beneficial even when serving smaller documents of 20MB+.

Complete code

using System.IO;
using SautinSoft.Pdf;

class Program
{
	static void Main()
	{

		// This property is necessary only for licensed version.
		//SautinSoft.Pdf.Serial = "XXXXXXXXXXX";

		using (PdfDocument doc = PdfDocument.Load("Regular PDF File.pdf"))
		// In order to achieve the conversion of a loaded PDF file to a linearized PDF file,
		// we just need to save a PdfDocument object using Linearization Save Option.
		doc.Save("Linearized PDF File.pdf", SaveOptions.e_linearized);
	}
}

            

Download.


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:



Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.