Overview PDF Focus .Net

Introduction

PDF Focus .Net is a cross-platform .NET library that allows your applications to convert PDF documents to everything: DOCX, RTF, HTML, XML, Excel, images and text documents with many options and functions. After adding a reference to "SautinSoft.PdfFocus.dll" and entering 3-4 lines of C#, you can use the API in your applications:

Convert PDF to DOCX, RTF:
- Three modes of converting:
  - Flowing - all text is arranged via paragraphs, looks like typed by a human
  - Exact - all text is arranged using small shape (x,y) blocks as in PDF structure
  - Continuous - all text is arranged via large shape blocks with (x,y) coordinates
- Recreates real tables with rows and cells from graphic lines
- Produced DOCX document is completely compatible with Office Open XML specification, ECMA-376
- Produced RTF document is completely compatible with RTF 1.8 specification
- Full text formatting with images, colors, backgrounds, shapes, tables, vector graphics, font styles, sizes
- Keep the char scaling and spacing or set unified
- Full Unicode support

Convert PDF to Text:
- Produced Text documents with full Unicode support
- The document layout in textual mode similar to original PDF

Convert PDF to Images:
- TIFF, Multipage-TIFF, Multipage-TIFF-CCITT4
- JPG, Jpeg
- PNG
- Bitmap
- System.Drawing.Image
- Ability to set DPI, color depth, image format
- Ability to set a custom width and height in pixel, point or percent

Convert PDF to HTML:
- HTML5 with CSS
- Two modes of converting:
  - HTML-Fixed - all text is arranged using small shape (x,y) blocks
  - HTML-Flowing - all text is arranged via paragraphs, looks like typed by a human
- Specify the document Title
- Store images inside the HTML document as binary data either as separate PNG or JPG files
- Set the quality for all images within the HTML document

Convert PDF to Excel:
- Creates .xls workbooks
- Allows to put all pages from PDF document into a single worksheet or create separate sheets for each PDF page
- Two modes of converting:
  - Convert all textual data
  - Convert only tabular data

Convert PDF to XML:
- Creates well-formed XML documents
- Two modes of converting:
  - Convert all textual data
  - Convert only tabular data

Extract Images from PDF:
- Extract all images and vector graphics
- Extract images from specific pages
- Extract only images with specific width or height

Input formats

PDF 1.0-1.7, PDF/A.

Output formats

DOCX, RTF, Text, HTML, JPEG, PNG, BMP, TIFF, MultipageTiff, GIF, Excel, XML.

Advanced Features

Convert a PDF document as a file, URI, MemoryStream, array of bytes
Set a custom pages or ranges for converting "1-3, 5, 8-13, 16"
Convert a password protected PDF, in case you know the password
Get to know the number of pages in PDF and their sizes
Detect tables in PDF document
Rasterize vector graphics or skip them
Preserve images or skip them
Show invisible text or not
Add copyright text to into each document page
Interface to specify OCR (optical character recognition) engine
Supports JBIG2 and JPEG2000 codecs