Overview PDF Focus .Net


Introduction

PDF Focus .Net is a cross-platform .NET library that allows your applications to convert PDF documents to everything: DOCX, RTF, HTML, XML, Excel, images and text documents with many options and functions. After adding a reference to "SautinSoft.PdfFocus.dll" and entering 3-4 lines of C#, you can use the API in your applications:

  • Convert PDF to DOCX, RTF:
    • Three modes of converting:
      • Flowing - all text is arranged via paragraphs, looks like typed by a human
      • Exact - all text is arranged using small shape (x,y) blocks as in PDF structure
      • Continuous - all text is arranged via large shape blocks with (x,y) coordinates
    • Recreates real tables with rows and cells from graphic lines
    • Produced DOCX document is completely compatible with Office Open XML specification, ECMA-376
    • Produced RTF document is completely compatible with RTF 1.8 specification
    • Full text formatting with images, colors, backgrounds, shapes, tables, vector graphics, font styles, sizes
    • Keep the char scaling and spacing or set unified
    • Full Unicode support

  • Convert PDF to Text:
    • Produced Text documents with full Unicode support
    • The document layout in textual mode similar to original PDF

  • Convert PDF to Images:
    • TIFF, Multipage-TIFF, Multipage-TIFF-CCITT4
    • JPG, Jpeg
    • PNG
    • Bitmap
    • System.Drawing.Image
    • Ability to set DPI, color depth, image format
    • Ability to set a custom width and height in pixel, point or percent

  • Convert PDF to HTML:
    • HTML5 with CSS
    • Two modes of converting:
      • HTML-Fixed - all text is arranged using small shape (x,y) blocks
      • HTML-Flowing - all text is arranged via paragraphs, looks like typed by a human
    • Specify the document Title
    • Store images inside the HTML document as binary data either as separate PNG or JPG files
    • Set the quality for all images within the HTML document

  • Convert PDF to Excel:
    • Creates .xls workbooks
    • Allows to put all pages from PDF document into a single worksheet or create separate sheets for each PDF page
    • Two modes of converting:
      • Convert all textual data
      • Convert only tabular data

  • Convert PDF to XML:
    • Creates well-formed XML documents
    • Two modes of converting:
      • Convert all textual data
      • Convert only tabular data

  • Extract Images from PDF:
    • Extract all images and vector graphics
    • Extract images from specific pages
    • Extract only images with specific width or height

Input formats

PDF 1.0-1.7, PDF/A.


Output formats

DOCX, RTF, Text, HTML, JPEG, PNG, BMP, TIFF, MultipageTiff, GIF, Excel, XML.


Advanced Features

  • Convert a PDF document as a file, URI, MemoryStream, array of bytes
  • Set a custom pages or ranges for converting "1-3, 5, 8-13, 16"
  • Convert a password protected PDF, in case you know the password
  • Get to know the number of pages in PDF and their sizes
  • Detect tables in PDF document
  • Rasterize vector graphics or skip them
  • Preserve images or skip them
  • Show invisible text or not
  • Add copyright text to into each document page
  • Interface to specify OCR (optical character recognition) engine
  • Supports JBIG2 and JPEG2000 codecs