Introduction
PDF Focus .Net is a cross-platform .NET library that allows your applications to convert PDF documents to everything: DOCX, RTF, HTML, XML, Excel, images and text documents with many options and functions. After adding a reference to "SautinSoft.PdfFocus.dll" and entering 3-4 lines of C#, you can use the API in your applications:
- Convert PDF to DOCX, RTF:
- Three modes of converting:
- Flowing - all text is arranged via paragraphs, looks like typed by a human
- Exact - all text is arranged using small shape (x,y) blocks as in PDF structure
- Continuous - all text is arranged via large shape blocks with (x,y) coordinates
- Recreates real tables with rows and cells from graphic lines
- Produced DOCX document is completely compatible with Office Open XML specification, ECMA-376
- Produced RTF document is completely compatible with RTF 1.8 specification
- Full text formatting with images, colors, backgrounds, shapes, tables, vector graphics, font styles, sizes
- Keep the char scaling and spacing or set unified
- Full Unicode support
- Convert PDF to Text:
- Produced Text documents with full Unicode support
- The document layout in textual mode similar to original PDF
- Convert PDF to Images:
- TIFF, Multipage-TIFF, Multipage-TIFF-CCITT4
- JPG, Jpeg
- PNG
- Bitmap
- System.Drawing.Image
- Ability to set DPI, color depth, image format
- Ability to set a custom width and height in pixel, point or percent
- Convert PDF to HTML:
- HTML5 with CSS
- Two modes of converting:
- HTML-Fixed - all text is arranged using small shape (x,y) blocks
- HTML-Flowing - all text is arranged via paragraphs, looks like typed by a human
- Specify the document Title
- Store images inside the HTML document as binary data either as separate PNG or JPG files
- Set the quality for all images within the HTML document
- Convert PDF to Excel:
- Creates .xls workbooks
- Allows to put all pages from PDF document into a single worksheet or create separate sheets for each PDF page
- Two modes of converting:
- Convert all textual data
- Convert only tabular data
- Convert PDF to XML:
- Creates well-formed XML documents
- Two modes of converting:
- Convert all textual data
- Convert only tabular data
- Extract Images from PDF:
- Extract all images and vector graphics
- Extract images from specific pages
- Extract only images with specific width or height
Input formats
PDF 1.0-1.7, PDF/A.
Output formats
DOCX, RTF, Text, HTML, JPEG, PNG, BMP, TIFF, MultipageTiff, GIF, Excel, XML.
Advanced Features
- Convert a PDF document as a file, URI, MemoryStream, array of bytes
- Set a custom pages or ranges for converting "1-3, 5, 8-13, 16"
- Convert a password protected PDF, in case you know the password
- Get to know the number of pages in PDF and their sizes
- Detect tables in PDF document
- Rasterize vector graphics or skip them
- Preserve images or skip them
- Show invisible text or not
- Add copyright text to into each document page
- Interface to specify OCR (optical character recognition) engine
- Supports JBIG2 and JPEG2000 codecs